Creating separate rollup tables

Hi,

In my scenario I would like to keep original data queryable, and create separate rollup tables that have coarser time granularity or drops certain dimensions. An example is to have data aggregated hourly, daily and monthly. I understand Druid can aggregregate over hour/day/month from original data, but I would like to make these queries even faster and save compute cycles, thus keeping separate rollup tables.

So far the only way I could find to achieve this is to create separate ingestion spec for the rollup tables. However, this seems like a waste of resources, because each rollup will be calculated from original data. I could have calculated daily rollup from hourly rollup, and monthly from daily, not from original data. It feels like I could have “ingested” one rollup table and specifying a coarser time granularity to create a new table :). Is there a recommended way of doing this?

Thanks! James

Hi James,
you can achieve this via reindexing from existing segments -

See http://druid.io/docs/latest/ingestion/update-existing-data.html

Wouldn’t reindexing overwrite existing segments? I actually want 4 tables in the end: wikipedia, wikipedia_1h, wikipedia_1d, wikipedia_1mo. Please let me know if I missed something. - Thanks! James

you should be able to read from one datasource and write as different datasource.
e.g. -

{
  "type" : "index_hadoop",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "wikipedia_1d",
      "parser" : {....}
      },
      "metricsSpec" : [....],
      "granularitySpec" : ....
    },
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "dataSource",
        "ingestionSpec" : {"dataSource": "wikipedia_1h",...}
      }
    },
    "tuningConfig" : ....
  }
}