Unclear exception during ingestion (indexing) with cardinality aggregation

Hi guys,
I’m trying to ingest some data, having a cardinality aggregation.
That’s my config file:
{
“type”: “index_hadoop”,
“spec”: {
“dataSchema”: {
“dataSource”: “impression_cardinality”,
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
},
{
“type”:“cardinality”,
“name”:“uniquesClient”,
“fieldNames”:[“client”, “viewer_id”],
“byRow”:true
}
],
“granularitySpec”: {
“segmentGranularity”: “day”,
“queryGranularity”: “day”,
“intervals”: [
“2015-11-01/2015-11-02”
]
},

}
}
and i get a not very clear exception:
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:138) ~[druid-indexing-service-0.8.2.jar:0.8.2]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:206) ~[druid-indexing-service-0.8.2.jar:0.8.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:221) [druid-indexing-service-0.8.2.jar:0.8.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:200) [druid-indexing-service-0.8.2.jar:0.8.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_45]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:135) ~[druid-indexing-service-0.8.2.jar:0.8.2]
… 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.LegacyIndexGeneratorJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:202) ~[druid-indexing-hadoop-0.8.2.jar:0.8.2]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:96) ~[druid-indexing-hadoop-0.8.2.jar:0.8.2]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:259) ~[druid-indexing-service-0.8.2.jar:0.8.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_45]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:135) ~[druid-indexing-service-0.8.2.jar:0.8.2]
… 7 more
[INFO ] 2015-12-11 11:39:50.341 [task-runner-0] ExecutorLifecycle - Task completed with status: {
“id” : “index_hadoop_impression_cardinality_2015-12-11T11:35:52.373Z”,
“status” : “FAILED”,
“duration” : 230714
}

If i run the same ingestion without cardinality aggregation, it goes OK.
{
“type”: “index_hadoop”,
“spec”: {
“dataSchema”: {
“dataSource”: “impression_cardinality”,
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
}
],
“granularitySpec”: {
“segmentGranularity”: “day”,
“queryGranularity”: “day”,
“intervals”: [
“2015-11-01/2015-11-02”
]
},

}
}

Could you, please, help me to find out what wrong with the cardinality aggregation?

Cardinality aggregator can only be used at query time. Use ‘hyperUnique’ aggregator at ingestion time. At some point, we should consolidate these aggregators…

May you, please, provide a short example?

In the documentation i can find only “cardinality”:

{
  "type": "cardinality",
  "name": "<output_name>",
  "fieldNames": [ <dimension1>, <dimension2>, ... ],
  "byRow": <false | true> # (optional, defaults to false)
}

that you say it’s for query time only

and a “hyperUnique”

{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }

but it does not permit me to select a set of fields. Does it calculate hyperUnique only on 1 field?  If i want to do it for country, device pair for instance?

Thanks, Vadim

HyperUnique is an aggregator that needs to be used at ingestion time to build a hyperUnique column. Then at query time you can use the same aggregator on that column to get your results.

Honestly, it might be easier for you just to use https://github.com/implydata/plyql and issue SQL queries. There is a verbose mode to show that Druid queries the SQL is being translated to.