Result of all metrics is 0 after running index_hadoop task

Hi,

I am trying to run the index_hadoop task on one of my datasource for reindexing. This job is getting success and segment is creating. But when I am trying to see the result by select query, I found all metrics value are 0. Could you please help me on this. Below are details I have used to run this task.

index_hadoop task:

{

“type”: “index_hadoop”,

“spec”: {

“dataSchema”: {

“dataSource”: “index_hadoop_test20”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format” : “json”,

“timestampSpec” : {

“format” : “auto”,

“column” : “timestamp”

},

“columns” : [

“timestamp”,

“dim1”,

“metric1_sum”

],

“dimensionsSpec” : {

“dimensions” :

}

}

},

“metricsSpec”: [

{

“name” : “metric1_sum”,

“type” : “doubleSum”,

“fieldName” : “metric1”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “NONE”,

“intervals”: [“2017-03-01T02:00:00/2017-03-02T02:00:00”]

}

},

“ioConfig”: {

“type”: “hadoop”,

“inputSpec”: {

“type”:“dataSource”,

“ingestionSpec”: {

“dataSource”:“arpan1”,

“intervals” : [“2017-03-01T02:00:00/2017-03-02T02:00:00”]

}

}

},

“tuningConfig”: {

“type”: “hadoop”,

“partitionsSpec” : {

“type” : “hashed”,

“targetPartitionSize” : 5000000

},

“jobProperties” : {}

}

}

}

Select Query :

{

“queryType”: “select”,

“dataSource”: “index_hadoop_test20”,

“descending”: “true”,

“dimensions”:,

“metrics”:,

“granularity”: “all”,

“intervals”: [“2017-01-01/2017-03-31”],

“pagingSpec”:{“pagingIdentifiers”: {}, “threshold”:300}

}

Result :

“timestamp”: “2017-03-01T07:00:00.000Z”,

“result”: {

“pagingIdentifiers”: {

“index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”: -11

},

“dimensions”: [

“dim4”,

“dim3”,

“dim2”,

“dim1”

],

“metrics”: [

“metric1_sum”

],

“events”: [

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -1,

“event”: {

“timestamp”: “2017-03-01T09:45:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -2,

“event”: {

“timestamp”: “2017-03-01T09:30:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -3,

“event”: {

“timestamp”: “2017-03-01T09:15:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -4,

“event”: {

“timestamp”: “2017-03-01T09:00:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -5,

“event”: {

“timestamp”: “2017-03-01T08:45:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -6,

“event”: {

“timestamp”: “2017-03-01T08:30:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -7,

“event”: {

“timestamp”: “2017-03-01T08:00:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -8,

“event”: {

“timestamp”: “2017-03-01T07:45:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -9,

“event”: {

“timestamp”: “2017-03-01T07:30:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -10,

“event”: {

“timestamp”: “2017-03-01T07:15:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

},

{

“segmentId”: “index_hadoop_test20_2017-03-01T00:00:00.000Z_2017-03-02T00:00:00.000Z_2017-03-03T07:04:25.068Z”,

“offset”: -11,

“event”: {

“timestamp”: “2017-03-01T07:00:00.000Z”,

“dim4”: “dim4”,

“dim3”: “dim3”,

“dim2”: “dim2”,

“dim1”: “dim1”,

“metric1_sum”: 0

}

}

]

}

}

Please give some solution on this.

Hey Santosh,

What version of Druid is this and what did the ingestion spec for the original load (not reindexing) look like?

Hi Gian,

I have tested it on druid-0.9.2 , druid-0.10.0-SNAPSHOT and druid-0.10.0-rc1 versions. Please find the original ingestion spec I have used to load data.

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “arpan1”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “timestamp”,

“format”: “ddMMyyyyHHmmss”

},

“dimensionsSpec”: {

“dimensions”: [

]

}

}

},

“metricsSpec”: [

{

“name”: “metric1”,

“fieldName”: “metric1”,

“type”: “doubleSum”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: “NONE”,

“rollup”: false

}

},

“ioConfig”: {

“topic”: “testdemo”,

“consumerProperties”: {

“bootstrap.servers”: “localhost:9092”

}

}

}

Thanks,

Santosh

Does it work if you add metric1 to the dataSource ingestionSpec metrics list when you do reindexing? Like this:

“ioConfig”: {

“type”: “hadoop”,

“inputSpec”: {

“type”:“dataSource”,

“ingestionSpec”: {

“dataSource”:“arpan1”,

“intervals” : [“2017-03-01T02:00:00/2017-03-02T02:00:00”],

"metrics" : [“metric1”]

}

}

}

Hi Gian,

I have done the modification in ingestionSpec as per your suggestion, but got the same result.

Please have a look again.

Thanks in advance.

Ah, what’s going on is that the hadoop reindexing mechanism is being sneaky. It doesn’t apply your metricsSpec to the segments as-is, it applies them in the “combining” form. This is nice I guess since it lets you use the same metricsSpec while reindexing as you would on your raw data. But it also means you can’t use the metricsSpec to define new aggregators. This would be useful and would be a new feature. The docs could also use some clarifications.

Hi Gian, I was wondering how did you manage to solve this issue? I am facing the same problem right now at https://groups.google.com/d/msg/druid-user/4I9lUreb60k/JdWIyE1xAAAJ