Druid IO - Values not getting aggregated in Batch Ingestion

Hello

I am trying to perform 4-5 batch ingestions, one after then other, on the same datasource and index, via multiple input json data files which is sharing the same schema-index-json file.

My schema-index-json file looks like this:

{

“type” : “index_hadoop”,

“spec” : {

“ioConfig” : {

“type” : “hadoop”,

“inputSpec” : {

“type” : “static”,

“paths” : “…/…/order-data.json”

}

},

“dataSchema” : {

“dataSource” : “order-data-ds”,

“granularity” : “all”,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “year”,

“queryGranularity” : “none”

},

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“dimensionsSpec” : {

“dimensions” : ,

“dimensionExclusions” : [

“timestamp”,

“isGood”,

“isBad”

]

},

“timestampSpec” : {

“format” : “auto”,

“column” : “timestamp”

}

}

},

“aggregations” : [

{ “type” : “count”, “name” : “TotalIngestedRows”},

{ “type” : “doubleSum”, “name” : “CountOfGood”, “fieldName” : “isGood” },

{ “type” : “doubleSum”, “name” : “CountOfBad”, “fieldName” : “isBad” }

],

“metricsSpec” : [

{

“type” : “count”,

“name” : “TotalIngestedRows”

},

{

“type” : “doubleSum”, “name” : “CountOfGood”, “fieldName” : “isGood”

},

{

“type” : “doubleSum”, “name” : “CountOfBad”, “fieldName” : “isBad”

}

]

},

“tuningConfig” : {

“type” : “hadoop”,

“partitionsSpec” : {

“type” : “hashed”,

“targetPartitionSize” : 5000000

},

“jobProperties” : {}

}

}

}

The problem I am facing is, after ingesting all the files, my total count of records , i.e TotalIngestedRows is not equal to sum of total records in the 4 ingested files.

Is there a problem with the aggregators, or should I use post aggregators.? What should I change to get

TotalIngestedRows = sum of total records in the 4 ingested files

Please help.!

Hello,

I think you’ll need a LongSum aggregator at query time to get the total count of ingested events (i,e., sum that “TotalIngestedRows” count aggregator across all timestamp-dimension tuples).

More info at:

http://druid.io/docs/latest/querying/aggregations.html

Thanks,

Jon