Hi All,
I want to understand on what factors and parameters druid decides to roll up two entries . I assume Druid must be calculating something similar to hash to figure out the equality between two rows and then decide to roll up accordingly.So how is that Hash Calculated.
I am sharing my use with example below :
I am doing a batch ingestion into my druid (Version : druid-0.9.0, 0.9.1.1) :
Following is my batch Ingestion Script:
{
“type”: “index_hadoop”,
“spec”: {
“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “static”,
“paths”: “MyEventListFile.json”
}
},
“dataSchema”: {
“dataSource”: “<<DATASOURCE_NAME>>>”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “timestamp”,
“format”: “millis”
},
“dimensionsSpec”: {
“dimensions” :
[
A,B,C,D
],
“dimensionExclusions” :
}
}
},
“metricsSpec” :
[
{ “type” : “longMax”, “name” : “eventCount”, “fieldName”: “count” }
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “HOUR”,
“queryGranularity”: “none”,
“intervals”: [“2019-07-18/2019-12-01”]
}
},
“tuningConfig”: {
“type”: “hadoop”
}
}
}
{“timestamp”:“1572557460000”,“A”:“0”,“B”:“Pravesh300000”,“C”:“30000”,“D”:“praveshgmail.com”,“NOTADIMENSION”:“test1”,“count”:1}
{“timestamp”:“1572557460000”,“A”:“0”,“B”:“Pravesh300000”,“C”:“30000”,“D”:“praveshgmail.com”,“NOTADIMENSION”:“test2”,“count”:1}
I have two event which has same timestamp and same values for the dimensions as well (A,B,C,D) BUT the column which I cannot declare as dimension is different (NOTADIMENSION).
I am getting count as 1 in this case but I want count as 2.
Whats the solution here ? Is there anything which I can specify explicitly to tell druid how to roll up, i.e. on what columns and timestamp calculate the hash ?
Hoping to hear back soon as I am blocked on this.
Thanks,
Pravesh Gupta