Losing segments in delta ingestion

We have a specific use case. Every week new events are coming. We want to have all the events in the same datasource. By reading the documentation I was thinking using a delta ingestion task.

My task :

{

“type”: “index_hadoop”,

“spec”: {

“ioConfig” : {

“type” : “hadoop”,

“inputSpec” : {

“type” : “multi”,

“children”: [

{

“type” : “dataSource”,

“ingestionSpec” : {

“dataSource”: “test”,

“intervals”: [“2018-01-01/1W”]

}

},

{

“type” : “static”,

“inputFormat”: “io.druid.data.input.parquet.DruidParquetInputFormat”,

“paths”: “test/test”

}

]

}

},

“dataSchema”: {

“dataSource”: “test”,

“parser”: {

“type”: “parquet”,

“parseSpec”: {

“format”: “timeAndDims”,

“timestampSpec”: {

“column”: “ts”,

“format”: “posix”

},

“dimensionsSpec”: {

“dimensions”: [

“user_id”,

“artist_id”

],

“dimensionExclusions”: ,

“spatialDimensions”:

}

}

},

“metricsSpec”: [

{

“type”: “count”,

“name”: “count”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “WEEK”,

“queryGranularity”: “WEEK”,

“intervals”: [“2018-01-01/2019-01-01”]

}

},

“tuningConfig”: {

“type”: “hadoop”,

“partitionsSpec”: {

“type”: “hashed”,

“targetPartitionSize”: 20000000

},

“jobProperties” : {

“io.compression.codecs” : “org.apache.hadoop.io.compress.SnappyCodec”

},

“leaveIntermediate”: true,

“indexSpec”: {

“bitmap”: {

“type”:“concise”

}

}

}

}

}

The first week it is working perfectly. I have a datasource test with the segment of the first and second week. However, when i am indexing the third week I am losing the segment of the first week. What am I doing wrong ?

do you mean “delta ingestion” worked fine after 2nd week (where 1st week’s data was already stored in Druid and 2nd week data was new) … then on 3rd week “delta ingestion” did not work? its unlikely for it to work once and not again.

check your task logs to see it actually read from both places or feel free to share your task log for us to see.

– Himanshu

I fix my problem. I didn’t mention well the segments I wanted to keep from the datasource.
I have another question: Is there any way of enable top N query through hive with druid ? I saw an open jira to accept approximate result but I wanted a work around meanwhile.

Thanks a lot,

Eloïse