[druid-user] Native Batch(index_parallel) task is always waiting

Hi,

We are using a Kafka ingestion task to index real time data. There is a lot of late data coming therefore lots of small segments are created.

I’m trying to start a index_parallel job to reindex datasource. I assume there will be timechunk locks holding by Kafka task due to arriving late data, so I turned off forceTimeChunkLocks for
both kafka ingestion task and native batch tasks.

However, when I submitted native batch task, its status code was always “WAITING” until changed to “FAIL”. and there is no logs for the task. I have no clue what be the reason.

The reindex task is as follows:

{
“type”:“index_parallel”,
“spec”:{
“dataSchema”:{
“dataSource”:“metric_duration_result”,
“timestampSpec”:{
“column”:“timestamp”,
“format”:“millis”
},
“dimensionsSpec”:{
“dimensions”:[

],
“dimensionExclusions”:[
“duration”,
“durationSum”,
“durationSketch”,
“count”,
“timestamp”
]
},
“metricsSpec”:[
{
“type”:“count”,
“name”:“count”
},
{
“type”:“longSum”,
“name”:“durationSum”,
“fieldName”:“duration”,
“expression”:null
},
{
“type”:“quantilesDoublesSketch”,
“name”:“durationSketch”,
“fieldName”:“duration”,
“k”:4096
}
],
“granularitySpec”:{
“type”:“uniform”,
“segmentGranularity”:“HOUR”,
“queryGranularity”:“MINUTE”,
“rollup”:true,
“intervals”: [“2020-12-20T21:00:00Z/P1D”]
}
},
“ioConfig”:{
“type”:“index_parallel”,
“inputSource”:{
“type”:“druid”,
“dataSource”:“metric_duration_result”,
“interval”: “2020-12-20T21:00:00Z/P1D”
},
“inputFormat”:{
“type”:“json”
}
},
“tuningConfig”:{
“type”:“index_parallel”,
“maxNumConcurrentSubTasks”:2
}
},
“context”:{
“forceTimeChunkLock”: false
}
}

Thanks.

Out of curiosity, why are you doing a re-index rather than a compaction process?

https://druid.apache.org/docs/latest/ingestion/data-management.html#compaction-and-reindexing

Are there any errors in any of the log files?

We enabled the compaction before and we found that the compaction results were far from perfect. Segment numbers were high compared to Hadoop reindex tasks(20~k segments compared to 300~). And I am not certain if it’s by design (no guaranteed perfect roll up).

Waiting prallel tasks has no logs. Checked from unified-console, the status of the task is :

“id”: “index_parallel_metric_duration_result_loaecpih_2020-12-24T08:06:05.480Z”,
“groupId”: “index_parallel_metric_duration_result_loaecpih_2020-12-24T08:06:05.480Z”,
“type”: “index_parallel”,
“createdTime”: “2020-12-24T08:06:05.481Z”,
“queueInsertionTime”: “1970-01-01T00:00:00.000Z”,
“statusCode”: “RUNNING”,
“status”: “RUNNING”,
“runnerStatusCode”: “WAITING”,
“duration”: -1,
“location”: {
“host”: null,
“port”: -1,
“tlsPort”: -1
},
“dataSource”: “metric_duration_result”,
“errorMsg”: null
}

and its “Logs” tab says:

Request failed with status code 404

Rachel Pedreschi <rachel.pedreschi@imply.io> 于2020年12月24日周四 上午6:43写道:

No errors in the coordinator, overlord, or historical logs? Can you pull a live report? https://druid.apache.org/docs/latest/ingestion/tasks.html#live-report