Error on batch ingesting from different druid datasource

Hi, Maybe somebody could help me I’m out of ideas.
I have an error when I’m trying to overwrite some data in one datasource with data from different datasource. Here is the task manifest:
{
“type” : “index_parallel”,
“spec” : {
“dataSchema” : {
“dataSource” : “my-events-v2”,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “MINUTE”,
“rollup”: false
}
},
“ioConfig” : {
“type” : “index_parallel”,
“inputSource” : {
“type” : “druid”,
“dataSource”: “my-events”,
“interval”: “2019-12-09/2020-07-15”
},
“appendToExisting” : false
},
“tuningConfig” : {
“type” : “index_parallel”,
“maxRowsPerSegment” : 5000000,
“maxRowsInMemory” : 25000
}
}
}
both datasources have the exact same dimensions and metrics, same segment granularity.

the error that i get is:
{
“id”: “index_parallel_my-events-v2_elkccoja_2020-08-18T15:34:44.028Z”,
“groupId”: “index_parallel_my-events-v2_elkccoja_2020-08-18T15:34:44.028Z”,
“type”: “index_parallel”,
“createdTime”: “2020-08-18T15:34:44.029Z”,
“queueInsertionTime”: “1970-01-01T00:00:00.000Z”,
“statusCode”: “FAILED”,
“status”: “FAILED”,
“runnerStatusCode”: “WAITING”,
“duration”: 5514,
“location”: {
“host”: “localhost”,
“port”: 8103,
“tlsPort”: -1
},
“dataSource”: “my-events-v2”,
“errorMsg”: “java.lang.NullPointerException: inputRowParser\n\tat com.google.common.base.Preconditions.checkNotNull…”
}

and this is in the report:
{
“ingestionState”: “DETERMINE_PARTITIONS”,
“unparseableEvents”: {},
“rowStats”: {
“determinePartitions”: {
“processed”: 0,
“processedWithError”: 0,
“thrownAway”: 0,
“unparseable”: 0
},
“buildSegments”: {
“processed”: 0,
“processedWithError”: 0,
“thrownAway”: 0,
“unparseable”: 0
}
},
“errorMsg”: “java.lang.NullPointerException: inputRowParser\n\tat com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)\n\tat org.apache.druid.segment.indexing.DataSchema.getTimestampSpec(DataSchema.java:200)\n\tat org.apache.druid.indexing.common.task.IndexTask.collectIntervalsAndShardSpecs(IndexTask.java:749)\n\tat org.apache.druid.indexing.common.task.IndexTask.createShardSpecsFromInput(IndexTask.java:678)\n\tat org.apache.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:641)\n\tat org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:497)\n\tat org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:124)\n\tat org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:826)\n\tat org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:454)\n\tat org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:124)\n\tat org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421)\n\tat org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n”
}

Thank you

Hi Catalin,

I have never used batch ingestion but just guessing. Your data might have some null values which it is not able to ingest. Try to transform them with default value and then try.

Regards,
Poonam

Hi Poonam,

Doesn’t look to be related to data itself but rather to how it’s organized. If i add intervals to granularitySpec ( “intervals”: [“2019-12-09/2020-07-15”], ) the error changes from :

“ERROR o.a.d.i.c.t.IndexTask [task-runner-0-priority-0] Encountered exception in DETERMINE_PARTITIONS.
java.lang.NullPointerException: inputRowParser”

to

“ERROR o.a.d.i.c.t.IndexTask [task-runner-0-priority-0] Encountered exception in BUILD_SEGMENTS.
java.lang.NullPointerException: inputRowParser”

so this makes me thinking it is unable to parse the structure of the dataset

Yes so while parsing data it’s failing on null values. So add trasformspec to handle null values.

Regards,
Poonam

I found it, I had to include in the json task definition all the characteristics of my destination dataset: timestampSpec, dimensionsSpec, metricsSpec. I thought it will just use the existig specs but apparently you have to specify them.