Error ingesting data from local CSV: java.lang.NullPointerException: baseDir

When ingesting from a local CSV file I’m getting this error:

2020-08-12T22:04:29,761 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version ‘Jersey: 1.19.3 10/24/2016 03:43 PM’
2020-08-12T22:04:29,981 WARN [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Chat handler is already registered. Skipping chat handler registration.
2020-08-12T22:04:29,988 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Skipping determine partition scan
2020-08-12T22:04:30,048 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in BUILD_SEGMENTS.
java.lang.NullPointerException: baseDir
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) ~[guava-16.0.1.jar:?]
at org.apache.druid.segment.realtime.firehose.LocalFirehoseFactory.initObjects(LocalFirehoseFactory.java:86) ~[druid-server-0.18.1.jar:0.18.1]
at org.apache.druid.data.input.impl.AbstractTextFilesFirehoseFactory.initializeObjectsIfNeeded(AbstractTextFilesFirehoseFactory.java:93) ~[druid-core-0.18.1.jar:0.18.1]
at org.apache.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:59) ~[druid-core-0.18.1.jar:0.18.1]
at org.apache.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:49) ~[druid-core-0.18.1.jar:0.18.1]
at org.apache.druid.data.input.impl.FirehoseToInputSourceReaderAdaptor$1.(FirehoseToInputSourceReaderAdaptor.java:55) ~[druid-core-0.18.1.jar:0.18.1]
at org.apache.druid.data.input.impl.FirehoseToInputSourceReaderAdaptor.read(FirehoseToInputSourceReaderAdaptor.java:53) ~[druid-core-0.18.1.jar:0.18.1]
at org.apache.druid.segment.transform.TransformingInputSourceReader.read(TransformingInputSourceReader.java:43) ~[druid-processing-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:122) ~[druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:945) ~[druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:526) [druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:123) [druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:858) [druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:469) [druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:123) [druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.18.1.jar:0.18.1]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.18.1.jar:0.18.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_261]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
2020-08-12T22:04:30,060 WARN [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - handler[index_parallel_test-2921090-iodjephi_2020-08-12T22:04:21.780Z] not currently registered, ignoring.

Here is my spec file:

{
“type”: “index_parallel”,
“spec”: {
“dataSchema”: {
“dataSource”: “test-2921090”,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “HOUR”,
“intervals”: [
“2020-04-01/2020-08-01”
],
“rollup”: true
},
“parser”: {
“parseSpec”: {
“format”: “csv”,
“columns”: [
“col1”,
“col2”,
“quantity”
],
“timestampSpec”: {
“format”: “iso”,
“column”: “col1”
},
“dimensionsSpec”: {
“dimensions”: [
“col2”
]
},
“metricsSpec”: [
{
“type”: “doubleSum”,
“name”: “quantitySum”,
“fieldName”: “quantity”
}
]
}
}
},
“ioConfig”: {
“type”: “index_parallel”,
“firehose”: {
“type”: “local”,
“inputSource”: {
“type”: “local”,
“files”: ["/Users/bretselby/Downloads/export6/sheet1.csv"]
},
“inputFormat”: {
“type”: “csv”,
“columns”: [
“col1”,
“col2”,
“quantity”
]
}
}
},
“tuningConfig”: {
“type”: “index_parallel”,
“maxRowsPerSegment”: 5000000
}
}
}

I am running the micro-quickstart script to start Druid 0.18.1 on Mac OSX.

You’re missing the baseDir parameter. I don’t think local import works like you think they are. Files are just the file part, from the baseDir

From the Doc :

“ioConfig”: { “type”: “index_parallel”, “inputSource”: { “type”: “local”, “baseDir”: “examples/indexing/”, “filter”: “wikipedia_data.json” },

https://druid.apache.org/docs/latest/ingestion/index.html

Happy ingesting!