Issues with Batch ingestion in Druid Cluster

Hi Team,

we have a druid cluster with 1 master server, 1 data server and 1 query server

-Overload, coordinator & zookeeper running in master server

-historical & middle manager running in data server

-broker & router running in query server

We face challenge in batch ingestion. (ioConfig as follows)

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir”: “/{filepath}”,

“filter” : “{filename}”

},

“appendToExisting” : false

},

Tried submitting the indexing task by

option 1)placing the data file only in data server

option 2)placing the data file only in master server,

but both options failed.

Finally we could load the batch file once data file is placed in both master server and data server.

Could you please advise how to make batch ingestion working by having data file only in data server?

There is also another issue that indexing logs created in data server ars not visible from overlord unified-console.

While looking for index logs from overlord console it says ‘Error: No log was found for this task. The task may not exist, or it may not have begun running yet’

But we are able to find the indexing logs in the data server. How could this be made available in overlord console?

Thanks

Soumya

What version of Druid are you using? Druid 0.15.0 includes a data loader that makes batch and streaming ingestion much easier.

https://druid.apache.org/docs/latest/tutorials/tutorial-batch.html

If you are using a previous version what command are you using to load the file? Also can you please send the entire ingestion specification you are using?

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io

Hi Eric,

We use 0.15.2

Tried loading the batch file using data loader (from overlord-console), and also by submitting the spec to overlord from master server using curl command.

Spec as follows

{

“type” : “index”,

“spec” : {

“dataSchema” : {

“dataSource” : “datasource_name”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“dimensionsSpec” : {

“dimensions” : [

“eventID”,

“eventDesc”,

“eventType”,

“location”,

“startDate”,

“endDate”,

“phoneNumber”

]

},

“timestampSpec”: {

“column”: “startDate”,

“format”: “MM/dd/yyyy hh:mm:ss a”

}

}

},

“metricsSpec” : ,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”,

“rollup” : false

}

},

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir”: “/filepath”,

“filter” : “filename.json”

},

“appendToExisting” : false

},

“tuningConfig” : {

“type” : “index”,

“maxRowsPerSegment” : 5000000,

“maxRowsInMemory” : 25000

}

}

}

Correction: version 0.15.1

Are you seeing any errors in your overlord or middle manager logs? I’m assuming the location of the file and filename are correct.

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io

can you attached all runtime.properties of the given nodes?

Cause I think there is a coordination problem between nodes.

Hi Umar,

Please find the three of the servers runtime.property file as attached.

regards

Sumeet Lalvani

common.runtime_DATA.properties (4.63 KB)

common.runtime_MASTER.properties (4.63 KB)

common.runtime_QUERY.properties (4.63 KB)