Batch Ingestion alternatives other than Hadoop indexing

Hi Team,

We have a 7 node druid cluster with 4 historical, 2 Middle manager, 1 broker, overlord and coordinator and we are using NFS as our deep storage, since we don’t have a dedicated hadoop cluster

On a daily basis we will be ingesting around 20GB dataset, as of now we are looking at alternatives for ingesting our data

We tried with local firehose ingesion service

“ioConfig”:{

“type”:“index”,

“firehose”:{

“baseDir”:"/var/druid/ingest/lfiles",

“filter”:“druid_events.txt”,

“type”:“local”

}

},

and the task is running for more than 8 hours and still its not completed

Please advise us on the best possible mechanism for ingesting our data, also let me know if you need any other details

PS : On a single node cluster ingesting files over 700 MB, resulted in tasks failing with OutOfMemory exception

Thanks,

Sathish

Hey Sathish,

You could try the hadoop task in local mode, perhaps. If you don’t configure a remote cluster, then the indexing just runs on the middleManager directly, single-threaded.

You could also use streaming ingestion, i.e. load from your data -> Kafka/Tranquility -> Druid. This doesn’t require Hadoop and it will scale out as much as you need.