Question about Druid data ingestion

Hi Druid Community members,

Currently, I have a server that accumulates data in memory and dump a batch to a JSON file periodically. Then my server runs a bash command to let Druid ingest the JSON file. After it’s done, the server will run another command to remove the file.

It seems very inefficient since the server writes the data to local first and then ingest. Is there a way from me to start a batch process in Druid and send JSON objects directly from my server to Druid without writing to a local file? Thank you!

I looked at streaming ingestion but it doesn’t seem to fit my needs exactly because I don’t need real-time query features.

Best regards,

Huck

The counterintuitive approach to batch would be to get rid of the pipeline that writes to a file and have it write to a stream (Kafka or kinesis etc) instead. But you say it doesn’t fit your purpose. What would your ideal pipeline would be?

In my team, we write it to S3 instead of local filesystem and then run the batch injestion…