Data Ingestion to Druid using StreamSets

Hi Team,

We are currently looking for option to ingest(batch/stream) data to Druid datasource using Apache StreamSets data collector.

We could read Druid datasource from StreamSet using avatica driver, but could not find a direct solution for loading to Druid.

Could anyone help us with the feasibility and approach to achieve ingestion to Druid using StreamSets.

Thanks

Soumya

Hi Soumya:

Yes, I have tested using Streamset to load data from local or AWS to a Kafka Producer, then create a Druid Kafka ingestion task to read stream data from that Kafka broker. The configuration shall be easy.

Hope this helps.

Hi Ming,

Thank you. We explored Kafka option. Was the Kafka Ingestion to Druid also triggered from Streamsets?

Thanks & Regards

Hi Soumya, no Druid is running outside of StreamSets. I don’t follow StreamSets closely, but does it support Druid in its modules now? :slight_smile:

Hi Team,

We are currently looking for option to ingest(batch/stream) data to Druid datasource using Apache StreamSets data collector.

We could read Druid datasource from StreamSet using avatica driver, but could not find a direct solution for loading to Druid.

Could anyone help us with the feasibility and approach to achieve ingestion to Druid using StreamSets.

Thanks

Soumya

Hi Ming,

Thank you for the suggestions. We went with triggering the supervisor API submission from Streamsets

Thanks

Soumya