Ingesting batch and stream data from one data source

I want to ingest data in the datasource as a parquet file and then kafka.
This is because I want to send a query by configuring it in one table. Is this way impossible?
I think it would be good to add a parquet after creating a kafka sauce. only need to put in the parquet once.
There may be a way to put the parquet file in the kafka using spark, but it would be nice i could solve it only with a Druid.

Welcome @seon_park!

Are you asking if, after ingestion, Druid will store a queryable parquet file?

Parquet is definitely a supported format. If you want to set up two ingestion jobs from the same data in your source system, you can certainly do that. Druid will store the ingested data in an optimized file format called a segment.

Hi @sean_park,
If you are asking about loading historical data from parquet, you can do that through batch ingestion either using parallel_index task or SQL based batch ingestion.

You can then setup ongoing ingestion from kafka messages in one of its supported formats (typical JSON or protobuf).

Thank you for your answer. But I was asking if parquet and kafka could be collected in the same data source. I was just asking if it is possible to collect parquet continuously with kafka after collecting.

Should I modify the ingestion spec? Or is it different from that?

@seon_park ,
You will need a spec for batch ingestion from parquet and another spec for the streaming ingestion.

1 Like

I’ve done this too to first ingest old data from files and then continue onwards with streaming data. It should just work if you set the data source name to the same string in both instances.

That’s correct. Are you just using the batch to load initial history? So, one time batch and then all streaming?

I posted a question because it didn’t work out when I first tried it. But after trying it again, it works well. It was a rudimentary mistake. There must have been another problem. Please understand. :slight_smile:

2 Likes