Hi Druid team!
I am ingesting data from a Kafka stream and would like to create several different data sources from that one stream. I would like to have one “raw datasource” (no rollup, all information is preserved) and several “abbreviated/metric datasources”. These abbreviated datasources would make use of Druid’s roll-up functionality and metrics so that queries such as “give me the hourly event count for this event type over the past year” are processed more quickly. These “abbreviated” datasources would hold specialized low granularity data because of roll-up, and my “raw” datasource would hold all the information as it was originally ingested so no information is lost.
I would like to serve the data from my “raw” datasource and my “abbreviated” datasources in real time. Druid allows me to serve data in real-time from the “raw” datasource using Stream Ingestion. However, I am using the Druid Input Source to fill my “abbreviated” datasources with data, which is a batch-based process.
How can I fill and query my “abbreviated” datasources in real time? I could make those datasources do their own Kafka Stream ingestion, but wouldn’t that be wasteful (duplicate consumption) and very resource intensive? Thank you for your help.