I have several questions about realtime data ingestion.
Conditions of my task:
I dont use java. I have various scripts (more than 500) that produce data.
These producers generate events, that i can describe as one datasource, but generally they can have different timestamp (one of them realtime, others can go with day delay or something like this). Also this events from different producers not related each other logically. Moreover, one can produce 10k per second, others less than 100. So it is seem good idea to store events from different producers in different datasources.
How could i implement realtime ingestion of various dynamic numbers of datasources?
I see 3 options:
- Generate new druid.realtime.specFile each time when new producer (new datasource) appears and restart realtime node.
- Launch realtime index task for each datasource. But i didn’t manage it due to bug: https://groups.google.com/forum/#!searchin/druid-development/DruidServerMetadata$20/druid-development/XwOWCw7ac9U/pduNG6vwijYJ
- Use one datasource with extra dimension “producer_type” and tune segments shards by this dimension
As i understood, submitting realtime index task turns middle manager node into realtime node. Does it mean, that if i have only one middle manager node i can submit only one realtime index task?
How can i tune realtime ingestion to provide sharding by dimension? Should i use tuningConfig.partitionsSpec? Will it provide storing events in each segment shard only with one value of specific dimension?
As i don’t use java, i can’t use tranquillity. Right now i just want set up working prototype. But in future I will probably wish implement my own push scheme. But there is no information about EventReceiverFirehose in documentation and examples. Is this approach deprecated? Where can i get description of right usage of it?