How can I change bring in a new ingestion with an existing datasource name

I currently have a druid ingestion using a non-native ingestion into a datasource named “event_source”. I am planning on creating a new ingestion of the same stream using Kinesis.

How can I do this while:

  1. Having the new ingestion keep the same datasource name, i.e “event_source”
  2. Be able to access the data pre and post ingestion change in the same query. i.e I can do count(*) from event_source and it would include all the data pre and post ingestion change

Can you tell us a bit about your current ingestion? Are you running a supervisor? If so, you might be able to accomplish 1 & 2 by updating the existing supervisor.

Thanks Mark. The current ingestion uses Tranquility. This uses a supervisor, but that is handled on the tranquility instances and does not appear in the list of supervisors on the druid console.

Hi @jamesvk,

If understand correctly you want to replace your current Tranquility ingestion with a Druid native Kinesis ingestion.

I am no expert in this, so take this with a grain of salt, but based on what I’ve read here, I believe the steps you would need to take are:

  • stop publishing to tranquility server & start publishing events to Kinesis stream
    – make sure Kinesis can hold the data for the duration of the switchover period
    – You may want to do this just a few minutes before the end of a segmentGranularity cycle, so that you do not need to wait long for its tasks to complete.
  • I’m not sure whether there’s anything to do on the tranquility side to “flush” it. At a minimum I think you will need to wait for the next segmentGranularity time period to end and the real-time tasks associated to the tranquility ingest to finish on their own. Should take a few minutes after the period ends, while the segments are published and hand-off occurs. After which, the tasks should terminate successfully on their own.
  • Once all the tranquility created tasks have completed, start new kinesis ingestion job
    – it will catch up with the data that has accumulated into Kinesis

Your users will experience a period of time where no new data is arrives until the kinesis job catches back up.

In theory, if your new ingestion job is to the same datasource, querying it should span both the before and after the switch. The new job will start to create segments for the same datasource, but it will not remove the ones that are already there.

1 Like

Great, thanks a lot Sergio, this is very helpful.

Total mad possible other solution (!) would be to create a second table and use a union datasource. But that may be a bit weird… and obvs would add some latency to queries.

/me wonders if this is a silly idea…