stream push or stream pull

Hi,

I have developed a firehose in python which will keep streaming me json data (using http get requests to an external site).

This will stream about 500 messages/data per second. I would like to ingest this data to druid and report if using caravel.

  1. I am not sure if i should use Stream push or stream pull. Please suggest how to decide between the two.

  2. If i use stream push , can i use tranquility. I heard the tranquility might not scale for production system. If so, what are the other options apart from tranquility.

  3. If i use stream pull, are the examples which i can use. I am not able to find detailed example.

thanks

VR

See Inline.

Hi,

I have developed a firehose in python which will keep streaming me json data (using http get requests to an external site).

This will stream about 500 messages/data per second. I would like to ingest this data to druid and report if using caravel.

  1. I am not sure if i should use Stream push or stream pull. Please suggest how to decide between the two.

Generally It is advisable to use a messaging service like kafka for storing your messages. From kafka, you can either use tranquility kafka or kafka-indexing-service to index data into druid.

  1. If i use stream push , can i use tranquility. I heard the tranquility might not scale for production system. If so, what are the other options apart from tranquility.

thats not true, tranquility is a helper library which can be used to manage druid index tasks and send data to druid. I am not aware of any scalability issues.

  1. If i use stream pull, are the examples which i can use. I am not able to find detailed example.

If you decide to write your own firehose to pull data into druid, you can look at FileIteratingFirehose.java in druid souce code as a reference and start from there.

Additional docs on choosing that may be helpful: https://imply.io/docs/latest/ingestion-streams