Question on Memory management in Druid

Hi All,

I have few questions on how the memory and persistence will be handled . Kindly help to clarify the questions

Scenario : I will be ingesting data from a kafka stream to druid, currently the size is less, but over the period of time if millions of records added to the datasource in druid .

Question 1. If the in-memory goes out of allocated range , how can we say where the data has to be stored ? For my case , need to store the actual data to postgres table.

Question 1.1 : If actual data is persisted in that case, will the new incoming data will be sent to db or the old data will be moved to db ?( Is it configurable on which data (old or new) should be written to db)

Question 1.2 :How bad Will the select query performance be impacted if some records are stored in db?

P

  1. all data/segments will be kept in deep storage as backup/long term storage. Druid will manage what segments to pull from deep storage and drop to Historicals for query purposes (kept in segment cache location). Not all segments will be mapped in memory thus Druid will also query on disk. But for segments in memory, queries have typically better performance. Not sure why you will be storing data in postgres.

1.1) you can only store segments in Deep Storage and nowhere less.

1.2) refer to 1 & 1.1 above.

Rommel Garcia