I am trying to use lookups in a clustered deployment of Druid and was wondering which of the Druid processes (Broker, Coordinator, Historical, Middle-Manager) would require the files for the lookup data. When I try on a Druid database running locally, it seems like the Broker and Historical processes are the ones that are looking for the data because they are listed as the pending nodes for loading the lookups. I also saw messages in the logs for these two processes about lookups. Does this mean that only the Broker and Historical nodes would require the lookup data? If so, could you explain why this is the case?
All of those processes have a reason to access lookup data at some point. Are you having issues getting the lookups to work in a clustered environment?
Thanks for your reply! Would this mean that we will need to populate the lookup data files locally for all of our containers? Our Druid is running in a container environment where the Historical, Broker, Coordinator and Middle Managers are running in different containers. We’d like to do file based lookups while minimising any unnecessary copies of data so we were wondering which do not have the lookup data pushed to them by Druid’s builtin orchestration and would need it to be populated locally.
The lookups currently are stored on-heap on the historical, broker and during ingestion, on the middle manager. The file that you use to populate them would need to be available to those processes.