Kafka ingestion task lookup loading out of memory

I am currently trying to steam data from Kafka to small Apache Druid 0.21.1 cluster.

As the index_kafka task starts up, it seems to load many lookup into memory something like below:

2022-01-10T17:38:23,677 INFO [NamespaceExtractionCacheManager-0] org.apache.druid.server.lookup.namespace.JdbcCacheGenerator - Finished loading 1110832 values for namespace

We happen to have many lookup tables for other projects. The index_kafka task end up crashing due to out of memory trying to load all these lookup tables.

The Kafka ingestion I am testing doesn’t require lookup in dimension or metrics, is there a way to disable lookup loading in index_kafka task?


At this point, there isn’t. (There may be in the future.) Your current choices are to increase the peon heap sizes, or reduce your lookup sizes.

I wonder if it would help to use Indexers instead of Middle Managers? That should reduce the copies of lookup cache maps that need to be kept in memory.

1 Like

I also forgot another option - if none of your ingestion tasks need lookups, you might be able to use lookup tiers, and not include MM processes in the tier. Lookups · Apache Druid


@Hellmar_Becker thanks for the suggestion, can you please elaborate on “use Indexers instead of Middle Managers”? Is there documentation on how this works?

@Ben_Krug thank you for the 3 suggestion. I’ll give a try not include MM in the lookup tier. thanks.

Yw! Regarding the indexer process, there’s documentation here. This would be a significant change, so be sure to test thoroughly if you decide to use it.

Thanks for the documentation.

As a follow up, I’ve managed to get the index_kafka task to not load the lookup by not including middle manager to the __default lookup tier.

Thanks again for the suggestion.

Excellent, glad to hear it!