questions regarding broker config

  1. Does broker use these in any way:
  • druid.server.maxSize
  • druid.segmentCache.locations
    If not, then are these properties only used by Historicals or by any other node too?
  1. druid.processing.buffer.sizeBytes is it per thread or across threads?

Eg, if i set the value to 1gb, will each processing thread use 1 gb or will it be 1gb shared across all threads?

  1. Does the local cache use off heap or heap?

  2. The default value of druid.broker.cache.unCacheable is [“groupBy”, “select”].

Does that imply that groupBy and select queries are not recommended for caching?

If yes, then doesnt that make the cache pretty useless since those are the most frequently performed queries?

If no, then why are those 2 made default?

  1. druid.broker.cache.populateCache

what is the purpose of this? If i set it to false, I guess the cache never gets populated thus never has anything in it.

Isnt that the same as disabling the cache?

  1. what factors determine the size of Xmx and MaxDirectMemorySize for the jvm that is running broker?

Is MaxDirectMemorySize = processing.buffer.sizeBytes * processing.numThreads ?

  1. /druid/broker/v1/loadstatus

this endpoint does not work on 0.6.160.

What is the euqivalent url for this version of druid?

Inline.

  1. Does broker use these in any way:
  • druid.server.maxSize
  • druid.segmentCache.locations

Broker doesn’t download segments locally and does not use this.

If not, then are these properties only used by Historicals or by any other node too?

Historicals. Realtimes technically have this property but you can set the values to 0.

  1. druid.processing.buffer.sizeBytes is it per thread or across threads?

Per threads.

Eg, if i set the value to 1gb, will each processing thread use 1 gb or will it be 1gb shared across all threads?

  1. Does the local cache use off heap or heap?

Heap.

  1. The default value of druid.broker.cache.unCacheable is [“groupBy”, “select”].

Does that imply that groupBy and select queries are not recommended for caching?

Yes, the result sets can be so large you’ll fill up your entire cache with them. This will impact other queries.

If yes, then doesnt that make the cache pretty useless since those are the most frequently performed queries?

Our production cluster uses timeseries and topNs almost entirely for our UI. We don’t use groupBys at all. GroupBys are a flexible but slow query and we found many workflows can be accomplished by topNs instead.

If no, then why are those 2 made default?

  1. druid.broker.cache.populateCache

what is the purpose of this? If i set it to false, I guess the cache never gets populated thus never has anything in it.

Isnt that the same as disabling the cache?

You can still use a cache but not populate it for new queries. Whatever items are in the cache will remain there. We generally only disable this for testing though.

  1. what factors determine the size of Xmx and MaxDirectMemorySize for the jvm that is running broker?

Is MaxDirectMemorySize = processing.buffer.sizeBytes * processing.numThreads ?

Yes for maxDirectMemory, although I think the requirement is sizeBytes * (# threads +1), but I have to double check.

We set heaps of about 25G in production on our brokers. We’ll try to write some docs about heuristics in picking heap size. We have about 10 brokers for a 350TB (cumulative druid segment size) cluster.

  1. /druid/broker/v1/loadstatus

this endpoint does not work on 0.6.160.

What is the euqivalent url for this version of druid?

It does not exist in that version, it was added after.