Query performance - long time for a timeseries

Hey all,

We are currently evaluating druid as our main analytics engine.

We’ve set up a druid server where all components live in the same machine.
We have little to none expertise in tuning and configuring druid so I’m seeking help from you guys!

Druid server at the moment has ~100m records and we are querying to test performance.

The timeseries query we tried took 18 seconds (!) to complete, which - for a service like druid - seems a really long time.

So I’m posting all the details i could think are needed and hope you guys can help out with the tuning/configuring and general advice.

Google Drive Folder link:

https://drive.google.com/drive/folders/1CB4LRSAXKw7eHaxMVfC0VtqADOXFTjfp?usp=sharing

Thank you very much,

Michael

P.S.: I’m sharing via Google drive, for some reason groups wont let me attach the files…

Hey Michael,

The most common reasons for sub-par performance in fresh Druid setups are using too few processing threads or having too many small segments.

For processing threads, you know you’re doing a good job if your CPUs are running near 100% when a query is ongoing. If not, it means you don’t have enough threads running. Most of the work is generally done by historicals, so in your case, you probably want to set druid.processing.numThreads = 7 or 8 on the historical runtime.properties.

Small segments are generally not an issue with Hadoop ingestion (it sizes segments automatically using M/R jobs). But it can be an issue with Kafka ingestion if you have a lot of late data or are doing a backfill through Kafka. In general the way to approach this is to use compaction, which combines smaller segments together into bigger ones. You can do it manually with the ‘compact’ task and, in the future, we are working to automate it to happen in the background.

Hope this helps!

Gian

Hello Gian,

Thank you so much for taking the time to answer!

The segments were a lot for each day, as you stated, so for a week’s worth of time , i compacted the half days and kept the other un-compact.

Also, we increased the Historical node threads to 8 and set JVM Heap & Direct memory to 3g each.

Now, reading the druid documentation, i see that Historical will use the heap memory for it’s group by query and “segment memory” for others(?).

If the segments are less in count (they are now 3 vs 20+), I assume that the space needed in segment memory would be more than before? So druid might be forced to page again?

I’m asking this because when i query the days that are not compact i get slightly better performance than the compact ones. Maybe druid handles them better because they are smaller?

Of course all these assumptions are combined with the fact that druid runs in a 16gb server for PoC purposes.

I’m also uploading 2 images showing the segments state before and after the compact task.

The general performance is around x2 - x3 faster after you recommendation! Which is great!

Now, as another question in the same context,

Druid documentation states “Broker nodes uses the JVM heap mainly to merge results from historicals and real-times.”

If druid is running in a monolith (all services in the same machine) and each service is a single instance, does the Broker

actually “merge” anything?

I will also post the heap memory of each service:

Broker: processing threads: 2 buffSize: 256000000b

druid 10646 10626 3 12:05 ? 00:01:31 java -server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1792m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath lib/*:conf/druid/_common:conf/druid/broker io.druid.cli.Main server broker

``

Historical: processing threads: 8 buffSize: 256000000b

druid 10656 10626 9 12:05 ? 00:04:31 java -server -Xms3g -Xmx3g -XX:MaxDirectMemorySize=3g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.io.tmpdir=var/tmp -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -classpath lib/*:conf/druid/_common:conf/druid/historical io.druid.cli.Main server historical

``

Thank you very much,

Michael

compact.png

non-compact.png

Kindly bumping, any more info would be much appreciated!

Thank you,

Michael