high query response time

Hi,
We are looking to improve query response timings on our druid cluster. Right now some queries take thousands of milliseconds to respond. We expect them to be <1s just as what druid describes normal value of ‘query/time’ to be. We are not sure if its the type of queries we have or the cluster configuration thats needs to be addressed to improve response timings. Any guidance from folks here is appreciated.

Please find attached files for containing the query, peon task log and broker log and config info.

I will be happy to provide more information.

Thanks,

Nikhil.

query.txt (2.85 KB)

broker.txt (18.4 KB)

indexer-peon.txt (13.6 KB)

important-configs.txt (1.63 KB)

Hi Nikhil,
Have you read this http://druid.io/docs/latest/operations/metrics.html

I would recommend that you enable some kind of metrics logging in order to break down the query time into smaller pieces like that you can figure out where is the bottleneck.

Please activate and log metrics then will be more than happy to work with you on this.

Thanks for responding Slim.
Actually the attachments have logs with metrics enabled.

So this particular query i ran has to go to only to indexer peon task to fetch current hour data. Query result for previous hours is already in cache because i ran the same query few minutes before.

For e.g. Indexer peon task shows this metric:

“metric”:“query/partial/time”,“value”:17667

Thanks,

Nikhil.

Hi,
could someone please check logs and comment on our query and cluster config?
Thanks.
Nikhil.

Hi Nikhil,
Reading the logs i see that most of your query time is consumed by one index node on one specific incremental segment (open segment). It took 17667 Ms to scan it.
So one way yo fix this is to have more partitions like that work can be done in parallel and broker will merge results.

Broker:
query/time 17697
INDEX NODE:
query/partial/time 17667 -> time to scan an incremental segment

Thank you for taking time to look into it.
We tried with 2 indexer peon tasks such that each task is reading from 2 partitions each (we have 4 partitions on kafka). That resulted in 2 different segments for each hour (segmentgranularity is HOUR). Segments were very small (~50MB) and auto segment consolidation using ‘druid.coordinator.merge.on=true’ stopped working. So i tried adding them together in one big segment by using custom Append task . It didnt work out. Later I read in this forum that segment consolidation works only when there is 1 peon task creating them. So i had to switch back to single peon task.

Let me know if my understanding about merging tasks is wrong. I can give it another try after your suggestions.

How many rows you have per segment ?
Usually you don’t want to have more than 5M rows per segment. If it is the case it might be the bottleneck.

I have configured following in task.json so it ends up creating ~30-40 directories for an hour in indexer peon:

“tuningConfig” : {

  "type" : "realtime",

  "maxRowsInMemory" : 50000,

  "intermediatePersistPeriod" : "PT2M",

  "windowPeriod" : "PT5M",

Segment info:

{“metadata”:{“dataSource”:“svctrace”,“interval”:“2016-02-23T18:00:00.000Z/2016-02-24T12:00:00.000Z”,“version”:“2016-02-24T12:13:07.300Z”,“loadSpec”:{“type”:“local”,“path”:"/druid03/export/svctrace/2016-02-23T18:00:00.000Z_2016-02-24T12:00:00.000Z/2016-02-24T12:13:07.300Z/0/index.zip"},“dimensions”:“cp,dp,p,s,so,sp”,“metrics”:“count,duration”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:482952089,“identifier”:“svctrace_2016-02-23T18:00:00.000Z_2016-02-24T12:00:00.000Z_2016-02-24T12:13:07.300Z”},“servers”:[“historical1:8080”,“historical2:8080”]}

How do i find number of rows in a segment?

http://druid.io/docs/latest/operations/performance-faq.html

or

http://imply.io/ if you need dedicated help