I have a druid cluster, and I use spark streaming + tranquility to push streamed events to druid. When ingesting, I also do queries on the data. 90% of the queries returned with 1s, but occassionally there’s some query that took > 100s. I checked the query metric in broker, which show that the query/node/ttfb is about 100s, and the query is performed on worker-node:8100. In the worker-node’s task metric showed that the query/wait/time was really long.
2018-07-17T09:10:47,985 INFO [timeseries_apm_metrics_[2018-07-17T09:00:00.000Z/2018-07-17T09:04:50.000Z]] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2018-07-17T09:10:47.985Z”,“service”:“druid/middleManager”,“host”:“emr-worker-1.cluster-64941:8100”,“version”:“0.11.0”,“metric”:“query/wait/time”,“value”:107970,“dataSource”:“apm_metrics”,“duration”:“PT290S”,“hasFilters”:“true”,“id”:“994f07d2-0fd0-4a5c-8583-2b05fa1a107d”,“interval”:[“2018-07-17T09:00:00.000Z/2018-07-17T09:04:50.000Z”],“numComplexMetrics”:“0”,“numMetrics”:“4”,“segment”:“apm_metrics_2018-07-17T09:00:00.000Z_2018-07-17T09:05:00.000Z_2018-07-17T09:02:39.915Z_1”,“type”:“timeseries”}]
My stream events is about 10k/s, and the queries rate is about 30/s. I also tried increase the partition of the stream, but it seems no help.
Can anybody give me some hint?