Doubling box capacity doesnt lower latency

One of the biggest factors attracting me to druid was the fact that giving enough hardware, one can drive down the latency to whatever is needed.

However, I am having trouble getting that.

Scenario:

My data source is 40gb, replicated across 2 historicals (40gb per historical * 2 = 80gb total).

This 40gb is 4 months of data, however we only ever query 3 months at a time.

My aim is to get the 3 month query (across approx 30gb of data) to about 200-200ms latency.

Here is what I tried:

2 historicals on google cloud.

First try:

n1-highmem-16 instances for historicals.

Each instance has 16 cores, 104GB ram.

Thus each query got 32 cores to execute.

Here we saw response times of upto 800ms-1s.

Second try:

n1-standard-32 instances

Each instance has 32 cores, 120GB ram.

Thus each query got 64 cores to execute.

However, I now see latency of 1s.

So not only did the latency not****improve by doubling the cores, it actually became higher!

Here is a graph showing performance:

http://screencast.com/t/WFxZp99neM

the historical-32 nodes are the second try ones with 32 cores.

historical-1 and historical-2 are first try with 16 cores.

Configuration

In all cases, the config was:

#1 gb

druid.processing.buffer.sizeBytes=107374182

druid.processing.numThreads =

#100gb

druid.server.maxSize=100000000000

Is there anything I could be doing wrong here?

Why doesnt increasing core count lower latency?

What can I do to get to my desired latency level?

Hi Prashant, when trying to understand query latency, it is also good to understand where the bottlenecks lie.

One of the most important metrics to look at is how long queries are taking at the broker level, versus the time taken at the historical level, versus the time taken to scan segments.

I’d also be curious to learn more information about how you are running the benchmarks. Are you running the same queries over again without cache?

One of the biggest factors attracting me to druid was the fact that giving enough hardware, one can drive down the latency to whatever is needed.

However, I am having trouble getting that.

Scenario:

My data source is 40gb, replicated across 2 historicals (40gb per historical * 2 = 80gb total).

This 40gb is 4 months of data, however we only ever query 3 months at a time.

How many segments are in this 40gb of data? I recall looking at your cluster before you had very small segments, and should be creating segments of approx. 250 mb.

My aim is to get the 3 month query (across approx 30gb of data) to about 200-200ms latency.

Here is what I tried:

2 historicals on google cloud.

First try:

n1-highmem-16 instances for historicals.

Each instance has 16 cores, 104GB ram.

Thus each query got 32 cores to execute.

Here we saw response times of upto 800ms-1s.

Second try:

n1-standard-32 instances

Each instance has 32 cores, 120GB ram.

Thus each query got 64 cores to execute.

However, I now see latency of 1s.

So not only did the latency not****improve by doubling the cores, it actually became higher!

Here is a graph showing performance:

http://screencast.com/t/WFxZp99neM

the historical-32 nodes are the second try ones with 32 cores.

historical-1 and historical-2 are first try with 16 cores.

Configuration

In all cases, the config was:

#1 gb

druid.processing.buffer.sizeBytes=107374182

druid.processing.numThreads =

#100gb

druid.server.maxSize=100000000000

Is there anything I could be doing wrong here?

Why doesnt increasing core count lower latency?

What can I do to get to my desired latency level?

I believe with your setup, the query times are not bounded by your hardware but instead by your configuration and general setup. Sharing that information with us would help with debugging things further.

Fangjin,
As you can see in the screenshots i posted, the response times they show are of historical node only.

The reason for that is that its clear by looking at the reponse time from broker, which is about 1s and from historical, which is also about 1s, the slowdown is in historical.

If it wasnt, then brokers response time would have been much higher for broker.

The queries are indeed running without cache.

Its not the exact same query, since in our dashboard we do things like ‘last 90 days’, ‘last 30 days’, etc.

Since ‘last days’ changes over time, we cant really cache this.

It seems there are about 5,800 segments loaded. Since there is replication of 2, i assume there are 5800/2 = 2,900 segments in 40 gb of data.

The segment size varies, but the last couple hundred segments range from a max of 45 MB to 30 MB each.

The point is that considering that all data is in ram in both scenarios, the only limiting factor can be the cpu.

Since the cpu was increased for 32 cores to 64 cores total, we should have seen atleast some latency drop.

However, that is not the case, which is what is highly surprising.

Hi Prashant, what are the queries are you issuing against the nodes? How many times did you issue the queries?

In any case, I think for you to drop query latency substantially, you will need larger segments. We typically size our segments between 250mb to 500mb. Druid has built in merging logic to try and create segments of a certain size leveraging the indexing service, and I think that will be beneficial for your setup.