issue: large size thetaSketch performance

hi, guys

Druid and thetaSketch is awesome, but I encountered some performance issue and suspect bug.

I am processing a big datasource which have about 250,000,000 rows and have 2,000,000 “unique userid” every day.

I ingest thetaSketch metric with size 4194304 ( 16384 * 256) to get an accurate result, segment granularity is HOUR, and each segment have around 1~30 shards (limited by targetPartitionSize=500000), each shard is about 100MB.

ingest schema:

{

“type”: “thetaSketch”,

“name”: “uid_sketch_x256”,

“fieldName”: “uid”,

“size”: 4194304

}

query:

{

“queryType”: “timeseries”,

“dataSource”: “myds”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “longSum”,

“name”: “pv”,

“fieldName”: “cnt”

},

{

“type”: “thetaSketch”,

“name”: “uv”,

“fieldName”: “uid_sketch_x256”,

“size”: 4194304

}

],

“postAggregations”: ,

“intervals”: [

“2016-01-06T00:00:00+08:00/2016-01-07T00:00:00+08:00”

]

}

response (in 1~2mins), CORRECT result:

[

{

“timestamp”: “2016-01-05T16:00:00.000Z”,

“result”: {

“uv”: 2252174,

“pv”: 247924867

}

}

]

and because the time is too long, I tried to do some optimization. I tried “chunkPeriod”: “PT1H”

query:

{

“queryType”: “timeseries”,

“dataSource”: “myds”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “longSum”,

“name”: “pv”,

“fieldName”: “cnt”

},

{

“type”: “thetaSketch”,

“name”: “uv”,

“fieldName”: “uid_sketch_x256”,

“size”: 4194304

}

],

“postAggregations”: ,

“intervals”: [

“2016-01-06T00:00:00+08:00/2016-01-07T00:00:00+08:00”

],

“context”: {

“chunkPeriod”: “PT1H”

}

}

response (in 8 seconds), but WRONG result:

[

{

“timestamp”: “2016-01-06T15:00:00.000Z”,

“result”: {

“uv”: 218137,

“pv”: 12712584

}

}

]

And I tried another way to increase performance, I switch thetaSketch size to 65536 in query time

query:

{

“queryType”: “timeseries”,

“dataSource”: “myds”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “longSum”,

“name”: “pv”,

“fieldName”: “cnt”

},

{

“type”: “thetaSketch”,

“name”: “uv”,

“fieldName”: “uid_sketch_x256”,

“size”: 65536

}

],

“postAggregations”: ,

“intervals”: [

“2016-01-06T00:00:00+08:00/2016-01-07T00:00:00+08:00”

]

}

response (in 3 seconds), it seems “reasonable correct”:

[

{

“timestamp”: “2016-01-05T16:00:00.000Z”,

“result”: {

“uv”: 2246892.3929119823,

“pv”: 247924867

}

}

]

My question are:

  1. How could I improve thetaSkech performance if I use big sketch such as 4194304?

as I inspect into broker, it seems it used too long time merging results from historical in a single thread.

a) Why the process in broker is so long? Is merging some tens/hundreds of segments’ result very complex?

b) Can we increase performance by using multi-threads? I mean first divide and then conquer in multiple threads

c) Can we do some merge in historical node and decrease pressure on broker’s single thread?

d) Can we use GPU to improve performance? (I guess most computation would be parallel bits computation, GPU may be good at)

  1. If I use smaller thetaSkech size 65536 (compare to ingest thetaSketch size 4194304) in query time, can I get “reasonable correct” result? I have this question because I saw in doc:
  //note that after you index with a particular size, druid will persist sketch in segments
  //and you "will" use size greater or equal to that at query time.

Does “will” mean “must”? Or just a recommendation?

  1. what is the mean of “chunkPeriod” ? why it produces WRONG result? Is it a bug?

A lot of questions, thanks for any responses.

Hi,

Please see inline.

hi, guys

Druid and thetaSketch is awesome, but I encountered some performance issue and suspect bug.

I am processing a big datasource which have about 250,000,000 rows and have 2,000,000 “unique userid” every day.

I ingest thetaSketch metric with size 4194304 ( 16384 * 256) to get an accurate result, segment granularity is HOUR, and each segment have around 1~30 shards (limited by targetPartitionSize=500000), each shard is about 100MB.

ingest schema:

{

“type”: “thetaSketch”,

“name”: “uid_sketch_x256”,

“fieldName”: “uid”,

“size”: 4194304

}

query:

{

“queryType”: “timeseries”,

“dataSource”: “myds”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “longSum”,

“name”: “pv”,

“fieldName”: “cnt”

},

{

“type”: “thetaSketch”,

“name”: “uv”,

“fieldName”: “uid_sketch_x256”,

“size”: 4194304

}

],

“postAggregations”: ,

“intervals”: [

“2016-01-06T00:00:00+08:00/2016-01-07T00:00:00+08:00”

]

}

response (in 1~2mins), CORRECT result:

[

{

“timestamp”: “2016-01-05T16:00:00.000Z”,

“result”: {

“uv”: 2252174,

“pv”: 247924867

}

}

]

and because the time is too long, I tried to do some optimization. I tried “chunkPeriod”: “PT1H”

query:

{

“queryType”: “timeseries”,

“dataSource”: “myds”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “longSum”,

“name”: “pv”,

“fieldName”: “cnt”

},

{

“type”: “thetaSketch”,

“name”: “uv”,

“fieldName”: “uid_sketch_x256”,

“size”: 4194304

}

],

“postAggregations”: ,

“intervals”: [

“2016-01-06T00:00:00+08:00/2016-01-07T00:00:00+08:00”

],

“context”: {

“chunkPeriod”: “PT1H”

}

}

response (in 8 seconds), but WRONG result:

[

{

“timestamp”: “2016-01-06T15:00:00.000Z”,

“result”: {

“uv”: 218137,

“pv”: 12712584

}

}

]

And I tried another way to increase performance, I switch thetaSketch size to 65536 in query time

query:

{

“queryType”: “timeseries”,

“dataSource”: “myds”,

“granularity”: “all”,

“aggregations”: [

{

“type”: “longSum”,

“name”: “pv”,

“fieldName”: “cnt”

},

{

“type”: “thetaSketch”,

“name”: “uv”,

“fieldName”: “uid_sketch_x256”,

“size”: 65536

}

],

“postAggregations”: ,

“intervals”: [

“2016-01-06T00:00:00+08:00/2016-01-07T00:00:00+08:00”

]

}

response (in 3 seconds), it seems “reasonable correct”:

[

{

“timestamp”: “2016-01-05T16:00:00.000Z”,

“result”: {

“uv”: 2246892.3929119823,

“pv”: 247924867

}

}

]

My question are:

  1. How could I improve thetaSkech performance if I use big sketch such as 4194304?

H: Unfortunately, there isn’t too much you can do except if the bottleneck is GC and in that case increasing heap will help. Monitory the GC activity and see if more memory is needed. It may be possible that you need to increase the size of young generation so that objects are not promoted to old generation prematurely.

as I inspect into broker, it seems it used too long time merging results from historical in a single thread.

a) Why the process in broker is so long? Is merging some tens/hundreds of segments’ result very complex?

H: Yes, sketch merge expense (both CPU and memory) goes up as size of sketch increases.

b) Can we increase performance by using multi-threads? I mean first divide and then conquer in multiple threads

H: chunkPeriod is supposed to do that. But your result is showing that combination of sketches and chunkSize is probably not playing well and there is potentially a bug. I will look into it to see if there is really a bug or something else is going on.

c) Can we do some merge in historical node and decrease pressure on broker’s single thread?

H: as much possible, merging is done at historicals (unless you are using caching at brokers).

d) Can we use GPU to improve performance? (I guess most computation would be parallel bits computation, GPU may be good at)

H: may be, haven’t experimented.

  1. If I use smaller thetaSkech size 65536 (compare to ingest thetaSketch size 4194304) in query time, can I get “reasonable correct” result? I have this question because I saw in doc:
  //note that after you index with a particular size, druid will persist sketch in segments
  //and you "will" use size greater or equal to that at query time.

Does “will” mean “must”? Or just a recommendation?

H: It will “work” but If you use smaller size at query time then the accuracy, you get, would be according to the size used at query time.

  1. what is the mean of “chunkPeriod” ? why it produces WRONG result? Is it a bug?H:

H: this breaks up the query in multiple query (one per interval chunk), processes each in parallel and then merges the results. It appears, combination of sketches and chunkSize is probably not playing well and there is potentially a bug. I will look into it to see if there is really a bug or something else is going on.

very appreciated your quick response.

“as much possible, merging is done at historicals (unless you are using caching at brokers).”
do you mean cache at broker maybe more slower than cache in historical? If it is true, I will try to use cache only on historical.

chunkPeriod looks hopeful, hope it work earlier :slight_smile:

在 2016年3月1日星期二 UTC+8上午2:15:19,Himanshu Gupta写道:

Hi,

yes, if you populate cache at broker then merging is not done at historicals. so, don’t do that if you are using thetaSketch (or anything else where merging is expensive).

– Himanshu

Thx, I will try it.

在 2016年3月2日星期三 UTC+8上午1:06:55,Himanshu Gupta写道:

Hi,

Is it possible for you to check if patch in https://github.com/druid-io/druid/pull/2601 fixes the chunkPeriod problem.

– Himanshu

I will try on version 163e536.

在 2016年3月8日星期二 UTC+8上午5:06:24,Himanshu Gupta写道:

It works, results are correct. Query time:
without chunkPeriod: 57793ms

with chunkPeriod: 16799ms

not ideally, but much faster.

在 2016年3月9日星期三 UTC+8下午2:03:59,李斯宁写道:

thanks for checking, glad to know it worked. with chunk period, you might have to experiment a little to see what works best, it appears that it reduced latency to about 1/3rd so that is good.

– Himanshu

Himanshu, shoudl we enable chunkPeriod by default? Others will experience performance problems but may not ask in the forums.

Also, this may be interesting to people: