the size of raw thetasketch byte array is two big when i turn on context.finalize to false(>>16k)

I want to compute interset of unique user (thetasketch) in timeframe T1 and T2. but druid cannot handle it with bultin function.

so i set context.finalize to false to fetch raw thetasketch bytearray ,and try to do set operation in client side.

I expect the bytearray will be about 16k size. but the result is 174796 (base64 encoding). and got an exception:

Input sketch too large for allocated memory.

java.lang.RuntimeException: Input sketch too large for allocated memory

i import data into druid with default setting:

{
  "name": "unique_user",
  "type": "thetaSketch",
  "fieldName": "suuid"
}

what am i doing wrong ,and how can i fix it?

thanks you all.




sorry , i miss undersanding the means of Nominal Entries

default value of druid is 16k, the library of yahoo datasketch is 4k.

it could be fixed by SetOperationBuilder.setNominalEntries(1024*16)

在 2016年5月17日星期二 UTC+8下午11:00:41,hui li写道:

“size” set by user is the number of nominal entries, byte size of contents would be upper bounded by SetOperation.getMaxUnionBytes(size). so, after decoding, base64 string into byte, size of byte should be less than SetOperation.getMaxUnionBytes(size) .

For 16K sketch, that value is 262176 bytes.

– Himanshu