I am exporting data from druid then process finally import processed data to druid.
I have a problem with hyperunique metric fields.
I can export long, int, double but hyperunique is kind of special.
I see that we can do this with Hadoop-index tasks to import hyperunique from datasourceA to datasourceB and it works very well.
How to store them in csv, parquet and them import it again?
I wonder how current druid store this data type also.
Continue to debug
HLLCV1 seems like a blackbox I don’t know how to get the string value of this object.
Some functions that I point to is getStorageBuffer and toByteBuffer but these functions return bytes and 1 is protected.
So I guess It must go to HyperUniquesSerde I read the getExtractor but It seems just to getRaw which is a HLLCV1 instance.
I wonder how the HLLCV1 get string in select query below.
That string is the base64 encoding of the bytes you get from toByteBuffer. You should be able to read it into an HLLCV1 object by base64-decoding it and then doing HyperLogLogCollector.makeCollector(bytes).
Thank you very much Gian,
I got it and just wanna explain to anyone who interested:
There is a function call toByteArray() that help us skip the convert from ByteBuffer to Byte Array in HyperLogLogCollector then Base64 can handle the rest by using encodeToString function.