How to export hyperUnique to csv or parquet then import it again?

Hi everyone,

I am exporting data from druid then process finally import processed data to druid.

I have a problem with hyperunique metric fields.

I can export long, int, double but hyperunique is kind of special.

I see that we can do this with Hadoop-index tasks to import hyperunique from datasourceA to datasourceB and it works very well.

How to store them in csv, parquet and them import it again?

I wonder how current druid store this data type also.

Regards,

Chanh

Continue to debug

HLLCV1 seems like a blackbox I don’t know how to get the string value of this object.

Some functions that I point to is getStorageBuffer and toByteBuffer but these functions return bytes and 1 is protected.

So I guess It must go to HyperUniquesSerde I read the getExtractor but It seems just to getRaw which is a HLLCV1 instance.

I wonder how the HLLCV1 get string in select query below.

“event”: {

“timestamp”: “2016-12-12T04:00:00.000Z”,

“carrier_id”: “1000”,

“time_frame”: “35”,

“age_range_id”: “-1”,

“gender_id”: “3”,

"user_sketches": “AQAAAgAAAAHVMAN3Ag==”,

“viewable”: 2,

“impression”: 0,

“click”: 0

}

Hey Chanh,

That string is the base64 encoding of the bytes you get from toByteBuffer. You should be able to read it into an HLLCV1 object by base64-decoding it and then doing HyperLogLogCollector.makeCollector(bytes).

Thank you very much Gian,

I got it and just wanna explain to anyone who interested:

There is a function call toByteArray() that help us skip the convert from ByteBuffer to Byte Array in HyperLogLogCollector then Base64 can handle the rest by using encodeToString function.

Base64.getEncoder.encodeToString(value.asInstanceOf[HLLCV1].toByteArray)

Regards,

Chanh