Druid ingestion time roll up vs. query time roll discrepancy for Theta Sketch

Hi druid team,

I observed something odd.

I use K=4096 in my Spark processing to pre-compute the theta sketch and stored as a Base64 encoded string for byte arrays, then I use K=67108864 when ingesting into Druid with roll-up set to true. I get many estimation that’s way below the expected values (expecting 150mil unique users but getting 13k unique users).

However when I just ingest the pre-computed Spark data into Druid without using roll up at ingestion time, and then I just query the unique users using APPROX_COUNT_DISTINCT_DS_THETA, I was able to get the correct result. I’m wondering why the discrepancy.

Also, is Base64 String the only way Druid can load and interpret pre-computed Sketch object?

I’m using Druid 0.15.1 and DataSketch core 0.13.3