Ingest Approximate Histogram objects into Druid

Hi All,

I need to ingest a real-time data stream into Druid which also contains some histogram objects. I looked into the source code at found below class which implements ComplexMetricSerde (required for complex column types)

https://github.com/druid-io/druid/blob/master/extensions-core/histogram/src/main/java/io/druid/query/aggregation/histogram/ApproximateHistogramFoldingSerde.java

Now I am wondering whether it is possible to use the above type to ingest my histogram object into Druid as a complex column.

Currently we use Tranquility to ingest real-time data into Druid.

Please help me out.

Regards,

_ashwani

Follow-up: Can I ingest an ApproximateHistogram via Tranquility by creating the ApproximateHistogram object at run-time and pass the same using ApproximateHistogramAggregator ?

ping

ping

Hello Druid Users - It would be great you anyone can help me on this topic.

Regards,

_ashwani

Hi Ashwani,

I think the code is not currently in place for you to be able to load an actual approximate histogram object at ingest time. But what you can do is load an array of values. So you could pass in something like “histogram_value” : [10, 10, 15, 20] and then the histogram aggregator would incorporate all four values (two 10s, one 15, and one 20).

Thanks for response Gian.

If I understand correctly, I can use approxHistogram aggregator for the histogram field within the Tranquillity’s metricsSpec configuration and pass array of values for the histogram field to generate the approximate histogram.

Also, do we have any plains to support ingestion of histograms in future ?

Regards,

_ashwani

It would require additional development under the histogram plugin (io.druid.query.aggregation.histogram) to support a precomputed histogram object. An example of the same pattern can be found under the hyperunique aggregator. It accepts a pre computed HLL object. See: io.druid.query.aggregation.hyperloglog.PreComputedHyperUniquesSerde and it’s usages for example implementation approach.

Kyle

Hi Ashwani,

What Kyle said is accurate – it is definitely doable but would require some work on the histogram extension. If you are willing to get your hands dirty then a contribution would be welcome :slight_smile:

Hi Gian,

I would be more than happy to contribute, but I would need some guidance from your side.

Can you point me to some related documentation which would be helpful?

Regards,

_ashwani

Hi Ashwani,

Check out how the “isInputHyperUnique” flag is handled in the HyperUniquesAggregatorFactory for an example of what you can do. The important part is in getTypeName - it changes the type to “preComputedHyperUnique”, which switches the serde from HyperUniquesSerde to PreComputedHyperUniquesSerde. Check out AggregatorsModule for where those are registered.