Javascript in virtual columns


I know I cannot use javascript in a virtual column. But I am in a situation where I need to do that - I have a complicated column that I need to do some preprocessing on before doing the actual aggregations I want to do in the query. The preprocessing needs to be done on the fly, it’s not something I can do beforehand. How can I do this? Is there any way to “select” a javascript transform? It looks like the only way for me to do that right now is to create a sub query that is a groupBy, and include all columns in the dimensions, including a dimension that does an extraction using this javascript. But that’s obviously not ideal.


On a similar note - is there a way for me to write my own custom functions that can be used in expressions?

What transformation you are trying to achieve? Perhaps someone in the community has already done something similar or has ideas on how to achieve it.

You can add your own extensions to Druid and build a new transformation function. From the Creating extensions · Apache Druid docs, you have the first hint on how to do that:

  1. Add new ingest transform by implementing the org.apache.druid.segment.transform.Transform interface from the druid-processing package.

Hi Sergio, one of my columns is a string that is formatted to allow me to use it like a map (because it seems druid does not support ingesting a map or array, please correct me if that is not the case). So I need to parse the map, filter on the keys of the map, and aggregate the values in the map.

Currently I store the map using two delimiters, one for separating key from value, one for separating the key-value pairs from each other. I can use a javascript aggregation to do certain aggregations (e.g. if I am just summing up the values across all rows), but if I want to do something like variance, I think I would need to create a virtual column that sums up the values per row, and then do variance across that virtual column.

It looks like my understanding might be wrong. I think what I can do here is have two columns with arrays of values, one with the map key, one with the map value, and then use a cartesian_fold to do the filtering on one column and aggregation of the other column that I am trying to do