Multi-value field

Hi,
I’ve a use case where multi-value functionality could be applied.

Searching in documentation and UserGroup I’ve found it’s available till 0.8.0 version.

Can you please confirm it’s available at 0.8.1 version?

Moreover I’ve read sometimes ago how to query multi-value data but I’m no more able to find this page so if someone can give me the link I’d really appreciate.

Thanks

Maurizio

http://druid.io/docs/0.9.0-rc2/querying/optimizations.html

Hi Fangjin,
thanks about the feedback.

Is the “type”: “listFiltered” only available starting from 0.9.0 version?

Let’s suppose I’ve a field prices, so inside there is a list of prices, how could I aggregate this values having a single value as result (sum of values)?

Thanks

Maurizio

Hey Maurizio,

listFiltered is available starting in 0.9.0.

About your list summing question, I assume what you’re asking is that you have raw data with a field like “prices”: [0.10, 0.20] and you want to store that in Druid in a floating-point typed column as 0.30 – rather than storing the raw values. You might (I haven’t tested this) be able to do this with a javaScript aggregator at ingestion time, which receives the array and sums the values.

Hi Gian,
thanks about your feedback.

Trying to better explain my request, if I’ve “price”:[0.10,0.20] I’d like to have a postAggregation that calculate the sum of values (0.30 in our sample). Mainly I’m trying to understand if there could be a javascript function or something else that loop over single values “price” and do something, the sum in my example.

Thanks!

Maurizio

Is the “price”:[0.10,0.20] something that you have in your raw data, or that you want to have in your Druid table? If it’s just in the raw data, what were you hoping to have in your Druid table?

Hi Gian,
I’m not sure I’ve correctly understood your question.

I’m ingesting trough Kafka some json like these:

{“insert_datetime”: “2016-03-15 10:15:09”,“offer_id”: “2790”,“price”:[“0.10”,“0.20”],“total_price”:“0.30”}

{“insert_datetime”: “2016-03-15 10:15:10”,“offer_id”: “2791”,“price”:[“0.30”,“0.20”],“total_price”:“0.50”}

{“insert_datetime”: “2016-03-15 10:15:11”,“offer_id”: “2790”,“price”:[“0.10”,“0.40”],“total_price”:“0.50”}

``

Realtime is correctly processing them, offer_id, price and total_price are some of dimensions setup.

Now the idea is to remove the total_price field and calculate the total_price based on price
So the idea was to loop over the price values with a postAggregation.

I’m not sure this can be done and if there is a way to loop over the multiple values of each row.

Thanks

Maurizio

Maurizio, what is the SQL you are trying? Perhaps https://github.com/implydata/plyql can help you out

Hi Fangjin,
I’m just using json structure calling Broker, I can try to use plyql as u suggested but mainly what I don’t know is if there is a way to iterate over multiple values inside the field.

Multi-value is a completely new feature so I was not able to find examples about it except this page http://druid.io/docs/0.9.0-rc2/querying/optimizations.html

Thanks

Maurizio

We recently added a bunch of improvements to our docs about multi-value dimensions here: https://github.com/druid-io/druid/pull/2701/files

For your use case though, we don’t really support multi-value metrics. At ETL time, can you break up the price field into different events? You could also use a javascript aggregator and store price as a dimension, but the performance will be slow.

Why not just keep total_price around and remove the price instead?

Hi Fangjin,
thanks about your feedback.

I’ll maintain the total_price and have another field (multi-value) listing the categories instead of prices, than after querying I’ll map the category with the price.

Thanks a lot!

Maurizio