Can druid aggregate on string dimensions?

Dear all,
Can druid aggregate on string dimensions?
For example, my datasource has four columns, timestamp column, url column, title column and pv column.
Url and title columns are dimension columns.
The query I am execute is like : SELECT url, FIRST(title) FROM datasource GROUP BY url.
The same url may has multiple titles. I want only one title for each url.
Does this make sense?


Yufeng Wang

I guess you’re asking if Druid has support for an aggregator like “FIRST(title)”. It does for numbers, but not for strings. It’s possible to write one. The main restriction is that currently, aggregators must allocate a fixed buffer for their work upfront, and so you’d need to allocate a buffer as big as the largest string you’d plan to accept. If the aggregator sees a string longer than the buffer, it’d have to truncate or drop it. But subject to that restriction, it is possible to write such an aggregator as an extension or contribute it to core Druid.