ingestSegment - changing dimension name and values


I need to add a new data source with a new column name and default values. I’d like to use ingestSegment firehose for this. Basically what I need to do:

“dimensions”: [


“type”: “extraction”,

“dimension”: “type”,

“value”: “xyz_percentage”,

“extractionFn”: {

“type”: “javascript”,

“function”: “function(str) { if (str === ‘x’) { return 30; } else { return 0; } }”





I can use this code to query from a data source. I’d like to build a new data source using this. Do you have any suggestions how to use it?

I was thinking that maybe I can add this code to ioConfig->firehose->dimesions list, but it only takes strings.



Hey Indrek, I don’t think the segment ingest firehose stuff supports dimension extractions. That would be useful, though, so IMO a filed issue or a pull request would be welcome. Other than that, your best bet for things that the segment ingest firehose doesn’t support is to go back to your raw data and re-index it.

Thanks. Do you know if there are any alternatives? E.g exporting given data source and then ingesting it again as a new data source. I don’t have the raw data of this anymore.


There isn’t an officially supported export feature. But, you could try getting the raw rows out of druid with the select query ( or the new DatasourceInputFormat (; it’s in 0.8.1-rc3, although not directly documented. It’s an implementation detail of the datasource inputSpec).

Fwiw I think it’s generally best to have a copy of your raw data in some other system. For two reasons: because Druid can partially aggregate when ingesting, so it’s not necessarily storing your raw data as-is; and because it can be useful to re-index raw data from time to time.