ingestSegment - changing dimension name and values

Hi,

I need to add a new data source with a new column name and default values. I’d like to use ingestSegment firehose for this. Basically what I need to do:

“dimensions”: [

{

“type”: “extraction”,

“dimension”: “type”,

“value”: “xyz_percentage”,

“extractionFn”: {

“type”: “javascript”,

“function”: “function(str) { if (str === ‘x’) { return 30; } else { return 0; } }”

}

}

],

``

I can use this code to query from a data source. I’d like to build a new data source using this. Do you have any suggestions how to use it?

I was thinking that maybe I can add this code to ioConfig->firehose->dimesions list, but it only takes strings.

Thanks,

Indrek

Hey Indrek, I don’t think the segment ingest firehose stuff supports dimension extractions. That would be useful, though, so IMO a filed issue or a pull request would be welcome. Other than that, your best bet for things that the segment ingest firehose doesn’t support is to go back to your raw data and re-index it.

Thanks. Do you know if there are any alternatives? E.g exporting given data source and then ingesting it again as a new data source. I don’t have the raw data of this anymore.

Regards,
Indrek

There isn’t an officially supported export feature. But, you could try getting the raw rows out of druid with the select query (http://druid.io/docs/latest/development/select-query.html) or the new DatasourceInputFormat (https://github.com/gianm/druid/blob/master/indexing-hadoop/src/main/java/io/druid/indexer/hadoop/DatasourceInputFormat.java; it’s in 0.8.1-rc3, although not directly documented. It’s an implementation detail of the datasource inputSpec).

Fwiw I think it’s generally best to have a copy of your raw data in some other system. For two reasons: because Druid can partially aggregate when ingesting, so it’s not necessarily storing your raw data as-is; and because it can be useful to re-index raw data from time to time.