TransformSpec filtering before transforming?

Hi,

My goal is to reindex existing data in Druid and update a particular column, based upon the previous values in the column. For that, I’m using an ingestSegmentFirehose with the appropriate filters, and a transformSpec to update the values in the column if the filters match.

I’m using a spec akin to the following:

{

“type”: “index”,

“spec”: {

“dataSchema”: {

“dataSource”: “sampleDatasource”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “timeAndDims”,

“timestampSpec”: {

“column”: “timestamp”,

“format”: “auto”

},

“dimensionsSpec”: {}

}

},

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “NONE”,

“intervals”: [“2016-01-01T13:00:00.000Z/2019-05-21T13:56:52.889Z”]

},

“transformSpec”: {

“transforms”: [

{

“type”: “expression”,

“name”: “animal”,

“expression”: “concat(‘platypus’)”

}

]

}

},

“ioConfig”: {

“type”: “index”,

“firehose”: {

“type”: “ingestSegment”,

“dataSource”: “sampleDatasource”,

“interval”: “2016-01-01T13:00:00.000Z/2019-06-21T13:56:52.889Z”

},

“filter”: {

“field”: {

“type”: “selector”,

“dimension”: “animal”,

“value”: “duck”,

“extractionFn”: null

}

}

}

}

}

I want to replace values of duck in the column animal to platypus. Other rows shouldnt be affected.

Observation: All the values in column animal are changed to *platypus. *

I have read that filters are applied after transformation, but that’s not what I need as I am totally replacing duck with **platypus. **

I have also tried using regex to replace duck to platypus, but I see no change to the data even though the logs show n events processed and I find myself struggling with the regex because there are so few examples out there (escaping? unescaping?).

This was the expression I was using in the second case

“expression”: “replace(animal,’\bduck\b’,‘platypus’)”

Tried this also

“expression”: “replace(animal,’\bduck\b’,‘platypus’)”

Observation: No change to the data even though the logs show n events processed.

Please ease my misery.

For the expression, you can use if(animal == 'duck','platypus', animal).

For replace(animal,'duck','platypus'), the pattern for replace is just a literal, not a regex, the underlying implementation follows the standard Java String.replace(CharSequence target, CharSequence replacement).

You’ll also want to remove the filter from the ingestSegment firehose: by filtering on “animal = duck”, you will only reingest “duck” rows. The new output segment will overwrite the old segment and you’ll lose everything but the “duck” rows (or if you have appendToExisting set to true, it will append the new segment to the old rows and you’ll have duplicates for the former “duck” rows).