Hi,
My goal is to reindex existing data in Druid and update a particular column, based upon the previous values in the column. For that, I’m using an ingestSegmentFirehose with the appropriate filters, and a transformSpec to update the values in the column if the filters match.
I’m using a spec akin to the following:
{
“type”: “index”,
“spec”: {
“dataSchema”: {
“dataSource”: “sampleDatasource”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “timeAndDims”,
“timestampSpec”: {
“column”: “timestamp”,
“format”: “auto”
},
“dimensionsSpec”: {}
}
},
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “NONE”,
“intervals”: [“2016-01-01T13:00:00.000Z/2019-05-21T13:56:52.889Z”]
},
“transformSpec”: {
“transforms”: [
{
“type”: “expression”,
“name”: “animal”,
“expression”: “concat(‘platypus’)”
}
]
}
},
“ioConfig”: {
“type”: “index”,
“firehose”: {
“type”: “ingestSegment”,
“dataSource”: “sampleDatasource”,
“interval”: “2016-01-01T13:00:00.000Z/2019-06-21T13:56:52.889Z”
},
“filter”: {
“field”: {
“type”: “selector”,
“dimension”: “animal”,
“value”: “duck”,
“extractionFn”: null
}
}
}
}
}
I want to replace values of duck in the column animal to platypus. Other rows shouldnt be affected.
Observation: All the values in column animal are changed to *platypus. *
I have read that filters are applied after transformation, but that’s not what I need as I am totally replacing duck with **platypus. **
I have also tried using regex to replace duck to platypus, but I see no change to the data even though the logs show n events processed and I find myself struggling with the regex because there are so few examples out there (escaping? unescaping?).
This was the expression I was using in the second case
“expression”: “replace(animal,’\bduck\b’,‘platypus’)”
Tried this also
“expression”: “replace(animal,’\bduck\b’,‘platypus’)”
Observation: No change to the data even though the logs show n events processed.
Please ease my misery.