ingestSegment - Reducing cardinality of dimension with re-index task

I need to reduce cardinality of one of the dimensions in data source, replacing some of its values with default value.

Is it possible? It seems that I can’t use extraction type for ingestSegmentFirehose dimensions list. But maybe there are other possibilities that I can’t see?

Hi ,
You could use filters : please refer this link https://druid.apache.org/docs/latest/querying/filters.html

In your case you can use extractionfn in selector filters for achieving your goal.

Hope this helps!!

Thanks

Hi ,

Please also have a look at https://druid.apache.org/docs/latest/querying/dimensionspecs.html incase if you want to transfrom the data .

Thanks

I need to re-index data - reingest. Your links are about querying data, not ingesting

Filters available for ingesting also but it will remove rows of data and not dimension values
What I need to achieve is if I have

such data
field1,10
field2,1
field13,3

``

After reindexing and replacing dimention value field13 with field1 I need to receive this in final segment

field1,13
field2,1

``

Dimension spec that you gave is for querying data only and can’t be used during ingestion of data

I tried to use it in ioConfig -> ingestionsSegemntFirehose-> diemnsions but it is not working there

You can use transformSpecs at ingest time, that might be a good approach: https://druid.apache.org/docs/latest/ingestion/index.html#transformspec

Yeah, I tried it, but it seems to support only simple operations. What if I need to replace >1000 dimension values at a time? Can I write some funcitons there like in extractionFn?

Any of these functions would work: https://druid.apache.org/docs/latest/misc/math-expr.html

I mean that I need to write some complex custom function to replace all needed values and not one by one. I it possible with transform spec? Can’t find such possibility in doc
Something like that
function(str){
if(str.match(…)) {
return ‘value1’;
}
if(str.match(…)) {
return ‘value2’;
}

if(str.match(…)) {
return ‘valueN’;
}
}

``

I was thinking you could do it with a cascade of “if” functions, or maybe a “case_searched” function.

Shubham,

What do you mean by simple concat not working? can you send your ingestion spec with transform? I have used below transform in my ingestion and spec and it works

“transformSpec”: {
“filter”: null,
“transforms”: [
{
“type”: “expression”,
“name”: “f1”,
“expression”: “concat( case_simple(timestamp_extract(__time,‘HOUR’),0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,2),id)”
},
{
“type”: “expression”,
“name”: “f2”,
“expression”: “concat(case_simple(timestamp_extract(__time,‘HOUR’),12,1,13,1,14,1,15,1,16,1,17,1,18,1,19,1,20,1,21,1,22,1,23,1,3),id)”
},
{
“type”: “expression”,
“name”: “dow”,
“expression”: “timestamp_extract(__time,‘DOW’)”
}
]
}
}

vijay

Hi Vijay,

Sorry but I was using double quotes and then afterwards I realised it needs single quotes for a string, and that is when I deleted my post.