How to use regexFiltered and extractionFns together

Hi all,

How can I use regexFiltered and extractionFns together?

  1. I have the multi-value dimension for tileIds and I want to grep all records which tile id should match with one of the tile ids from the multi-value list from the start of the string.
  • For this I am using javascript function. “function”: “function(x) { if(x){ if(x.indexOf(‘03131313113031310022’) != -1) return true;} return false;}” , But I want to check this string from start of String. x.startsWith(‘03131313113031310022’) not working.
  • To solve above problem I am using regex type Filter, But we know javascript is much faster that regex filter. (I am putting more that 500 values in filter)
    Please help me to grep accurate data.
  1. For GroupBy Query I want to grep result with count those match with tile Ids. I am using regexFiltered to get all match tileIds., but in result its giving full String match and not giving aggregated data together.

**This Query returning ERROR : **

{

“queryType”: “groupBy”,

“dataSource”: “source_segment”,

“granularity”: {

“period”: “P1W”,

“type”: “period”

},

“dimensions”: [

{

“type”: “regexFiltered”,

“delegate”: {

“type”: “default”,

“dimension”: “tileIds”,

“outputName”: “tileIds”

},

“pattern”: “^(03131313113031310022).*”

},

{

“type” : “extraction”,

“dimension” : “tileIds”,

“outputName” : “tileIds”,

“extractionFns”: [

{

“type” : “substring”,

“index” : 0,

“length” : 20

}

]

}

],

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “javascript”,

“dimension”: “tileIds”,

“function”: “function(x) { if(x){ if(x.indexOf(‘03131313113031310022’) != -1) return true;} return false;}”

}

]

},

“aggregations”: [

{

“type”: “count”,

“name”: “total_numz_count”,

“fieldName”: “numz_sum”

}

],

“intervals”: [

“2016-10-09T00:00/2016-10-09T08:00”

]

}

**Result : **

{

“error”: “Instantiation of [simple type, class io.druid.query.dimension.ExtractionDimensionSpec] value failed: extractionFn must not be null (through reference chain: java.util.ArrayList[1])”

}

This Query returning data w/o aggregator:

{

“queryType”: “groupBy”,

“dataSource”: “source_audience_segment”,

“granularity”: {

“period”: “P1W”,

“type”: “period”

},

“dimensions”: [

{

“type”: “regexFiltered”,

“delegate”: {

“type”: “default”,

“dimension”: “tileIds”,

“outputName”: “tileIds”

},

“pattern”: “^(03131313113031310022).*”

}

],

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “regex”,

“dimension”: “tileIds”,

“pattern”: “03131313113031310022.*”

}

]

},

“aggregations”: [

{

“type”: “count”,

“name”: “total_numz_count”,

“fieldName”: “numz_sum”

}

],

“intervals”: [

“2016-10-09T04:00/2016-10-09T08:00”

]

}

**Result : **

[{“version”:“v1”,“timestamp”:“2016-10-03T00:00:00.000Z”,“event”:{“tileIds”:“03131313113031310022103”,“total_numz_count”:1}},{“version”:“v1”,“timestamp”:“2016-10-03T00:00:00.000Z”,“event”:{“tileIds”:“03131313113031310022111”,“total_numz_count”:1}},{“version”:“v1”,“timestamp”:“2016-10-03T00:00:00.000Z”,“event”:{“tileIds”:“03131313113031310022203”,“total_numz_count”:1}}]

I am expecting : count = 3 with subString 20 length

[{“version”:“v1”,“timestamp”:“2016-10-03T00:00:00.000Z”,“event”:{“tileIds”:“03131313113031310022”,“total_numz_count”:3}}]

It would be great if someone from the druid team could have a look at this use case to see if druid has the capability support this use case. If yes, how could I optimize the query?

Thanks,

Jitesh

Do I need to post process result to aggregate with 20 digit tileId for the count ?

Please, someone, help me to get it.

Thanks,

Jitesh