Lookup Query

Hello,

I am trying to use lookup feature that is released recently. I am not sure how to setup or enable that, it will be good if somebody can point out to some tutorial.

I have tried setting it up as per current documentation but not able to do that. I have added a jdbc namespace in Broker and Historical runtime.properties. Then tried to query on that with given json

{

*** “queryType”: “groupBy”,***

*** “intervals”: {***

*** “intervals”: [“2015-11-02T07:00:01.054/2015-11-02T13:00:01.054”],***

*** “type”: “intervals”***

*** },***

*** “granularity”: “fifteen_minute”,***

*** “metric”: “cost”,***

*** “dataSource”: {***

*** “name”: “stagingRealtimeImpressionFeed”,***

*** “type”: “table”***

*** },***

*** “aggregations”: [{***

*** “fieldName”: “cost”,***

*** “name”: “cost”,***

*** “type”: “doubleSum”***

*** }],***

*** “postAggregations”: ,***

*** “dimensions”: [{***

*** “type”: “lookup”,***

*** “lookup”: {***

*** “type”: “namespace”,***

*** “namespace”: “postcode_id_lookup”***

*** },***

*** “replaceMissingValueWith”: “Unknown”,***

*** “injective”: true***

*** },***

*** {***

*** “type”: “default”,***

*** “dimension”: “advId”,***

*** “outputName”: “advId”***

*** }],***

*** “threshold”: 1000,***

*** “limitSpec”: {***

*** “type”: “default”,***

*** “limit”: 500***

*** }***

}

Thanks,

Navneet

Hi Navneet!

Can you clarify some stuff on what the query is returning, what is in the historical and broker logs, and what your expectations are?

Thanks,

Charles Allen

Hey Navneet,

I noticed a syntax error in your query related to lookup functions that could be part of the issue. The lookup function is an extraction function and should be written as the “extractionFn” of an extraction type dimension. In other words, the block:

*** “dimensions”: [{***

*** “type”: “lookup”,***

*** “lookup”: {***

*** “type”: “namespace”,***

*** “namespace”: “postcode_id_lookup”***

*** },***

*** “replaceMissingValueWith”: “Unknown”,***

*** “injective”: true***

*** },***

should look more like this:

“dimensions”: [{
“type”: “extraction”,
“dimension”: “”,
“outputName”: “”,
“extractionFn”: {
“type”: “lookup”,
“lookup”: {
“type”: “namespace”,
“namespace”: “postcode_id_lookup”
},
“replaceMissingValueWith”: “Unknown”,
“injective”: true
}
},

Hello David,

I tried with mentioned changes now but now I am getting this error:

Query:

{

  • “queryType”: “groupBy”,*

  • “intervals”: {*

  • “intervals”: [“2015-11-02T07:00:01.054/2015-11-02T13:00:01.054”],*

  • “type”: “intervals”*

  • },*

  • “granularity”: “fifteen_minute”,*

  • “metric”:“cost”,*

  • “dataSource”: {*

  • “name”: “stagingRealtimeImpressionFeed”,*

  • “type”: “table”*

  • },*

  • “aggregations”: [{*

  • “fieldName”: “cost”,*

  • “name”: “cost”,*

  • “type”: “doubleSum”*

  • }],*

  • “postAggregations”: , *

  • “dimensions”: [ { “type” : “extraction”, “dimension” : “postalCode”, “outputName” : “postalId”, “extractionFn” : {“type”:“lookup”, “lookup”:{“type”:“namespace”,“namespace”:“postcode_id_lookup”},*

  • “replaceMissingValueWith”:“Unknown”,*

  • “injective”:false}], *

  • “threshold”: 1000,*

  • “limitSpec”: { “type”: “default”, “limit”: 500 }*

}

Response:

{

  • “error”: “Instantiation of [simple type, class io.druid.query.extraction.NamespacedExtractor] value failed: Namespace [postcode_id_lookup] not found (through reference chain: java.util.ArrayList[0])”*

}

is your historical (or realtime node) knows about your lookup namespace configuration ? can you share how you are configuring those namespace ?

Please make sure to sanitize any usernames, passwords, or database URLs.

Hello,
I have updated the namespace lookup and restarted the historical node. This is the namespace configuration

druid.query.extraction.namespaceList=[{ “type”:“jdbc”, “namespace”:“postcode_id_lookup”, “connectorConfig”:{“createTables”:true,“connectURI”:“jdbc:mysql://test”,“user”:“test”,“password”:“test”}, “table”: “string_ids”, “keyColumn”: “REFID”, “valueColumn”: “STRING”,“pollPeriod”:P7D}]

But now I am getting different error in historical node :

io.druid.server.QueryResource - Exception occurred on request [GroupByQuery{limitSpec=NoopLimitSpec, dimFilter=null, granularity=DurationGranularity{length=900000, origin=0}, dimensions=[DefaultDimensionSpec{dimension=‘null’, outputName=‘null’}, DefaultDimensionSpec{dimension=‘advId’, outputName=‘advId’}], aggregatorSpecs=[CountAggregatorFactory{name=‘rows’}], postAggregatorSpecs=[], limitFn=identity}]

java.lang.NullPointerException

With this query:

{

“queryType”: “groupBy”,

“intervals”: {

“intervals”: [“2015-11-02T07:00:01.054/2015-11-02T13:00:01.054”],

“type”: “intervals”

},

“granularity”: “fifteen_minute”,

“dataSource”: {

“name”: “stagingRealtimeImpressionFeed”,

“type”: “table”

},

“aggregations”: [{

“name”: “rows”,

“type”: “count”

}],

“postAggregations”: ,

“dimensions”: [{

“type”: “lookup”,

“lookup”: {

“type”: “namespace”,

“namespace”: “postcode_id_lookup”

},

“replaceMissingValueWith”: “Unknown”,

“injective”: true

},

{

“type”: “default”,

“dimension”: “advId”,

“outputName”: “advId”

}],

“threshold”: 1000,

“limitSpec”: {

“type”: “default”,

“limit”: 500

}

}

Can you give more of the logging available in the historical logs?

Also, could you start the historical node with this file in the classpath, but named “log4j2.xml” ?

Hello Charles,

I have tried with log4j2.xml and changed ‘ROOT’ logger to debug mode, but I am still the same error. Is anything else I can try?

Hi i guess your dimension spec is not correct
it should be


“dimension”: {
“type” : “extraction”,
“dimension” : “dimensionsName",
“outputName” : “TheOutputDimensionName",
“extractionFn” : { “type”:“lookup”,
“lookup”:{“type”:“namespace”,“namespace”:“NameSpaceID”}
},
“replaceMissingValueWith”:“MISSING”,
“retainMissingValue”:false
}


So i guess in your case

“dimensions”: [ { “type” : “extraction”, “dimension” : “postalCode”, “outputName” : “postalId”, “extractionFn” : {“type”:“lookup”, “lookup”:{“type”:“namespace”,“namespace”:“postcode_id_lookup”},

“replaceMissingValueWith”:“Unknown”,

“injective”:false}}

]

please try this and tell me if it does not work.

Hi i guess your dimension spec is not correct
it should be

"dimension": {
  "type" : "extraction",
  "dimension" : “dimensionsName",
  "outputName" : “TheOutputDimensionName",
  "extractionFn" : { "type":"lookup",
    "lookup":{"type":"namespace","namespace":"NameSpaceID"}
    },
    "replaceMissingValueWith":"MISSING",
    "retainMissingValue":false
  }

So i guess in your case

“dimensions”: [ { “type” : “extraction”, “dimension” : “postalCode”, “outputName” : “postalId”, “extractionFn” : {“type”:“lookup”, “lookup”:{“type”:“namespace”,“namespace”:“postcode_id_lookup”},
“replaceMissingValueWith”:“Unknown”,
“injective”:false}}
]

please try this and tell me if it does not work.

I’m still curious what shows up in the logs, the NullPointerException should have a stack trace with it.

Hello Slim,

Great!! Thanks a lot!!

After doing that change it worked. But now when I am trying to run ‘Select’ query, I am not able to retrieve the lookup column. In the results, all the columns are there but lookup column is missing.

This is the query I am trying:

{

  • “queryType”: “select”,*

  • “intervals”: {*

  •    "intervals": [*
    
  •        "2015-11-19T07:00:01.054/2015-11-19T13:00:01.054"*
    
  •    ],*
    
  •    "type": "intervals"*
    
  • },*

  • “granularity”: “fifteen_minute”,*

  • “metric”: “cost”,*

  • “dataSource”: {*

  •    "name": "stagingRealtimeImpressionFeed",*
    
  •    "type": "table"*
    
  • },*

  • “dimension”: {*

  •    "type": "extraction",*
    
  •    "dimension": "postalCode",*
    
  •    "outputName": "postalId",*
    
  •    "extractionFn": {*
    
  •        "type": "lookup",*
    
  •        "lookup": {*
    
  •            "type": "namespace",*
    
  •            "namespace": "postcode_id_lookup"*
    
  •        }*
    
  •    },*
    
  •    "replaceMissingValueWith": "MISSING",*
    
  •    "retainMissingValue": false*
    
  • },*

  • “threshold”: 1000,*

  • “pagingSpec”:{“pagingIdentifiers”: {}, “threshold”:5}*

}

Any logs ?

Slim BOUGUERRA

Hello,

Query is returning response like this with any error in logs:

“pagingIdentifiers”: {

“testFeed_2015-11-19T00:00:00.000Z_2015-11-20T00:00:00.000Z_2015-11-19T00:00:00.000Z”: 4

},

“events”: [

{

“segmentId”: “testFeed_2015-11-19T00:00:00.000Z_2015-11-20T00:00:00.000Z_2015-11-19T00:00:00.000Z”,

“offset”: 0,

“event”: {

“timestamp”: “2015-11-19T07:01:00.000Z”,

“country”: “US”,

“lo”: null,

“mod”: null,

“mdCode”: “840”,

“postalCode”: “15862”,

“mcc”: “0”,

“cbsaCode”: “47900”,

“ecp”: “0”,

“deviceAAID”: null,

“height”: “250”,

“naicsCode”: “334220”,

“orgName”: “-1”,

“apid”: null,

“ip”: “4.35.234.196”,

“advId”: “${ADV_ID}”,

“cityid”: “2281”,

“creativeSize”: “300x250”,

“sitedomain”: null,

“region”: “47”,

“aid”: “0”,

“reservePrice”: “0”,

“csaCode”: “548”,

“ozoneId”: “${OZONE_ID}”,

“tagId”: “1”,

“cpId”: “${CP_ID}”,

“homebizType”: “3”,

“advFreq”: “${ADV_FREQ}”,

“ispName”: “-1”,

“d0”: null,

“d1”: null,

“creativeId”: “31492361”,

“d2”: null,

“d3”: null,

“sid”: “0”,

“d4”: null,

“ioId”: “${IO_ID}”,

“uid”: null,

“publisherId”: null,

“car”: null,

“mak”: null,

“supplyType”: “0”,

“mnc”: “0”,

“userAgent”: “300”,

“url”: null,

“cpgId”: “${CPG_ID}”,

“la”: null,

“sicCode”: “366302”,

“width”: “300”,

“siteId”: “856268”,

“connSpeed”: “-1”,

“pricePaid”: “0”,

“cost”: 0,

“count”: 1

}

But there is no PostalId column which should come from lookup.

Thanks,

Navneet

I think it is a query syntax….Select query takes dimensions with S not dimension

Hey Slim,

Thanks a lot. Is it possible for you to provide me correct syntax or sample query? I have also tried with Dimensions but I am getting following error:

{
    "error": "Can not deserialize instance of java.util.ArrayList out of START_OBJECT token\n at [Source: HttpInputOverHTTP@1ea7fa0e; line: 14, column: 6]"
}

With this query:
{
    "queryType": "select",
    "intervals": {
        "intervals": [
            "2015-11-19T07:00:01.054/2015-11-19T13:00:01.054"
        ],
        "type": "intervals"
    },
    "granularity": "fifteen_minute",
    "metric": "cost",
    "dataSource": {
        "name": "stagingRealtimeImpressionFeed",
        "type": "table"
    },
    "dimensions": {
        "type": "extraction",
        "dimension": "postalCode",
        "outputName": "postalId",
        "extractionFn": {
            "type": "lookup",
            "lookup": {
                "type": "namespace",
                "namespace": "postcode_id_lookup"
            }
        },
        "replaceMissingValueWith": "MISSING",
        "retainMissingValue": false
    },
    "threshold": 1000,
    "pagingSpec":{"pagingIdentifiers": {}, "threshold":5}
}

Thanks,

Navneet

Hi since it is dimensions with S it usually means an array of objects like “dimensions”:[{object_1},{object_2}]

{
  "queryType": "select",
  "intervals": {
    "intervals": [
      "2015-11-19T07:00:01.054/2015-11-19T13:00:01.054"
    ],
    "type": "intervals"
  },
  "granularity": "fifteen_minute",
  "metric": "cost",
  "dataSource": {
    "name": "stagingRealtimeImpressionFeed",
    "type": "table"
  },
  "dimensions": [
    {
      "type": "extraction",
      "dimension": "postalCode",
      "outputName": "postalId",
      "extractionFn": {
        "type": "lookup",
        "lookup": {
          "type": "namespace",
          "namespace": "postcode_id_lookup"
        }
      },
      "replaceMissingValueWith": "MISSING",
      "retainMissingValue": false
    }
  ],
  "threshold": 1000,
  "pagingSpec": {
    "pagingIdentifiers": {},
    "threshold": 5
  }
}

Hi Navneet,

Unfortunately, it doesn’t seem like you can currently use extraction dimensions with select-type queries. Select queries expect an array of strings of dimensions to include in the results.

Hi Navneet,

Pivot and PlyQL just added lookup support, you could express your lookup as $postalCode.lookup(“postcode_id_lookup”) in the Pivot config or LOOKUP(postalCode, “postcode_id_lookup”) in the PlyQL SQL. Running ether tool with the --verbose flag will show you the underlying Druid query that they are making.