I got a issue to upgrade to Druid version 0.6.172

I got a issue to upgrade to Druid version 0.6.172.

I am still using 0.6.73 so I already have a issue about Timezone granularity in 0.6.73 so I need to upgrade to 0.6.172 or 0.7.1.1.

I am testing to upgrade to that but, in 0.6.172, I got a issue that I can’t find the dimension as it is used at hyperUnique aggregator in a realtime spec when I query.

We are still using hyperloglog extension with 0.6.73 but 0.6.172 or 0.6.173 or latest version doesn’t support hyperloglog extension and hyperUnique is added in main code so we changed hyperloglog to hyperunique.

A hyperunique aggregator in 0.6.73 is working well but the Druid since 0.6.172 isn’t working.

Below is the test case on Druid 0.6.172 about this issue.

[

{

"schema" : { 

  "dataSource":"data",

  "aggregators":[ 

    { "type":"count", "name":"events" },

    { "type": "hyperUnique", "name": "testHyperUnique", "fieldName":"test" },

{ “type”: “hyperUnique”, “name”: “testHyperUnique”, “fieldName”:“test” }

  ],

  "indexGranularity":"minute",

  "shardSpec" : { 

    "type": "linear","partitionNum":"1" 

  } 

},

"config" : {

  "maxRowsInMemory" : 500000,

  "intermediatePersistPeriod" : "PT10m"

},

"firehose" : {

  "type" : "kafka-0.7.2",
  "consumerProps" : { 
    "zk.connect" : "<zookeeper>:8020",
    "zk.connectiontimeout.ms" : "15000",
    "zk.sessiontimeout.ms" : "15000",
    "zk.synctime.ms" : "5000",
    "groupid" : "<gorupid>",
    "fetch.size" : "1048586",
    "autooffset.reset" : "largest",

    "autocommit.enable" : "false" 

  },

  "feed" : "topic",

  "parser" : {

    "timestampSpec" : {

      "column" : "ts", "format" : "millis" 

    },

    "data" : {

      "format" : "json" 

    },

    "dimensionExclusions" : [""] 

  }

},

"plumber" : {

  "type" : "realtime",

  "windowPeriod" : "PT30M",

  "segmentGranularity":"hour",

  "basePersistDirectory" : "/tmp/druid/basePersist",

  "rejectionPolicy": {

    "type": "serverTime"

  } 

}

}

]

{

“queryType”: “groupBy”,

“dataSource”: “data”,

“granularity”: “all”,

“dimensions”: [“test”],

“aggregations”: [

{ “type”: “count”, “name”: “event”, “fieldName”:“event” }

],

“intervals”: [“2015-04-28T05:17:00Z/2015-4-28T09:00:00Z”]

}

[{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“event”:144294}}]


I think we need to define which is the root cause clearly in the issue.

I am not sure it is a bug in Druid so we are testing it on multi clusters (0.6.73, 0.6.172, 0.6.173, 0.7.1.1)

If you have any idea or solution, let me know please.

Thank.

Murry,

It looks like you are defining the hll aggregator twice on the ingestion task. Also, the query you are running doesn’t seem to include an hll aggregator.

What is your expected output for the query?

–Eric

I have a mis-copy in realtime spec.
Expected output is like this.

[

{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“test”:“test1”,“event”:2}},

{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“test”:“test2”,“event”:2}},

{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“test”:“test3”,“event”:2}},

{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“test”:“test4”,“event”:2}},

{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“test”:“test5”,“event”:2}},

{“version”:“v1”,“timestamp”:“2015-04-28T05:17:00.000Z”,“event”:{“test”:“test6”,“event”:2}}

]

But, when I use a key in logs to make a HyperUnique aggregator, I can’t use it in dimension.

2015년 4월 28일 화요일 오후 11시 7분 5초 UTC+9, Eric Tschetter 님의 말:

Ah, I think that's because of a change that automatically removes
columns used to compute metrics from the dimension sets.

Fwiw, if you want to keep the test column, then you are going to have
to either not aggregate it or have it in the dimension whitelist. It
looks like a backwards incompatible change made it in without us
realizing it.

--Eric

Before your reply, I guess a backwards incompatible with old style spec format so I am working to make a new style spec format with dimension whitelist for the same issue.

After it is finished, I will talk about the result again.

Thanks.

2015년 4월 29일 수요일 오후 2시 39분 41초 UTC+9, Eric Tschetter 님의 말:

After the Ingestion spec changed old style to new style, the realtime is working well!

2015년 4월 29일 수요일 오후 2시 46분 22초 UTC+9, Munchang Jeong 님의 말:

@xvrl / @fjy : We should add this behavior to release-notes as it might confuse more users migrating from 0.6

– Himanshu

If I remember correctly the fact that metics were not excluded was considered to be a bug we fixed in 0.6.171, as indicated in the release notes
https://github.com/druid-io/druid/releases/tag/druid-0.6.171

If there is a difference in behavior between the new and old schema with the same whitelist, then I would assume this is a bug in 0.6.x

OK, if this behavior existed in druid-0.6.171 then we dont need to do anything. I though that the change went in 0.7 and wanted to ensure that people migrating from 0.6 would notice this in the release notes.

– Himanshu