Why the compact feature meet a bug in new version 0.15.0

{
“type” : “compact”,
“dataSource” : “vehicleParking”,
“interval” : “2019-07-09T00:00:00.000Z/2019-07-10T00:00:00.000Z”
}

I have post the request of compact. i want to merge the segments during 2019-07-09. But I have meet a problem.

I have already issued a query request and wrote down the sum of the parkingDuration before compacting.The number is 62881583913. The query spec is as follows.

{
“queryType”: “timeseries”,
“dataSource”: “vehicleParking”,
“granularity”:“all”,
“aggregations”: [
{“type”: “longSum”,“name”: “parkingDuration”,“fieldName”: “parkingDuration”}
],
“intervals”: [
“2019-07-09T00:00:00.000Z/2019-07-10T00:00:00.000Z”
]
}

After compacting the segment,the number was instead of 7971084738 which is smaller than the previous. I don’t know if it is exists a bug in the latest version. Anyone who can help me?

Hey Scoffi:

It does sound like you lost data after compaction. Did you run a count on number of rows before and after compaction?

yeah, i indeed did it. I have told you the words in the email.

Ming F ming.fang@imply.io 于2019年7月12日周五 上午5:18写道:

Hi, this shouldn’t happen and it would be a bug if it changed the result. I’m not sure how it could change the result yet though.
Would you please post your ingestion spec as well here?

Dear sir,
Thank for your reply. I am sorry i cannot reply to you 10 hours ago on time cause i am in the church.

I have post my ingestion spec in the attachment.

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “vehicleParking”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “startParkingTime”,

“format”: “auto”

},

“dimensionsSpec”: {

“dimensions”: [“vehicleId”,“geoHash”,“cityCode”]

}

}

},

“metricsSpec”: [

{“type”: “longMax”, “name”: “parkingDuration”, “fieldName”: “parkingDuration”},

{“type”: “longMax”, “name”: “endParkingTime”, “fieldName”: “endParkingTime”}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “day”,

“queryGranularity”: “second”

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 2147483647

},

“ioConfig”: {

“topic”: “VehicleParking”,

“consumerProperties”: {

“bootstrap.servers”: “master:6667,slave1:6667,slave2:6667”

},

“taskCount”: 1,

“replicas”: 1,

“taskDuration”: “PT1H”

}

}

Jihoon Son ghoonson@gmail.com 于2019年7月14日周日 上午11:05写道:

ingest-from-kafka.rtf (1.64 KB)

魏总, 我的微信被禁言了, 只能收到你的信息, 无法回复, 先sorry 啦 :D

你就在这里回复我吧,我翻墙很蛋疼的,有时翻不出来。

Ming F ming.fang@imply.io 于2019年7月15日周一 上午10:18写道:

Sorry for the late reply.

So, you’re using Kafka ingestion. Is it possible that new data has arrived after compaction task succeeded? Then, they could change your results after compaction.

But the result is smaller than before .I did not know how it could be happen.

Jihoon Son ghoonson@gmail.com 于2019年7月17日周三 上午3:41写道:

Dear Mr,

We are using the latest version of druid.when we ingest the data from kafka, we have used the roll up to compress the data. And we used longMax to compute the data in metricSpec .Could you please find the solution for me,Thanks.The index spec is as follows in blue.

{

"type": "kafka",

"dataSchema": {

    "dataSource": "vehicleParking",

    "parser": {

        "type": "string",

        "parseSpec": {

            "format": "json",

            "timestampSpec": {

                "column": "startParkingTime",

                "format": "auto"

            },

            "dimensionsSpec": {

                "dimensions": ["vehicleId","geoHash","cityCode"]

            }

        }

    },

    "metricsSpec": [

        {"type": "longMax", "name": "parkingDuration", "fieldName": "parkingDuration"},

        {"type": "longMax", "name": "endParkingTime", "fieldName": "endParkingTime"}

    ],

    "granularitySpec": {

        "type": "uniform",

        "segmentGranularity": "day",

        "queryGranularity": "second"

    }

},

"tuningConfig": {

    "type": "kafka",

    "maxRowsPerSegment": 2147483647

},

"ioConfig": {

    "topic": "VehicleParking",

    "consumerProperties": {

        "bootstrap.servers": "master:6667,slave1:6667,slave2:6667"

    },

    "taskCount": 1,

    "replicas": 1,

    "taskDuration": "PT1H"

}

}