how to add custom transform logic on field when doing realtime ingestion from kafka

Hi community,

I’m trying to ingest data from Kafka in realtime. The problem is the timestamp column of our data is in the format ''yyyy-MM-dd HH:mm:ss" under local timezone (UTC-8). Since realtime node is recommended to start with “-Duser.timezone=UTC”, which is 8 hours earlier than UTC-8, all the ingested data are considered too new and are thrown away.

My configuration for timestampSpec is like this:

“timestampSpec” : {

“column” : “_datetime”,

“format” : “yyyy-MM-dd HH:mm:ss”

}

I know the problem can be easily fixed if I can change the time format to be a posix timestamp or to be in “yyyy-MM-dd HH:mm:ss ZZ”, but unfortunately it’s not under my control. Another way would be add an extra transformation using a stream processor, but I think it’s overkill for this case.

I’ve also considered adding an extra argument to timestampSpec to convey timezone information of source data like below

    "timestampSpec" : {

“column” : “_datetime”,

“format” : “yyyy-MM-dd HH:mm:ss”,

“timeZone”: “Asia/Shanghai”

}

But it seems I have to change code in all three repositories (druid, druid-api, java-util) in order to achieve this, which makes me hesitate.

So, what’s your recommended way to add this kind of simple transformation on source data?

Found out I can add the “timeZone” parameter by only changing TimestampSpec.java from druid-api :slight_smile:
PR is here https://github.com/druid-io/druid/pull/2762

在 2016年3月30日星期三 UTC+8下午2:26:12,Dayue Gao写道: