message getting dropped

Hi,

I am new to Druid.io and trying to setup a cluster of 3 nodes using aws ec2.

I am just getting drop message exception nothing else. where should i look for to resolve this issue as coordinator, overload, historical and middlemanager logs are showing any exception.

but i am getting below mentioned exception.

2016-07-05 23:14:25,445 [KafkaConsumer-1] ERROR c.m.tranquility.kafka.KafkaConsumer - Exception:

java.lang.RuntimeException: com.metamx.tranquility.tranquilizer.MessageDroppedException: Message dropped

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[com.google.guava.guava-16.0.1.jar:na]

at com.metamx.tranquility.kafka.writer.TranquilityEventWriter.maybeThrow(TranquilityEventWriter.java:138) ~[io.druid.tranquility-kafka-0.8.2.jar:0.8.2]

at com.metamx.tranquility.kafka.writer.TranquilityEventWriter.send(TranquilityEventWriter.java:105) ~[io.druid.tranquility-kafka-0.8.2.jar:0.8.2]

at com.metamx.tranquility.kafka.KafkaConsumer$2.run(KafkaConsumer.java:231) ~[io.druid.tranquility-kafka-0.8.2.jar:0.8.2]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]

Caused by: com.metamx.tranquility.tranquilizer.MessageDroppedException: Message dropped

at com.twitter.finagle.NoStacktrace(Unknown Source) ~[na:na]

i followed the configuration suggested in http://druid.io/docs/0.9.1.1/tutorials/cluster.html, http://druid.io/docs/0.9.1.1/configuration/production-cluster.html and http://druid.io/docs/0.9.1.1/ingestion/stream-push.html

my kafka.json

{

“dataSources” : {

“splunk_data” : {

“spec” : {

“dataSchema” : {

“dataSource” : “splunk_data”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“timestampSpec” : {

“column” : “nDateTime”,

“format” : “YYYY-MM-DD HH:mm:SS”

},

“dimensionsSpec” : {

“dimensions” : ,

“dimensionExclusions” :

},

“format” : “json”

}

},

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”

},

“metricsSpec” : [

{

“type” : “count”,

“name” : “count”

},

{

“name” : “nEventCount”,

“type” : “longSum”,

“fieldName” : “nEventCount”

},

{

“fieldName” : “nEventErrorCount”,

“name” : “nEventErrorCount”,

“type” : “longSum”

}

]

},

“ioConfig” : {

“type” : “realtime”

},

“tuningConfig” : {

“type” : “realtime”,

“maxRowsInMemory” : “100000”,

“intermediatePersistPeriod” : “PT60M”,

“windowPeriod” : “PT60M”

}

},

“properties” : {

“task.partitions” : “1”,

“task.replicants” : “1”,

“topicPattern” : “DruidInput_1”

}

}

},

“properties” : {

“zookeeper.connect” : “xx.x.x.xx:2181”,

“druid.discovery.curator.path” : “/druid/discovery”,

“druid.selectors.indexing.serviceName” : “druid/overlord”,

“commit.periodMillis” : “15000”,

“consumer.numThreads” : “2”,

“kafka.zookeeper.connect” : “xx.x.x.xx:2181”,

“kafka.group.id” : “tranquility-splunk_data_test_1”

, “reportDropsAsExceptions”:“true”

}

}

and sending data

{“nDateTime”:“2016-07-05 22:48:00”,“nEventCount”:“1”,“nEventErrorCount”:“0”}

Thanks

Tarun

Hey Tarun,

I think your issue is probably related to a bad timestamp format. While ‘YYYY-MM-DD HH:mm:SS’ is a valid JodaTime format, it’s probably not parsing the string as you expect it to and thus the message gets dropped for not being a ‘recent’ event. Take a look at: http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html and in particular note that the format string is case sensitive. The main issues I see are:

  • ‘DD’ means day of year, for day of month use ‘dd’
  • ‘SS’ means fraction of a second, for seconds use ‘ss’

You’re also not supplying any timezone information in your date formatter so make sure that your event source, Tranquility, and Druid are all configured to use the same timezone. We recommend running everything in UTC, and it’s probably best to explicitly include time zone information in the formatter so there’s no ambiguity.

If you’re just getting started with Druid ingestion from Kafka, you might be interested in the new Kafka indexing service that was just released. It provides exactly-once ingestion guarantees and can ingest non-recent data which Tranquility currently can’t do. The caveat is that since it’s a new feature, it’s not as battle-tested as Tranquility Kafka so there’s probably a few bugs still lurking in there. If you’re interested, you can read more about it:

http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html
http://imply.io/post/2016/07/05/exactly-once-streaming-ingestion.html