Overlord does not write segments to S3

Hi -

I enabled http log emitter in all druid nodes and have a rest end-point that collects these metrics and pumps them into Druid using Tranquility API.

I can see that tasks are successfully submitted to Overrlord. And middle-manager and peons are working on them and tasks are being completed with SUCCESS status.

However, I notice few strange things -

1 - I use S3 as deep storage. And as mentioned in production-cluster document, I set the following in Overlord node -

# Upload all task logs to deep storage
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=druid
druid.indexer.logs.s3Prefix=prod/logs/v1

I can see the task being created inside druid bucket, but log content is empty - 0 bytes - both before and after the task is completed.

2 - The Segment datasource and the segments are never created in S3 after the task is successful.

Note # I see no errors in overlord node, middle-manager node and peon logs. Of course, I had to juggle with peon memory settings to get it right.

What am I missing? Any advice?

Thanks!

I also see some exception in my app -

com.metamx.tranquility.tranquilizer.MessageDroppedException: Message dropped

at com.twitter.finagle.NoStacktrace(Unknown Source)

Apr 27, 2016 4:52:43 PM com.twitter.finagle.loadbalancer.LoadBalancerFactory$StackModule$$anonfun$5 apply

INFO: druidTask!druid:overlord!index_realtime_DruidMetrics_2016-04-27T20:50:00.000Z_0_0: name resolution is negative (local dtab: Dtab())

com.metamx.tranquility.tranquilizer.MessageDroppedException: Message dropped

at com.twitter.finagle.NoStacktrace(Unknown Source)

com.metamx.tranquility.tranquilizer.MessageDroppedException: Message dropped

at com.twitter.finagle.NoStacktrace(Unknown Source)

But index_realtime_DruidMetrics_2016-04-27T20:50:00.000Z_0_0 is getting submitted to the overrlord and getting completed with success status

And the payload for the above task is as follows -

{"task":"index_realtime_DruidMetrics_2016-04-27T20:50:00.000Z_0_0","payload":{"id":"index_realtime_DruidMetrics_2016-04-27T20:50:00.000Z_0_0","resource":{"availabilityGroup":"DruidMetrics-50-0000","requiredCapacity":1},"spec":{"dataSchema":{"dataSource":"DruidMetrics","parser":{"type":"map","parseSpec":{"format":"json","timestampSpec":{"column":"timestamp","format":"auto","missingValue":null},"dimensionsSpec":{"dimensions":["host","metric","service"],"spatialDimensions":[]}}},"metricsSpec":[{"type":"count","name":"count"},{"type":"doubleSum","name":"sum","fieldName":"value"},{"type":"doubleMin","name":"min","fieldName":"value"},{"type":"doubleMax","name":"max","fieldName":"value"}],"granularitySpec":{"type":"uniform","segmentGranularity":"MINUTE","queryGranularity":{"type":"none"},"intervals":null}},"ioConfig":{"type":"realtime","firehose":{"type":"clipped","delegate":{"type":"timed","delegate":{"type":"receiver","serviceName":"firehose:druid:overlord:DruidMetrics-50-0000-0000","bufferSize":100000},"shutoffTime":"2016-04-27T21:16:00.000Z"},"interval":"2016-04-27T20:50:00.000Z/2016-04-27T20:51:00.000Z"},"firehoseV2":null},"tuningConfig":{"type":"realtime","maxRowsInMemory":75000,"intermediatePersistPeriod":"PT10M","windowPeriod":"PT20M","basePersistDirectory":"/tmp/1461779543852-0","versioningPolicy":{"type":"intervalStart"},"rejectionPolicy":{"type":"none"},"maxPendingPersists":0,"shardSpec":{"type":"linear","partitionNum":0},"indexSpec":{"bitmap":{"type":"concise"},"dimensionCompression":null,"metricCompression":null},"buildV9Directly":false,"persistThreadPriority":0,"mergeThreadPriority":0,"reportParseExceptions":false}},"context":null,"groupId":"index_realtime_DruidMetrics","dataSource":"DruidMetrics"}}

Wondering where is the actual content that should be part of the segment??

Anyone?..pretty much struck here for couple of hours now!

Anyone out there who can help me debug tranquility?

druidTask!druid:overlord!index_realtime_DruidMetrics_2016-04-27T20:50:00.000Z_0_0: name resolution is negative (local dtab: Dtab())

Not sure what that means in the client app where tranquility sender is running

I see these …

com.metamx.tranquility.tranquilizer.MessageDroppedException: Message dropped

at com.twitter.finagle.NoStacktrace(Unknown Source)

and looks like the stack trace is swallowed :slight_smile:

Hey Jagadeesh,

The good news is that if you’re seeing tasks being created on the overlord, that means that things are wired up correctly and the services are discovering one another. MessageDroppedException usually indicates that Tranquility dumped the message because the event was too old (Tranquility is intended to work with current data only). I noticed in your ingestion spec that you had a segmentGranularity of ‘MINUTE’ but a windowPeriod of 20 minutes which is a little odd; can you try setting your segmentGranularity to ‘HOUR’ and make sure that the events are recent data (have timestamps that fall within the hour)?

It also sounds like you might be having problems writing to S3 as a separate issue, so you should try to query the indexing task through a broker first to see if the data is getting into Druid, and if it is but the segments aren’t being written, we can troubleshoot that separately.

Thanks David!

I figured the issue seems to be happening due to invalid JSON spec for the datasource. It looks like distributed exception handling across nodes is probably very poorly implemented here. End of the day, after fixing my JSON, I am able to insert data into Druid.

I was originally using - https://github.com/druid-io/tranquility/blob/master/core/src/test/resources/example.json

But it looks like I was supposed to use - https://github.com/implydata/distribution/blob/master/src/conf-quickstart/tranquility/server.json

Thanks,

Jagadeesh

That’s great to hear!