[druid-user] Re: No Segments created - Kafka Indexing Service

Hey Guys,

I’m stuck here without a clue on what’s happening.

Please find the images of the coordinator console, cluster view, overlord console, middle manager runtime properties and the resources allocated to it.

The curl command is constantly showing this count for hours together now !
curl -sX POST “http:///druid/v2/?pretty=true” -H ‘content-type: application/json’ -d @/tmp/count-searchkeywords.json
[ {
“version” : “v1”,
“timestamp” : “2018-01-22T00:00:00.000Z”,
“event” : {
“actor.junk-field” : null,

"total-count" : 150000

}
} ]

Coordinator Console: No Datasources visible

Coordinator Old Console - Cluster View:

middle-manager.png

Some snippets below from the task logs, The incremental persists thread seems to create the smoosh files and all the version/meta-data in the tmp storage. However the segments are not pushed. I don’t see any of the S3 Segment pusher invoked.

It is worth re-iterating that I’m trying to ingest some millions of events (dated Jan 2018) over the kafka indexer task !

2018-06-20T13:49:31,742 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[timestamp] inverted with cardinality[7,461] in 140,921 millis.
2018-06-20T13:49:31,789 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Completed index.drd in 10 millis.
2018-06-20T13:49:31,820 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.java.util.common.io.smoosh.FileSmoosher - Created smoosh file [/mnt//task/index_kafka_caliper-sample-sessions-avro-kafka-prod-timestamp-test1_eacb41e33fd1473_flpgjfbf/work/persist/caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T19:00:00.000Z_2018-01-22T20:00:00.000Z_2018-06-20T13:44:32.472Z/0/00000.smoosh] of size [410760] bytes.
2018-06-20T13:49:32,164 DEBUG [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexIO - Mapping v9 index[/mnt/<Masked>/index_kafka_caliper-sample-sessions-avro-kafka-prod-timestamp-test1_eacb41e33fd1473_flpgjfbf/work/persist/caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T19:00:00.000Z_2018-01-22T20:00:00.000Z_2018-06-20T13:44:32.472Z/0]
2018-06-20T13:49:32,198 DEBUG [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexIO - Mapped v9 index[/mnt/<Masked>/index_kafka_caliper-sample-sessions-avro-kafka-prod-timestamp-test1_eacb41e33fd1473_flpgjfbf/work/persist/caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T19:00:00.000Z_2018-01-22T20:00:00.000Z_2018-06-20T13:44:32.472Z/0] in 34 millis
2018-06-20T13:49:32,203 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - Segment[caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T12:00:00.000Z_2018-01-22T13:00:00.000Z_2018-06-20T13:44:26.802Z], persisting Hydrant[FireHydrant{, queryable=caliper-sample-sessions-avro-kafka-prod-timestamp-test1_2018-01-22T12:00:00.000Z_2018-01-22T13:00:00.000Z_2018-06-20T13:44:26.802Z, count=0}]
2018-06-20T13:49:32,241 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Starting persist for interval[2018-01-22T12:00:00.000Z/2018-01-22T13:00:00.000Z], rows[1,106]
2018-06-20T13:49:32,263 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Using SegmentWriteOutMediumFactory[TmpFileSegmentWriteOutMediumFactory]
2018-06-20T13:49:32,290 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Completed version.bin in 18 millis.
2018-06-20T13:49:32,307 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.IndexMergerV9 - Completed factory.json in 16 millis
2018-06-20T13:49:32,387 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[eventType] conversions with cardinality[2] in 80 millis.
2018-06-20T13:49:32,455 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[userbrowser.name] conversions with cardinality[0] in 34 millis.
2018-06-20T13:49:32,682 DEBUG [main-SendThread(ip-10-35-35-239.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0xf64134df7f2d07f after 1ms
2018-06-20T13:49:34,562 INFO [caliper-sample-sessions-avro-kafka-prod-timestamp-test1-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[actor.districtPid] conversions with cardinality[111] in 2,071 millis.

``

Hi guys,

It’d be great if the experts could point me to the right direction.

There’s 1 topic with 2 partitions and there are 13 million events (dated Jan 2018) on this topic. Each event has 35 fields and with deep cardinalities.

I tried altering segment granularity to having 5 minutes, 2 minutes, hour, day etc.,

I also tried altering taskDuration to having 5 minutes, 20, 30 minutes, 1 hour, 4 hours.

I have also tried completion timeout set to larger values than the default (although based on the index task description in the logs it does not look to be honored).

Essentially I also tried having taskCount set to 2 with each task having 2 gb heap and 1g off heap, with 2 replicas.

Finally, Tried with 1 task with no replicas and the task having 10g heap, 3 gb off heap. When this was run, I had set only 2 dimensions with least cardinalities.

Every one of them failed.

Any help here is appreciated and will be timely.

Thanks a million

Varaga

Hi Druid Experts,

   Any ideas/help here?

Best Regards

Varaga

There are no segments created in deep storage for the intervals.

Every one of them failed.

Do you have some sample complete task logs for these two cases?

  • Jon

Varaga: In the log you sent me via direct mail, it looks like the task didn’t make any attempts to publish/handoff segments, what was the taskDuration in that attached task’s supervisor spec?

I would recommend checking your coordinator logs and overlord logs to see if there’s anything interesting relating to the task in question and its segments.

The persists that were occurring seem to be extremely slow, e.g.:


2018-07-06T12:15:41,672 INFO [caliper-sample-atomic-avro-kafka-int-w3-incremental-persist] io.druid.segment.StringDimensionMergerV9 - Completed dim[id] inverted with cardinality[5,048] in 86,457 millis.

I would suggest setting your log level to INFO, the DEBUG level logs don’t really add much additional useful information for ingestion logs and could be hurting your performance there.

You may want to allocate fewer worker tasks per middle manager, allocating fewer resources per worker, or use bigger middle manager machines, it looks like the machine might be too small for your current worker settings.

I also noticed there’s data for the year 2051, this will likely cause issues with segment handoff with Kafka indexing: https://github.com/apache/incubator-druid/issues/4137, so I would make sure your input doesn’t have future data like that.

Finally, Tried with 1 task with no replicas and the task having 10g heap, 3 gb off heap. When this was run, I had set only 2 dimensions with least cardinalities. Every one of them failed.

If you run a task like that with a short taskDuration like 10M, what is the specific failure that you see?

Thanks,

Jon