[Kafka Indexing Service]Not able to store data in deep storage

Hey ,
I am using the kafka indexing service to ingest our data. It is able to connect to consumer properly and is able to receive message. But due to some issue data is not being handoff to deep storage. And also task is not updating the status even after task duration event. It keeps on running.

Here is my supervisor spec :

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “prismnew25”,

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “day”,

“queryGranularity”: “none”,

“intervals”: [“2016-04-01/2016-12-30”]

},

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“dimensionsSpec”: {

“dimensions”: [“lang”, “share_clicks”, “ts_bucket”, “old_hash_id”, “ab_test”, “event_name”, “title”, “noti_opened”, “fullstory_time_total”, “ts_back_valid”, “custom_title”, “targeted_city”, “at”, “short_view_event”, “published_dt”, “short_time”, “notification_type”, “variants”, “device_id”, “category”, “toss_opened”, “noti_shown”, “event_source”, “score”, “author”, “bookmark”, “is_video”, “source”, “like_count”, “share_view”, “vid_length”, “content”, “fullstory_view”, “ts_valid”, “targeted_country”, “video_event”, “shortened_url”, “toss_clicked”, “hashId”, “group_id”, “img_url”, “is_deleted”]

},

“timestampSpec”: {

“format”: “millis”,

“column”: “at”

}

}

},

“metricsSpec”: [{

“name”: “count”,

“type”: “count”

}, {

“name”: “fullstory_total_time”,

“type”: “doubleSum”,

“fieldName”: “fullstory_time_total”

}, {

“name”: “total_like_count”,

“type”: “longSum”,

“fieldName”: “like_count”

}, {

“name”: “total_share_views”,

“type”: “longMax”,

“fieldName”: “share_views”

}, {

“name”: “total_short_time”,

“type”: “doubleSum”,

“fieldName”: “short_time”

}, {

“name”: “distinct_user”,

“type”: “hyperUnique”,

“fieldName”: “device_id”

}, {

“name”: “distinct_hash_Id”,

“type”: “hyperUnique”,

“fieldName”: “hashId”

}, {

“name”: “total_bookmark”,

“type”: “longSum”,

“fieldName”: “bookmark”

}, {

“name”: “total_fullstory_view”,

“type”: “longSum”,

“fieldName”: “fullstory_view”

}, {

“name”: “total_noti_opened”,

“type”: “longSum”,

“fieldName”: “noti_opened”

}, {

“name”: “total_noti_shown”,

“type”: “longSum”,

“fieldName”: “noti_shown”

}, {

“name”: “total_toss_clicked”,

“type”: “longSum”,

“fieldName”: “toss_clicked”

}, {

“name”: “total_toss_opened”,

“type”: “longSum”,

“fieldName”: “toss_opened”

}, {

“name”: “total_share_click”,

“type”: “longSum”,

“fieldName”: “share_clicks”

}, {

“name”: “total_short_views”,

“type”: “longSum”,

“fieldName”: “short_view_event”

}, {

“name”: “total_video_views”,

“type”: “longSum”,

“fieldName”: “video_event”

}, {

“name”: “total_ts_valid”,

“type”: “longSum”,

“fieldName”: “ts_valid”

}, {

“name”: “total_full_ts_valid”,

“type”: “longSum”,

“fieldName”: “ts_back_valid”

}, {

“name”: “is_ab”,

“type”: “longMax”,

“fieldName”: “ab_test”

}, {

“name”: “ab_variants”,

“type”: “longMax”,

“fieldName”: “variants”

}

]

},

“tuningConfig”: {

“type”: “kafka”

},

“ioConfig”: {

“topic”: “prism-final-testing-4”,

“consumerProperties”: {

“bootstrap.servers”: “172.16.3.142:9092,172.16.3.113:9092”

}

}

}

Attached are the middle manager jvm config, runtime and common properties.

Do let us know if you need any more configuration. I am not seeing any error logs on any nodes.

Thanks,

Saurabh

common.runtime.properties (4.04 KB)

runtime.properties (710 Bytes)

jvm.config (159 Bytes)

I am successfully able to persist the data into deep storage. But it 3 hours to do that. Why is that ?. As per the config the task should take 1 -1 1/2 hour max to persist the data.

One more question :

Does druid support to have 2 coordinator nodes. I have taken one node down and data got persisted. Is this the issue that it is taking time ?

Hey Saurabh,

If you take a look at your task log and the timestamp of the entries it might give you a better idea of what’s taking so long or if anything is going wrong. You might also want to add ‘-XX:+PrintGCDetails -XX:+PrintGCTimeStamps’ to druid.indexer.runner.javaOpts so that you can see if your task is spending a lot of time in GC.

The default value for taskDuration is 1H, but the time it takes to publish the segment depends on the complexity of your index + other factors. However, the task should have timed out trying to publish after 30M (by default) so I’d also expect that the task logs would show it was killed by the supervisor.

One thing that likely could be happening is that the task does persist the segment to deep storage and then sits around waiting for it to be noticed by the coordinator and loaded by a historical. I would check things in this order:

  1. indexing task logs - if you see logs for ‘Awaiting handoff of segments’ but don’t see ‘Segment Handoff complete’, then look at
  2. coordinator logs - check if it discovers the new segments around the time the indexing task published them and if it notified a historical to load them, and if so then look at
  3. historical logs - see if the segment loaded or if there’s something preventing it from loading (which would normally be capacity issues)

Yes Druid supports multiple coordinators for HA. One coordinator will be elected to leader and will be the active coordinator while the rest of the coordinators will be idle until a failover condition occurs.