Druid 0.6.146 segments are not being handed off from realtime to historical

I looked at the latest docs as docs for 0.6.146 did not have anything
http://druid.io/docs/latest/ingestion/faq.html#my-realtime-node-is-not-handing-segments-off

– Metadata config is correct for Mysql

druid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid

druid.db.connector.user=druid

druid.db.connector.password=druid

– Historical nodes are well within their capacity. Checked this via the coordinator’s cluster view.

– There was one exception in the historical logs.

INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.coordination.BaseZkCoordinator.start() throws java.io.IOException] on object[io.druid.server.coordination.ZkCoordinator@134806b7].

I found via coordinator’s cluster view that the segment binary version is null and size is being shown as 0 and nothing is being handed off to historical node. Everything is on realtime.

– No exceptions in coordinator logs.

Any idea what could be causing the versions to become null? I guess that can be considered as corrupt. But the queries are giving correct results. Kill task is not working on these segments.

This is happening for some new dataSource that were created. The spec is exactly the same as older ones except the granularity.

There are some older segments with null values and they are not being handed off either to historical. The only common thing is that they all seem to have null version.

There were some exceptions in realtime.log also. But not much detail.

com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.client.ServerInventoryView.start() throws java.lang.Exception] on object[io.druid.client.SingleServerInventoryView@3e794c9c]

com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.segment.realtime.RealtimeManager.start() throws java.io.IOException] on object[io.druid.segment.realtime.RealtimeManager@28317c6f]

[XXXX-2015-06-15T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[XXXX]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class com.metamx.common.IAE, exceptionMessage=Bad number of metrics[3], expected [2], interval=2015-06-15T00:00:00.000Z/2015-06-15T01:00:00.000Z}

com.metamx.common.IAE: Bad number of metrics[3], expected [2]

at io.druid.segment.IndexMerger.merge(IndexMerger.java:270)

at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:170)

at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:163)

at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:348)

at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

The XXXX are the dataSource name. But these were not the new dataSource created.

Bump

Hi Aseem, did you stop the node at any point and change the metrics schema?

I stopped druid -> Yes
Change the metrics schema -> I don’t know what that is.

Hi Aseem,
by changing the schema FJ refers to any addition/removal of metrics during the restart ?

from the exception trace it seems you might have removed a metric and restarted the node, that caused this.

you will need to clear the persist directory for realtime node and restart it in order for things to work.

If you want to support Schema changes on the fly, you can use druid indexing service and tranquility (https://github.com/druid-io/tranquility) which also support schema updates.