Segment [%s] is different than expected size. Expected [%d] found [%d]

Hi, from time to time I see this warning in the Historical logs:

2015-07-19T16:15:50,485 WARN [ZkCoordinator-0] io.druid.segment.loading.SegmentLoaderLocalCacheManager - Segment [sdgadserver-videoimpression_2015-07-19T15:00:00.000Z_2015-07-19T16:00:00.000Z_2015-07-19T15:00:13.577Z] is different than expected size. Expected [346239] found [346284]

Is this somehow important? What is the reason for this warning?

Lukas Havrlant

Hmm, are you using tranquility with replication? It’s possible that your historical node got metadata from one of the replicated segments and actual data from a different one, and they’re slightly mismatched. That could happen if the segments are slightly different due to processing the data in a different order or one of them having some duplicate data. (Tranquility based ingestion can generate duplicates after retries of network issues)

Hi Gian,
thank you! Yes we use Tranquility with replication so maybe that’s the problem. Is there an easy way how to find out if the segments have duplicates or different data in general due to retries?

The easiest way is to compare counts in Druid to a gold copy of the data you have somewhere.

If these kinds of discrepancies are a problem for you, the best way to deal with them in Druid today is to run a lambda architecture. You’d tee off a copy of your data into HDFS/S3 and periodically replace recently realtime-ingested intervals with batch-ingested data. This works because batch ingestion is one-shot, all-or-nothing and so it will exactly match your input data.

Ok, thank you again. I don’t think it’s a problem right now but maybe some day it will.