Some of the segments even a month old time has is_realtime with 1 when taskDuration is PT1H

Hello Experts,
I am observing that certain segments like the one below where is_realtime is 0.

{

“datasource”: “xyz”,

“end”: “2019-07-01T21:00:00.000Z”,

“is_available”: 1,

“is_published”: 1,

“is_realtime”: 1,

“num_replicas”: 8,

“num_rows”: 0,

“partition_num”: 2,

“payload”: “{“datapayload here””,

“size”: 168002426,

“start”: “2019-07-01T20:00:00.000Z”,

“version”: “2019-07-01T20:00:26.442Z”

}

As per the document https://druid.apache.org/docs/latest/querying/sql it means its served by realtime task. But its been more than a month. How is it still being handled by real time tasks?

Am i missing something here?

Also num_rows is 0 but size is very high. Is that because it doesnt know the rows and returning 0?

Any help on this would be helpful

Thanks in advance

Hi Vindhya,

Are you consistently getting these results for a segment ? I see that you have 8 replicas, it’s possible to get inconsistent results for realtime tasks depending upon which ingestion task gets queried by the Broker.

A segment can be published and realtime both, if the handoff is not complete yet. About the 0 num_rows and size being non-zero, it seems bit weird, will check on that.

It looks like you found a bug, I opened a issue for this https://github.com/apache/incubator-druid/issues/8142
About the non-zero size and zero num_rows, those values come from 2 different sources, we get the size from DataSegment object and num_rows from SegmentMetadata query, so it’s possible for a brief period of time, that num_rows is not known, so it returns 0, but you should not see that for long, if you repeat the query after few minutes.

Thanks Surekha. I queried again almost 24 hours later and still i get the same result.
i.e is_realtime = 1 and also num_rows shows as 0 even now and i observed that there are some segments with is_realtime=0 returning num_rows as 0 but in case of segments with i s_realtime=1 always returns num_rows 0

Hmm, in our test cluster, I did not get any results for “select * from sys.segments where num_rows = 0 and size > 0 and is_realtime=1 ;”, but I do see segments for query “select * from sys.segments where num_rows = 0 and size > 0 and is_realtime=0 ;” Do you get some data back for both of these queries ? Also, what’s the druid version you are on ?

Yes i get results for both of these queries.
Druid version we are using is 0.13

Some bug fixes went in Druid 0.14 for system schema. I think you might be hitting https://github.com/apache/incubator-druid/pull/6888
If you can upgrade to 0.15 (last release), you will get this fix. If you also want the fix for “is_realtime”, I have opened a PR for that here https://github.com/apache/incubator-druid/pull/8154

Druid 0.16 will have that fix, which is planned to code freeze in a week or so.

. It will take sometime for us to upgrade to latest version.will have to live with this info for now. Thanks a lot for the help