Data not available after successful hadoop Indexing

Hi Team,

We are setting up a 30 node druid cluster with hadoop indexing

All our ingest tasks are successful, and we see the data available in segments also in the table as mentioned below.

But on querying and also in the overlord console we have datasources without any data in them

select * from druid_segments order by created_date desc limit 10;

pageviews_zkrstrt_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T22:05:30.637Z | pageviews_zkrstrt | 2016-10-18T22:07:29.478Z | 2016-10-01T00:00:00.000Z | 2016-10-02T00:00:00.000Z | 0 | 2016-10-18T22:05:30.637Z | 1 | {“dataSource”:“pageviews_zkrstrt”,“interval”:“2016-10-01T00:00:00.000Z/2016-10-02T00:00:00.000Z”,“version”:“2016-10-18T22:05:30.637Z”,“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://stampy:8020/apps/dt/pxp/druid/segments/pageviews_zkrstrt/20161001T000000.000Z_20161002T000000.000Z/2016-10-18T22_05_30.637Z/0/index.zip”},“dimensions”:“url,user”,“metrics”:“views,latencyMs”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:2861,“identifier”:“pageviews_zkrstrt_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T22:05:30.637Z”} |

pageviews_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T21:15:46.005Z | pageviews | 2016-10-18T21:43:39.790Z | 2016-10-01T00:00:00.000Z | 2016-10-02T00:00:00.000Z | 0 | 2016-10-18T21:15:46.005Z | 1 | {“dataSource”:“pageviews”,“interval”:“2016-10-01T00:00:00.000Z/2016-10-02T00:00:00.000Z”,“version”:“2016-10-18T21:15:46.005Z”,“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://stampy:8020/apps/dt/pxp/druid/segments/pageviews/20161001T000000.000Z_20161002T000000.000Z/2016-10-18T21_15_46.005Z/0/index.zip”},“dimensions”:“url,user”,“metrics”:“views,latencyMs”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:2861,“identifier”:“pageviews_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T21:15:46.005Z”} |

pageviews_stampy_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T19:07:44.659Z | pageviews_stampy | 2016-10-18T19:14:51.249Z | 2016-10-01T00:00:00.000Z | 2016-10-02T00:00:00.000Z | 0 | 2016-10-18T19:07:44.659Z | 1 | {“dataSource”:“pageviews_stampy”,“interval”:“2016-10-01T00:00:00.000Z/2016-10-02T00:00:00.000Z”,“version”:“2016-10-18T19:07:44.659Z”,“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://stampy:8020/apps/dt/pxp/druid/segments/pageviews_stampy/20161001T000000.000Z_20161002T000000.000Z/2016-10-18T19_07_44.659Z/0/index.zip”},“dimensions”:“url,user”,“metrics”:“views,latencyMs”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:2861,“identifier”:“pageviews_stampy_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T19:07:44.659Z”} |

test_pageviews_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T18:22:56.583Z | test_pageviews | 2016-10-18T18:24:46.944Z | 2016-10-01T00:00:00.000Z | 2016-10-02T00:00:00.000Z | 0 | 2016-10-18T18:22:56.583Z | 1 | {“dataSource”:“test_pageviews”,“interval”:“2016-10-01T00:00:00.000Z/2016-10-02T00:00:00.000Z”,“version”:“2016-10-18T18:22:56.583Z”,“loadSpec”:{“type”:“hdfs”,“path”:“hdfs://stampy:8020/apps/dt/pxp/druid/segments/test_pageviews/20161001T000000.000Z_20161002T000000.000Z/2016-10-18T18_22_56.583Z/0/index.zip”},“dimensions”:“url,user”,“metrics”:“views,latencyMs”,“shardSpec”:{“type”:“none”},“binaryVersion”:9,“size”:2861,“identifier”:“test_pageviews_2016-10-01T00:00:00.000Z_2016-10-02T00:00:00.000Z_2016-10-18T18:22:56.583Z”}

Query : curl -X POST ‘host:8080/druid/v2/?pretty’ -H ‘Content-Type:application/json’ -d @pageviews_ts.query

Rspoones:

I did restart the zookeeper and all the overlord and coordinator nodes, still facing the same issue

Please advise us on how to proceed

Thanks,

Sathish

do you see any exception on the coordinator/overlord log ?

can you share those ?

also please make sure that druid version corresponds to the metadata storage extension version as well

Thanks Slim,

Restarting zookeeper followed by other nodes fixed the issue, we have a 3 node zk cluster where we also have historicals and middle manager running, will that be a problem

Regards,

Sathish

3 ZK nodes is standard and shouldn’t be a problem. Make sure druid’s common.runtime.properties has the IPs of all the nodes.