Unable to see the new data that has been indexed

All,

This is the first time i am trying out druid. Looks very promising, but running into some teething issues. I hope you can help .

I have ingested some files using batch mode ( non-hadoop). The ingestion was successful as seen in the overlord console and also checked the ingestion log .

SegmentInsertAction{segments=[DataSegment{size=120158798, shardSpec=No

neShardSpec, metrics=[m1,m2.m3,m4,m5], dimensions=[a,b,c,d,e,f,g,h,i,j], version=‘2016-04-06T05:33:36.716Z’, loadSpec={type=local

, path=/druid/storage/pos_big/2016-02-18T00:00:00.000Z_2016-02-19T00:00:00.000Z/2016-04-06T05:33:36.716Z/0/index.zip}, interval=2016-02-18T00:00:00.000Z/2016-02

-19T00:00:00.000Z, dataSource=‘pos_big’, binaryVersion=‘9’}]}

2016-04-06T06:03:03,997 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_pos_big_2016-04-06T05:33:36.716Z”,

“status” : “SUCCESS”,

“duration” : 1761899

}

I looked at the storage directories and they do have the files split for each date. However, when i query the data it does not show the new data loaded. I restarted druid but no luck. Is there any metadata update i need to do or check?

Thanks for your help

Giri

I just found that the /druid/indexCache/ doesn’t have the files for the dates which are not showing up. Is there a way to recreate the indexCache or rebuild it for the newly ingested data?

Hi giri,
Once the data is indexed, the druid coordinator loads it up onto some historical node which then serves the query.

Do you have your coordinator and historical running ?

Can you see the segment loaded in the coordinator console ?

Nishant,

Thanks for your reply. I did look at localhost:8090/console.html and it has status of SUCCESS for the load. I was loading each week separately as an index job and running them parallelly ( non overlapping time windows). I can see in the logs as segment being added. However, when i query the data i don’t get all the results back for the dates i have loaded … i only see 4 days or so instead of full history 365+ days. I can see that the data has been indexed because

/druid/storage/pos_big has directories for each day and is of size i expect

However

/druid/indexCache/pos_big has directories only for 4 days instead of full year

The historical query results only show the dates in /druid/indexCache/pos_big , which is only 4 days

Hi Giri,
SUCCESS status for indexTask means that the data is indexed successfully and the segment is pushed to deep-storage. It does not guarantee that it has been handed over to the historical node. loading of segments on the historical nodes is done by the coordinator.

Few things to check -

  1. Check the coordinator console (http://localhost:8081/console.html) which shows how much data is loaded on the historical node

  2. Make sure that the historical node has enough capacity to load the segments (capacity of historical node is defined via druid.server.maxSize)

  3. check coordinator/historical logs for any errors related to segment loading

Nishant,

Thanks a million for the help. You are right - druid.server.maxSize was low and it did not load all the segments. After increasing the druid.server.maxSize and segment max size and restarting the cluster, i was able to see entire date range in the queries.

Cheers

Giri