All Druid queries return an empty list

I have a very puzzling situation with a Druid cluster (0.8.0). This cluster was installed a few months back, and I’m trying to get it running.

I have successfully ingested and indexed a data file (I think). According to the web interface of the coordinator node, the data source is fully available, and I can that all the segments are stored on node cluster-c. I can retrieve the dimensions and metrics of the data set with the command:

curl cluster-a:8082/druid/v2/datasources/opportunity_histogram_1M?interval=2015-01-01/2015-12-31

I have tried simple queries such as the following (I have also tried more complex queries, with the same result):

curl -X POST -H ‘content-type: application/json’ -d @query-metadata.json cluster-a:8082/druid/v2/

{
“queryType”: “dataSourceMetadata”,

“dataSource”: “opportunity_histogram_1M”,

“intervals”: [“2015-01-01/2015-12-31”]

}

``

curl -X POST -H ‘Content-Type:application/json’ -d @query-timeboundary.json cluster-a:8082/druid/v2/

{
“queryType” : “timeBoundary”,

“dataSource” : “opportunity_histogram_1M”

}

``

I enabled request logging on both the historical node that happens to host these segments (cluster-c), and on the broker node (cluster-a). I can see the requests coming into both nodes.

On the broker:

2015-11-18T00:09:06.951Z 10.0.10.246 {“queryType”:“timeBoundary”,“dataSource”:{“type”:“table”,“name”:“opportunity_histogram_1M”},“intervals”:{“type”:“intervals”,“intervals”:[“0000-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z”]},“bound”:"",“context”:{“queryId”:“3aa88e03-5e7d-4dc0-929d-e54b16dd2e6d”,“timeout”:300000}} {“query/time”:18,“success”:true}

On the historical node:

2015-11-18T00:18:06.399Z 10.0.10.248 {“queryType”:“timeBoundary”,“dataSource”:{“type”:“table”,“name”:“opportunity_histogram_1M”},“intervals”:{“type”:“segments”,“segments”:[{“itvl”:“2015-02-16T00:00:00.000Z/2015-02-17T00:00:00.000Z”,“ver”:“2015-11-10T23:03:30.838Z”,“part”:0},{“itvl”:“2015-03-18T00:00:00.000Z/2015-03-19T00:00:00.000Z”,“ver”:“2015-11-10T23:03:30.838Z”,“part”:0}]},“bound”:"",“context”:{“finalize”:false,“queryId”:“3aa88e03-5e7d-4dc0-929d-e54b16dd2e6d”,“timeout”:300000}} {“query/time”:11,“success”:true}

Both nodes seem to be reporting success, but no data is returned. No errors or warnings are displayed on the standard output or standard error of any of the cluster nodes.

Any suggestions how to troubleshoot this problem?

Hi Craig, can you post a screenshot of the coordinator console main page?

Hi Craig, can you post the full command and response of the timeboundary query?

Thank you for a quick response! The full command, JSON content, and response logs of the time boundary query are in my original post. The query returns .

Craig

Does it make a difference if u use the other datasource?

Both data sources return the same result: . I should note that the two data sources are actually ingested from the same data file, but with a different number of lines imported (ten thousand and one million).

Any exceptions in the broker logs?

I don’t see any exceptions. A request log entry from the broker is shown in my original post. The broker node is printing output to the console like this:

2015-11-18T00:08:04,076 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - New Server[DruidServerMetadata{name=‘cluster-c:8081’, host=‘cluster-c:8081’, maxSize=300000000000, tier=’_default_tier’, type=‘historical’, priority=‘0’}]

2015-11-18T00:26:08,334 INFO [qtp640275932-40] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://cluster-c:8081

That is the only output after the broker service is started. I don’t see any other logs from the broker. Is there a way to configure more verbose logging?

Craig

Hmmm, very odd. Are there any interesting logs for your historical? If you issue a timeBoundary query to your historical directly, do any results return?

Fangjin,
Queries directly to the historical node return the same result. cluster-c is the historical node that contains my data:

curl -X POST -H ‘Content-Type:application/json’ -d @query-timeboundary.json cluster-c:8081/druid/v2/

``

I see the request logged on cluster-c:

2015-11-22T22:26:29.997Z 10.0.10.246 {“queryType”:“timeBoundary”,“dataSource”:{“type”:“table”,“name”:“opportunity_histogram_10k”},“intervals”:{“type”:“intervals”,“intervals”:[“0000-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z”]},“bound”:"",“context”:{“queryId”:“5ad7bf12-daf4-4469-a01c-96c84b7e8533”,“timeout”:300000}} {“query/time”:16,“success”:true}

An empty list is returned to the querying node.

We are considering “tearing down” this cluster and starting from scratch, rather than spending a lot more time finding obscure errors in our setup. Do you have any further suggestions before we start over?

Craig

I have never seen this problem before. Can you dump the output of the curl in verbose mode? I don’t think tearing down a cluster will overcome this problem. I suspect the problem to be something with the environment things are set up.

Hi All,

I am also facing similar issue.

When I try to run an example at the below link I am able to get output.

http://druid.io/docs/latest/tutorials/tutorial-loading-streaming-data.html

But when I make use of spec file at below location I am getting empty result set as an output.

http://druid.io/docs/latest/ingestion/realtime-ingestion.html

Is there anything else that we have to change at the configuration level?

Thanks and Regards,

Pratik

Pratik, are you sure you have data in the cluster?

Can you copy and paste your ingest/* metrics?

I was wrong. I went through ingest metrics and found that there were events throownaway for my new data. I catched it in ‘ingest/events/thrownaway’ metric. It was because of timestamp I was ingesting.

I just changed tuningConfig -> rejectionPolicy -> type to “messageTime” from serverTime. and It worked for me.

Thank you Fangjin.