Error while loading TSV data into Druid : java.lang.RuntimeException: No buckets? seems there is no

Hello,

I just started using Druid 2 days back. I was able to do the batch ingestion of json quickstarter data.

However when I was trying to load a sample TSV data , I get the following error :

2018-08-10T18:57:32,958 INFO [task-runner-0-priority-0] io.druid.indexer.HadoopDruidIndexerJob - No metadataStorageUpdaterJob set in the config. This is cool if you are running a hadoop index task, otherwise nothing will be uploaded to database.
2018-08-10T18:57:32,972 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_rams-test_2018-08-10T18:57:24.010Z, type=index_hadoop, dataSource=rams-test}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:222) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:238) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.1.jar:0.12.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.1.jar:0.12.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	... 7 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: No buckets?? seems there is no data to index.
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:229) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	... 7 more
Caused by: java.lang.RuntimeException: No buckets?? seems there is no data to index.
	at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:182) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.12.1.jar:0.12.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:293) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:219) ~[druid-indexing-service-0.12.1.jar:0.12.1]
	... 7 more
2018-08-10T18:57:32,981 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_rams-test_2018-08-10T18:57:24.010Z] status changed to [FAILED].
2018-08-10T18:57:32,983 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_rams-test_2018-08-10T18:57:24.010Z",
  "status" : "FAILED",
  "duration" : 4402




My Sample Data is as follows :

229367	F_q2haNs6nj0d8pL--ddf122GtW	sb	false	0	ro	pass	none	true	true	212	0	2018-08-08	yidap	0	0	prod
228389	phJVKct_nvTbEMZ7yXW1Dk6Ms	sp	false	0	ro	pass	none	true	true	2876	0	2018-08-08	yidap	0	0	prod
226702	ECwcLeGlPHWlIhP1_7vkJTqXK	sp	false	0	ro	pass	none	true	true	64	8	2018-08-08	yidap	0	0	prod
225020	ECwcLeGlPHWlIhP1_7vkJTqXK	sp	false	0	ro	pass	none	true	true	272	10	2018-08-08	yidap	0	0	prod






My index task config is as follows :



{
   "type":"index_hadoop",
   "spec":{
      "ioConfig":{
         "type":"hadoop",
         "inputSpec":{
            "type":"static",
            "paths":"quickstart/000000_0"
         }
      },
      "dataSchema":{
         "dataSource":"rams-test",
         "granularitySpec":{
            "type":"uniform",
            "segmentGranularity":"day",
            "queryGranularity":"none",
            "intervals":[
               "2018-08-09/2018-08-10"
            ],
            "rollup":true
         },
         "parser":{
            "type":"hadoopyString",
            "parseSpec":{
               "format":"tsv",
               "timestampSpec":{
                  "format":"yyyy-mm-dd",
                  "column":"date_key"
               },
               "columns":[
                  "app_id",
                  "tag_id",
                  "ad_inv_type",
                  "ssai_req",
                  "ads_limit_tracking",
                  "event_type",
                  "redirect_type",
                  "redirect_reason",
                  "is_roku",
                  "is_raf",
                  "ad_request_count",
                  "rida_replaced_count",
                  "date_key",
                  "rida_type",
                  "lat_turned_on_count",
                  "lat_turned_off_count",
                  "bucket_test_id"
               ],
               "dimensionsSpec":{
                  "dimensions":[
                     "app_id",
                     "tag_id",
                     "ad_inv_type",
                     "ssai_req",
                     "ads_limit_tracking",
                     "event_type",
                     "redirect_type",
                     "redirect_reason",
                     "is_roku",
                     "is_raf",
                     "rida_type",
                     "bucket_test_id"
                  ]
               }
            }
         },
         "metricsSpec":[
            {
               "type":"count",
               "name":"count"
            }
         ]
      },
      "tuningConfig":{
         "type":"hadoop",
         "partitionsSpec":{
            "type":"hashed",
            "targetPartitionSize":5000000
         },
         "jobProperties":{

         }
      }
   }
}





If somebody could help me with this issue, it would be great.

Thank you

Hi,

this error can happen if the input data is empty. And the interval in your taskSpec is “2018-08-09/2018-08-10”, but there’s no data falling in this interval (all rows have the timestamp of ‘2018-08-08’). So, it would work if you change the interval in your taskSpec to ‘2018-08-08/2018-08-09’.

Best,

Jihoon

Hi Jihoon,

Thank you so much for your reply. I made the changes as you suggested. But still I was seeing the same error. Then I modified the “format” field of timestampspec to “auto” and I was able to load data into druid successfully.

Best,

Vinay

I guess there is no data from your original file

在 2018年8月11日星期六 UTC+8上午3:05:34,Vinay Shetty写道: