Cannot see segments on S3 with hadoop index

Hello, my setup is as follows:

  • Druid 0.15.1 on private cloud
  • Hadoop EMR on aws.

I have successfully submitted an ingestion job from druid console, and i can see it completed successfully in Hadoop.
However, in S3, i only see logs for the job, and not the segments. Hence, segments are never loaded in druid with the error:

Caused by: java.lang.RuntimeException: No buckets?? seems there is no data to index.

Attaching my job spec and common properties:

Since the job in hadoop terminates successfylly, how can i further debug why segments are not showing up on S3?

spec.json (2.28 KB)

common.properties (975 Bytes)

Hi,

Can you check your data is coherent with your spec interval ?

If you load data from 2018 but your spec specify interval of 2019, the ingestion won’t fail but you won’t have any segments.

I don’t remember exactly what you can search but in the ingestion logs there is a line where you can see how many lines are ingested

Hi,

Yes, i ve checked the sample file i am using and all the timestamps fall within the ingestion period i have set on the spec.

Are the ingestion logs on hadoop yarn?

This is the druid report job:

{
“ingestionStatsAndErrors”: {
“taskId”: “index_hadoop_data_2019-09-04T09:26:25.973Z”,
“payload”: {
“ingestionState”: “BUILD_SEGMENTS”,
“unparseableEvents”: null,
“rowStats”: {
“determinePartitions”: {
“rowsProcessedWithErrors”: 0,
“rowsProcessed”: 0,
“rowsUnparseable”: 0,
“rowsThrownAway”: 478627
},
“buildSegments”: null
},
“errorMsg”: “java.lang.RuntimeException: No buckets?? seems there is no data to index.”
},
“type”: “ingestionStatsAndErrors”
}
}

``

I am guessing rowsThrownAway Is problematic?

Ok, problem was, i had “auto” in timestampSpec but my data was in posix. I mistakenly thought that “auto” meant it would automatically detect the timestamp format.

Solved.