I have a csv file which has 5.6million records (around 30days of data). And 75 columns.
I am succesffuly able to ingest the data using native ingestion (index_parallel). But when i try to do the same using hadoop index, it is failing.
I used the same ingestion (wiki hadoop ingestion json, which works) and replaced with my data specific entries.
Attached yarn log and the ingestion spec.
Can you please help.
failing_task_hadoop_index_08Aug2018.json (2.35 KB)
yarn_log_08Aug2018.log (169 KB)
Seems like parsing error. Can you check if Data in your file is correctly formatted? Try with a smaller 10-20 records file first.
Yes, i tried with sample 10 records. Even that is failing.
If data is the issue, then the native ingestion should also fail. Am i missing something?
This is the error I’m seeing in the talks log.
Caused by: java.lang.UnsupportedOperationException: hasHeaderRow or maxSkipHeaderRows is not supported. Please check the indexTask supports these options.
Try removing the header of the csv file.
skip hasHeaderRow, it means you have to define all columns in parseSpec like columns : [column1, column2 …] instead of using hasHeaderRow.
Thanks for your replies.
I was trying multiple combinations.
data in ‘csv’ format, hadoop-index service does not work with header in the csv file, with header=true/skiprows=1. If i remove header and run the job, the MR is working fine and segments are getting created.