Error while Loading Data from Hive

Hi,

While trying to load data from an external hive table (as part of Azure HDInsights 4), I’m seeing the following error -

Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1574676501569_0032_212_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

Caused by: java.io.FileNotFoundException: Operation failed: “The specified path does not exist.”, 404, PUT, https://samplestorageaccount.dfs.core.windows.net/test-cotnainer/tmp/druid-indexing/.staging-hive_20191215140547_c22c5c7a-a76e-42fd-97bd-cab14fe506bb/intermediateSegmentDir/sales_data/81ae240150904c878c740db92825de7c/0_index.zip?action=append&position=916&timeout=90, PathNotFound, “The specified path does not exist. RequestId:2b0e984c-901f-0061-3250-b37626000000 Time:2019-12-15T14:07:14.8829493Z”

Can someone help with this?

Hi Aditya,

A similar thing happened with me, turned out that Druid first stores file names that are to be ingested and then starts ingesting it. In my case the HDFS files were continuously being replaced with same data. So evem though it looked like the data on Hive is same, underlying files names weere being changed continuously. Try copying the files to a different folder which nobody touches and try ingesting it from there.