How do I transfer my raw event data from HDFS into Druid

What are my options when transferring my raw event data from HDFS into Druid?

Option 1: Save the raw data into HDFS, then use a Hadoop Job to transform to the Druid format and finally use the Druid Batch Data Ingestion functionality.

Option 2: Save the raw data into HDFS, then create a HadoopDruidIndexer Task that has a custom InputFormat (See below example using MyAvroToTextInputFormat).

Hi Mark, what format is your raw data in right now? FWIW, if you have data in CSV, delimited, or JSON format, you should be able to ingest it directly using Druid’s built in hadoop indexer. If you don’t have data in these formats, do you have an ETL process that can flatten data and have it in one of these formats?

I was just wondering what my available options are when transferring my raw event data from HDFS into Druid. Is there any “built-in” Hadoop to Druid consumption option that uses a ETL process to flatten and reformat this data?

Has anyone had any experience with the functionality described here https://github.com/druid-io/druid/pull/1177 ?

– Mark

The available Hadoop-based ingestion methods are listed here:
http://druid.io/docs/latest/ingestion/batch-ingestion.html