We are currently using the static-s3 firehose with the index task to ingest data into our druid cluster, but this requires us to convert our parquet data to csv before ingestion. I would like to use the parquet ingestion extension without having tho set up hadoop, but the static-s3 firehose seems to require a StringInputRowParser, which is causing an error with the parquet extension. Is there a way to work around the firehose so that we can read in the parquet files we have on s3 without setting up a hadoop cluster?
as far as I know, using Hadoop index task is the only available way to load parquet files from s3. I raised https://github.com/druid-io/druid/issues/5584.
2018년 4월 5일 (목) 오전 7:08, firstname.lastname@example.org님이 작성:
Thanks for opening the issue, I’ll see if we could set up the hadoop index task in the mean time, but thanks for opening the ticket!