Ingesting Parquet files into Druid

I have some parquet files in S3 that needs to be ingested into Druid hosted on AWS EC2 boxes. We have no active Hadoop cluster. Is it possible to ingest these files without starting a Hadoop cluster?

I’m trying below spec

{
“type”: “index_hadoop”,
“spec”: {
“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “static”,
“inputFormat”: “org.apache.druid.data.input.parquet.DruidParquetInputFormat”,
“paths”: “s3n:///andesextractsdruid/PARQUET_TABLE/marketplace_name=Amazon.com/snapshot_day=2019-02-10/*
}
},
“dataSchema”: {
“dataSource”: “PARQUET_TABLE”,
“parser”: {
“type”: “parquet”,
“parseSpec”: {
“format”: “timeAndDims”,
“timestampSpec”: {
“column”: “snapshot_day”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [
“marketplace_name”,
“sellable_onhand_quantity”
],
“dimensionExclusions”: ,
“spatialDimensions”:
}
}
}
},
“tuningConfig”: {
“type” : “hadoop”,
“reportParseExceptions” : true,
“jobProperties” : {
“fs.s3.awsAccessKeyId” : “key”,
“fs.s3.awsSecretAccessKey” : “acckey”,
“fs.s3.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“fs.s3n.awsAccessKeyId” : “key”,
“fs.s3n.awsSecretAccessKey” : “acckey”,
“fs.s3n.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“io.compression.codecs” : “org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec”
},
“partitionsSpec”: {
“type”: “hashed”,
“targetPartitionSize” : 5000000
}
}
}
}

Hi Avi?

What version of Druid are you using? I believe native ingest only supports flat files today. It is on the roadmap to support binary file types in the future.

Hi Daniel, We are using ver 15

Hi Avi,
Unfortunately, no, you still can’t without hadoop

Here you can find why : https://groups.google.com/forum/#!searchin/druid-user/Parquet%7Csort:date/druid-user/0ubPu-i7kmM/3xKvTltwBgAJ