S3 to druid data ingestion

Hi, My ingestion task says success when I import data from S3 to druid but then the druid never makes its data source , I am submitting the spec thru command line.


bin/post-index-task --file quickstart/tutorial/wikipedia-index-s3.json --url http://localhost:8081


“type” : “index_parallel”,
“spec” : {
“dataSchema” : {
“dataSource” : “wikipediaTest”,
“timestampSpec”: {
“column”: “time”,
“format”: “iso”
“dimensionsSpec” : {
“dimensions” : [
{ “name”: “added”, “type”: “long” },
{ “name”: “deleted”, “type”: “long” },
{ “name”: “delta”, “type”: “long” }
“metricsSpec” : ,
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “day”,
“queryGranularity” : “none”,
“intervals” : [“2015-09-12/2015-09-13”],
“rollup” : false
“ioConfig” : {
“type” : “index_parallel”,
“inputSource” : {
“type” : “s3”,
“prefixes”: [“s3://druids3migration/druid/segments/wikiticker-2015-09-12-sampled2.json”]
“inputFormat” : {
“type” : “json”
“appendToExisting” : false
“tuningConfig” : {
“type” : “index_parallel”,
“maxRowsPerSegment” : 5000000,
“maxRowsInMemory” : 25000

Once i have faced a problem like yours, at that time my source data did not have rows between interval, which is 2015-09-12 in your case. Druid will not create any segment if your source data don’t have a valid range. And you will not see any waring on console. I am not expert in Druid. Just want to mention, may be you missed to check.

Aditya Verma vermaaaditya038@gmail.com, 22 Nis 2020 Çar, 21:03 tarihinde şunu yazdı:

Hi Sedat, Thanks for the reply, I have found out the reason, The real issue is that the files stored in the druid are in zip format but the spec reads data in the form of the JSON.
because when I uploaded the file on the s3 bucket in JSON format, I was able to read the data from the JSON file by using my above command.

I want to know how can one directly pull the data into druid from s3 which Is in .zip format so that I can read my data once I loose it from historical node.