Hi,
I am using Druid to ingest 1TB worth of files from S3 by using a local cluster on a c4.4xlarge machine. The error happens after ingesting 1400 files, and it just stops ingesting any more files from there. The error doesn’t seem to indicate what is wrong:
2017-04-16T20:09:35,104 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_searches_2017-04-14T02:26:05.089Z, type=index_hadoop, dataSource=searches}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.jar:0.9.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_05]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_05]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_05]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_05]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_05]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_05]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_05]
at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_05]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
… 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_05]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_05]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_05]
at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_05]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
… 7 more
2017-04-16T20:10:02,611 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_searches_2017-04-14T02:26:05.089Z] status changed to [FAILED].
2017-04-16T20:10:39,674 WARN [Curator-Framework-0] org.apache.curator.ConnectionState - Connection attempt unsuccessful after 232110 (greater than max timeout of 30000). Resetting connection and trying again with a new connection.
2017-04-16T20:10:40,232 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_hadoop_searches_2017-04-14T02:26:05.089Z”,
“status” : “FAILED”,
“duration” : 236634704
}
``
Settings wise, I am using the conf settings recommended and reduced the heap size requirements a little so that the memory consumed remains within the capacity of the machine.
I am currently using the following spec to ingest the files from S3, and have proven that it works on a single file from S3 when changing the path to consume one file. However, when trying to ingest 1TB worth of files (total of 10000+ files) from S3, we get the error.
{
“type”: “index_hadoop”,
“spec”: {
“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “static”,
“paths”: “s3n://<ACCESS_KEY>:<SECRET_ACCESS_KEY>@//dt=2017-04-04/*”
}
},
“dataSchema”: {
“dataSource”: “searches”,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “DAY”,
“intervals”: [“2017-04-04/2017-04-05”]
},
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“flattenSpec”: {
“useFieldDiscovery”: true,
“fields”: [
{
“type”: “path”,
“name”: “timestamp”,
“expr”: “$.eventHeader.createdOn.unixTimeMillis”
},
{
“type”: “path”,
“name”: “id”,
“expr”: “$.downstreamId.identifier”
},
{
“type”: “path”,
“name”: “type”,
“expr”: “$.type”
},
{
“type”: “path”,
“name”: “exceptionType”,
“expr”: “$.exception.exceptionType”
},
{
“type”: “path”,
“name”: “qCount”,
“expr”: “$.qCount”
},
{
“type”: “path”,
“name”: “region”,
“expr”: “$.eventHeader.serviceInstance.region”
},
{
“type”: “path”,
“name”: “searchKind”,
“expr”: “$.search.kind”
},
{
“type”: “path”,
“name”: “engine”,
“expr”: “$.engineName”
},
{
“type”: “path”,
“name”: “requestClient”,
“expr”: “$.requestClientKind”
}
]
},
“dimensionsSpec”: {
“dimensions”: [
“id”,
“type”,
“exceptionType”,
“region”,
“searchKind”,
“engine”,
“requestClient”
],
“dimensionExclusions” : ,
“spatialDimensions” :
},
“timestampSpec”: {
“format”: “millis”,
“column”: “timestamp”
}
}
},
“metricsSpec”: [
{
“name”: “count”,
“type”: “count”
},
{
“type” : “longSum”,
“name” : “qCountSum”,
“fieldName” : “qCount”
}
]
},
“tuningConfig”: {
“type”: “hadoop”,
“jobProperties”: {
“fs.s3n.awsAccessKeyId”: “<ACCESS_KEY>”,
“fs.s3n.awsSecretAccessKey”: “<SECRET_KEY>”,
“fs.s3n.impl”: “org.apache.hadoop.fs.s3native.NativeS3FileSystem”
}
}
}
}
``
Hopefully, there is someone out there who knows about the possible issue related to this.
Thanks!
Yong Cheng