Ingestion Hadoop Task gets stuck when running

Hi,

I have successfully run the different druid components on cluster mode that I have configured it to use an external hdfs cluster, my task keeps running to sync file from storage but still can’t be added without any error messages. These are the some lines on my log files:

2017-02-16T11:41:38,543 WARN [task-runner-0-priority-0] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

2017-02-16T11:41:38,544 WARN [task-runner-0-priority-0] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2017-02-16T11:41:38,546 INFO [task-runner-0-priority-0] org.apache.hadoop.hdfs.PeerCache - SocketCache disabled.

2017-02-16T11:41:39,028 INFO [task-runner-0-priority-0] io.druid.guice.JsonConfigurator - Loaded class[class com.metamx.emitter.core.LoggingEmitterConfig] from props[druid.emitter.logging.] as [LoggingEmitterConfig{loggerClass=‘com.metamx.emitter.core.LoggingEmitter’, logLevel=‘info’}]

2017-02-16T11:41:39,097 INFO [task-runner-0-priority-0] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.metrics.DruidMonitorSchedulerConfig] from props[druid.monitoring.] as [io.druid.server.metrics.DruidMonitorSchedulerConfig@18a35628]

2017-02-16T11:41:39,110 INFO [task-runner-0-priority-0] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.metrics.MonitorsConfig] from props[druid.monitoring.] as [MonitorsConfig{monitors=[class com.metamx.metrics.JvmMonitor]}]

2017-02-16T11:41:39,114 INFO [task-runner-0-priority-0] io.druid.server.metrics.MetricsModule - Adding monitor[com.metamx.metrics.JvmMonitor@3967aafe]

2017-02-16T11:41:39,115 INFO [task-runner-0-priority-0] io.druid.server.metrics.MetricsModule - Adding monitor[io.druid.query.ExecutorServiceMonitor@1313c5cf]

2017-02-16T11:41:39,115 INFO [task-runner-0-priority-0] io.druid.server.metrics.MetricsModule - Adding monitor[io.druid.server.initialization.jetty.JettyServerModule$JettyMonitor@a0cf4da]

2017-02-16T11:41:39,119 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Assigning value [536870912] for [druid.processing.buffer.sizeBytes] on [io.druid.query.DruidProcessingConfig#intermediateComputeSizeBytes()]

2017-02-16T11:41:39,122 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Using method itself for [druid.computation.buffer.poolCacheMaxCount, ${base_path}.buffer.poolCacheMaxCount] on [io.druid.query.DruidProcessingConfig#poolCacheMaxCount()]

2017-02-16T11:41:39,123 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Using method itself for [${base_path}.numMergeBuffers] on [io.druid.query.DruidProcessingConfig#getNumMergeBuffers()]

2017-02-16T11:41:39,123 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Assigning value [2] for [druid.processing.numThreads] on [io.druid.query.DruidProcessingConfig#getNumThreads()]

2017-02-16T11:41:39,124 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Using method itself for [${base_path}.columnCache.sizeBytes] on [io.druid.query.DruidProcessingConfig#columnCacheSizeBytes()]

2017-02-16T11:41:39,124 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Using method itself for [${base_path}.fifo] on [io.druid.query.DruidProcessingConfig#isFifo()]

2017-02-16T11:41:39,124 INFO [task-runner-0-priority-0] org.skife.config.ConfigurationObjectFactory - Assigning default value [processing-%s] for [${base_path}.formatString] on [com.metamx.common.concurrent.ExecutorServiceConfig#getFormatString()]

2017-02-16T11:41:39,248 INFO [task-runner-0-priority-0] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.indexer.HadoopKerberosConfig] from props[druid.hadoop.security.kerberos.] as [io.druid.indexer.HadoopKerberosConfig@57e05773]

2017-02-16T11:41:39,448 INFO [task-runner-0-priority-0] io.druid.guice.PropertiesModule - Loading properties from common.runtime.properties

2017-02-16T11:41:39,449 INFO [task-runner-0-priority-0] io.druid.guice.PropertiesModule - Loading properties from runtime.properties

2017-02-16T11:41:39,469 INFO [task-runner-0-priority-0] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, directory=‘extensions’, hadoopDependenciesDir=‘hadoop-dependencies’, hadoopContainerDruidClasspath=‘null’, loadList=[druid-hdfs-storage, mysql-metadata-storage]}]

2017-02-16T11:41:39,470 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.HadoopIndexTask - Starting a hadoop determine configuration job…

2017-02-16T11:41:39,500 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - trying to authenticate user [admin] with keytab [/opt/druid-0.9.2/conf/druid/_common/user.keytab]

2017-02-16T11:41:39,578 INFO [task-runner-0-priority-0] org.apache.hadoop.security.UserGroupInformation - Login successful for user admin using keytab file /opt/druid-0.9.2/conf/druid/_common/user.keytab

2017-02-16T11:41:39,614 INFO [task-runner-0-priority-0] io.druid.indexer.path.StaticPathSpec - Adding paths[/pageviews.json]

2017-02-16T11:41:39,650 WARN [task-runner-0-priority-0] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2017-02-16T11:42:24,955 INFO [task-runner-0-priority-0] io.druid.indexer.path.StaticPathSpec - Adding paths[/pageviews.json]

2017-02-16T11:42:25,201 WARN [task-runner-0-priority-0] org.apache.hadoop.hdfs.BlockReaderLocal - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

2017-02-16T11:42:25,348 INFO [task-runner-0-priority-0] org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 425047 for admin on ha-hdfs:hacluster

2017-02-16T11:42:25,363 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://hacluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 425047 for admin)

2017-02-16T11:42:25,737 INFO [task-runner-0-priority-0] org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider - Failing over to 40

2017-02-16T11:42:26,109 WARN [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.

2017-02-16T11:42:26,476 WARN [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).

2017-02-16T11:43:11,375 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2017-02-16T11:43:13,254 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1

2017-02-16T11:43:14,208 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1486083704046_0035

2017-02-16T11:43:14,209 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.JobSubmitter - Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 425047 for admin)

2017-02-16T11:43:14,662 INFO [task-runner-0-priority-0] org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.

``

Here’s my task

{

“type”: “index_hadoop”,

“spec”: {

“ioConfig”: {

“type”: “hadoop”,

“inputSpec”: {

“type”: “static”,

“paths”: “/pageviews.json”

}

},

“dataSchema”: {

“dataSource”: “pageviews”,

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “day”,

“queryGranularity”: “none”,

“intervals”: [

“2015-09-01/2015-09-02”

]

},

“parser”: {

“type”: “hadoopyString”,

“parseSpec”: {

“format”: “json”,

“dimensionsSpec”: {

“dimensions”: [

“url”,

“user”

]

},

“timestampSpec”: {

“format”: “auto”,

“column”: “time”

}

}

},

“metricsSpec”: [

{

“name”: “views”,

“type”: “count”

},

{

“name”: “latencyMs”,

“type”: “doubleSum”,

“fieldName”: “latencyMs”

}

]

},

“tuningConfig”: {

“type”: “hadoop”,

“partitionsSpec”: {

“type”: “hashed”,

“targetPartitionSize”: 5000000

},

“jobProperties”: {}

}

}

}

``

Can anyone has seen the similar situation before, I have tried to upgrade the hdfs extension from 2.3.0 to 2.7.2 but it still can’t ingest. however, if I trying to ingest data from local file instead of hdfs, it can be work fine.

Thanks

Jeremy Lv