Error: io.druid.java.util.common.ISE: No suitable partitioning dimension found! at

**Any help is very much appreciated. **

**I have been trying to load our data coming from hadoop. However, I keep getting the following error every time. It always fails on the second hadoop job with the following error. **

Error: io.druid.java.util.common.ISE: No suitable partitioning dimension found! at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionReducer.innerReduce(DeterminePartitionsJob.java:754) at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:497) at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:471) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

I looked into the hadoop job logs and this is the failure messages. I am not certain what the error message I highlighted below means where it says the expected row count -1. The weird thing is when I reduce the number of dimensions from 10 to 3 dimensions and still retaining accountId, the ingestion job successfully completes. The shard size is around 700mb.

2019-09-24T13:24:48,061 INFO [main] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 29,999,769 rows and 41,514 unique values: SingleDimensionShardSpec{dimension=‘accountId’, start=‘988028571’, end=‘993517804’, partitionNum=183}

2019-09-24T13:24:52,016 INFO [main] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 29,997,660 rows and 44,650 unique values: SingleDimensionShardSpec{dimension=‘accountId’, start=‘993517804’, end=‘998895034’, partitionNum=184}

2019-09-24T13:24:54,275 INFO [main] io.druid.indexer.DeterminePartitionsJob - Removing possible shard: SingleDimensionShardSpec{dimension=‘accountId’, start=‘993517804’, end=‘998895034’, partitionNum=184}

2019-09-24T13:24:54,275 INFO [main] io.druid.indexer.DeterminePartitionsJob - Adding possible shard with 37,288,717 rows and 53,794 unique values: SingleDimensionShardSpec{dimension=‘accountId’, start=‘993517804’, end=‘null’, partitionNum=184}

2019-09-24T13:24:54,275 INFO [main] io.druid.indexer.DeterminePartitionsJob - Completed dimension[accountId]: 185 possible shards with 8,826,599 unique values

2019-09-24T13:24:54,276 INFO [main] io.druid.indexer.DeterminePartitionsJob - Dimension[accountId] is not present in all rows (row count 1,251,890,080 != expected row count -1)

2019-09-24 13:11:35,587 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
2019-09-24 13:11:35,588 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
2019-09-24 13:11:35,589 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
2019-09-24 13:11:35,692 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-09-24 13:11:36,143 INFO [main] org.hibernate.validator.internal.util.Version: HV000001: Hibernate Validator 5.1.3.Final
2019-09-24 13:15:09,649 WARN [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the stream java.io.IOException: No such process
2019-09-24 13:24:54,278 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : io.druid.java.util.common.ISE: No suitable partitioning dimension found!
	at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionReducer.innerReduce(DeterminePartitionsJob.java:754)
	at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:497)
	at io.druid.indexer.DeterminePartitionsJob$DeterminePartitionsDimSelectionBaseReducer.reduce(DeterminePartitionsJob.java:471)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)


I also

We can identify the issue by looking at your ingression spec of you can share it.

Here is the ingestion spec. And here are some test scenarios that seem to indicate that the issue is related to the number of records that I have loading.

  1. When I tried only loading 20% of the data, it worked just fine.

  2. When I tried more than 20%, it failed with the same error that I mentioned above.

Any help is very much appreciated. Thanks.

{

“type”: “index_hadoop”,

“spec”: {

“ioConfig”: {

“type”: “hadoop”,

“inputSpec”: {

“type”: “granularity”,

“dataGranularity”: “DAY”,

“inputPath”: “hdfs://hadoodprod-nn-ha:8020/apps/datadump/”,

“filePattern”: “.*\.gz”,

“pathFormat”: “‘y’=yyyy/‘m’=MM/‘d’=dd/”

}

},

“dataSchema”: {

“dataSource”: “test_db_source”,

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: {

“type”: “period”,

“period”: “P1D”,

“timeZone”: “America/Phoenix”

},

“queryGranularity”: “hour”,

“intervals”: [

“2019-08-31T00:00-07:00/2019-09-02T00:00-07:00:00”

]

},

“parser”: {

“type”: “hadoopyString”,

“parseSpec”: {

“format”: “tsv”,

“columns”: [

“skuId”,

“bookId”,

“timestamp”,

“siteId”,

“accountId”,

“cId”,

“fmtItem”,

“plId”,

“clientbookId”,

“lcId”,

“accountIdMod”,

“eventType”,

“iRank”

],

“dimensionsSpec”: {

“dimensions”: [

“accountId”,

“skuId”,

“bookId”,

“siteId”,

“cId”,

“eventType”

],

“dimensionExclusions”: [

“accountIdMod”,

“iRank”

]

},

“timestampSpec”: {

“format”: “auto”,

“column”: “timestamp”

}

}

},

“metricsSpec”: [

{

“name”: “count”,

“type”: “count”

},

{

“type”: “longSum”,

“name”: “views”,

“expression”: “if(substring(eventType,0,2)==‘OI’, 1,0)”

},

{

“type”: “longSum”,

“name”: “viewsSearch”,

“expression”: “if(eventType==‘OIS’, 1,0)”

},

{

“type”: “longSum”,

“name”: “check”,

“expression”: “if(substring(eventType,0,2)==‘OC’, 1,0)”

},

{

“type”: “longSum”,

“name”: “checkOnyelo”,

“expression”: “if(eventType==‘OCE’, 1,0)”

},

{

“type”: “longSum”,

“name”: “checkOffyelo”,

“expression”: “if(eventType==‘OCNE’, 1,0)”

}

],

“transformSpec”: {

“transforms”: [

{

“type”: “expression”,

“name”: “accountId”,

“expression”: “if(strlen(trim(accountId))==0, ‘0’, accountId)”

}

]

}

},

“tuningConfig”: {

“type”: “hadoop”,

“partitionsSpec”: {

“type”: “dimension”,

“targetPartitionSize”: 10000000,

“maxPartitionSize”: 20000000,

“partitionDimension”: “accountId”

},

“useCombiner”: “true”,

“ignoreInvalidRows”: “false”,

“logParseExceptions”: “true”,

“jobProperties”: {

“mapreduce.job.classloader”: “true”,

“mapreduce.reduce.memory.mb”: 10240,

“mapreduce.job.reduces”: 50,

“mapreduce.map.log.level”: “DEBUG”

}

}

}

}

``