Hi,
I am trying to use the standalone Hadoop indexer to run the Wikipedia example that comes with Druid and I am getting the following error after the map and reduce phase:
…
Caused by: java.lang.RuntimeException: No buckets?? seems there is no data to index.
at io.druid.indexer.IndexGeneratorJob.run(IndexGeneratorJob.java:160) ~[assembly_druid-assembly-0.1-SNAPSHOT.jar:0.1-SNAPSHOT]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:182) ~[assembly_druid-assembly-0.1-SNAPSHOT.jar:0.1-SNAPSHOT]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:96) ~[assembly_druid-assembly-0.1-SNAPSHOT.jar:0.1-SNAPSHOT]
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:182) ~[assembly_druid-assembly-0.1-SNAPSHOT.jar:0.1-SNAPSHOT]
at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:132) ~[assembly_druid-assembly-0.1-SNAPSHOT.jar:0.1-SNAPSHOT]
at io.druid.cli.Main.main(Main.java:91) ~[assembly_druid-assembly-0.1-SNAPSHOT.jar:0.1-SNAPSHOT]
I’ve looked at other topics in this group alluding to the same error referencing an incorrect timestamp spec but that doesn’t seem to apply here since I am using the sample Wikipedia data.
It must some obvious mistake but I am unable to figure it out
I did build a custom fat jar to get around the jackson version conflict issue between Druid and Hadoop.
Here is the command:
java -Xmx256m -Dhdp.version=2.3.4.0-3485 -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath /root/lib/*:$(hadoop classpath) io.druid.cli.Main index hadoop examples/indexing/wikipedia_hadoop_config.json
The sample data is from the examples directory:
{“timestamp”: “2013-08-31T01:02:33Z”, “page”: “Gypsy Danger”, “language” : “en”, “user” : “nuclear”, “unpatrolled” : “true”, “newPage” : “true”, “robot”: “false”, “anonymous”: “false”, “namespace”:“article”, “continent”:“North America”, “country”:“United States”, “region”:“Bay Area”, “city”:“San Francisco”, “added”: 57, “deleted”: 200, “delta”: -143}
{“timestamp”: “2013-08-31T03:32:45Z”, “page”: “Striker Eureka”, “language” : “en”, “user” : “speed”, “unpatrolled” : “false”, “newPage” : “true”, “robot”: “true”, “anonymous”: “false”, “namespace”:“wikipedia”, “continent”:“Australia”, “country”:“Australia”, “region”:“Cantebury”, “city”:“Syndey”, “added”: 459, “deleted”: 129, “delta”: 330}
{“timestamp”: “2013-08-31T07:11:21Z”, “page”: “Cherno Alpha”, “language” : “ru”, “user” : “masterYi”, “unpatrolled” : “false”, “newPage” : “true”, “robot”: “true”, “anonymous”: “false”, “namespace”:“article”, “continent”:“Asia”, “country”:“Russia”, “region”:“Oblast”, “city”:“Moscow”, “added”: 123, “deleted”: 12, “delta”: 111}
{“timestamp”: “2013-08-31T11:58:39Z”, “page”: “Crimson Typhoon”, “language” : “zh”, “user” : “triplets”, “unpatrolled” : “true”, “newPage” : “false”, “robot”: “true”, “anonymous”: “false”, “namespace”:“wikipedia”, “continent”:“Asia”, “country”:“China”, “region”:“Shanxi”, “city”:“Taiyuan”, “added”: 905, “deleted”: 5, “delta”: 900}
{“timestamp”: “2013-08-31T12:41:27Z”, “page”: “Coyote Tango”, “language” : “ja”, “user” : “cancer”, “unpatrolled” : “true”, “newPage” : “false”, “robot”: “true”, “anonymous”: “false”, “namespace”:“wikipedia”, “continent”:“Asia”, “country”:“Japan”, “region”:“Kanto”, “city”:“Tokyo”, “added”: 1, “deleted”: 10, “delta”: -9}
Here is the slightly modified spec file:
{
“dataSchema”: {
“dataSource”: “wikipedia”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “timestamp”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [
“page”,
“language”,
“user”,
“unpatrolled”,
“newPage”,
“robot”,
“anonymous”,
“namespace”,
“continent”,
“country”,
“region”,
“city”
],
“dimensionExclusions”: ,
“spatialDimensions”:
}
}
},
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
},
{
“type”: “doubleSum”,
“name”: “added”,
“fieldName”: “added”
},
{
“type”: “doubleSum”,
“name”: “deleted”,
“fieldName”: “deleted”
},
{
“type”: “doubleSum”,
“name”: “delta”,
“fieldName”: “delta”
}
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “NONE”,
“intervals”: [“2013-08-31/2013-09-01”]
}
},
“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “static”,
“paths”: “hdfs:///data/wikipedia_data.json”
},
“metadataUpdateSpec”: {
“type”: “mysql”,
“connectURI”: “jdbc:mysql://localhost:3306/druid”,
“user”: “druid”,
“password”: “diurd”,
“segmentTable”: “druid_segments”
},
“segmentOutputPath”: “/tmp/segments”
},
“tuningConfig”: {
“type”: “hadoop”,
“workingPath”: “/tmp/working_path”,
“partitionsSpec”: {
“type” : “dimension”,
“targetPartitionSize”: 5000000
}
}
}
Thanks!