post real time index task to overlord get 500 error

I am fresh with druid, and I am running some tests against druid, when I post task to my overload server, I got a 500 error:

this is my request:

curl -XPOST -H’Content-type: application/json’ \

http://localhost:8090/druid/indexer/v1/task” -d ‘{ “type”: “index_realtime”, “id”: “example”, “resource”: { “availabilityGroup”: “someGroup”, “requiredCapacity”: 1 }, “spec”: { “dataSchema”: { “dataSource”: “wikipedia”, “parser”: { “type”: “string”, “parseSpec”: { “format”: “json”, “timestampSpec”: { “column”: “timestamp”, “format”: “auto” }, “dimensionsSpec”: { “dimensions”: [ “page”, “language”, “user”, “unpatrolled”, “newPage”, “robot”, “anonymous”, “namespace”, “continent”, “country”, “region”, “city” ], “dimensionExclusions”: [ ], “spatialDimensions”: [ ] } }, “metricsSpec”: [ { “type”: “count”, “name”: “count” } ], “granularitySpec”: { “type”: “uniform”, “segmentGranularity”: “PT5m”, “queryGranularity”: “NONE” } } }, “ioConfig”: { “type”: “realtime”, “firehose”: { “type”: “kafka-0.8”, “consumerProps”: { “zookeeper.connect”: “localhost:2181”, “zookeeper.connection.timeout.ms” : “15000”, “zookeeper.session.timeout.ms” : “15000”, “zookeeper.sync.time.ms” : “5000”, “group.id”: “consumer-group”, “fetch.message.max.bytes” : “1048586”, “auto.offset.reset”: “largest”, “auto.commit.enable”: “false” }, “feed”: “kafka_topic_1” }, “plumber”: { “type”: “realtime” } }, “tuningConfig”: { “type”: “realtime”, “maxRowsInMemory”: 500000, “intermediatePersistPeriod”: “PT5m”, “windowPeriod”: “PT5m”, “basePersistDirectory”: “/tmp/realtime/basePersist”, “rejectionPolicy”: { “type”: “messageTime” } } }}’

and the response is :

Error 500

HTTP ERROR: 500

Problem accessing /druid/indexer/v1/task. Reason:

    javax.servlet.ServletException: com.fasterxml.jackson.databind.JsonMappingException: Instantiation of [simple type, class io.druid.segment.indexing.DataSchema] value failed: null

Powered by Jetty://

my druid version is 0.7.1-rc1.

someone help me out please.

thank you.

Hi,

What version of Druid are you using? FWIW, for submitting realtime index tasks, we highly recommend looking at the Tranquility (https://github.com/metamx/tranquility) library for managing things.

my druid version is 0.7.1-rc1.

I am not familiar with scala. i check out Tranquility. I check out scala and do the sbt package, and I got error like this:

[error] src/main/scala/com/metamx/tranquility/druid/DruidBeamMaker.scala:75: too many arguments for constructor MapInputRowParser: (x$1: io.druid.data.input.impl.ParseSpec)io.druid.data.input.impl.MapInputRowParser

[error] new MapInputRowParser(

[error] ^

[error] src/tranquility/src/main/scala/com/metamx/tranquility/druid/DruidBeamMaker.scala:83: too many arguments for constructor RealtimeIndexTask: (x$1: String, x$2: io.druid.indexing.common.task.TaskResource, x$3: io.druid.segment.realtime.FireDepartment)io.druid.indexing.common.task.RealtimeIndexTask

[error] new RealtimeIndexTask(

[error] ^

[error] two errors found

[error] (compile:compile) Compilation failed

I found some invisible character in my json. I fix it, and got this.
somebody help me. what am I done to cause this error?

Hi,
looks like you are trying to define plumber in the task json,

can you try removing plumber if its specified in task spec?

If this is not the case, can you share the task json you are submitting ?

I’d just like to add that you should probably look into Tranquility (http://github.com/metamx/tranquility) for running realtime index tasks. It will make life much simpler. If you are interested in ingesting directly with Kafka, you should look into realtime nodes. The ingestion story in Druid is confusing right now but there are multiple proposals out right now to clean up ingestion.

Please let me know if you’d like me to elaborate on my previous post.

thank you for your reply. About Tranquility , as I mentioned in my post, I can not compile it correctly. My druid version is 0.7.1-rc1 and I got errors like this:

I used cul to submit task like this

curl -XPOST -H’Content-type: application/json’ \

“http://inex_server:8090/druid/indexer/v1/task” -d @mytest/index2.spec

this is my spec file for my test

{

“type”:“index_realtime”,

“spec”:{

“dataSchema”:{

“dataSource”:“wikipedia3”,

“parser”:{

“type”:“string”,

“parseSpec”:{

“format”:“json”,

“timestampSpec”:{

“column”:“timestamp”,

“format”:“auto”

},

“dimensionsSpec”:{

“dimensions”:[“page”,“language”,“user”,“unpatrolled”,“newPage”,“robot”,“anonymous”,“namespace”,“continent”,“country”,“region”,“city”],

“dimensionExclusions”:,

“spatialDimensions”:

}

}

},

“metricsSpec”:[{

“type”:“count”,

“name”:“count”

},{

“type”:“doubleSum”,

“name”:“added”,

“fieldName”:“added”

},{

“type”:“doubleSum”,

“name”:“deleted”,

“fieldName”:“deleted”

},{

“type”:“doubleSum”,

“name”:“delta”,

“fieldName”:“delta”

}],

“granularitySpec”:{

“type”:“uniform”,

“segmentGranularity”:“minute”,

“queryGranularity”:“NONE”

}

},

“ioConfig”:{

“type”:“realtime”,

“firehose”:{

“type”:“kafka-0.8”,

“consumerProps”:{

“zookeeper.connect”:“zk_ip:2181”,

“zookeeper.connection.timeout.ms”:“15000”,

“zookeeper.session.timeout.ms”:“15000”,

“zookeeper.sync.time.ms”:“5000”,

“group.id”:“druid-example”,

“fetch.message.max.bytes”:“1048586”,

“auto.offset.reset”:“largest”,

“auto.commit.enable”:“false”

},

“feed”:“kafka_topic_1”

},

“plumber”:{

“type”:“realtime”

}

},

“tuningConfig”:{

“type”:“realtime”,

“maxRowsInMemory”:500000,

“intermediatePersistPeriod”:“PT10m”,

“windowPeriod”:“PT10m”,

“basePersistDirectory”:"/tmp/realtime/basePersist",

“rejectionPolicy”:{

“type”:“messageTime”

}

}

}

}

Hi roc, see inline.

I used cul to submit task like this

curl -XPOST -H’Content-type: application/json’ \

http://inex_server:8090/druid/indexer/v1/task” -d @mytest/index2.spec

this is my spec file for my test

{

“type”:“index_realtime”,

“spec”:{

“dataSchema”:{

“dataSource”:“wikipedia3”,

“parser”:{

“type”:“string”,

“parseSpec”:{

“format”:“json”,

“timestampSpec”:{

“column”:“timestamp”,

“format”:“auto”

},

“dimensionsSpec”:{

“dimensions”:[“page”,“language”,“user”,“unpatrolled”,“newPage”,“robot”,“anonymous”,“namespace”,“continent”,“country”,“region”,“city”],

“dimensionExclusions”:,

“spatialDimensions”:

}

}

},

“metricsSpec”:[{

“type”:“count”,

“name”:“count”

},{

“type”:“doubleSum”,

“name”:“added”,

“fieldName”:“added”

},{

“type”:“doubleSum”,

“name”:“deleted”,

“fieldName”:“deleted”

},{

“type”:“doubleSum”,

“name”:“delta”,

“fieldName”:“delta”

}],

“granularitySpec”:{

“type”:“uniform”,

“segmentGranularity”:“minute”,

“queryGranularity”:“NONE”

}

},

“ioConfig”:{

“type”:“realtime”,

“firehose”:{

“type”:“kafka-0.8”,

“consumerProps”:{

“zookeeper.connect”:“zk_ip:2181”,

zookeeper.connection.timeout.ms”:“15000”,

zookeeper.session.timeout.ms”:“15000”,

zookeeper.sync.time.ms”:“5000”,

group.id”:“druid-example”,

“fetch.message.max.bytes”:“1048586”,

“auto.offset.reset”:“largest”,

“auto.commit.enable”:“false”

},

“feed”:“kafka_topic_1”

},

Remove starting from here

“plumber”:{

“type”:“realtime”

}

End Remove

},

“tuningConfig”:{

“type”:“realtime”,

“maxRowsInMemory”:500000,

“intermediatePersistPeriod”:“PT10m”,

“windowPeriod”:“PT10m”,

“basePersistDirectory”:"/tmp/realtime/basePersist",

“rejectionPolicy”:{

“type”:“messageTime”

}

}

}

}

That should work around your ingestion errors.

yes,it works. thank you. maybe you tell me what plumber really is, and how to use it.

I copy the task from http://druid.io/docs/0.7.0/Tasks.html

so, why it does not work in my case?

another question, this real time index task works, but still no data was handed off to deep storage. when I try the realtime node, it can ingest data and run query, but it never push data to deep storage. how can I make data persisted into my hdfs? btw, realtime node publish his segments to metastore (mysql), but this real time index task had nothing to mysql table. what I should do to make it publish the segment info to mysql?

I have this in my _common/common.runtime.properties:

druid.extensions.coordinates=[“io.druid.extensions:druid-hdfs-storage”,“io.druid.extensions:druid-kafka-eight”,“io.druid.extensions:mysql-metadata-storage”]

Zookeeper

druid.zk.service.host=zk_ip

Metadata Storage (mysql)

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://mysql_ip:3306/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=diurd

Deep storage (local filesystem for examples - don’t use this in production)

druid.storage.type=hdfs

druid.storage.storage.storageDirectory=hdfs://hdfs_namenode:8020/druid/storage

Query Cache (we use a simple 10mb heap-based local cache on the broker)

druid.cache.type=local

druid.cache.sizeInBytes=10000000

are you setting prop "druid.publish.type" manually in any way?`` Or, Make sure it is set to
``druid.publish.type=metadata`

`

"metadata" is default value, I don't see `` "druid.publish.type" ` in your runtime.properties below but this is one usual cause for not seeing handoffs.

`

-- Himanshu

I added this prop, but nothing happened.

Do you see any errors at all on your realtime node? Also, try to enable debug logs to see if there is anything printed of interest?

also, check that whether hdfs is getting new files written at “hdfs://hdfs_namenode:8020/druid/storage” or if there are new entries being made in table “druid.druid_segments” ?

also try enabling, if not already, realtime monitoring by setting…
druid.monitoring.monitors=["```io.druid.segment.realtime.RealtimeMetricsMonitor"]`

druid.emitter=logging

Hi,

Handoff is generally controlled by three configs - rejectionPolicy, segmentGranularity, windowPeriod

In your configs you are using messageTime rejectionPolicy and in this case the “current-time” events can be accepted as long as the data is relatively in sequence.

what is the interval of data you are trying to ingest ? Do you have a constant stream of events with increasing timestamps being ingested ?

Also, If your events are “current-time” (based on the wall clock), you should consider using serverTime rejection policy as that is what almost every production cluster uses.

More details on RejectionPolicy can be found here: http://druid.io/docs/latest/Realtime-ingestion.html

I am doing some test for now, I used messageTime so I can fake some data and put them to kafka with console-producer. I use the data copied from http://druid.io/docs/0.7.0/Tutorial:-Loading-Streaming-Data.html . I replaced the time stamp

Actually, I can do the query, just no data pushed to hdfs.

I run my server with nohup, and found a exception in nohuo.out.

Exception in thread “plumber_merge_0” java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName

My hadoop version is 2.6.0 and I found a doc in druid home page about using other hadoop (http://druid.io/docs/0.7.0/Other-Hadoop.html), but it seems this can be used to make hadoop index work, is this can be used to realtime node and indexing server?

I found error in log file. I am running against hadoop 2.6.0.

2015-03-31T10:06:01,923 ERROR [roc_test7501_4-2015-01-31T01:02:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[roc_test7501_4]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.io.IOException, exceptionMessage=Mkdirs failed to create /roc_test7501_4/20150131T010200.000Z_20150131T010300.000Z/2015-01-31T01_02_00.000Z/0 (exists=false, cwd=file:/home/hbase/bin/druid), interval=2015-01-31T01:02:00.000Z/2015-01-31T01:03:00.000Z}

java.io.IOException: Mkdirs failed to create /roc_test7501_4/20150131T010200.000Z_20150131T010300.000Z/2015-01-31T01_02_00.000Z/0 (exists=false, cwd=file:/home/hbase/bin/druid)

at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) ~[hadoop-common-2.6.0.jar:?]

at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428) ~[hadoop-common-2.6.0.jar:?]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[hadoop-common-2.6.0.jar:?]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[hadoop-common-2.6.0.jar:?]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[hadoop-common-2.6.0.jar:?]

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775) ~[hadoop-common-2.6.0.jar:?]

at io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:79) ~[?:?]

Hi Roc, can you make sure you have permission to write to the local directory it is complaining about?