No can be found in Druid

Hi Guys,

we want to evaluate druid for our business case, but we currently have a problem to get our data in. Our Data is delivered via RabbitMQ as a json.

What we have done:

(1) Setup Druid manually using these commands from the docker script: https://github.com/druid-io/docker-druid/blob/master/Dockerfile (using 0.7.3)

(2) Setup a realtime spec which is the following:

[{

“dataSchema” : {

“dataSource” : “queuehub”,

“metricsSpec” : [

{“type” : “doubleSum”, “name”: “cost”, “fieldName”:“c”},

{“type” : “count”, “name” : “count”}

],

“indexGranularity”: “minute”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“timestampSpec” : { “column” : “timestamp”, “format” : “auto” },

“dimensionsSpec”: {

“dimensionExclusions”: ,

“dimensions”: [

“c”,

#b”,

#m”,

#pm”,

“bl”,

“q”,

“pt”,

“v”,

“-kca”,

“-kcc”,

“-kt”

],

“spatialDimensions”:

}

}

}

},

“shardSpec”: {“type”: “none”},

“ioConfig” : {

“type” : “realtime”,

“firehose” : {

“type” : “rabbitmq”,

“connection” : {

“host”: “192.168.123.2”,

“port”: “5672”,

“username” : “druid”,

“password” : “durid”,

“virtualHost” : “/”

},

“config” : {

“exchange” : “staging.druid”,

“queue” : “staging.druid”,

“routingKey” : “#”,

“durable” : “true”,

“exclusive” : “false”,

“autoDelete” : “false”,

“maxRetries” : “10”,

“retryIntervalSeconds” : “1”,

“maxDurationSeconds” : “300”,

“autoAck” : “true”

},

“parser” : {

“timestampSpec” : { “column” : “timestamp”, “format” : “posix” },

“data” : { “format” : “json” }

}

},

“plumber”: {

“type”: “realtime”,

“windowPeriod”: “PT1m”,

“segmentGranularity”: “minute”,

“basePersistDirectory”: “/tmp/druid/localStorage”

}

},

“tuningConfig”: {

“type” : “realtime”,

“maxRowsInMemory”: 500000,

“intermediatePersistPeriod”: “PT1m”,

“windowPeriod”: “PT10m”,

“basePersistDirectory”: “/tmp/realtime/basePersist”,

“rejectionPolicy”: {

“type”: “serverTime”

}

}

}]

``

The realtime node is started with the following:

java -server -Xms5g -XX:NewSize=2g -XX:MaxNewSize=2g -XX:MaxDirectMemorySize=10g -Ddruid.processing.buffer.sizeBytes=10000000 -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.extensions.coordinates="[“io.druid.extensions:druid-rabbitmq”]" -Ddruid.host=192.168.2.123 -Ddruid.realtime.specFile=/usr/local/druid/config/realtime.spec -Ddruid.zk.service.host=localhost -Ddruid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid -Ddruid.db.connector.user=druid -Ddruid.db.connector.password=diurd -Ddruid.database.segmentTable=prod_segments -cp /usr/local/druid/lib/* io.druid.cli.Main server realtime

``

and it consumes the data:

2015-06-26T06:32:16,654 INFO [queuehub-incremental-persist] io.druid.firehose.rabbitmq.RabbitMQFirehoseFactory - Acknowledging delivery of messages up to tag: 21380107

2015-06-26T06:33:16,655 INFO [chief-queuehub] io.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[queuehub]

2015-06-26T06:33:16,655 INFO [queuehub-incremental-persist] io.druid.firehose.rabbitmq.RabbitMQFirehoseFactory - Acknowledging delivery of messages up to tag: 21420766

2015-06-26T06:34:16,656 INFO [chief-queuehub] io.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[queuehub]

2015-06-26T06:34:16,656 INFO [queuehub-incremental-persist] io.druid.firehose.rabbitmq.RabbitMQFirehoseFactory - Acknowledging delivery of messages up to tag: 21461882

``

Is there maybe a problem with parsing the data? The key timestamp of the JSON from RabbitMQ is an Unix-Timestamp

This is the output of the other Deamons:

2015-06-26T06:34:27,428 INFO [DatabaseRuleManager-Exec–0] io.druid.metadata.SQLMetadataRuleManager - Polled and found rules for 1 datasource(s)

2015-06-26 06:34:28,013 DEBG ‘druid-indexing-service’ stdout output:

2015-06-26T06:34:28,013 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 0 tasks from storage (0 tasks added, 0 tasks removed).

2015-06-26 06:34:48,492 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:34:48,492 INFO [Coordinator-Exec–0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue is empty.

2015-06-26T06:34:48,492 INFO [Coordinator-Exec–0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant terminate queue is empty.

2015-06-26 06:34:48,492 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:34:48,492 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorBalancer - [_default_tier]: One or fewer servers found. Cannot balance.

2015-06-26T06:34:48,492 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Load Queues:

2015-06-26T06:34:48,492 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[192.168.123.2:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 0 bytes served.

2015-06-26 06:35:27,220 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:35:27,220 WARN [DatabaseSegmentManager-Exec–0] io.druid.metadata.SQLMetadataSegmentManager - No segments found in the database!

2015-06-26 06:35:27,430 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:35:27,430 INFO [DatabaseRuleManager-Exec–0] io.druid.metadata.SQLMetadataRuleManager - Polled and found rules for 1 datasource(s)

2015-06-26 06:35:28,013 DEBG ‘druid-indexing-service’ stdout output:

2015-06-26T06:35:28,013 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 0 tasks from storage (0 tasks added, 0 tasks removed).

2015-06-26 06:35:48,493 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:35:48,492 INFO [Coordinator-Exec–0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue is empty.

2015-06-26T06:35:48,493 INFO [Coordinator-Exec–0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant terminate queue is empty.

2015-06-26 06:35:48,493 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:35:48,493 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorBalancer - [_default_tier]: One or fewer servers found. Cannot balance.

2015-06-26T06:35:48,493 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Load Queues:

2015-06-26T06:35:48,493 INFO [Coordinator-Exec–0] io.druid.server.coordinator.helper.DruidCoordinatorLogger - Server[192.168.123.2:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 0 bytes served.

2015-06-26 06:36:27,221 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:36:27,221 WARN [DatabaseSegmentManager-Exec–0] io.druid.metadata.SQLMetadataSegmentManager - No segments found in the database!

2015-06-26 06:36:27,431 DEBG ‘druid-coordinator’ stdout output:

2015-06-26T06:36:27,431 INFO [DatabaseRuleManager-Exec–0] io.druid.metadata.SQLMetadataRuleManager - Polled and found rules for 1 datasource(s)

2015-06-26 06:36:28,013 DEBG ‘druid-indexing-service’ stdout output:

2015-06-26T06:36:28,013 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 0 tasks from storage (0 tasks added, 0 tasks removed).

``

These have been started using supervicord with the following config:

[supervisord]

nodaemon=true

loglevel=debug

[program:zookeeper]

command=/usr/local/zookeeper/bin/zkServer.sh start-foreground

user=daemon

priority=0

#[program:mysql]

#command=/usr/bin/pidproxy /var/run/mysqld/mysqld.pid /usr/bin/mysqld_safe

#user=mysql

#priority=0

[program:druid-coordinator]

user=druid

command=java

-server

-Xms10g

-XX:NewSize=512m

-XX:MaxNewSize=512m

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Ddruid.host=%(ENV_HOSTIP)s

-Ddruid.extensions.coordinates=[“io.druid.extensions:mysql-metadata-storage”]

-Ddruid.extensions.localRepository=/usr/local/druid/repository

-Ddruid.metadata.storage.type=mysql

-Ddruid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid

-Ddruid.metadata.storage.connector.user=druid

-Ddruid.metadata.storage.connector.password=diurd

-Ddruid.coordinator.startDelay=PT20S

-Ddruid.storage.type=local

-Ddruid.storage.storageDirectory="/tmp/druid/localStorage"

-cp /usr/local/druid/lib/*

io.druid.cli.Main server coordinator

redirect_stderr=true

priority=100

[program:druid-indexing-service]

user=druid

command=java

-server

-Xms5g

-XX:NewSize=2g

-XX:MaxNewSize=2g

-XX:MaxDirectMemorySize=10g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Ddruid.host=%(ENV_HOSTIP)s

-Ddruid.extensions.coordinates=[“io.druid.extensions:mysql-metadata-storage”]

-Ddruid.extensions.localRepository=/usr/local/druid/repository

-Ddruid.metadata.storage.type=mysql

-Ddruid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid

-Ddruid.metadata.storage.connector.user=druid

-Ddruid.metadata.storage.connector.password=diurd

-Ddruid.indexer.storage.type=metadata

-Ddruid.peon.mode=local

-Ddruid.indexer.queue.startDelay=PT0M

-Ddruid.storage.type=local

-Ddruid.storage.storageDirectory="/tmp/druid/localStorage"

-Ddruid.indexer.runner.javaOpts="-server -Xmx1g"

-cp /usr/local/druid/lib/*

io.druid.cli.Main server overlord

[program:druid-historical]

user=druid

command=java

-server

-Xmx4g

-Xms4g

-XX:NewSize=1g

-XX:MaxDirectMemorySize=9g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Ddruid.host=%(ENV_HOSTIP)s

-Ddruid.extensions.coordinates=[“io.druid.extensions:druid-s3-extensions”]

-Ddruid.extensions.localRepository=/usr/local/druid/repository

-Ddruid.metadata.storage.type=mysql

-Ddruid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid

-Ddruid.metadata.storage.connector.user=druid

-Ddruid.metadata.storage.connector.password=diurd

-Ddruid.s3.accessKey=AKIAIMKECRUYKDQGR6YQ

-Ddruid.s3.secretKey=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

-Ddruid.computation.buffer.size=67108864

-Ddruid.segmentCache.locations="[{“path”:"/var/tmp/druid/indexCache",“maxSize”:5000000000}]"

-Ddruid.server.maxSize=5000000000

-cp /usr/local/druid/lib/*

-Ddruid.storage.type=local

-Ddruid.storage.storageDirectory="/tmp/druid/localStorage"

io.druid.cli.Main server historical

redirect_stderr=true

priority=100

[program:druid-broker]

user=druid

command=java

-server

-Xmx10g

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Ddruid.host=%(ENV_HOSTIP)s

-Ddruid.computation.buffer.size=67108864

-Ddruid.broker.cache.sizeInBytes=33554432

-Ddruid.broker.cache.useCache=true

-Ddruid.broker.cache.populateCache=true

-Ddruid.metadata.storage.type=mysql

-Ddruid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid

-Ddruid.metadata.storage.connector.user=druid

-Ddruid.metadata.storage.connector.password=diurd

-Ddruid.storage.type=local

-Ddruid.storage.storageDirectory="/tmp/druid/localStorage"

-cp /usr/local/druid/lib/*

io.druid.cli.Main server broker

redirect_stderr=true

priority=100

``

inside a screen with:

export HOSTIP=“192.168.123.2” && exec /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf

``

But if we look now into the web interface => There is no datasource shown … also if I take manually a look inside the MySQL or the filesystem.

Thanks for your help in Advance

One thing that results in this sort of issue commonly is that realtime
ingestion is written assuming an up-to-date stream of data. Where
up-to-date means that the timestamps are happening "now". Is it
possible that the data you are flowing through rabbitMQ has "old"
timestamps (where old is defined as more than windowPeriod, e.g. 10
minutes)?

If so, you might be better off setting your "rejectionPolicy" to
"messageTime" instead of "serverTime". "messageTime" defines "now" as
the max timestamp seen so far where "serverTime" defines it as
System.currentTimeMillis(). Even when using "messageTime", though,
your data should be delivered in near-time order.

You can verify that messages are being thrown away by turning on
metric logging with

druid.emitter=logging

With that set, you *should* see minutely entries in the logs with like
"events/processed" and "events/thrownAway". If messages are being
thrown away due to the timestamp being "too old" then it will
increment "events/thrownAway".

--Eric