Configuration Issues

I am having some difficulty understanding the Druid configuration options. Currently, my understanding is based on the following documentation pages ( http://druid.io/docs/0.7.3/Configuration.html , http://druid.io/docs/0.7.3/Production-Cluster-Configuration.html ) and the Druid configuration directory given below.

druid-0.7.3/config/

druid-0.7.3/config/_common

druid-0.7.3/config/_common/log4j2.xml

druid-0.7.3/config/_common/common.runtime.properties

druid-0.7.3/config/broker

druid-0.7.3/config/broker/runtime.properties

druid-0.7.3/config/coordinator

druid-0.7.3/config/coordinator/runtime.properties

druid-0.7.3/config/historical

druid-0.7.3/config/historical/runtime.properties

druid-0.7.3/config/overlord

druid-0.7.3/config/overlord/runtime.properties

druid-0.7.3/config/realtime

druid-0.7.3/config/realtime/runtime.properties

``

Issue 1:

In the Production Cluster Configuration it mentions a “MiddleManager Node” and specific configuration options including a sample “Runtime.properties” file yet there is no “middlemanager” config folder (Shown above). Where does the MiddleManager Node “Runtime.properties” file exist? Is it possible that the only place to set properties for this MiddleManager Node is via the “common.runtime.properties” file?

Issue 2:

I have noticed that after running a Druid Hadoop Index Task, a huge amount of temp base flush files are created in my /tmp/ directory: “/tmp/base988516203438225223flush”, “base996856649186480337flush”, “base999908642298810186flush”.

Am I correct in guessing that the “druid.indexer.logs.directory” property described on http://druid.io/docs/0.7.3/Indexing-Service-Config.html is responsible for the output location? Where can I set this value?

Also the documentation mentions that the values mentioned on http://druid.io/docs/0.7.3/Indexing-Service-Config.html , that “Must be set on Overlord and Middle Manager”. How do I set these values? What properties files do I edit, specifically for the Middle Manager?

– My Hadoop Index Task –

I have created a Druid Hadoop Index Task similar to the example “wikipedia_index_hadoop_task.json” shown on http://druid.io/docs/latest/Tutorial:-Loading-Batch-Data.html

I then execute this task with the following curl command:

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @my_test_index_hadoop.json localhost:8090/druid/indexer/v1/task

``

– Found this in my Logs –

2015-06-04T04:02:00,745 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base4637660840424535801flush/merged/v8-tmp] completed index.drd in 3 millis.

2015-06-04T04:02:00,766 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base4637660840424535801flush/merged/v8-tmp] completed dim conversions in 21 millis.

2015-06-04T04:02:00,819 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base4637660840424535801flush/merged/v8-tmp] completed walk through of 45 rows in 53 millis.

``

2015-06-04T04:02:00,837 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base4637660840424535801flush/merged/v8-tmp] completed inverted.drd in 18 millis.

2015-06-04T04:02:00,844 INFO [pool-20-thread-1] io.druid.segment.IndexIO$DefaultIndexIOHandler - Converting v8[/tmp/base4637660840424535801flush/merged/v8-tmp] to v9[/tmp/base4637660840424535801flush/merged]

``

2015-06-04T04:02:01,069 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base3783834297381322300flush/merged/v8-tmp] completed index.drd in 0 millis.

2015-06-04T04:02:01,076 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base3783834297381322300flush/merged/v8-tmp] completed dim conversions in 6 millis.

2015-06-04T04:02:01,093 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base3783834297381322300flush/merged/v8-tmp] completed walk through of 115 rows in 16 millis.

``

2015-06-04T04:02:01,110 INFO [pool-20-thread-1] io.druid.segment.IndexMerger - outDir[/tmp/base3783834297381322300flush/merged/v8-tmp] completed inverted.drd in 17 millis.

2015-06-04T04:02:01,112 INFO [pool-20-thread-1] io.druid.segment.IndexIO$DefaultIndexIOHandler - Converting v8[/tmp/base3783834297381322300flush/merged/v8-tmp] to v9[/tmp/base3783834297381322300flush/merged]

``

Hi Mark, please see inline.

I am having some difficulty understanding the Druid configuration options. Currently, my understanding is based on the following documentation pages ( http://druid.io/docs/0.7.3/Configuration.html , http://druid.io/docs/0.7.3/Production-Cluster-Configuration.html ) and the Druid configuration directory given below.

druid-0.7.3/config/

druid-0.7.3/config/_common

druid-0.7.3/config/_common/log4j2.xml

druid-0.7.3/config/_common/common.runtime.properties

druid-0.7.3/config/broker

druid-0.7.3/config/broker/runtime.properties

druid-0.7.3/config/coordinator

druid-0.7.3/config/coordinator/runtime.properties

druid-0.7.3/config/historical

druid-0.7.3/config/historical/runtime.properties

druid-0.7.3/config/overlord

druid-0.7.3/config/overlord/runtime.properties

druid-0.7.3/config/realtime

druid-0.7.3/config/realtime/runtime.properties

``

Issue 1:

In the Production Cluster Configuration it mentions a “MiddleManager Node” and specific configuration options including a sample “Runtime.properties” file yet there is no “middlemanager” config folder (Shown above). Where does the MiddleManager Node “Runtime.properties” file exist? Is it possible that the only place to set properties for this MiddleManager Node is via the “common.runtime.properties” file?

The middle manager is a node that is part of the indexing service. See: http://druid.io/docs/latest/Indexing-Service.html

The example configs don’t have it listed as the simple POC on your laptop setup just uses a overlord running in local mode to do indexing. Indexing can be done in a distributed fashion for higher scale workloads.

Issue 2:

I have noticed that after running a Druid Hadoop Index Task, a huge amount of temp base flush files are created in my /tmp/ directory: “/tmp/base988516203438225223flush”, “base996856649186480337flush”, “base999908642298810186flush”.

Am I correct in guessing that the “druid.indexer.logs.directory” property described on http://druid.io/docs/0.7.3/Indexing-Service-Config.html is responsible for the output location? Where can I set this value?

This config is for setting where task logs are stored.

Also the documentation mentions that the values mentioned on http://druid.io/docs/0.7.3/Indexing-Service-Config.html , that “Must be set on Overlord and Middle Manager”. How do I set these values? What properties files do I edit, specifically for the Middle Manager?

For every type of Druid node, you can include configuration in the runtime.properties for that node. When you start up a node, the runtime.properties should be included in the classpath.

Just to follow up with my issues.

Issue 1:

What is the location for the runtime.properties file for the Middle Manager mentioned in the documentation ( http://druid.io/docs/0.7.3/Production-Cluster-Configuration.html )?

Issue 2:

How do I change where the here task logs are stored? For example, from “/tmp/base988516203438225223flush” to “/tmp/druid-tasklogs/base988516203438225223flush”?

What is the location for the runtime.properties file for the Middle Manager mentioned in the documentation ( http://druid.io/docs/0.7.3/Production-Cluster-Configuration.html )?

Druid looks for its configuration files in the classpath.

Issue 2:

How do I change where the here task logs are stored? For example, from “/tmp/base988516203438225223flush” to “/tmp/druid-tasklogs/base988516203438225223flush”?

You can set this using

druid.indexer.task.baseDir