Quick start on Digital ocean failed

I created a new Ubuntu Droplet to run Druid.

http://imply.io/docs/latest/quickstart#start-up-services

I get the following FAILED message:

root@ubuntu-512mb-sfo1-01:~/imply-1.1.1# bin/post-index-task --file quickstart/wikiticker-index.json

Task started: index_hadoop_wikiticker_2016-03-18T06:34:43.891Z

**Task log: ** http://localhost:8090/druid/indexer/v1/task/index_hadoop_wikiticker_2016-03-18T06:34:43.891Z/log

**Task status: ** http://localhost:8090/druid/indexer/v1/task/index_hadoop_wikiticker_2016-03-18T06:34:43.891Z/status

Task index_hadoop_wikiticker_2016-03-18T06:34:43.891Z still running…

Task index_hadoop_wikiticker_2016-03-18T06:34:43.891Z still running…

Task index_hadoop_wikiticker_2016-03-18T06:34:43.891Z still running…

Task finished with status: FAILED

However, the services appear to work fine:

root@ubuntu-512mb-sfo1-01:~/imply-1.1.1# bin/supervise -c conf/supervise/quickstart.conf

[Fri Mar 18 02:33:22 2016] Running command[zk], logging to[/root/imply-1.1.1/var/sv/zk.log]: bin/run-zk conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[coordinator], logging to[/root/imply-1.1.1/var/sv/coordinator.log]: bin/run-druid coordinator conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[broker], logging to[/root/imply-1.1.1/var/sv/broker.log]: bin/run-druid broker conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[historical], logging to[/root/imply-1.1.1/var/sv/historical.log]: bin/run-druid historical conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[overlord], logging to[/root/imply-1.1.1/var/sv/overlord.log]: bin/run-druid overlord conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[middleManager], logging to[/root/imply-1.1.1/var/sv/middleManager.log]: bin/run-druid middleManager conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[pivot], logging to[/root/imply-1.1.1/var/sv/pivot.log]: bin/run-pivot conf-quickstart

[Fri Mar 18 02:33:22 2016] Running command[tranquility-server], logging to[/root/imply-1.1.1/var/sv/tranquility-server.log]: bin/tranquility server -configFile conf-quickstart/tranquility/server.json

Hi Timothy, what does the Task log say?

I wasn’t able to connect via curl to get it when it reached a ‘failed’ status. I was finally able to do it while it was in ‘still running’ mode and got the log – didn’t have enough memory. Was looking for specs in the quick start for the size of the machine…it had 2GB RAM, bumping up to 8GB but don’t know for sure…trying it not. Is there someplace in the docs that references the pre-requisite hardware to do the quickstart? thanks.

  • 8G of RAM
  • 2 vCPUs
    minimum

although ideally 16G of RAM and 8 vCPUs would be nice. The quickstart was designed to run locally on your laptop.

Hey Timothy,

4GB of RAM should be enough, generally I’m doing my testing on a docker-machine vm with that much ram. We’ll update the docs to reflect that.

It worked, but now moved over to docker.

Hi, after trying to do a file data load per the quick start instructions I get the following:

Traceback (most recent call last):
File “bin/post-index-task”, line 110, in
main()
File “bin/post-index-task”, line 99, in main
task_id = json.loads(post_task(args, read_task_file(args), submit_timeout_at))[“task”]
File “bin/post-index-task”, line 34, in post_task
raise_friendly_error(e)
File “bin/post-index-task”, line 84, in raise_friendly_error
raise Exception(“HTTP Error {0}: {1}, check overlord log for more details.\n{2}”.format(e.code, e.reason, text))
Exception: HTTP Error 500: Server Error, check overlord log for more details.
javax.servlet.ServletException: com.fasterxml.jackson.databind.JsonMappingException: The end instant must be greater or equal to the start (through reference chain: java.util.ArrayList[0])

Here is the overlord.log…

2016-03-21T17:22:29,146 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-histogram
] for class[io.druid.cli.CliCommandCreator]
2016-03-21T17:22:34,703 INFO [main] io.druid.initialization.Initialization - Added URL[
s-repo/io/druid/extensions/druid-histogram/0.8.3-iap3/druid-histogram-0.8.3-iap3.jar]
2016-03-21T17:22:34,704 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-datasketc
hes] for class[io.druid.cli.CliCommandCreator]
2016-03-21T17:22:34,833 INFO [main] io.druid.initialization.Initialization - Added URL[
s-repo/io/druid/extensions/druid-datasketches/0.8.3-iap3/druid-datasketches-0.8.3-iap3.jar]
2016-03-21T17:22:34,833 INFO [main] io.druid.initialization.Initialization - Added URL[
s-repo/com/yahoo/datasketches/sketches-core/0.2.2/sketches-core-0.2.2.jar]
2016-03-21T17:22:37,890 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-histogram
] for class[io.druid.initialization.DruidModule]
2016-03-21T17:22:37,927 INFO [main] io.druid.initialization.Initialization - Adding remote extension module[io.druid.query.aggrega
tion.histogram.ApproximateHistogramDruidModule] for class[io.druid.initialization.DruidModule]
2016-03-21T17:22:37,928 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-datasketc
hes] for class[io.druid.initialization.DruidModule]
2016-03-21T17:22:37,932 INFO [main] io.druid.initialization.Initialization - Adding remote extension module[io.druid.query.aggrega
tion.datasketches.theta.SketchModule] for class[io.druid.initialization.DruidModule]
2016-03-21T17:22:38,011 INFO [main] io.druid.initialization.Initialization - Adding remote extension module[io.druid.query.aggrega
tion.datasketches.theta.oldapi.OldApiSketchModule] for class[io.druid.initialization.DruidModule]
2016-03-21T17:22:46,148 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.metadata.MetadataStorageConnecto
rConfig] from props[druid.metadata.storage.connector.] as [DbConnectorConfig{createTables=true, connectURI=‘jdbc:derby://localhost
:1527/var/druid/metadata.db;create=true’, user=‘null’, passwordProvider=null}]
2016-03-21T17:22:46,328 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.metadata.MetadataStorageTablesCo
nfig] from props[druid.metadata.storage.tables.] as [io.druid.metadata.MetadataStorageTablesConfig@a20b94b]
2016-03-21T17:22:47,252 INFO [main] io.druid.metadata.storage.derby.DerbyConnector - Configured Derby as metadata storage
2016-03-21T17:22:48,024 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.common.config.ConfigManagerConfi
g] from props[druid.manager.config.] as [io.druid.common.config.ConfigManagerConfig@730e5763]
2016-03-21T17:22:48,193 INFO [main] io.druid.metadata.storage.derby.DerbyConnector - Configured Derby as metadata storage
2016-03-21T17:22:48,355 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.DruidNode] from props[dru
id.] as [DruidNode{serviceName=‘druid/overlord’, host=‘3563cc04fcd8’, port=8090}]

compose-unknown-contact.jpg

postbox-contact.jpg

postbox-contact.jpg

Can you post your indexing spec?

postbox-contact.jpg

compose-unknown-contact.jpg

postbox-contact.jpg

Here you go, I attached it.I probably did something wrong there.

Here is a sample of the json (non-flattened):

{
“type”: “waf”,
“rayid”: “xxxxxxxxx”,
“zone_id”: xxxxxxxxx2,
“timestamp”: “2016-03-21T20:47:25.07Z”,
“client_ip”: “xxx.xxx.xxx.”,
“host”: “xxxxx…com”,
“http_method”: “POST”,
“protocol”: “HTTP/2.0”,
“uri”: “/event_logger”,
“country”: “us”,
“action”: “allow”,
“rule_id”: “xxxxxxxxx”,
“colo”: xxxx,
“edge_dur”: 47000064,
“user_agent”: “Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36”,
“rule_group”: “xxxx_rule”,
“activated_rules”: [
“xxx”,
“yyyyy”,
],
“exit_code”: xxxx,
“anomaly_score”: xxxxx,
“sql_injection_score”: xxxxx,

postbox-contact.jpg

postbox-contact.jpg

compose-unknown-contact.jpg

waf-index.json (1.44 KB)

nevermindn I found the bug
in my spec – typo

postbox-contact.jpg

postbox-contact.jpg

compose-unknown-contact.jpg

Do you know have your indexing spec? The thing you used in this command

bin/post-index-task --file <my_indexing_spec.json>

postbox-contact.jpg

compose-unknown-contact.jpg

postbox-contact.jpg

I sent it but I found a bug and got t working. It’s awesome. Want to understand scaling when we meet amongst other things. Our small subset does 300MB a day.

postbox-contact.jpg

postbox-contact.jpg

compose-unknown-contact.jpg

I copoied to the docker image a larger file under the same name and adjusted the intervals in the indexing spec to include the added timestamp range…but the data I get back remains the same as before. Do I need to do anything if I change the data but the filename remains the same?

When I change the filename, datasource name and path, the new data is reflected.

I am also trying to get a feel for what values to put for segmentGranularity. The time-interval is a day, but the size of the data is 90MB.

compose-unknown-contact.jpg

postbox-contact.jpg

postbox-contact.jpg

My file has 130k events, but druid, with segmentGranularity and queryGranularity both an hour only shows 599 events. The time interval values appear to be correct in the indexing spec…

compose-unknown-contact.jpg

postbox-contact.jpg

postbox-contact.jpg

Hi Timothy, Druid rolls up data as it ingests it as a way to greatly reduce the data it has to store.

http://druid.io/docs/latest/design/index.html

Also, ensure that the “intervals” in your indexing spec matches the interval of the data you want to ingest.

Also: http://druid.io/docs/latest/ingestion/faq.html#not-all-of-my-events-were-ingested

Hi yes I read that earlier and looked for metrics logged. I checked in the var/sv

But didn’t see them. Where in the docker file system would I find the metrics for ingest?

Are you doing batch or streaming ingestion? For batch ingestion, the windowPeriod stuff doesn’t apply.

Can you verify how much your data is rolling up?

Batch. Is there a way to see raw events associated with a row from the roll up to see how much data is rolling up?

That’s als a related question. Is there a way to drill into raw events, for example, in a point in a time series?