Change LocalStorage to S3 as deepstorage

Hello,

I am using Tranquility to fetch data from Kafka and send it to indexing service for indexing. Everything is working fine related to indexing and segment creation.

But all the task are going to local storage instead of deep storage. This is configuration for overlord and middle manager:

Overlord

druid.host={HOST_IP}:{OVERLORD_PORT}

druid.port=${OVERLORD_PORT}

druid.service=druid/overlord

Run the overlord in local mode with a single peon to execute tasks

#druid.indexer.runner.type=local

#druid.indexer.queue.startDelay=PT0M

#druid.indexer.runner.javaOpts="-server -Xmx256m"

#druid.indexer.fork.property.druid.processing.numThreads=1

#druid.indexer.fork.property.druid.computation.buffer.size=100000000

Run the overlord in remote mode

druid.indexer.runner.type=remote

druid.indexer.runner.minWorkerVersion=0

Upload all task logs to deep storage

druid.indexer.logs.type=log

druid.indexer.logs.directory=/mnt/xvdf/druid/prod/indexing-logs/v1

Store all task state in the metadata storage

druid.indexer.storage.type=metadata

Deep storage (local filesystem for examples - don’t use this in production)

Deep storage (local filesystem for examples - don’t use this in production)

druid.storage.type=s3

druid.storage.bucket=${bucket}

druid.storage.baseKey=druid/prod/segments/

druid.s3.accessKey=${DRUID_S3_ACCESS_KEY}

druid.s3.secretKey=${DRUID_S3_SECRET_KEY}

MiddleManager

druid.host={HOST_IP}:{MIDDLEMANAGER_PORT}

druid.port=${MIDDLEMANAGER_PORT}

druid.service=druid/middlemanager

Store task logs in deep storage

druid.indexer.logs.type=log

druid.indexer.logs.directory=/mnt/xvdf/druid/prod/indexing-logs/v1/

Resources for peons

druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

druid.indexer.task.baseDir=/mnt/xvdf/druid/prod/

druid.indexer.task.baseTaskDir=/mnt/xvdb/druid/prod/persistent/task/

Peon properties

druid.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912

druid.indexer.fork.property.druid.processing.numThreads=2

druid.indexer.fork.property.druid.segmentCache.locations=[{“path”: “/mnt/xvdb/druid/persistent/zk_druid”, “maxSize”: 0}]

druid.indexer.fork.property.druid.server.http.numThreads=50

druid.indexer.fork.property.druid.storage.directory=/mnt/xvdf/druid/prod/indexing-logs/v1

druid.indexer.fork.property.druid.storage.type=log

Deep storage (local filesystem for examples - don’t use this in production)

druid.storage.type=s3

druid.storage.bucket=${bucket}

druid.storage.baseKey=druid/prod/segments/

druid.s3.accessKey=${DRUID_S3_ACCESS_KEY}

druid.s3.secretKey=${DRUID_S3_SECRET_KEY}

druid.worker.capacity=5

druid.worker.ip=${HOST_IP}

druid.worker.version=0

This is the log message from Indexing Task, where I can see it is using local instead of S3.

2016-07-05T23:13:25,772 INFO [apnxs-streaming-lld-2016-07-05T22:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Pushing [apnxs-streaming-lld_2016-07-05T22:00:00.000Z_2016-07-05T23:00:00.000Z_2016-07-05T21:54:22.071Z] to deep storage

2016-07-05T23:13:25,792 INFO [apnxs-streaming-lld-2016-07-05T22:00:00.000Z-persist-n-merge] io.druid.segment.loading.LocalDataSegmentPusher - Copying segment[apnxs-streaming-lld_2016-07-05T22:00:00.000Z_2016-07-05T23:00:00.000Z_2016-07-05T21:54:22.071Z] to local filesystem at location[/tmp/druid/localStorage/apnxs-streaming-lld/2016-07-05T22:00:00.000Z_2016-07-05T23:00:00.000Z/2016-07-05T21:54:22.071Z/0]

2016-07-05T23:13:25,794 INFO [apnxs-streaming-lld-2016-07-05T22:00:00.000Z-persist-n-merge] io.druid.segment.loading.LocalDataSegmentPusher - Compressing files from[/mnt/xvdb/druid/prod/persistent/task/index_realtime_apnxs-streaming-lld_2016-07-05T22:00:00.000Z_0_0/work/persist/apnxs-streaming-lld/2016-07-05T22:00:00.000Z_2016-07-05T23:00:00.000Z/merged] to [/tmp/druid/localStorage/apnxs-streaming-lld/2016-07-05T22:00:00.000Z_2016-07-05T23:00:00.000Z/2016-07-05T21:54:22.071Z/0/index.zip]

Not able to find why it is using /tmp directory and how to change it.

Thanks,

Navneet

Any help regarding this? It looks like some configuration issue only. If somebody can guide where to look for it will be great.

Hey Navneet,

Have you included the S3 extension (‘druid-s3-extensions’) in your config? For details on including extensions see: http://druid.io/docs/latest/development/extensions.html

Hello David,

I have included these extensions in Common properties file:

druid.extensions.coordinates=[“graphite-emitter”,“druid-s3-extensions”, “mysql-metadata-storage”]

Thanks,

Navneet

Ah, what version of Druid are you running? Starting from Druid 0.9, the extension loading mechanism has changed to use druid.extensions.loadList instead of druid.extensions.coordinates. If you’re using 0.9 or later, try: druid.extensions.loadList=[“graphite-emitter”,“druid-s3-extensions”, “mysql-metadata-storage”]

In your startup logs, you should see some lines that look like this:

2016-07-07T04:35:57,665 INFO [main] io.druid.initialization.Initialization - Loading extension [druid-s3-extensions] for class [io.druid.cli.CliCommandCreator]
2016-07-07T04:35:57,665 INFO [main] io.druid.initialization.Initialization - added URL[file:/…/druid-s3-extensions/druid-s3-extensions-0.9.1.1.jar]

If you’re seeing those lines in your logs, then the extension is being loaded and we can look into what else might not quite be correct.

Thanks a lot it worked. But now I am facing issue with Graphite Emitter. I have added the graphite extension under extensions folder. Now there are issues with Classes.
Do we need to add some libraries in lib folder to use it?

I am getting this Exception:

ClassNotFoundException:

  • com.codahale.metrics.graphite.PickledGraphite

Hey Navneet,

Because the graphite extension is a contributed extension rather than core, you need to download it using pull-deps (imply distro: http://imply.io/docs/latest/extensions.html, see “community and third-party extensions”; community distro: http://druid.io/docs/latest/operations/including-extensions.html). This will get the extension jar and all of its dependencies as well.

Hey David,

Sorry,It was not fixed with the extension configuration. It is still creating in local storage. Do I need to restart zookeeper after these changes?

Thanks,

Navneet

Hey Navneet,

Hmm, no you don’t have to restart ZK, only the middle manager. Are you seeing the S3 extension loading logs when the middle manager and the indexing task is starting up? If the extension is getting loaded, are there any other interesting log messages in the task logs related to authenticating with S3 or reading/writing to that bucket?

Hi Navneet,

What happens if you move this configuration block:

druid.storage.type=s3

druid.storage.bucket=${bucket}

druid.storage.baseKey=druid/prod/segments/

druid.s3.accessKey=${DRUID_S3_ACCESS_KEY}

druid.s3.secretKey=${DRUID_S3_SECRET_KEY}

to the common runtime properties instead of overlord/middlemanager?

  • Jon

Hello David,

This is the startup log in Middle Manager

16-07-09T04:05:17,729 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, directory=’/mnt/xvdb/apps/druid-0.9.0//extensions’, hadoopDependenciesDir=‘hadoop-dependencies’, loadList=[graphite-emitter, druid-s3-extensions, mysql-metadata-storage]}]

2016-07-09T04:05:17,737 INFO [main] io.druid.initialization.Initialization - Loading extension [graphite-emitter] for class [io.druid.cli.CliCommandCreator]

2016-07-09T04:05:17,756 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/graphite-emitter/slf4j-api-1.7.7.jar]

2016-07-09T04:05:17,756 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/graphite-emitter/graphite-emitter-0.9.1.jar]

2016-07-09T04:05:17,756 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/graphite-emitter/metrics-core-3.1.2.jar]

2016-07-09T04:05:17,757 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/graphite-emitter/metrics-graphite-3.1.2.jar]

2016-07-09T04:05:17,758 INFO [main] io.druid.initialization.Initialization - Loading extension [druid-s3-extensions] for class [io.druid.cli.CliCommandCreator]

2016-07-09T04:05:17,759 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/druid-s3-extensions/druid-s3-extensions-0.9.0.jar]

2016-07-09T04:05:17,759 INFO [main] io.druid.initialization.Initialization - Loading extension [mysql-metadata-storage] for class [io.druid.cli.CliCommandCreator]

2016-07-09T04:05:17,760 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/mysql-metadata-storage/mysql-metadata-storage-0.9.1.1.jar]

2016-07-09T04:05:17,760 INFO [main] io.druid.initialization.Initialization - added URL[file:/mnt/xvdb/apps/druid-0.9.0/extensions/mysql-metadata-storage/mysql-connector-java-5.1.38.jar]

2016-07-09T04:05:18,017 INFO [main] io.druid.initialization.Initialization - Loading extension [graphite-emitter] for class [io.druid.initialization.DruidModule]

2016-07-09T04:05:18,019 INFO [main] io.druid.initialization.Initialization - Adding local file system extension module [io.druid.emitter.graphite.GraphiteEmitterModule] for class [io.druid.initialization.DruidModule]

2016-07-09T04:05:18,019 INFO [main] io.druid.initialization.Initialization - Loading extension [druid-s3-extensions] for class [io.druid.initialization.DruidModule]

2016-07-09T04:05:18,224 INFO [main] io.druid.initialization.Initialization - Adding local file system extension module [io.druid.storage.s3.S3StorageDruidModule] for class [io.druid.initialization.DruidModule]

2016-07-09T04:05:18,225 INFO [main] io.druid.initialization.Initialization - Adding local file system extension module [io.druid.firehose.s3.S3FirehoseDruidModule] for class [io.druid.initialization.DruidModule]

2016-07-09T04:05:18,225 INFO [main] io.druid.initialization.Initialization - Loading extension [mysql-metadata-storage] for class [io.druid.initialization.DruidModule]

2016-07-09T04:05:18,228 INFO [main] io.druid.initialization.Initialization - Adding local file system extension module [io.druid.metadata.storage.mysql.MySQLMetadataStorageModule] for class [io.druid.initialization.DruidModule]

4.235: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(707840K)] 212692K(1014528K), 0.0846012 secs] [Times: user=0.11 sys=0.00, real=0.08 secs]

This is the log from indexing task, which says using LocalStorage as Deep Storage:

2016-07-07T15:59:31,557 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.CuratorDiscoveryConfig] from props[druid.discovery.curator.] as [io.druid.server.initialization.CuratorDiscoveryConfig@629ae7e]
2016-07-07T15:59:31,690 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.indexing.common.RetryPolicyConfig] from props[druid.peon.taskActionClient.retry.] as [io.druid.indexing.common.RetryPolicyConfig@56681eaf]
2016-07-07T15:59:31,694 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.segment.loading.LocalDataSegmentPusherConfig] from props[druid.storage.] as [io.druid.segment.loading.LocalDataSegmentPusherConfig@5d7835a8]
2016-07-07T15:59:31,695 INFO [main] io.druid.segment.loading.LocalDataSegmentPusher - ** Configured local filesystem as deep storage**2016-07-07T15:59:31,699 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.common.aws.AWSCredentialsConfig] from props[druid.s3.] as [io.druid.common.aws.AWSCredentialsConfig@e48bf9a]
2016-07-07T15:59:31,767 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.storage.s3.S3DataSegmentPusherConfig] from props[druid.storage.] as [io.druid.storage.s3.S3DataSegmentPusherConfig@71d8cfe7]
2016-07-07T15:59:31,771 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.storage.s3.S3DataSegmentArchiverConfig] from props[druid.storage.] as [io.druid.storage.s3.S3DataSegmentArchiverConfig@31d6f3fe]
2016-07-07T15:59:31,790 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.client.DruidServerConfig] from props[druid.server.] as [io.druid.client.DruidServerConfig@249e0271]
2016-07-07T15:59:31,795 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.BatchDataSegmentAnnouncerConfig] from props[druid.announcer.] as [io.druid.server.initialization.BatchDataSegmentAnnouncerConfig@65753040]
2016-07-07T15:59:31,801 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.ZkPathsConfig] from props[druid.zk.paths.] as [io.druid.server.initialization.ZkPathsConfig@20876eed]
2016-07-07T15:59:31,806 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[interface io.druid.server.coordination.DataSegmentAnnouncerProvider] from props[druid.announcer.] as [io.druid.server.coordination.BatchDataSegmentAnnouncerProvider@2c58dcb1]
2016-07-07T15:59:31,808 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.client.coordinator.CoordinatorSelectorConfig] from props[druid.selectors.coordinator.] as [io.druid.client.coordinator.CoordinatorSelectorConfig@288214b1]
2016-07-07T15:59:31,812 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifierConfig] from props[druid.segment.handoff.] as [io.druid.segment.realtime.plumber.CoordinatorBasedSegmentHandoffNotifierConfig@1a87b51]

2016-

Ah, I see what’s wrong now - in your middle manager config, you have this line:

druid.indexer.fork.property.druid.storage.type=log

druid.indexer.fork.property.* properties are used to explicitly set peon properties and take precedence over inherited properties from the MM. Remove this line and hopefully things will start working (‘log’ is also not a valid value for druid.storage.type so it falls back to the default of local).

Thanks a lot David. It worked.

Hi , I am new to druid and amazon s3, I want to push my data from Druid to amazon s3
what are the steps to do it.

I have made few changes in my commom.runtime.properties

I want to ask do I need to create a bucket on s3 if yes , how and why?

how can I push my data on regular basis , is it thru a cli command to some UI pull.

Looking forward

druid.extensions.loadList=[“druid-hdfs-storage”, “druid-kafka-indexing-service”, “druid-datasketches”,“druid-s3-extensions”]

druid.storage.type=s3

#druid.storage.bucket=your-bucket → what is this location

druid.storage.baseKey=druid/segments

druid.s3.accessKey=*JOGAH2

druid.s3.secretKey=*****GYLH+Y6nx+j


Reply

Hi Aditya,

If you want to push data from Druid to s3, you can use deep storage as s3 and set storage type to s3, see https://druid.apache.org/docs/latest/development/extensions-core/s3.html#deep-storage

Yes you would need to create a bucket on your s3 and provide that in your properties. How to create a bucket in s3, for that you should refer to aws docs : https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html or https://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html

And why you need to create a bucket : well you need to create a bucket and point Druid to it, so Druid can create segments in your s3 bucket, I’m not sure if that’s your question though.

What is the source of your data ? So you will ingest from your source and Druid would push it to your s3 bucket, after ingesting from your input source.

Thanks,

Surekha

Hi,
Thanks for the reply , If you see above, I Have mentioned the settings in my common.runtime.properties.

Is there anything else that I need to mention?

Also, how does the push mechanism work? Like for every segment or after a certain time interval.

druid.extensions.loadList=[“druid-hdfs-storage”, “druid-kafka-indexing-service”, “druid-datasketches”,“druid-s3-extensions”]

druid.storage.type=s3

#druid.storage.bucket=your-bucket → what is this location

druid.storage.baseKey=druid/segments

druid.s3.accessKey=*JOGAH2

druid.s3.secretKey=*****GYLH+Y6nx+j

can anyone reply on this please?

Hey,

You’d need to mention your s3 bucket name in your config in addition to the rest. So Druid creates the segments for each time interval, which is configurable in your ingestion spec, it’s defined by the segmentGranularity parameter of the granularitySpec. And these segments would be published depending upon your tuningConfig parameters https://druid.apache.org/docs/latest/ingestion/native-batch.html#tuningconfig Are you planning on doing batch or stream ingestion ?

Hi , I am planning to do stream ingestion.
I tried doing it what you just mentioned but I was not able to push anything to the bucket . I am pasting my config below please check

druid.extensions.loadList=[“druid-hdfs-storage”, “druid-kafka-indexing-service”, “druid-datasketches”,“druid-s3-extensions”]

For S3:

druid.storage.type=s3
druid.storage.bucket=druids3migration --> My bucketname in S3, I have also created a folder. But is it necessary to mention that or this will work?
druid.storage.baseKey=druid/segments
druid.s3.accessKey=AKI*************PVOQ
druid.s3.secretKey=************************5HjnHY6nx+jXn7Pzisbq+s

Please let me what else I need to change now.

Are your ingestion tasks succeeding, do you see any errors in task logs ? Also check your MM and overlord logs.