Error Using Azure Data lake as deep storage

Hello,
am new to druid, to use Azure Data lake as deep storage, i followed this link Microsoft Azure · Apache Druid and made changes to common.runtime.properties file.

in unified console when i try to connect to data by clicking Azure data lake, and after providing the URI i get following error.

Blockquote

Error: Cannot construct instance of org.apache.druid.data.input.azure.AzureInputSource, problem: Invalid URI scheme [https://storage.blob.core.windows.net/databsesnew/output_csv/events.csv] must be [azure] at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 190] (through reference chain: org.apache.druid.indexing.overlord.sampler.IndexTaskSamplerSpec[“spec”]->org.apache.druid.indexing.common.task.IndexTask$IndexIngestionSpec["ioConfi

Blockquote

Here is copy of my common.runtime.properties file

Blockquote

druid.extensions.loadList=[“druid-azure-extensions”, “druid-hdfs-storage”, “druid-kafka-indexing-service”, “druid-datasketches”]

druid.host=localhost

Logging

Log all runtime properties on startup. Disable to avoid logging properties on startup:

druid.startup.logging.logProperties=true

Zookeeper

druid.zk.service.host=localhost
druid.zk.paths.base=/druid

Metadata storage

For Derby server on your Druid Coordinator (only viable in a cluster with a single Coordinator, no fail-over):

druid.metadata.storage.type=derby
druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
druid.metadata.storage.connector.host=localhost
druid.metadata.storage.connector.port=1527

Deep storage

druid.storage.type=azure
druid.azure.account=s_____orage
druid.azure.key=*************
druid.azure.container=*************
#druid.azure.prefix=“”
druid.azure.protocol=https
druid.azure.maxTries=3
druid.azure.maxListingLength=1024

druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/shared/indexing-logs

Service discovery

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

Monitoring

druid.monitoring.monitors=[“org.apache.druid.java.util.metrics.JvmMonitor”]
druid.emitter=noop
druid.emitter.logging.logLevel=info

Storage type of double columns

commiting this will lead to index double as float at the storage layer

druid.indexing.doubleStorage=double

Security

druid.server.hiddenProperties=[“druid.s3.accessKey”,“druid.s3.secretKey”,“druid.metadata.storage.connector.password=***”]

SQL

druid.sql.enable=true

Planning SQL query when there is aggregate distinct in the statement

druid.sql.planner.useGroupingSetForExactDistinct=true

Lookups

druid.lookup.enableLookupSyncOnStartup=false

Expression processing config

druid.expressions.useStrictBooleans=true

Http client

druid.global.http.eagerInitialization=false

Blockquote

Welcome @Shuhool-fotedar!

Still looking for more information about that error, but I came across this short video about best practices for deploying on Azure:

Thanks Mark, i solved the current problem, Issue was with the path i was giving in unified console.

1 Like

I had the same question as the OP and found the documentation a bit unclear, because I was unsure how to authenticate with Azure Blob Storage and where the azure:// URI scheme came from.

I managed to get it to work with these settings:

  1. I already had the Microsoft Azure Extensions implemented and configured correctly. It appears that the access key in the configuration is not only used for storing data in cold storage but also for retrieving data.
  2. With this in mind, the URIs really need to look like in the documentation, even if they appear like sample values. This means azure://<CONTAINER-NAME>/<PREFIX>/. The container name is not the name of the blob storage account, but instead the name of the “root” folder you want to open.

Hope that helps the next person looking for this answer <3

2 Likes

Very informative and useful article for beginners!

1 Like