Fail to configure data ingestion from minio object storage

Hi Druiders,

I’m looking for a configuration allowing me to ingest data from minio object storage bucket.

I’m using local single nano quickstart with local for test.

I found some material to setup deep storage on minio but it is not my need. I want to setup ingestion.

I tinkered jets3t config and ioConfig part of the ingestion config, but this latter one does not allow to set a specific endpoint.

I mainly had two errors, first ‘host not found’ when it was trying to use bucket DNS but I override this config. And then “Unable to find a region via the region provider” but minio does not require region.

I’m lost…

Have you tried letting the UI create the ingestion spec for you? You would just use the S3 option but provide min.io specific details like host name, bucket, keys etc.

To my knowledge, the UI does not allow to specify a custom s3 endpoint.

The UI is a fancy way to automatically create the Ingestion Spec for your ingestion. You can use the UI as a starting point, build it like you would an S3 source and add/modify details pertaining to min.io and submit the result as your ingestion spec to Druid.

I know that, in the end, the UI produce a JSON describing the ingestion. But I haven’t found how to set a custom S3 endpoint in this description.

This should help: Native batch ingestion · Apache Druid

I know this page. It proposes AWS S3, Azure Blob and Google Cloud Storage - using “type” property - but no custom S3 endpoint.

I am not sure what you mean by a custom S3 endpoint? are you running minio locally?

I’m running minio on its own FQDN (https://mydomain.tld/) but even locally I would have to set the local endpoint (http://localhost:9000) somewhere. And it seems there is no way to do it.

You can set the endpoint using: druid.s3.endpoint.url

This can be found at S3-compatible · Apache Druid

If you want, I can try setting up a local minio instance and testing it out but this will take some time.

This extension allows you to do 2 things:

  • Ingest data from files stored in S3.
  • Write segments to deep storage in S3.

It proposes to set custom S3 endpoint for ingestion and for deep storage, once and for all. It seems to be a strong assumption. What if I want to use on prem minio for deep storage and ingest data from AWS.

One of my client is using an other COTS for object storage, S3 compatible. They have two endpoints per regions, one for low cost and one for high perf. They will want to mix offers and regions.

Do not bother. now I know where to look, I’ll use your tip as a workaround for my experimentation, and thanks for that. But it will not make its way to production, unfortunately.

Deep storage and ingestion are completely separate things and druid allows them to either be in the same or different S3 locations. There is no requirement for them to both use the same endpoint.

That’s good news,

But then there should be at least two location where I can set S3 endpoint, one for deepstorage, one for ingestion. I can’t find them.