Insert-segment-to-db: load metadata from Google Cloud Storage located segments

Hi everyone,

I have a Druid 0.12 setup up and running with a GCS deep storage.

I’m now trying to use insert-segment-to-db to load some old segments located in a bucket Google Cloud Storage.

java
-Ddruid.metadata.storage.type=postgresql
-Ddruid.metadata.storage.connector.connectURI=jdbc:postgresql://localhost:5432/db
-Ddruid.metadata.storage.connector.user=user
-Ddruid.metadata.storage.connector.password=########
-Ddruid.extensions.loadList=[“druid-google-extensions”,“postgresql-metadata-storage”]
-Ddruid.storage.type=hdfs
-Ddruid.google.bucket=bucket
-Ddruid.google.prefix=segments
-cp /lib/:/opt/druid/lib/:conf/druid/_common:/opt/druid/extensions/druid-google-extensions/:/opt/druid/extensions/druid-hdfs-storage/
io.druid.cli.Main tools insert-segment-to-db
–workingDir gs://path-to-segments

``

The directive above successfully updates our metadata but queries, for these segments time-period, still return nothing.

Querying the metadata storage I find out that:

  • bucket info seems to be ignored
  • loadSpec type is hdfs rather than google

Newly loaded segment:

{
[…]
“loadSpec”: {
“type”: “hdfs”,
“path”: gs://path-to-segments
},
[…]
}

``

Regular segment:

{
    [...]
    "loadSpec": {
        "type": "google",
        "bucket": "bucket",
        "path": gs://path-to-segments
    },
    [...]
}

``

I feel that I’m using the hdfs extensions rather than google but if I specify google as a -dDruid.storage.type as a java arg I get the following error:

Exception in thread “main” com.google.inject.ProvisionException: Unable to provision, see the following errors:

  1. Unknown provider[google] of Key[type=io.druid.segment.loading.DataSegmentFinder, annotation=[none]], known options[[local]]

at io.druid.guice.PolyBind.createChoice(PolyBind.java:70) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.LocalDataStorageDruidModule)

while locating io.druid.segment.loading.DataSegmentFinder

1 error

at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1028)

at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054)

at io.druid.cli.InsertSegment.run(InsertSegment.java:102)

at io.druid.cli.Main.main(Main.java:116)

``

Did you go through this process already?

Thank you in advance,

It seems that druid-google-extensions doesn’t provide an implementation of the DataSegmentFinder interface (only used by insert-segment-to-db), so that tool is not currently supported with GCS.

I opened an issue here: https://github.com/druid-io/druid/issues/5628

Thanks,

Jon

Thank you Jonathan

I just make a pull request to add this to the Google Cloud Storage adapter: https://github.com/druid-io/druid/pull/5686