How to move a segment from one data source to another?

Hi all,
what is the easiest was to move a segment from one data source to another? E. g. I re-index yesterday’s data to temporary data source, I check if all data is correct a then I wanna to move these segments to production data source. I suppose I can use Hadoop batch re-indexing and reindex the the data with different output data source but it sounds too complicated. Is there any easier way?

Lukáš Havrlant

Hi again,
so I’ve tried the procedure with the Hadoop Index Task and it has worked but still – isn’t there any other easier approach how to move segments from data source A to data source B?

Lukáš

Hi,

Can you try http://druid.io/docs/0.9.0/ingestion/update-existing-data.html and use delta ingestion with the datasource name changed?

Thank you Fangjin, I’ll take a look at. So there is no way how to move segments to another data source without running some Druid’s MapReduce job?

Actually with 0.9.0, you can look into using or extending http://druid.io/docs/0.9.0/operations/insert-segment-to-db.html

We are currently trying to use the insert-segment-to-db CLI tool to import segments but we are having an issue.
Our deep storage is in S3. We manually copied segments into the target location where segments of the same kind are located. We pointed the workingPath property to where the segments are located in S3.
Then we started the CLI tool to insert these segments, but we get the following error back:

  1. Unknown provider[s3] of Key[type=io.druid.segment.loading.DataSegmentFinder, annotation=[none]], known options[[hdfs, local]]
    at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:86)
    while locating io.druid.segment.loading.DataSegmentFinder

1 error
at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1014)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1040)
at io.druid.cli.InsertSegment.run(InsertSegment.java:94)
at io.druid.cli.Main.main(Main.java:105)

Searching github for implementation classes of DataSegmentFinder only turns up a LocalDataSegmentFinder.java and an HdfsDataSegmentFinder.java and they register schemes local and hdfs, but there doesn’t seem to be a class in the Druid repo for registering the S3 scheme.

Is it currently not possible to insert segments with this tool if the deep storage is S3 or am I doing something wrong?

Hi,
Right now DataSegmentFinder is not implemented for S3.

For making insert-segment-to-db tool work for S3, S3DataSegmentFinder needs to be implemented.

It would be great if you could submit a PR or a github issue for this.

Hi fellas,

I implemented it https://github.com/druid-io/druid/pull/3446 … If anybody needs it.

Hi,
I am getting following error -

Exception in thread “main” com.google.inject.ProvisionException: Unable to provision, see the following errors:

  1. Unknown provider[mysql] of Key[type=io.druid.metadata.SQLMetadataConnector, annotation=[none]], known options[[derby]]

at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:86) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.metadata.storage.derby.DerbyMetadataStorageDruidModule)

while locating io.druid.metadata.SQLMetadataConnector

for the 3rd parameter of io.druid.metadata.IndexerSQLMetadataStorageCoordinator.(IndexerSQLMetadataStorageCoordinator.java:92)

while locating io.druid.metadata.IndexerSQLMetadataStorageCoordinator

at io.druid.guice.PolyBind.createChoiceWithDefault(PolyBind.java:86) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.metadata.storage.derby.DerbyMetadataStorageDruidModule)

while locating io.druid.indexing.overlord.IndexerMetadataStorageCoordinator

1 error

at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1028)

at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1054)

at io.druid.cli.InsertSegment.run(InsertSegment.java:92)

at io.druid.cli.Main.main(Main.java:108)

Any idea about it ?

Regards
Sidharth

Hey Sidharth,

To use the mysql metadata store, you need to include mysql-metadata-storage extension. Please check out http://druid.io/docs/latest/development/extensions-core/mysql.html.

Thanks,

Jihoon

2017년 7월 11일 (화) 오후 3:19, Sidharth Singla sidpkl.singla@gmail.com님이 작성:

Hi
That I have already included.

It worked now. Had to change the mysql-metadata-storage owner.

Regards.

I’m using a scala script for that in case anyone needed it :. https://gist.github.com/l15k4/cfe3109fb9b65c3cafe0433efc8e9de2