Copy Existing Table

What’s the best way to copy an existing Druid table/datasource (structure and data)? Basically looking for the rough equivalent of “create table as select * from mytable where mycolumn = ‘some value’;”

I’m sure it is more complicated than that, so I’m looking for guidance on the best approach.

Chris

Hi Chris,

I think ingestSegment firehose can be used to read the data from existing druid segments. A sample ingest Firehose spec is shown below:

{
“type”: “ingestSegment” ,
“dataSource”: “wikipedia” ,
“interval”: “2013-01-01/2013-01-02”
}

You can use the filter specs to filter the rows.

Please refer the ingestSegment firehose for more details if that helps .

https://druid.apache.org/docs/latest/ingestion/native-batch.html#firehoses

Thanks,

VaibhaV

Thanks, Vaibha. That is very helpful!

Quick follow-up : Will it use all workers I specify in my ingestion spec to parallelize the job?

@Chris Lavigne :Glad to know that .
@Karthik : In my opinon NO . You need to use parallel Indexing : Support for index_parallel in Firehose ingestSegment got introduced in Druid 0.15.

Thanks ,

Vaibhav

I’m experiencing an issue with my filter. If I use just a few values (like below), it works fine. When I add the number I need (about 50), it does not work. Is there a known limitation to the number of selector filters I can use? And/or maybe the selector filter is not the best way to approach the problem?

“transformSpec”: {

“filter”: {

“type”: “or”,

“fields”: [

{

“type”: “selector”,

“dimension”: “id”,

“value”: “myvalue1”

},

{

“type”: “selector”,

“dimension”: “id”,

“value”: “myvalue2”

}

]

}

}

Maybe you want to give a try using IN operator if that helps something like below :

{
“type”: “in”,
“dimension”: “id”,
“values”: [“myvalue1”, “myvalue2”, “myvalue3”]
}

Thanks,

Vaibhav

Thanks for your message. I did, and it worked perfect!

:+1: Cool :slight_smile: