Merge segments created through kafka indexing service not possible, One has to reindex on raw data?

Is this true that only non sharded segments can be merged ? Kafka indexing service based segments are sharded on. partition and time.

With the druid.coordinator.merge.on feature, yes. But the “index” task (with ingestSegment firehose) and “index_hadoop” task (with dataSource input) can merge any kinds of segments.

At this time (0.10.0) no.

But the “index” task (with ingestSegment firehose) and “index_hadoop” task (with dataSource input) can merge any kinds of segments

Let me be clear. You are suggesting 2 possible ways of doing this

  • “index” task (with ingestSegment firehose)

  • “index_hadoop” task (with dataSource input)

If that is correct ? It seems that they have to be used on combination, but am not sure how ?

I use

{

"type"    : "ingestSegment",

"dataSource"   : "kraken_test",

"interval" : "2017-05-02/2017-05-03"

}

``

based on. http://druid.io/docs/latest/ingestion/firehose.html.

This however does not allow me to set up the desired merged size etc.

Could you give or point me to an example ?

Thanks.

These are two separate approaches, you can use either of them, Reindexing via hadoop index task is recommended if you are trying to reIndex larger datasets.
Hopefully following will give some more info - http://druid.io/docs/latest/ingestion/update-existing-data.html

My spec looks like

{

"type": "index",

"dataSource": "kraken_test",

"firehose": {

    "type": "ingestSegment",

    "dataSource": "kraken_test",

    "interval": "2017-05-02/2017-05-03"

}

}

And thisreturns me

{“error”:“Instantiation of [simple type, class io.druid.indexing.common.task.IndexTask] value failed: null”

curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @mergeSegments.json [host]:[port]/druid/indexer/v1/task

I am missing something here.

I think we posted at about the same time. Will try the Hadoop approach and let you know.

Hmm

{

“type” : “index_hadoop”,

“spec” : {

"ioConfig" : {
  "type" : "hadoop",
  "inputSpec" : {
    "type" : "dataSource",
    "ingestionSpec" : {
        "dataSource": "kraken_test",
        "intervals": ["2017-05-04/2017-05-05"]
     }
  }
},
"tuningConfig" : {
  "type" : "hadoop",
  "partitionsSpec" : {
    "numShards":"1"
  }
}

}

}

returns me a

{“error”:“Instantiation of [simple type, class io.druid.indexing.common.task.HadoopIndexTask] value failed: null”}

which is not very informative. I am sure I am missing some params in the spec.

I only want to merge shards in an already existing segment to a single shard ( or a single segment without shard ), whilst retaining the granularity etc as is.

Any Ideas ?

OK. I think it needs the whole dataSchema etc. I was naive enough to think that those are accessible through the segments to be merged.

I do hope that we have a better error response.

Thanks.