Convert Segment Task via Indexing Service not using Hadoop

Hi everybody,

We have a Druid 0.9 cluster with indexing service set up and the middle manager configured so it uses Hadoop to process the data. For regular ingestion that works fine, but we stumbled over a problem with segment conversion tasks. Apparently the Middle Manager (we only have a single one as the work is usually done by the Hadoop cluster anyway) doesn’t run such a conversion task on Hadoop, but instead directly. That’s pretty slow as it just runs on a single node instead of the whole Hadoop cluster.

Is there a way to process these conversion tasks with Hadoop as well, when submitting them through the indexing service?

In addition it looks like the conversion task stores the results locally until all segments are processed, before writing them back to deep storage, which doesn’t fit to the storage constraints of the middle manager.

For completeness the conversion task spec we submitted to the indexing service:

{
“type”: “convert_segment”,
“dataSource”:“our-datasource”,
“interval”: “2016-05-01T00:00:00.000/2016-05-08T00:00:00.000”,
“indexSpec”:{“bitmap”:{“type”:“roaring”}},
“force”: true,
“validate”: true
}

Best regards,

Daniel

Hi Daniel,
the current ConvertSegmentTask is not designed to be run on Hadoop Cluster.

If you want to parallelize and speed up the conversion process, you can consider having multiple convertSegment tasks for smaller intervals.