Since I’ve hit a wall running the druid hadoop indexing service (our cloud provider is limiting the size of intermediate objects uploaded to hdfs that get’s created during the indexing run), I was looking for other ways of doing this. druid-spark-batch seems like a promising approach, if only for a minor thing: I can’t figure out how to run it.
The project’s README talks about the various things needed but does not have any examples calling it. I’ve built the jar (for spark-1.6.0 and druid-0.9.0) and put it under the extensions dir, where
io.druid.cli.Main' can find it, but I don't know how to invoke it. Does it supplement the index hadoop` job? Or should a provide a different name in the command line?