Hey Guys -
We’re trying to find a way to do batch indexing using Spark instead of Hadoop.
We tried Metamarkets’ druid-spark-batch extension, but encountered many problems and decided to leave it for now.
Instead - we were thinking of writing our own simple Spark job to scan past data and then use Tranquility Core to index it.
We have two questions:
- Can we assume Tranquility Core will happily index data with old timestamps, or are we going to face window period issues?
- We want to use Tranquility Core’s MapPartitioner to partition our segments based on a single dimension (similar to the Hadoop Indexer’s single-dimension partitioning feature). Can anyone point me to an example on how to create a Tranquility Beam which is not based on the default partitioner?
I noticed that there aren’t many examples out there for using Tranquility Core, so we would gladly open-source our code once we have it working.
Thank you guys!