As you all know the newest Druid with Kafka Indexing Service produces huge number of segments/shards that supposed to be merged, but they’re not. So, we had to run a Hadoop Index Task and we decided to use Amazon EMR. During setting this up we encountered many unexpected problems we had to solve, and all this bumpy road we share in the blog post. We’d appreciate your feedback and comments, maybe we could have done some things in a different way.
Hope with druid 0.10.1 using this patch you don’t need all this workaround to use s3a schema