Hi All ,
Looking for some details regarding batch ingestion and what is the best practice any suggestion, tips
I have some tera byte (double digits) of historical data at s3 which I want to bring at druid .
Also, I have some aggregated/pre computed data at key value store (tera byte) .
So, far I have been trying with indexing service to ingest data . However , what is the best way bring all to druid efficiently as initial load? and
We do have spark pipe line in few places ,my plan is to use spark rather Hadoop any reference or any one using similar approach .