I am currently trying to solve a problem where mass data needs to be searched real time. Classic OLTP -> DW problem. The software uses the C++ papillon SDK check images for a match. This is stored in a BLOB ( to hold the vector floats) in a conventional RDBMS and is fully normalised, with relatively low data volumes 50-100M) on what would be the fact table. Given the amount of joins is about 4 deep to any data that can be used to predicate a scan, clearly this won’t scale, so I am looking for a RT tool that can flatten the dataset and return the BLOB / image vectors after reducing the dataset by standard dimension volumes such as dates or strings.
I was thinking of using CDC (Debezium) into Kafka from the 3NF tables, a microservices to then flatten into a new topic, that is then fed into ingestion into DRUID to handle the RT queries filtered on the dimensions of the data source, before handing back a much smaller dataset for the C++ process to do the image recognition on.
So it all hangs on if Druid can support BLOBs or vector floats!
Can anyone advise please?!