Metabase and Druid

Two questions that I would love answers to please.

1) is it still a true statement to say that only data loaded on historical can serve queries? Meaning data that's in deep storage cannot be served by a query. I would assume that if request for being made for data that is in deep storage that Druid would make a decision to load that data into segments cash on the historicals.

2) is metabase a good tool for visualizing data in Druid? I tested it out today and it crashed one of my historical nodes. I think it may have crashed the note because a segment handoff was being attempted while the historical service was overloaded. Eventually the segment load failed and at the same time the historical Druid service crashed.

Fortunately we are moving to imply and I will be able to start using pivot and other great tools that they have.

Hi Chris

1. Yes only those data loaded on historical would be served for queries. There is no change.
2. I don't know about metabase, but I don’t think it would be crashing druid directly. But it's possible that it could be issuing a non-optimized query which could lead to unexpected resource usage. Do you find it to be crashing everytime you use metabase?

Thanks & Rgds
Venkat

Thank you Venkat.

Question 1. Thanks. How does Druid respond when queried for data that is not in cache but is only in Deep Storage? Does Druid make decisions based on query requests to pull data from DS so it can be served to future queries?

Question 2. It is crashing a historical service for some reason regularly and maybe it is for the reason you stated.

Hey Chris,

For 1) Druid will never pull data from deep storage during a query – only data that’s been configured to be on the historicals, and has been pre-pulled, is queryable. (Druid’s query design is based on memory mapping local files rather than reading dynamically from deep storage)

For 2) It could be happening due to issuing of Select queries. They are known resource hogs and can crash historicals. We recently enhanced the more resource-friendly Scan query to support time ordering and are encouraging people to switch over. (for any query made through Druid SQL, you’re already switched automatically – it doesn’t issue Select queries)

Thanks Gian. back to the historical question. if there are several requests for data that is only in Deep Storage, could druid automatically decide to load those segments to server future requests?
or would this require manual load rules?

Hi,

For MetaBase may be some queries took more than 6 seconds. so you can go for native quries in metabase with setting timeout interval context more than 6 second. I think it will work .

Sudhanshu Lenka