I have been facing an issue where some of my queries take ~5 seconds and broker throws an internal server error. But it is occasionally, when querying similar types of queries any other time I do not see any such issue. I have seen response time on indexing task as well as historical nodes, response times over there do not have any such issues. Otherwise my average response time is around 15ms and 99%ile is around 100ms.
Has any one else seen anything like this ? Can some one help me out where could this issue be ?
For an intermittent issue like this, you might get some clues from the exact error being thrown by the broker, as well as checking for issues like network problems, servers losing connection to ZK, servers coming on and offline, or long GC pauses.
I will have a look at network. But how would ZK connection impact query performance as the ZK data is normally fetched and cached at query node. I haven’t seen GC pauses and performance of sub queries on historical and realtime nodes looks fine when seen on monitoring tools.
As Gian suggested looking at the broker error logs might help get some clues. Have you had a chance to look at the logs ? If so post the logs.