I feel the Kafka Indexing Service has a lot left to be documented. There are a lot of options there that are not clearly defined.
- What is
Intermediate Persist? The fields
maxPendingPersistsrefer to it, but there is no documentation for what it is?
Does this require disk space? Which property configures the location?
workerThreadsproperty used on overlord, middle manager or peon? The description says “The number of threads that will be used by the supervisor”. Does ‘supervisor’ refer to overlord here? Does this mean i can control the number of threads on Overlord from this config?
chatThreads. similar to above. which node, overlord, mm, or peon does this affect?
segmentWriteOutMediumFactory: Does specifying ‘tmpFile’ have advantage in that the data can be reused by peon in case it is shut down unexpectedly?
There needs to be some idea of the size or contents of it. The documentation just says “Druid temporarily stores some pre-processed data in some buffers”. What does ‘some’ mean here?
If my segment is 100MB in size, is this ‘pre-processed data’ going to be in kilobytes, single digit megabytes, 10 megabytes?
Atleast some info should be given as to what the data is or if using disk has any extra advantage. Without any of this, its like choosing a random number right now.
MM nodes functionality:
- Lets say all my MM nodes are shut down and recovered unexpectedly as part of maintenance by cloud provider.
When they come back up do they need to restart ingesting the segment from the very begining?
Lets say i make 24 hour segments. At time 23:15 all my MM nodes reboot. When they come back up, do they have to start scanning from 00:00?
This is important to know since this means there will be a period of time where the realtime data is not available. And this gap can grow if the segment size is too large. For example, what if i had a monthly segment.
- Can MM nodes themselves have really low heap sizes? It seems their job is simply to spawn peons. Can they have a -Xmx of just 512MB?