There’s no single read-only mode button or switch!
In addition to the Kafka supervisors (which are constantly writing and creating segments), you would want to make sure that auto-compaction is stopped and that no one makes any configuration changes. Without ingestion or compaction/re-indexing, no new segments are created, and once the coordinator finishes loading segments onto historicals and distributing them according to the balancing strategy, they shouldn’t be getting moved around either.
This would definitely approximate a read-only mode, but I don’t think I can guarantee zero writes against the metastore DB.
Could you tell us a little more about the underlying need behind this question?
I don’t know which infrastructure you’re using now and which infrastructure you’re migrating to, so my answer might have limited usefulness, but one thing you can consider is reducing the permissions of the relevant users in the relevant locations (e.g the metadata DB user, the credentials used to write to deep storage) to read-only, thus ensuring nothing will change in the metadata or the deep storage during the migration.
The risky side of this, is that if (for some reason, like Drop Rules) Druid will attempt to change something, it’ll fail, and might cause instability in the cluster you’re migrating from.
Are you cloud-based on on-prem? What are you using for deep storage?
Itai is right - in theory you should be able to avoid most writes, but if you try to guarantee it you risk things failing in unpredictable ways; I’m not sure I could recommend it but it seems worth testing out in a dev environment to see if you start seeing errors in any of the logs.
From what you describe, it feels like you should be able to get the cluster in a minimal-writes state, copy everything over, spin up the new cluster, and then switch users over, but there’s no specific procedure in place.
Creative thinking here Max and Itai (!!!) but do you think Alex might be able to just stop the coordinator and overlord processes? I mean, only the broker does querying and it just does a Read on the metadata database to build up the timeline to know where to send queries to Hists / MMs blah - so maybe that would do it??
Hmm… TBH, I’m not 100% sure, Peter.
I was trying to think if stopping the coordinator and overlord completely, for a long period of time, might have undesired side-effects (even for a cluster that’s in “read-only” mode).
Alex - I guess that potentially it could work, perhaps worth trying on a Dev environment first (assuming you have one).
BTW, as a future reference, we used a similar method to what I described above, to create a Dev cluster that’s a replica (in terms of data) of the Prod cluster, but in a “read-only” mode (see this slide deck from Virtual Druid Summit, slide 55).