Determine thetaSketch size

After reading the page, I am still not sure how to decide the size for thetaSketch.

Suppose I have a ID column which contain 7M unique IDs.

In the ingestion json, I specify 20 dimensions in dimensionSpec. After grouping by these 20 dimensions, the # of unique IDs for each group ranges from 1 to ~2K. The size of thetaSketch for this ingestion depends on 7M or 2K? If it is 2K, can I set the thetaSketchsize less than the default(16K)?

I would suggest starting from the accuracy you want to achieve.
The default (in Druid) 16K results in about 1.56% error with 95% confidence.

However, a 16K sketch is going to stay in the exact mode if it sees only 2K distinct values.

The serialized size of the resulting sketch in bytes roughly will be 8n + 24, where n is the number of distinct values limited by that configuration parameter (16K in this case). Simply speaking, after seeing 16K distinct values sketch starts throwing away some of them (goes into the estimation mode). This is a very simplified picture just to give a general idea.