Multiple small indices vs one big index

I am collecting data from different devices, all of which send exact same dimmenssions. The data is collected from RabbitMQ and pushed to druid through tranqulity.

The question I have is should i have individual index per device or should i put the data for all of the devices in one single index with device id as identifier for filtering.

Having individual indices for each device will enable me to delete the a specific index when corrosponding device is no longer relevent.

It will also make it easy to configure segment granularity and window period. In case of a single index the number of devices can change at anytime making it difficult to configure segment granularity properly.

But having individual index for each device will also create a job for each index, will this have an adverse effect on performance ?

Are there any guidelines / best practices to follow in this type of scenario ?



Hi Ravish, do you know how big the segments end up being when you have individual indexes per device? We’ve seen production setups with hundreds of datasources, and I believe Druid will be fine up to thousands of datasources. Multiple datasources is definitely more flexible and you can assign different rules to different datasources. The one drawback is that if certain datasources are very small, you end up with tiny segments and queries on these segments end up being sub-optimal.

Hi Fangjin,
The segments will be relatively small with individual datasources (i expect data to come in not more frequently than 1 second interval). Also is there a way to delete some specific data from segments based on a dimension value rather than time interval ?



Hi Ravish, deleting segments on dimension will require modifications to the rule interface and is not supported out of the box. For your use case, I think individual datasources make sense, but you should probably enable automated merging in the indexing service. This will try to merge segments to a desired size and ensure you don’t have a bunch of tiny segments lying around.

Thanks Fangjin, this helps.

Small correction. You can probably get away with writing a new rule instead of modifying the interface