I’m evaluating using Druid for as the backend datastore and filtering engine for a data visualization project.
I’ve gone through the documentation, the paper and tried out a few examples.
The only problem I have is the dependence on the Timestamp dimension.
I don’t expect my datasets to have any Timestamp dimension. Is Druid still the right choice for such datasets? I want the following features:
Sub second querying on large datasets(fast filtering).
Fast binned aggregation over different attributes.
All my queries would be categorical, ranged or (maybe geospatial).
We want to be doing the usual slicing-dicing and drilling down in our datasets.
Would druid be able to work with such datasets? Are there any “hacks” that we could do(example: append a proxy timestamp attribute to each record).
I know that the timestamp dimension is used to shard the data. How much of a performance hit would it be to use a fake timestamp dimension. Some questions:
Are there better ways of doing this? I might be missing something basic since I haven’t used druid much.
If people are using it for non-timeseries data than I’d be interested in knowing about it
If there are some alternate systems that do this and would be a better fit for me than I’d like to know about them too.