We are trying to build a generic Metrics Service where applications can send their data to so they can gain insights into their data. We would like to bring in new clients and add them as a new data source / system to the metrics service with minimal configuration changes. We would also like to keep enterprise level metrics on all the systems we are counting/building metrics for. As I’ve seen so far, druid provides a great out of the box capability for building analytics based on a single data source being ingested. That said, I am running into some problems now in trying to understand if and how we can use druid to extend it to our actual data types and multiple systems. Here are are my big questions:
- I have a data source which has events that are generated which contain other time based events within them. I don’t know how to count the top level events and the nested events as they both will have their own unique timestamps (although the nested events are contained within the timeframe of the parent event). It seems like these nested events should be handled as a separate datasource. Is there another way, if now, it leads me to my second question.
- How can druid handle separate datasources from realtime sources? It appears that druid can support different collections of data, but I don’t understand how I can ingest those data sources using the realtime node at the same time. I only see the realtime node supporting a single realtime spec file. Is it possible to accomplish this? Would I need to run multiple real time nodes for each new datatype with their own spec? It seems like this would be a configuration/maintenance nightmare and costly from the resource perspective.
- Is it possible to configure a realtime parser to ingest ALL of the properties within the data instead of having to list every dimension? Again, hoping for a generic data store that will count all the properties in our data and expose things at query time. I realize this is a performance & storage trade off, and it would be good to know (or be able to test) the point at which too many properties makes this unreasonable to do. So being able to specify “dimensions”: * would be great.
- I think I understand how we could handle enterprise metrics across all the systems, but it would depend on being able to store multiple data sources within druid. A simple example would be getting counts of how many events we have processed by each client which could be accomplished by generating a new “enterprise” event when we get new data from a client and ingesting that into druid as well. Is there a better way to do this? Does druid provide enough of it’s own metrics that we might not need to do this?
- Am I going about this wrong? Should I be trying to take all these different data sources and merging into a common format? I think this will be near impossible as the different systems could have nothing in common.
It seems like druid should be able to handle what I’m trying to accomplish, but I’m just having a hard time wrapping my head around some of the concepts. Any help or guidance would be greatly appreciated.