Here at Luminis, we want to integrate a Time Series database into our Information Grid (IG) product, and after extensive evaluation we decided on Druid. IG is a Java program, so for data ingestion, we want to integrate Tranquility core (0.8.2) into our codebase. The issue I have is that IG is an OSGi project, using Eclipse, BndTools and a Gradle build. In other words, we can’t for the moment use Maven. What we need to do is build the Tranquility core packages plus all necessary dependencies into an OSGi bundle. I have started doing this, but the number of required (transitive) dependencies seems to be growing exponentially. I’m up at 27 jars already, and BndTools is still showing a huge list of “calculated imports”.
I am sure that not all packages in these jars are required for Tranquility core, so my question is, is there any definitive list of the package dependencies actually required by Tranquility core, or even a jar containing them all? Any help would be much appreciated.
Great to hear you chose Druid. If you don’t mind my asking, what factors led you to do so and what else did you consider? That kind of info helps us when we write and speak about the project.
On your Tranquility question, it sure does have a lot of dependencies. I think most of it is because it depends on a couple of Druid modules (for parsing, serializing and the like) and those modules have a lot of dependencies themselves. You could run sbt dependency-tree to see where it all comes from. I bet many of them aren’t needed, but nobody has yet done the work to pick out the unneeded ones and add exclusions for them.
Fwiw if you want to see the full list that may get pulled, you can check the distribution tarball of tranquility 0.8.2 at http://druid.io/downloads.html. They should all be in there.
Thanks for the reply. Our evaluation process started with ~30 different TSDBs, which we narrowed down eventually to 5 - Druid, Riak TS, OpenTSDB, Hawkular and Prometheus. We chose Druid, frankly, because each of the others had an issue which gave us problems. For example, OpenTSDB has a limitation on tag/dimension values, Riak doesn’t seem to be able to aggregate into time slots, we didn’t like the pull paradigm in Prometheus. Many of the others we discounted because of lack of a community, too few developers, or lack of documentation. Druid does everything we need, but it’s certainly not the easiest to deploy or bind to.
Since I posted my message, I created a Maven project for test purposes, just to see what dependencies tranquility core draws in. Too many to count, but there are well over 200 of them, including over 50 aws-java-sdk* jars. This is almost unmanageable, so we will have to try and cut this down. That is a daunting prospect, as the only way I can see to do it is by trial and error. It’s difficult to believe that no-one has come across this problem before. Any suggestions as to how we can go about pruning these dependencies?
I think that most people deal with the dependencies either by just including them all (even though there are a lot) or by using the standalone Tranquility Server. It adds another process to run, but it means you don’t need to worry about jar conflicts. You can run it on localhost to avoid additional network hops.
Another option is the Kafka indexing service, if you don’t mind adding Kafka to your setup. It has some nice properties too, for example the ability to load late-arriving data.
Well we may be forced to consider using the server. With OSGi, everything must be in a bundle, and there is no way at all that we can include all classes from all the dependency jars into our Druid binding bundle, even counting on the fact that some dependencies (scala runtime, jackson) are available as OSGi bundles themselves. But this is frustrating as we’re trying to reduce the complexity, size and resource profile of our deployment as much as possible, given that this will be a small part of a much larger product which already binds to a number of other data storage facilities.
The availability of the in-process client was a large factor in our decision to go with Druid, as it is convenient and looks simple enough to use. If we can’t use it, we may have to reconsider, which would be a real shame as I do like Druid’s data architecture, and I think it’s a good fit with our product. Anyway, thanks for your help, Gian. Much appreciated!
Is the issue the size of your distribution or is it that you’re worried about jar conflicts? If it’s conflicts, you could probably build a shaded/relocated version of Tranquility Core and get around any of those issues that way.
Best of luck.