So I want to add an analytics dashboard for our advertisers that tracks their campaigns (similar to Google Analytics, but specifically tailored for our site).
I looked at InfluxDB and many of the alternatives, and Druid seems far and above… Just very complex.
Data ingestion: We get ~300 million pageviews a month, and I want to track every pageview along with some basic metadata that comes along in the request headers (timestamp, device, URL of current page, referrer with subdomain, type of impression, etc). Pretty basic stuff.
Then, in the realtime dashboard, the types of queries being done would be very simple. I would imagine 3-4 different ones total, only basic slicing of data along 1 dimension over time. For example, I might have total visits by device over x days, total visits by URL over x days, etc.
There will be a dropdown with fixed time periods (1d, 2d, 7d, 30d), as well as 1 very basic stacked line chart.
Overall, I think it’s about as easy of an implementation as it gets. I really need this to be solid for our advertisers though, which is why I’m going the extra mile in choosing something flexible and complicated, rather than something like InfluxDB where I’m essentially locked in.
I’ve spent a good 6 hours so far today, and I barely got a hacked-together modified docker container working. I feel like I know about 2% of what’s going on, and time is becoming somewhat of an issue. Not good.
I’m looking for someone to help me do the following - and I’m ok with paying more for quality work.
- Look at my project in detail and recommend the best overall strategy for using Druid (I’ve already coded it so I can show you exactly how it will look.)
- Deploy the ~7-9 server EC2 production setup using docker, Ansible, or some method which would allow us to easily add/remove nodes. Any other method you prefer would also be reasonable, as long as we can increase or decrease different node types easily
- Do a full, proper, tailored configuration for our project, doing your best to make it production ready and not just leaving a bunch of things at the default or non-optimal values. I’d expect you’d ask us some questions to help you figure out our needs of how long to keep data, what level of estimation do we want on TopN queries, etc.
- Not required, but would be amazing: wait a few days while we push production data to it, and we can fine-tune the configuration if anything was missed.
I know this isn’t the best place to post this, but it seems Druid information is sadly a bit thin.
I would be forever grateful if anyone is interested in this offer… don’t hesitate to reply with a way for me to contact you.