Druid suitability for historic, but random timestamp data

I have historic data of cab trails. It has location data of cabs moving around the city with timestamp of when the location was read from the device.

I am doing an experiment with various databases to achieve the following question:

Q. Given time of the day of the week(e.g., monday, 7PM, week 1 of the year), I want to lookup in druid how long did cab take to reach from point A to point B.

I want to have week, day and kind of cab as dimensions.

The data I have is random. Meaning, if I take a day, I have 7PM data followed by 3AM, followed by 10PM. Totally random. Can still batch this into druid and later query correctly?

If there is a easier way to achieve my question using druid, let me know. :slight_smile:

Hi Karim,

You can store a copy of your raw data in a distributed filesystem such as HDFS or S3 and run a Druid batch processing job over this data. For example, you can collect these events over the course of a day, and at the end of the day, run the batch processing job. In terms of measuring the time taken, you can calculate this metric at ETL time before loading into Druid.

– FJ