Timestamp assigned by druid

Is there any way to tell druid to just assign a new timestamp when it sees a new event and use serverTime rejection policy so no events are thrown away? It is not critical that we assign the timestamp ourselves for our application. We are using a realtime node and messageTime rejection policy with the window set to 5 minutes with data that should roughly be in order but we are still seeing events thrown away. Our segment sizes are 1min, could that be causing problems if handoff occurs before the window?

-drew

Hi Drew, there is no way to do this in Druid, but you can do this at ETL time or wait for this proposal to be implemented:

https://groups.google.com/forum/#!searchin/druid-development/windowPeriod/druid-development/kHgHTgqKFlQ/fXvtsNxWzlMJ

I like the proposal but it seems overkill for what we need…we just need to put events into whatever segment is rolling as the data is processed. Which would seem much more straightforward to implement.

-drew

If assigning timestamps is only what you want to do, you can try writing a custom parser similar to MapInputRowParser which could use current timestamp instead of the one present in message,

just curious, any specific reasons for having segments for 1 minute period ?

Sorry for the late reply. Correct me if I am wrong, but isn’t the timestamp parsing code implemented here:

So I would just need to add a ‘system’ timestamp type that returns a new Date()?

-drew

Hi Drew, the raw event is stored in a map and Druid will try to use the column name you specified to pull out a value from the map and try to parse that value as the timestamp. This logic is in TimestampSpec.java. If you have a timestamp column in your data, you can modify the parser to always return the current time instead of trying to parse the actual value in the event.

Thanks Fangyin,

I assume the standard way to go about making changes like this is to build our own version of the druid-api jar file and include it with our druid deploy. What is the best way to point druid at our repo to pull in our version of druid-api instead of the standard one? Can we do this using druid.extensions.remoteRepositories in the config?

-drew

I think I have solved this by including our own jar file and bundling all dependencies per the instructions at http://druid.io/docs/0.7.3/Including-Extensions.html

-drew