What is index_realtime doing?

I 'v runed coordinator, overlord, historical, broker.
I couldn’t run realtime node without druid.realtime.specFile

2016-01-21T13:53:41,033 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.RealtimeManagerConfig] from props[druid.realtime.] as [io.druid.guice.RealtimeManagerConfig@7c447c76]
Exception in thread “main” com.google.inject.CreationException: Guice creation errors:

  1. Error injecting constructor, java.lang.NullPointerException
    at io.druid.guice.FireDepartmentsProvider.(FireDepartmentsProvider.java:41)
    while locating io.druid.guice.FireDepartmentsProvider
    at io.druid.guice.RealtimeModule.configure(RealtimeModule.java:79)
    while locating java.util.List<io.druid.segment.realtime.FireDepartment>
    for parameter 0 at io.druid.segment.realtime.RealtimeMetricsMonitor.(RealtimeMetricsMonitor.java:42)
    while locating io.druid.segment.realtime.RealtimeMetricsMonitor
    at io.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:78)
    at io.druid.server.metrics.MetricsModule.getMonitorScheduler(MetricsModule.java:78)
    while locating com.metamx.metrics.MonitorScheduler
    at io.druid.server.metrics.MetricsModule.configure(MetricsModule.java:63)
    while locating com.metamx.metrics.MonitorScheduler annotated with @com.google.inject.name.Named(value=ForTheEagerness)


And i’v runned index_realtime

“type”: “index_realtime”,
“resource”: {
“availabilityGroup”: “someGroup”,
“requiredCapacity”: 1
“spec”: {
“dataSchema”: {
“dataSource”: “test_source”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “tsv”,
“timestampSpec”: {
“column”: “timestamp”,
“format”: “posix”
“columns”: […],
“dimensionsSpec”: {…}
“metricsSpec”: […],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “HOUR”,
“intervals”: [“2016-01-21T00:00:00/2016-01-22T00:00:00”]
“ioConfig”: {
“type”: “realtime”,
“firehose”: {
“type”: “local”,
“baseDir”: “/dir”,
“filter”: “2016-01-21.tsv”
“tuningConfig”: {
“type”: “realtime”,
“maxRowsInMemory”: 500000,
“intermediatePersistPeriod”: “PT10m”,
“windowPeriod”: “PT10m”,
“rejectionPolicy”: {
“type”: “serverTime”
Task compete “sucess” (without realtime node?), but i can’t find results.

Part of logs:

2016-01-21T13:49:09,134 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Creating plumber using rejectionPolicy[serverTime-PT10M]
2016-01-21T13:49:09,138 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Expect to run at [2016-01-22T00:10:00.000Z]
2016-01-21T13:49:09,140 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2016-01-21T13:49:09,141 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] segments. Attempting to hand off segments that start before [2016-01-21T00:00:00.000Z].
2016-01-21T13:49:09,141 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
2016-01-21T13:49:09,170 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [2016-01-21.tsv] in and beneath [/dir]
2016-01-21T13:49:09,186 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/dir/2016-01-21.tsv]
2016-01-21T13:49:09,837 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[test_source]
2016-01-21T13:49:09,838 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Shutting down...
2016-01-21T13:49:09,840 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /tmp/persistent/task/index_realtime_test_source_0_2016-01-21T13:49:01.897Z_nldhhglf/work
2016-01-21T13:49:09,847 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_realtime_test_source_0_2016-01-21T13:49:01.897Z_nldhhglf",
  "status" : "SUCCESS",
  "duration" : 904


Pet Nik, do you have your logs of that task?

FWIW, I think if you are just getting started with Druid, you might have an easier time with this quickstart: http://imply.io/docs/latest/quickstart

We’re trying to migrate that quickstart over to Druid right now

I ran the task and it went well. But the data from the first task to overwrite after a second job again. What is difference between the ordinary index task and index_realtime task? Attach the logs of tasks.

log2.txt (63.4 KB)

log1.txt (64.9 KB)

Hi Pet,

You shouldn’t ever use the index_realtime task on its own without the Tranquility library. It is going to be a hassle to manage. The index task is designed to read from files, and realtime indexing is designed to read from streams. Druid segments are versioned and immutable. Druid does a replace-by-interval strategy when new segments are created for an interval. So if you reindex data, it replaces existing data.