Does local disk persist data saved from real time ingestion using spec?

I am currently following rabbitmq and successfully saving the data with the following spec.

{
  "type": "index_realtime",
  "id": "sairam_testing_131",
  "spec": {
    "dataSchema": {
      "dataSource": "rabbitmq_test",
      "parser": {
        "type": "string",
        "parseSpec": {
          "format": "json",
          "timestampSpec" : {
            "column" : "timestamp",
            "format" : "auto"
          },
          "dimensionsSpec": {
            "dimensions": [
              "name"
            ],
            "dimensionExclusions": [],
            "spatialDimensions": []
          }
        }
      },
      "metricsSpec": [
        {
          "type": "count",
          "name": "count"
        }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "hour",
        "queryGranularity": "none"
      }
    },
    "ioConfig": {
      "type": "realtime",
      "firehose": {
        "type": "rabbitmq",
        "connection": {
          "host": "127.0.0.1",
          "port": "5672",
          "username": "admin",
          "password": "admin",
          "virtualHost": "/",
          "uri": "amqp://127.0.0.1/"
        },
        "config": {
          "exchange": "sairam",
          "queue": "sairam1",
          "durable": "true",
          "exclusive": "false",
          "routingKey": "#",
          "autoDelete": "false",
          "maxRetries": "10",
          "retryIntervalSeconds": "1",
          "maxDurationSeconds": "300"
        },
        "plumber": {
          "type": "realtime"
        }
      }
    },
    "tuningConfig": {
        "type": "realtime",
        "maxRowsInMemory": 500000,
        "intermediatePersistPeriod": "P1D",
        "windowPeriod": "P1D",
        "rejectionPolicy": {
          "type": "serverTime"
        }
      }
   
  }
}


But my question is Once the task is killed or druid server is down will the data saved in segments get lost too ?

Hey! There is a push at the end of the ingestion task itself – that push places the data into deep storage.
I am not familiar with how that firehose works when it comes to “catching up” if the task fails completely, and it has to “go back” to get older messages. In Kafka, for example, Druid remembers the topic offset when it has “pushed” successfully - so if the task fails, it just goes back to the offset that it started from the last time – so no data is lost. I’m not sure about rabbitmq though…

BUT if the push is successful, then yes, I believe that you are safe – the data is in deep storage, and it will be then seen by the coordinator and then distributed out to the historical processes.

I just took a look at the docs on this – firehoses (as I suspected) were deprecated in 0.17, and I don’t believe that the extension has been updated for like a billion years. Could you get your data into Druid some other way?

Anyone chance you could use Kafka?

rabbitmq connection did work for me. But the question here is that if I don’t have deep storage . Will the data still be saved on local disk itself ?

All Druid deployments have Deep Storage - if you’re just in Single Node mode, that deep storage will be on the local disk.

There are some more explanations about that in the Druid Basics course that you might want to run through:

Thanks a ton pete. I hope to finish it and understand .

@petermarshallio Will you please help me with this too . Unable to inject data from postgresql to druid (deployed using helm) - Stack Overflow

Hey! I’m afraid I am very old (!!!) and I am not someone who knows about HELM charts and deploying extensions. Maybe @Sergio_Ferragut might be able to advise?

Hi @Ram, I responded directly on stackoverflow.

1 Like