Re-index existing segments using ingestSegment

HI,

I am reindexing the existing data to remove two dimensions and creating a hyperUnique aggregator matric. I am running below JSON data but it stops when it doesn’t find a segment. The log is also appended below… I am using “ignoreWhenNoSegments” option but it doesn’t work with “type”: “ingestSegment”. There are two solutions in my mind.

  1. Skip that day data.
  2. Re ingests that day data from raw data.

My questions are:

  1. Please let me know if there is any other way.
  2. Also, check the index task config and let me know.
  3. Should I repeatedly run the same task after modifying the interval date for one-year data?

{

"type": "index",
"spec": {
    "dataSchema": {
        "dataSource": "testdatasource",
        "parser": {
            "type": "string",
            "parseSpec": {
                "timestampSpec": {
                    "column": "timestamp",
                    "format": "auto"
                },
                "dimensionsSpec": {
                    "dimensions": [
                        "segmentId",
                        "departmentid",
                        "event",
                        "dim1",
                        "dim5",
                        "dim7",
                        "userid"
                      
                    ],
                    "dimensionExclusions": [
                        "timestamp"
                    ]
                },
                "format": "json"
            }
        },
        "granularitySpec": {
            "type": "uniform",
            "segmentGranularity": "hour",
            "queryGranularity": "hour"
        },
        "metricsSpec": [{
                "type": "count",
                "name": "count"
            },
            {
                "name": "value_sum",
                "type": "doubleSum",
                "fieldName": "value"
            },
            {
                "type": "hyperUnique",
                "name": "uniqe_users",
                "fieldName": "userid",
                "isInputHyperUnique": false,
                "round": false
            }
        ]
    },
    "ioConfig": {
        "type": "index",
        "firehose": {
            "type": "ingestSegment",
            "dataSource": "testdatasource",
            "interval": "2017-01-23/2017-01-24"
        }
    }
}

}

It looks your segment metadata in the metadata store “druid_segments” table is not in sync with the deep storage.
The druid_segments table in metadata storage has the entry for some segment while the deep storage is missing the files.

The solution is to manually remove the faulty segment metadata from metadata storage.

Thanks Nishant. I will do that.

The way I am doing the reindexing stuff, is that correct ? please see my points 2 and 3 and let me know.

  1. Also, check the index task config and let me know.
  2. Should I repeatedly run the same task after modifying the interval date for one-year data or tjere is other way or can run single task for whole year (it says that it is not good to run this for data more then1 GB. My one hourly segment is beentween 300-700 MB.

Thanks again