segmentGranularity vs interval

I tried searching on this group but didn’t find my answer

the motivation is to have 1 segment between 300-700 mb. My current hourly job only has 30mb, so it would be nice to have the 24 hourly jobs to only become 1 segment (360mb) - is that possible?

At first I did this, which succeeds, but each segment is too small:

“granularitySpec”: {

            "type": "uniform",

            "segmentGranularity": "HOUR",

            "queryGranularity": "HOUR",

            "intervals": [

                "2015-11-23T15:00:00.000Z/2015-11-23T16:00:00.000Z"

            ]

        }

then I tried this, but then even though the segments do seem to aggregate into 1 DAY segment, each hour’s data seems to overwrite the previous - so when I make query it only shows data for the last hour batch ingested. E.g. if I ran 15th hour then 16th hour, only 16th hour data show in result

“granularitySpec”: {

            "type": "uniform",

            "segmentGranularity": "DAY",

            "queryGranularity": "HOUR",

            "intervals": [

                "2015-11-23T15:00:00.000Z/2015-11-23T16:00:00.000Z"

            ]

        }

finally I tried this - expanding the intervals to a full day (23rd-24th) no matter the hours of the actual data - same behavior of the last case:

“granularitySpec”: {

            "type": "uniform",

            "segmentGranularity": "DAY",

            "queryGranularity": "HOUR",

            "intervals": [

                "2015-11-23T00:00:00.000Z/2015-11-24T00:00:00.000Z"

            ]

        }

Again the motivation is to have all hourly data to accumulate to only 1 segment PER DAY, but yet have to ability to query by the HOUR, but I seem to be only able to query the last hour ingested no matter what.

How do I achieve my objective?

Geoff

Turn on automatic merging of segments.

Hi,

How do I turn on automatic merging of segments?

Regards,

Martin

One note is that when you specify an interval in your indexing configuration, you must include ALL the data for that interval as input to the task.