Compaction task - no space left on device

Still trying to get a grip on DRUID, and started my first ‘real compaction task last night’, I think my understanding of how it compacts may be flawed.

{
“type”: “compact”,
“dataSource”: “msg_detail”,
“interval”: “2019-05-01/2019-06-01”,
“tuningConfig” : {
“type” : “index”,
“maxRowsPerSegment” : 5000000,
“maxRowsInMemory” : 50000,
“forceExtendableShardSpecs” : true
}
}

the task ran for about 2 hours…then died with:

2019-06-04T16:30:11,330 INFO [task-runner-0-priority-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://pst-aws-lab.aws-dev./druid/segments/msg_detail/2019-02-25T09:15:00.000Z_2019-02-25T09:30:00.000Z/2019-06-03T02:48:07.508Z/3/e03a782a-8fb2-4db1-a150-fd1887fc7f49/index.zip] to outDir[/opt/data/druid/task/compact_msg_detail_2019-06-04T14:09:12.610Z/work/msg_detail/2019-02-25T09:15:00.000Z_2019-02-25T09:30:00.000Z/2019-06-03T02:48:07.508Z/3]
2019-06-04T16:30:12,114 INFO [task-runner-0-priority-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Loaded 39930401 bytes from [s3://pst-aws-lab.aws-dev./druid/segments/msg_detail/2019-02-25T09:15:00.000Z_2019-02-25T09:30:00.000Z/2019-06-03T02:48:07.508Z/3/e03a782a-8fb2-4db1-a150-fd1887fc7f49/index.zip] to [/opt/data/druid/task/compact_msg_detail_2019-06-04T14:09:12.610Z/work/msg_detail/2019-02-25T09:15:00.000Z_2019-02-25T09:30:00.000Z/2019-06-03T02:48:07.508Z/3]
2019-06-04T16:30:12,115 INFO [task-runner-0-priority-0] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[s3://pst-aws-lab.aws-dev./druid/segments/msg_detail/2019-02-25T09:30:00.000Z_2019-02-25T09:45:00.000Z/2019-06-03T02:48:20.373Z/0/840407c5-4f5d-446f-ab4c-5b99dac6f709/index.zip] to outDir[/opt/data/druid/task/compact_msg_detail_2019-06-04T14:09:12.610Z/work/msg_detail/2019-02-25T09:30:00.000Z_2019-02-25T09:45:00.000Z/2019-06-03T02:48:20.373Z/0]
2019-06-04T16:30:12,535 WARN [task-runner-0-priority-0] com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2019-06-04T16:30:12,535 WARN [task-runner-0-priority-0] com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2019-06-04T16:30:12,537 WARN [task-runner-0-priority-0] org.apache.druid.java.util.common.RetryUtils - Retrying (1 of 2) in 1,062ms.
java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method) ~[?:1.8.0_212]
at java.io.RandomAccessFile.write(RandomAccessFile.java:525) ~[?:1.8.0_212]
at org.apache.druid.java.util.common.io.NativeIO.chunkedCopy(NativeIO.java:219) ~[druid-core-0.14.0-incubating-iap10.jar:0.14.0-incubating-iap10]
at org.apache.druid.java.util.common.CompressionUtils.unzip(CompressionUtils.java:312) ~[druid-core-0.14.0-incubating-iap10.jar:0.14.0-in

``

naturally…my first question is: what device/node? (it would be imho, nice to get host name on each log statement, perhaps a config parameter to enable this).

so my setup: 1 Master node, 1 query node, 6 data nodes.

poking around in the logs…(and looking at grafana) I was able to determine the issue occurred on data-node 2.

When I started the compaction process (it had 500GB) free space (as pretty much all the data nodes did).

The TOTAL space consumed by the datasource is: 660GB, the datasource is configured with a 15 minute granularity.

my question is, why did it consume so much diskspace? (not only that: but only that nodes disk-space changed)

My only guess is its compacting all 15 segment windows at once across the full month? (I would have expected it to compact 1 segment window at a time…especially since I’m not changing the segment granularity).

surely if it did 1 segment window at time…it would not have run out of disk space?

I am retrying this morning, but setting 24 hour interval on the compaction task…vs the entire month.

Cheers

Dan

So I was able to run without issue when using smaller interval (1 day, again with 15 minute segments, split across ‘3-kafka’ partitions)

What has me puzzled, when the compaction task ran…it seems only have run on 1 data node.

or rather, looking at disk space, I see my datanode-005, dropped in disk space while the task was running, no other nodes saw this.

why am I puzzled??

well I would have assumed the data for each time-window (15minute segment) is ‘sharded’ across multiple nodes.

My naive understanding…(and gross assumptions) were that the compaction task would select a time-window (likely starting at the beginning, then going to the last time-window in the 24 hour interval)…and compact each one.

Now I would assume that with this config, not all time-windows would even have data on this 005 node.

so…my guess as to what is happening…

the compaction TASK was assigned to node 5.

It then will pull data from…??S3?? one time-window at a time, and compact it, then re-upload it to s3 (deleting the old s3 segment files??)

then…the coordinator would then…re-assign segments…etc.

anyways…my next compaction test…

I run 2 compactions at once (intervals: 2019-05-02/2019-05-04, then the second one: 2019-05-04/2019-05-06)

This time I would have assumed the compaction tasks to be assigned to different data nodes…but…too my surprise, they both appear to be running on the same node? …wouldn’t the overlord? try to spread out the tasks?

We ran into similar issues when running compaction to compact segments worth a few 100GBs generated by real time ingestion. The alternative approach that worked out great was to instead run a hadoop batch ingestion job with the input being the Druid datasource itself. This scales much better as we end up having a mapper for every shard.

Below is an example of the input spec to use for such an ingestion job.

“inputSpec”: {
“type”: “dataSource”,
“ingestionSpec”: {
“dataSource”: “xyz”,
“intervals”: [
“2019-04-22T00:00:00.000Z/2019-04-24T00:00:00.000Z”
]
}
}

In this scenario…if I understand what your doing.

  1. Your basically ‘moving’ data from the streamed into datasource into a NEW datasource (by using hadoop batch injestion) (but this approach will allow it to ‘compact’)…(which apparently doesn’t actually need hadoop…) since your the datasource your copying from is in fact just druid? (sorry not up to speed on the hadoop part…we are using S3 for our ‘cold-storage’).

  2. Once your hadoop task complete, you then need to go back and delete the old data from the ‘source’ ds.