Hi Mark and Tijo ,
Thanks for your valuable inputs and sorry for the late reply, Since the segments with the deep storage were corrupted, I had to clear the segments and re-ingest the data from scratch.
Is there an easier way/ tool to backup and restore this data?
Because if we end up in a similar situation, we might lose a lot of time again. If we have a backed up data that can be restored it would help us a lot
Also, Currently, we are executing ingestion and aggregation again with the newer data. Though the aggregation happens properly(with hardly 1% of the queries failing), we see similar problems with the logs of historicals.
Historicals go down constantly while the aggregation succeed eventually for most of the queries fired.
{“instant”:{“epochSecond”:1655455133,“nanoOfSecond”:438000000},“thread”:“ZKCoordinator–0”,“level”:“ERROR”,“loggerName”:“org.apache.druid.server.coordination.SegmentLoadDropHandler”,“message”:"Failed to load segment for dataSource: {class=org.apache.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[pmdata_2022-06-17T03:00:00.000Z_2022-06-17T04:00:00.000Z_2022-06-17T03:00:30.263Z_162]
and also below exception loading hadoop segment
{“instant”:{“epochSecond”:1655446838,“nanoOfSecond”:613000000},“thread”:“ZKCoordinator–0”,“level”:“ERROR”,“loggerName”:“org.apache.druid.segment.loading.SegmentLocalCacheManager”,“message”:“Failed to load segment in current location [/opt/druid/var/druid/segment-cache], try next location if any: {class=org.apache.druid.segment.loading.SegmentLocalCacheManager, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Error loading [hdfs://apache-hadoop-namenode.apps.svc.cluster.local:8020/druid/segments/pmdata/20220617T030000.000Z_20220617T040000.000Z/2022-06-17T03_00_30.263Z/179_b1445888-b60b-4916-a966-7aa22184fc23_index.zip], location=/opt/druid/var/druid/segment-cache}”,“thrown”:{“commonElementCount”:0,“localizedMessage”:“Error loading [hdfs://apache-hadoop-namenode.apps.svc.cluster.local:8020/druid/segments/pmdata/20220617T030000.000Z_20220617T040000.000Z/2022-06-17T03_00_30.263Z/179_b1445888-b60b-4916-a966-7aa22184fc23_index.zip]”,“message”:“Error loading [hdfs://apache-hadoop-namenode.apps.svc.cluster.local:8020/druid/segments/pmdata/20220617T030000.000Z_20220617T040000.000Z/2022-06-17T03_00_30.263Z/179_b1445888-b60b-4916-a966-7aa22184fc23_index.zip]”,“name”:“org.apache.druid.segment.loading.SegmentLoadingException”,“cause”:{“commonElementCount”:18,“localizedMessage”:“No space left on device”,“message”:“No space left on device”,“name”:“java.io.IOException”,“extendedStackTrace”:[{“class”:“java.io.RandomAccessFile”,“method”:“writeBytes”,“file”:“RandomAccessFile.java”,“line”:-2,“exact”:false,“location”:"?",“version”:“1.8.0_332”},{“class”:“java.io.RandomAccessFile”,“method”:“write”,“file”:“RandomAccessFile.java”,“line”:525,“exact”:false,“location”:"?",“version”:“1.8.0_332”},{“class”:“org.apache.druid.java.util.common.io.NativeIO”,“method”:“chunkedCopy”,“file”:“NativeIO.java”,“line”:221,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.utils.CompressionUtils”,“method”:“unzip”,“file”:“CompressionUtils.java”,“line”:304,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.utils.CompressionUtils”,“method”:“lambda$unzip$1”,“file”:“CompressionUtils.java”,“line”:188,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.java.util.common.RetryUtils”,“method”:“retry”,“file”:“RetryUtils.java”,“line”:129,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.java.util.common.RetryUtils”,“method”:“retry”,“file”:“RetryUtils.java”,“line”:81,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.java.util.common.RetryUtils”,“method”:“retry”,“file”:“RetryUtils.java”,“line”:163,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.java.util.common.RetryUtils”,“method”:“retry”,“file”:“RetryUtils.java”,“line”:153,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.utils.CompressionUtils”,“method”:“unzip”,“file”:“CompressionUtils.java”,“line”:187,“exact”:false,“location”:“druid-core-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.storage.hdfs.HdfsDataSegmentPuller”,“method”:“getSegmentFiles”,“file”:“HdfsDataSegmentPuller.java”,“line”:243,“exact”:false,“location”:"?",“version”:"?"}]},“extendedStackTrace”:[{“class”:“org.apache.druid.storage.hdfs.HdfsDataSegmentPuller”,“method”:“getSegmentFiles”,“file”:“HdfsDataSegmentPuller.java”,“line”:292,“exact”:false,“location”:"?",“version”:"?"},{“class”:“org.apache.druid.storage.hdfs.HdfsLoadSpec”,“method”:“loadSegment”,“file”:“HdfsLoadSpec.java”,“line”:57,“exact”:false,“location”:"?",“version”:"?"},{“class”:“org.apache.druid.segment.loading.SegmentLocalCacheManager”,“method”:“loadInLocation”,“file”:“SegmentLocalCacheManager.java”,“line”:327,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.segment.loading.SegmentLocalCacheManager”,“method”:“loadInLocationWithStartMarker”,“file”:“SegmentLocalCacheManager.java”,“line”:315,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.segment.loading.SegmentLocalCacheManager”,“method”:“loadInLocationWithStartMarkerQuietly”,“file”:“SegmentLocalCacheManager.java”,“line”:276,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.segment.loading.SegmentLocalCacheManager”,“method”:“loadSegmentWithRetry”,“file”:“SegmentLocalCacheManager.java”,“line”:255,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.segment.loading.SegmentLocalCacheManager”,“method”:“getSegmentFiles”,“file”:“SegmentLocalCacheManager.java”,“line”:211,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.segment.loading.SegmentLocalCacheLoader”,“method”:“getSegment”,“file”:“SegmentLocalCacheLoader.java”,“line”:52,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.server.SegmentManager”,“method”:“getSegmentReference”,“file”:“SegmentManager.java”,“line”:272,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.server.SegmentManager”,“method”:“loadSegment”,“file”:“SegmentManager.java”,“line”:219,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.server.coordination.SegmentLoadDropHandler”,“method”:“loadSegment”,“file”:“SegmentLoadDropHandler.java”,“line”:278,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.server.coordination.SegmentLoadDropHandler”,“method”:“addSegment”,“file”:“SegmentLoadDropHandler.java”,“line”:329,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.server.coordination.SegmentChangeRequestLoad”,“method”:“go”,“file”:“SegmentChangeRequestLoad.java”,“line”:61,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.server.coordination.ZkCoordinator”,“method”:“lambda$childAdded$2”,“file”:“ZkCoordinator.java”,“line”:150,“exact”:false,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“java.util.concurrent.Executors$RunnableAdapter”,“method”:“call”,“file”:“Executors.java”,“line”:511,“exact”:true,“location”:"?",“version”:“1.8.0_332”},{“class”:“java.util.concurrent.FutureTask”,“method”:“run”,“file”:“FutureTask.java”,“line”:266,“exact”:true,“location”:"?",“version”:“1.8.0_332”},{“class”:“java.util.concurrent.ThreadPoolExecutor”,“method”:“runWorker”,“file”:“ThreadPoolExecutor.java”,“line”:1149,“exact”:true,“location”:"?",“version”:“1.8.0_332”},{“class”:“java.util.concurrent.ThreadPoolExecutor$Worker”,“method”:“run”,“file”:“ThreadPoolExecutor.java”,“line”:624,“exact”:true,“location”:"?",“version”:“1.8.0_332”},{“class”:“java.lang.Thread”,“method”:“run”,“file”:“Thread.java”,“line”:750,“exact”:true,“location”:"?",“version”:“1.8.0_332”}]},“endOfBatch”:false,“loggerFqcn”:“org.apache.logging.slf4j.Log4jLogger”,“threadId”:85,“threadPriority”:5,“timestamp”:“2022-06-17T06:20:38.613+0000”}
Regards,
Chaitanya