Migrated from Single Server to cluster, now my historicals are throwing errors

Hi, new user here. I’ve recently migrated my druid installation from a single server to a cluster using the “Clustered Deployment” tutorial.

Things generally work, but my historical nodes seem to have some problem with old data that causes the logs to churn like bananas, and I don’t really get why. I’d love some advice to help me find out more.

I’ll describe my architecture in detail below, but basically I have 1 controller, 3 data servers, 1 query server, a postgres server for metadata storage and a server running minio pretending to be AWS S3. I load my data from Kafka.

Queries run fine, but 2 of the 3 historical nodes absolutely trash their hard drive with logs. The relevant log message is always something like this;

Caused by: java.lang.IllegalArgumentException: [/home/druid/druid/var/druid/segments/telemetry-signals-tagged/2021-03-12T13:00:00.000Z_2021-03-12T14:00:00.000Z/2021-03-24T12:42:13.180Z/3/9b4949ac-7381-4c34-9605-c6412c0d5adf/index.zip] does not exist

Note the path of the missing file: /home/druid/druid/var/druid/segments/ does not exist. I assume it is the location where the segments were stored when I was still using a single-server deployment.

I am assuming it’s not minio’s fault since loading and distribution of segments works partially. If minio didn’t work as an S3 replacement, it would fail completely.

Please tell me how I can mitigate this!

  • Ideally I’d like to know where the paths are stored so I can try to fix them manually or in batch (a cursory search of the metadata postgres revealed nothing)
  • Worst case, I’d like to see a list of all missing segments and just drop them, since they’re not loaded right now anyway.
  • Tips on how to dig deeper into diagnosing the problem are also very much appreciated.

Things I've tried
  • Dropping Segments: At first I thought this only relates to very old segments, so I dropped the oldest two months or so of my data. This didn’t help, the error messages just moved on to newer segment dates being missing.
  • Resetting my master server: Setting up a new master server and replacing the old one did not help.
  • Resetting the data servers: Setting up new data servers did not help either.
Architecture
  • druid-controller-1: 10.0.1.11: Unused for now
  • druid-controller-2: 10.0.0.6: Runs ZK and master using start-cluster-master-with-zk-server
  • druid-data-1: 10.0.1.21: Runs historical and middle manager using start-cluster-data-server. Somehow this server has all the data and never complains about files not existing
  • druid-data-2: 10.0.1.22: Runs historical and middle manager using start-cluster-data-server. Shows the file does not exist error frequently.
  • druid-data-3: 10.0.1.23: Runs historical and middle manager using start-cluster-data-server. Shows the file does not exist error frequently.
  • druid-query-1: 10.0.1.31: Runs router and broker using start-cluster-query-server
  • druid-metadata-storage: 10.0.0.7: Runs a postgres database
  • druid-deep-storage: 10.0.0.8: Runs a minio instance
Logs and Config Files

common.runtime.properties

druid.host=host

druid.startup.logging.logProperties=true

druid.zk.service.host=10.0.0.6
druid.zk.paths.base=/druid

druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://10.0.0.7:5432/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=thisistotallymyrealpassword

druid.storage.type=s3
druid.storage.bucket=deepstorage
druid.storage.baseKey=druid/segments
druid.s3.accessKey=some_user
druid.s3.secretKey=ofcourseIampastingmysecretkeyintothis

druid.s3.protocol=http
druid.s3.enablePathStyleAccess=true
druid.s3.endpoint.url=http://10.0.0.8:9000

druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=logs
druid.indexer.logs.s3Prefix=druid/indexing-logs

druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
druid.emitter.logging.logLevel=info

druid.indexing.doubleStorage=double

druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"]

druid.sql.enable=true
druid.lookup.enableLookupSyncOnStartup=false

historical.log from data-2

Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of `org.apache.druid.segment.loading.LocalLoadSpec`, problem: [/home/druid/druid/var/druid/segments/telemetry-signals-tagged/2021-04-24T07:00:00.000Z_2021-04-24T08:00:00.000Z/2021-04-24T07:00:02.680Z/0/6f2ad38e-efca-4b25-a1dc-5c4c85c16853/index.zip] does not exist
 at [Source: UNKNOWN; line: -1, column: -1]
	at com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1735) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:491) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:514) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:285) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:229) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:198) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:488) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1292) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:194) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3933) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3869) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:303) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:292) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:253) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:225) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:186) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:278) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:224) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:272) ~[druid-server-0.21.1.jar:0.21.1]
	... 8 more
Caused by: java.lang.IllegalArgumentException: [/home/druid/druid/var/druid/segments/telemetry-signals-tagged/2021-04-24T07:00:00.000Z_2021-04-24T08:00:00.000Z/2021-04-24T07:00:02.680Z/0/6f2ad38e-efca-4b25-a1dc-5c4c85c16853/index.zip] does not exist
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:148) ~[guava-16.0.1.jar:?]
	at org.apache.druid.segment.loading.LocalLoadSpec.<init>(LocalLoadSpec.java:51) ~[druid-server-0.21.1.jar:0.21.1]
	at sun.reflect.GeneratedConstructorAccessor58.newInstance(Unknown Source) ~[?:?]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_292]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_292]
	at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:229) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:198) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:488) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1292) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:194) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3933) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3869) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:303) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:292) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:253) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:225) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:186) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:278) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:224) ~[druid-server-0.21.1.jar:0.21.1]
	at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:272) ~[druid-server-0.21.1.jar:0.21.1]
	... 8 more
2021-08-10T13:39:19,503 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: telemetry-signals-tagged_2021-04-24T07:00:00.000Z_2021-04-24T08:00:00.000Z_2021-04-24T07:00:02.680Z]
2021-08-10T13:39:19,503 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.0.1.23:8083/telemetry-signals-tagged_2021-04-24T07:00:00.000Z_2021-04-24T08:00:00.000Z_2021-04-24T07:00:02.680Z] was removed

coordinator-overlord.log from controller-2

2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T16:00:00.000Z_2021-01-01T17:00:00.000Z_2021-03-23T04:50:49.855Z_14] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T15:00:00.000Z_2021-01-01T16:00:00.000Z_2021-03-23T04:50:19.026Z_12] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T14:00:00.000Z_2021-01-01T15:00:00.000Z_2021-03-24T20:52:26.514Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T13:00:00.000Z_2021-01-01T14:00:00.000Z_2021-03-23T08:03:36.019Z_11] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T12:00:00.000Z_2021-01-01T13:00:00.000Z_2021-03-24T20:51:38.526Z_5] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T11:00:00.000Z_2021-01-01T12:00:00.000Z_2021-03-23T09:22:42.188Z_12] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T10:00:00.000Z_2021-01-01T11:00:00.000Z_2021-03-23T08:02:43.834Z_12] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T09:00:00.000Z_2021-01-01T10:00:00.000Z_2021-03-23T08:02:37.244Z_11] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T08:00:00.000Z_2021-01-01T09:00:00.000Z_2021-03-23T08:02:11.249Z_12] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T07:00:00.000Z_2021-01-01T08:00:00.000Z_2021-03-24T15:49:36.978Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T06:00:00.000Z_2021-01-01T07:00:00.000Z_2021-03-24T20:55:05.296Z_5] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T05:00:00.000Z_2021-01-01T06:00:00.000Z_2021-03-24T20:55:05.094Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T04:00:00.000Z_2021-01-01T05:00:00.000Z_2021-03-24T20:55:05.049Z_3] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T03:00:00.000Z_2021-01-01T04:00:00.000Z_2021-03-24T20:54:59.528Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T02:00:00.000Z_2021-01-01T03:00:00.000Z_2021-03-24T20:54:59.297Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T01:00:00.000Z_2021-01-01T02:00:00.000Z_2021-03-24T20:54:58.995Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2021-01-01T00:00:00.000Z_2021-01-01T01:00:00.000Z_2021-03-24T20:54:58.942Z_6] in tier [_default_tier]
2021-08-10T13:21:59,293 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,294 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'replica' for segment [telemetry-signals-tagged_2020-12-31T23:00:00.000Z_2021-01-01T00:00:00.000Z_2021-03-23T07:59:43.044Z_14] to server [10.0.1.23:8083] in tier [_default_tier]
2021-08-10T13:21:59,294 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,295 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'replica' for segment [telemetry-signals-tagged_2020-12-31T22:00:00.000Z_2020-12-31T23:00:00.000Z_2021-03-23T07:59:11.472Z_11] to server [10.0.1.22:8083] in tier [_default_tier]
2021-08-10T13:21:59,295 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,296 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'replica' for segment [telemetry-signals-tagged_2020-12-31T21:00:00.000Z_2020-12-31T22:00:00.000Z_2021-03-23T07:59:02.400Z_10] to server [10.0.1.23:8083] in tier [_default_tier]
2021-08-10T13:21:59,296 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,296 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Throttling replication for segment [telemetry-signals-tagged_2020-12-31T20:00:00.000Z_2020-12-31T21:00:00.000Z_2021-03-24T16:52:55.564Z_8] in tier [_default_tier]
2021-08-10T13:21:59,296 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2021-08-10T13:21:59,350 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - Found 3 active servers, 0 decommissioning servers
2021-08-10T13:21:59,350 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - Processing 4 segments for moving from decommissioning servers
2021-08-10T13:21:59,350 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - All servers to move segments from are empty, ending run.
2021-08-10T13:21:59,350 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - Processing 5 segments for balancing between active servers
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - [_default_tier]: Segments Moved: [1] Segments Let Alone: [4]
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Assigned 253 segments among 3 servers
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Dropped 0 segments among 3 servers
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Moved 1 segment(s)
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Let alone 4 segment(s)
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Load Queues:
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Server[10.0.1.23:8083, historical, _default_tier] has 1 left to load, 0 left to drop, 170,968 bytes queued, 337,219,668 bytes served.
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Server[10.0.1.22:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 423,058,261 bytes served.
2021-08-10T13:21:59,375 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Server[10.0.1.21:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 2,157,019,084 bytes served.
2021-08-10T13:22:00,133 INFO [KafkaSupervisor-telemetry-signals-tagged] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [telemetry-signals-tagged] supervisor is running.
2021-08-10T13:22:00,133 INFO [KafkaSupervisor-telemetry-signals-tagged] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id='telemetry-signals-tagged', generationTime=2021-08-10T13:22:00.133Z, payload=KafkaSupervisorReportPayload{dataSource='telemetry-signals-tagged', topic='telemetry-signals-tagged', partitions=1, replicas=1, durationSeconds=3600, active=[{id='index_kafka_telemetry-signals-tagged_c3282f66ec65ace_pcggidaf', startTime=2021-08-10T12:42:36.694Z, remainingSeconds=1236}], publishing=[], suspended=false, healthy=true, state=RUNNING, detailedState=RUNNING, recentErrors=[]}}

Relates to Apache Druid 0.21.1

Is it possible that the path should be:
/home/druid/var/druid/segments/
rather than
/home/druid/druid/var/druid/segments/
?

Hey @BTS you can see inside the metadata store where exactly Druid has recorded that your segments are on a segment-by-segment basis. Here I can see the bucket that a segment is in, for example:

If you want to just get rid of the segments, mark them as unused using the Drop rules, and then you can either use the Kill API or you can set up automatic kill.

There’s also a tool to migrate – though I’ve never used it :smiley:

Hi Rachel and Peter,

Rachel, I’ve just double checked and the directory is correct. I have a druid directory that is inside my users home directory at /home/druid/. I believe the problem came with the migration.

Peter, thanks that looks very promising. I’ve run a query on the postgres server and found the list of segments. Can I just delete these entries from the database, or will there be any bad side effects from that?

Thanks you two!
Daniel

Best to use the Drop rules if you can tbh - and then to run Kill – this will delete the metadata as well as the deep storage.

Ohhh you can also mark segments as Unused through the API as well.

Thanks! I’ll play around with that and report back :grinning:

1 Like

Soo here’s an update: I marked the segments that were somehow locally stored as unused – turns out that was basically all the data previous to my move, so I must have botched the migration somehow. I killed the segments and re-played them from my long-term storage (I log all the JSONs into files before handing them over to Kafka to prevent data loss in these kinds of cases). Now things work as expected.

Thank you very much for your help.