Hi all,
I am trying to upgrade a cluster from 0.10.1 to 0.12.0 and have run into a problem with the lookups feature.
I have started with one historical node and upgraded it from 0.10.1 to 0.12.0. All went smooth except for lookups not working on queries sent against it.
I checked the coordinator (which is still on 0.10.1) log and found this exception:
2018-03-15T14:19:23,620 ERROR [LookupCoordinatorManager–7] io.druid.server.lookup.cache.LookupCoordinatorManager - Failed to finish lookup management loop.: {class=io.druid.server.lookup.cache.LookupCoordinatorManager, exceptionType=class java.lang.IllegalStateException, exceptionMessage=null}
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:161) ~[guava-16.0.1.jar:?]
at com.google.common.net.HostAndPort.getPort(HostAndPort.java:110) ~[guava-16.0.1.jar:?]
at io.druid.server.lookup.cache.LookupCoordinatorManager.lookupManagementLoop(LookupCoordinatorManager.java:517) ~[druid-server-0.10.1.jar:0.10.1]
at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run(MoreExecutors.java:582) [guava-16.0.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_151]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_151]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
``
I went further and checked zookeeper and noticed that the upgraded node shows up as http:historical-hostname-X:8083 under /druid/listeners/lookups/__default while everything else shows up as historical-hostname-Y:8083. It seems that this pull request https://github.com/druid-io/druid/pull/4270 that was included in 0.11.0 release has changed the format of the hosts to add the scheme.
I could not find anything about this in the 0.11.0 or 0.12.0 release notes. Is there an upgrade path from 0.10.1 to 0.12.0 without incurring downtime on the query nodes (ingestion can be delayed, we are running batch jobs every few minutes so those can be paused for a while)?
Thanks,
Alex