Druid uri lookup with http

Hi all,

I posted this in the slack, but I thought that it would be good if I asked here too.

I’m interested in using a uri lookup where the lookup data is put on http, but I am getting this error message in my logs:

2020-02-18T14:11:41,743 ERROR [NamespaceExtractionCacheManager-1] org.apache.druid.server.lookup.namespace.cache.CacheScheduler - Failed to update namespace [UriExtractionNamespace{uri=[http://localhost:8000/user-lt.csv](https://slack-redir.net/link?url=http%3A%2F%2Flocalhost%3A8000%2Fuser-lt.csv), uriPrefix=null, namespaceParseSpec=CSVFlatDataParser{columns=[user-id, name], keyColumn='user-id', valueColumn='name'}, fileRegex='null', pollPeriod=PT20S}] : org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@24f3b507
org.apache.druid.java.util.common.IAE: Unknown loader type[http].  Known types are [hdfs, file]
	at org.apache.druid.server.lookup.namespace.UriCacheGenerator.generateCache(UriCacheGenerator.java:77) ~[druid-lookups-cached-global-0.17.0.jar:0.17.0]
	at org.apache.druid.server.lookup.namespace.UriCacheGenerator.generateCache(UriCacheGenerator.java:47) ~[druid-lookups-cached-global-0.17.0.jar:0.17.0]
	at org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl.tryUpdateCache(CacheScheduler.java:229) [druid-lookups-cached-global-0.17.0.jar:0.17.0]
	at org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl.updateCache(CacheScheduler.java:208) [druid-lookups-cached-global-0.17.0.jar:0.17.0]
	at org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl.access$600(CacheScheduler.java:144) [druid-lookups-cached-global-0.17.0.jar:0.17.0]
	at org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl$2.run(CacheScheduler.java:190) [druid-lookups-cached-global-0.17.0.jar:0.17.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_242]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_242]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_242]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]

The Unknown loader type[http] suggests that http is not supported. Is it not possible to have the lookup data on http?

Here is my lookup config:

{
  "__default": {
    "user_lookup": {
      "version": "v7",
      "lookupExtractorFactory": {
        "type": "cachedNamespace",
        "extractionNamespace": {
          "type": "uri",
          "uri": "[http://localhost:8000/user-lt.csv](https://slack-redir.net/link?url=http%3A%2F%2Flocalhost%3A8000%2Fuser-lt.csv)",
          "namespaceParseSpec": {
            "format": "csv",
            "columns": ["user-id", "name"]
          },
          "pollPeriod": "PT20S"
        }
      },
      "firstCacheTimeout": 0
    }
  }
}

and my extension load list in conf/druid/single-server/micro-quickstart/_common/common.runtime.properties looks like this:

druid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "druid-lookups-cached-global"]

My druid version is 0.17 and I am running it using ./bin/start-micro-quickstart. How can I get these lookups to work?

Thanks,

Emily

Update: it turns out that uri lookups are not currently supported. A feature request to add this has been made: https://github.com/apache/druid/issues/9377

Emily