[druid-user] Druid Data source tab shows 0 rows all of a sudden

Hi ,

Druid Ingestion has been ongoing for almost 3 days now. However I see that the Data source tab doesnt show any rows

Also the datasource isnt visible while querying.

Any idea how to get over this issue.
Now I see Broker and historicals restarting with the below error seen in brokers,

{“instant”:{“epochSecond”:1658126146,“nanoOfSecond”:180000000},“thread”:“ServerInventoryView-0”,“level”:“WARN”,“loggerName”:“org.apache.druid.curator.inventory.CuratorInventoryManager”,“message”:“Exception while getting data for node /druid/segments/:8103/:8103_indexer-executor__default_tier_2022-07-18T06:25:25.846Z_c18fec55f1b14ed48a57726e0625c2580”,“thrown”:{“commonElementCount”:0,“localizedMessage”:“KeeperErrorCode = NoNode for /druid/segments/:8103/:8103_indexer-executor__default_tier_2022-07-18T06:25:25.846Z_c18fec55f1b14ed48a57726e0625c2580”,“message”:“KeeperErrorCode = NoNode for /druid/segments/:8103/:8103_indexer-executor__default_tier_2022-07-18T06:25:25.846Z_c18fec55f1b14ed48a57726e0625c2580”,“name”:“org.apache.zookeeper.KeeperException$NoNodeException”,“extendedStackTrace”:[{“class”:“org.apache.zookeeper.KeeperException”,“method”:“create”,“file”:“KeeperException.java”,“line”:118,“exact”:false,“location”:“zookeeper-3.5.9.jar”,“version”:“3.5.9”},{“class”:“org.apache.zookeeper.KeeperException”,“method”:“create”,“file”:“KeeperException.java”,“line”:54,“exact”:false,“location”:“zookeeper-3.5.9.jar”,“version”:“3.5.9”},{“class”:“org.apache.zookeeper.ZooKeeper”,“method”:“getData”,“file”:“ZooKeeper.java”,“line”:2131,“exact”:false,“location”:“zookeeper-3.5.9.jar”,“version”:“3.5.9”},{“class”:“org.apache.curator.framework.imps.GetDataBuilderImpl$4”,“method”:“call”,“file”:“GetDataBuilderImpl.java”,“line”:327,“exact”:false,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.imps.GetDataBuilderImpl$4”,“method”:“call”,“file”:“GetDataBuilderImpl.java”,“line”:316,“exact”:false,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.connection.StandardConnectionHandlingPolicy”,“method”:“callWithRetry”,“file”:“StandardConnectionHandlingPolicy.java”,“line”:67,“exact”:false,“location”:“curator-client-4.3.0.jar”,“version”:“?”},{“class”:“org.apache.curator.RetryLoop”,“method”:“callWithRetry”,“file”:“RetryLoop.java”,“line”:81,“exact”:false,“location”:“curator-client-4.3.0.jar”,“version”:“?”},{“class”:“org.apache.curator.framework.imps.GetDataBuilderImpl”,“method”:“pathInForeground”,“file”:“GetDataBuilderImpl.java”,“line”:313,“exact”:false,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.imps.GetDataBuilderImpl”,“method”:“forPath”,“file”:“GetDataBuilderImpl.java”,“line”:304,“exact”:false,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.imps.GetDataBuilderImpl$1”,“method”:“forPath”,“file”:“GetDataBuilderImpl.java”,“line”:107,“exact”:false,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.imps.GetDataBuilderImpl$1”,“method”:“forPath”,“file”:“GetDataBuilderImpl.java”,“line”:67,“exact”:false,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.druid.curator.inventory.CuratorInventoryManager”,“method”:“getZkDataForNode”,“file”:“CuratorInventoryManager.java”,“line”:188,“exact”:true,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.curator.inventory.CuratorInventoryManager”,“method”:“access$400”,“file”:“CuratorInventoryManager.java”,“line”:59,“exact”:true,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.druid.curator.inventory.CuratorInventoryManager$ContainerCacheListener$InventoryCacheListener”,“method”:“childEvent”,“file”:“CuratorInventoryManager.java”,“line”:389,“exact”:true,“location”:“druid-server-0.22.1.jar”,“version”:“0.22.1”},{“class”:“org.apache.curator.framework.recipes.cache.PathChildrenCache$5”,“method”:“apply”,“file”:“PathChildrenCache.java”,“line”:538,“exact”:true,“location”:“curator-recipes-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.recipes.cache.PathChildrenCache$5”,“method”:“apply”,“file”:“PathChildrenCache.java”,“line”:532,“exact”:true,“location”:“curator-recipes-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.listen.ListenerContainer$1”,“method”:“run”,“file”:“ListenerContainer.java”,“line”:100,“exact”:true,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor”,“method”:“execute”,“file”:“DirectExecutor.java”,“line”:30,“exact”:true,“location”:“curator-client-4.3.0.jar”,“version”:“?”},{“class”:“org.apache.curator.framework.listen.ListenerContainer”,“method”:“forEach”,“file”:“ListenerContainer.java”,“line”:92,“exact”:true,“location”:“curator-framework-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.recipes.cache.PathChildrenCache”,“method”:“callListeners”,“file”:“PathChildrenCache.java”,“line”:530,“exact”:true,“location”:“curator-recipes-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.recipes.cache.EventOperation”,“method”:“invoke”,“file”:“EventOperation.java”,“line”:35,“exact”:true,“location”:“curator-recipes-4.3.0.jar”,“version”:“4.3.0”},{“class”:“org.apache.curator.framework.recipes.cache.PathChildrenCache$9”,“method”:“run”,“file”:“PathChildrenCache.java”,“line”:808,“exact”:true,“location”:“curator-recipes-4.3.0.jar”,“version”:“4.3.0”},{“class”:“java.util.concurrent.Executors$RunnableAdapter”,“method”:“call”,“file”:“Executors.java”,“line”:511,“exact”:true,“location”:“?”,“version”:“1.8.0_332”},{“class”:“java.util.concurrent.FutureTask”,“method”:“run”,“file”:“FutureTask.java”,“line”:266,“exact”:true,“location”:“?”,“version”:“1.8.0_332”},{“class”:“java.util.concurrent.Executors$RunnableAdapter”,“method”:“call”,“file”:“Executors.java”,“line”:511,“exact”:true,“location”:“?”,“version”:“1.8.0_332”},{“class”:“java.util.concurrent.FutureTask”,“method”:“run”,“file”:“FutureTask.java”,“line”:266,“exact”:true,“location”:“?”,“version”:“1.8.0_332”},{“class”:“java.util.concurrent.ThreadPoolExecutor”,“method”:“runWorker”,“file”:“ThreadPoolExecutor.java”,“line”:1149,“exact”:true,“location”:“?”,“version”:“1.8.0_332”},{“class”:“java.util.concurrent.ThreadPoolExecutor$Worker”,“method”:“run”,“file”:“ThreadPoolExecutor.java”,“line”:624,“exact”:true,“location”:“?”,“version”:“1.8.0_332”},{“class”:“java.lang.Thread”,“method”:“run”,“file”:“Thread.java”,“line”:750,“exact”:true,“location”:“?”,“version”:“1.8.0_332”}]},“endOfBatch”:false,“loggerFqcn”:“org.apache.logging.slf4j.Log4jLogger”,“threadId”:110,“threadPriority”:5,“timestamp”:“2022-07-18T06:35:46.180+0000”}
{“instant”:{“epochSecond”:1658126146,“nanoOfSecond”:196000000},“thread”:“ServerInventoryView-0”,“level”:“WARN”,“loggerName”:“org.apache.druid.curator.inventory.CuratorInventoryManager”,“message”:“Ignoring event: Type - CHILD_ADDED , Path - /druid/segments/:8103/:8103_indexer-executor__default_tier_2022-07-18T06:25:25.846Z_c18fec55f1b14ed48a57726e0625c2580 , Version - 2”,“endOfBatch”:false,“loggerFqcn”:“org.apache.logging.slf4j.Log4jLogger”,“threadId”:110,“threadPriority”:5,“timestamp”:“2022-07-18T06:35:46.196+0000”}

Could someone help us with this query ?

Regards,
Chaitanya

Hey I’m not able to see the screenshot very well, I’m afraid? May have better luck posting it in Druid Forum?

What I can say though is that the coordinator is the service that will go and query your metadata database to find out what segments exist and are “used” – then will instruct the historicals, via Zookeeper, to go get that data. Then, as the historicals load their data, they advertise what they have back through Zookeeper so that the broker knows what you can query.

I would start with looking at the coordinator logs, and see that it can read the metadata database OK, and then you will also see it sending messages in the log to the historicals to load their data.

In the historical logs, you’ll see them pick up the requests from Zookeeper and then go and pull them out of deep storage.

I hope that’s a help as a start?

Some of these concepts are covered in the Druid Basics course at https://learn.imply.io which may help you find the root cause.

Regarding the query, is it large? If it is, you might want to filter by time.

Out of curiosity, what’s your druid.server.http.maxSubqueryRows?

Hi,

druid.server.http.maxSubqueryRows isnt set in my yaml . It would take default vaule probably.
Yes , the data is large and there are around 1Billion rows which occupy 2.3TB data.

There are multiple queries being fired, concurrently by upto 30 users. Some queries have upto 100 counters.

Below is the config of my historicals
historicals:
druidPort: 8083
extraJvmOptions: |-
-Xmx15g
-Xms512M
-XX:+UseG1GC
-XX:+ExitOnOutOfMemoryError
-XX:+PrintGCDetails
-XX:MaxDirectMemorySize=13g
nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/historical
nodeType: historical
replicas: 2
resources:
limits:
cpu: 16
memory: 50Gi
requests:
cpu: 500m
memory: 20Gi
runtimeProperties: |
druid.service=druid/historical
druid.plaintextPort=8083
druid.server.http.numThreads=50
druid.processing.buffer.sizeBytes=500MiB
druid.processing.numMergeBuffers=4
druid.processing.numThreads=15

Segment storage

druid.segmentCache.locations=[{“path”: “/opt/druid/var/druid/segment-cache/”,“maxSize”: “1500000000000”}]
druid.server.maxSize=1500GiB

Query cache

druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine

druid.cache.sizeInBytes=256MiB

druid.cache.sizeInBytes=1GiB
druid.server.http.defaultQueryTimeout=1500000
druid.segmentCache.lazyLoadOnStart=true
druid.storage.type=hdfs
volumeClaimTemplates:

  • metadata:
    name: data
    spec:
    accessModes:
  • ReadWriteOnce
    resources:
    requests:
    storage: 1500Gi
    volumeMounts:
  • mountPath: /opt/druid/var/druid/
    name: data
  • mountPath: /opt/druid/conf/druid/cluster/_common/hadoop-xml
    name: hadoop-config
    volumes:
  • configMap:
    name: apache-hadoop-configmap
    name: hadoop-config
  • emptyDir: {}
    name: data

The broker restart problem got resolved after tuning a param .
The UI is back and I can see the data source now . However, currently historicals arent loading still and is trying to load segments at start even if druid.segmentCache.lazyLoadOnStart=true

Historicals constantly keep restarting and the logs have segment loading message without any other errors.

Regards,
Chaitanya

Hey! How many historicals do you have?

I was looking at your screenshot better today – I see the datasource is only 37% available – which would indicate that you have an issue with the segments getting to the historical servers.

You have your location at 1500000000000 bytes – that’s 1.5TB if my maths is right? The load rules will default to 2 replicas of each segment – so if you have only one historical server, it will not be enough space to load all segments. I wonder if this is why you are getting NoNode warnings. A good way to calculate how many you need is to think about how much RAM you will have in each one. Druid will memory map segments to speed up calculations as a query runs. So if you have a server with 64GB RAM, maybe you would only have 256GB of segments on that server – so with your data set of 2.3TB, you may want to have 10 historicals. (Of course, this sizing work can get a lot more detailed!)

Also, the ideal segment size is about 500MB. I see you say you have 2.3TB and your screenshot has around 20,000 segments – which my maths (!!!) says is about 115MB. The more segments that there are, the more have to be processed in a query, and in addition, the more overhead there is on maintaining them. You may want to apply a greater segment granularity at ingestion time, or turn on automatic compaction (remembering that this will consume ingestion cores, and it’ll take a while).

Another item you may need to check is the setting in the coordinator dynamic configuration – it’s maxSegmentsToMove:

Hope this is helpful!

  • Pete

Hi Peter ,

Thanks for this suggestion. There were 2 historicals earlier. Increasing the historicals(and reducing the segment cache , max server size) helped resolve those issues with historicals.

Regards,
Chaitanya