OOMKilled Druid Operator. How to add resource in examples/tiny-cluster.yaml

I am deploying Druid-operator on my enterprise Kube. I am getting OOM Killed error:

NAME                                READY   STATUS      RESTARTS   AGE
druid-operator-c68cf5bc8-n6pzt      1/1     Running     0          107m
druid-tiny-cluster-brokers-0        0/1     OOMKilled   1          62s
druid-tiny-cluster-coordinators-0   0/1     OOMKilled   2          62s
druid-tiny-cluster-historicals-0    0/1     OOMKilled   1          62s
druid-tiny-cluster-routers-0        0/1     OOMKilled   1          62s
tiny-cluster-zk-0                   1/1     Running     0          16h

I tried using “resources” to add resource but getting below error:

error: error validating "examples/tiny-cluster.yaml": error validating data: ValidationError(Druid.spec): unknown field "resources" in org.apache.druid.v1alpha1.Druid.spec; if you choose to ignore these errors, turn validation off with --validate=false

Please suggest how to add “resources” in tiny-cluster.yaml.

Thanks.

Kindly go through druid-operator/examples.md at master · druid-io/druid-operator · GitHub

This example has all the key/values pair the druid operator Custom Resource Spec Supports.
If you have applied resource correctly this should work , regardless can you share the your current yaml and describe the way you installed the druid operator

AdheipSingh

Hi AdheipSingh,

II have followed the steps given in: druid-operator/getting_started.md at master · druid-io/druid-operator · GitHub

I used the same YAML given in Deploy and Example folders for ZK, Operator and tiny-cluster. made below changes:

  1. made S3 as deep storage change in common_properties. When I deployed this change in Minikube, it went fine but failed with OOMKill error in my enterprise Kube.
  2. readinessProbe section commented as I was getting readinessProbe error
  3. hostpath code section commented as I was getting host path error

Can you also help with Ingress section. I am using Router ingress for Unified Console but getting error “error validating data: ValidationError(Ingress): unknown field “routers” in io.k8s.api.extensions.v1beta1.Ingres”

Note: I had line druid.storage.type=s3 commented. After un-commenting I am no more getting OOMKill error. But, I dont see the cluster in “get pods” command , though I can see them in “get svc”.

The Tiny-Cluster Yaml is given below:

# This spec only works on a single node kubernetes cluster(e.g. typical k8s cluster setup for dev using kind/minikube or single node AWS EKS cluster etc)
# as it uses local disk as "deep storage".
#
apiVersion: "druid.apache.org/v1alpha1"
kind: "Druid"
metadata:
  name: tiny-cluster
  namespace: <namespace>
  labels:
    sdr.appname: tiny-cluster
spec:
  image: apache/druid:0.20.0
  # Optionally specify image for all nodes. Can be specify on nodes also
  # imagePullSecrets:
  # - name: tutu
  startScript: /druid.sh
  podLabels:
    environment: stage
    release: alpha
  podAnnotations:
    dummykey: dummyval
#  readinessProbe:
#    httpGet:
#      path: /status/health
#      port: 8088
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsGroup: 1000
  services:
    - spec:
        type: ClusterIP
        clusterIP: None
  commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
  jvm.options: |-
    -server
    -XX:MaxDirectMemorySize=10240g
    -Duser.timezone=UTC
    -Dfile.encoding=UTF-8
    -Dlog4j.debug
    -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
    -Djava.io.tmpdir=/druid/data
  log4j.config: |-
    <?xml version="1.0" encoding="UTF-8" ?>
    <Configuration status="WARN">
        <Appenders>
            <Console name="Console" target="SYSTEM_OUT">
                <PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
            </Console>
        </Appenders>
        <Loggers>
            <Root level="info">
                <AppenderRef ref="Console"/>
            </Root>
        </Loggers>
    </Configuration>
  common.runtime.properties: |

    # Zookeeper
    druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk
    druid.zk.paths.base=/druid
    druid.zk.service.compress=false

    # Metadata Store
    druid.metadata.storage.type=derby
    druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
    druid.metadata.storage.connector.host=localhost
    druid.metadata.storage.connector.port=1527
    druid.metadata.storage.connector.createTables=true

    # Deep Storage
    #druid.storage.type=local
    druid.storage.storageDirectory=/druid/deepstorage
    druid.storage.type=s3
    druid.storage.bucket=druid-cluster
    druid.storage.baseKey=druid/segments
    druid.s3.accessKey=<access_key>
    druid.s3.secretKey=<secret_key>
    druid.s3.endpoint.signingRegion=<region>
    druid.s3.endpoint.url=<url>
    druid.s3.protocol=http
    druid.s3.disableChunkedEncoding=true
    druid.s3.enablePathStyleAccess=true
    druid.s3.forceGlobalBucketAccessEnabled=false
    druid.storage.disableAcl=true
    druid.storage.useS3aSchema=true

    #
    # Extensions
    #
    druid.extensions.loadList=["druid-orc-extensions", "druid-s3-extensions", "druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-parquet-extensions", "druid-datasketches", "druid-basic-security"]

    #
    # Service discovery
    #
    druid.selectors.indexing.serviceName=druid/overlord
    druid.selectors.coordinator.serviceName=druid/coordinator

    #druid.indexer.logs.type=file
    #druid.indexer.logs.directory=/druid/data/indexing-logs
    druid.indexer.logs.type=s3
    druid.indexer.logs.s3Bucket=druid-cluster
    druid.indexer.logs.s3Prefix=druid/indexing-logs
    #added this below one
    druid.indexer.logs.disableAcl=true
    druid.lookup.enableLookupSyncOnStartup=false
  volumeMounts:
    - mountPath: /druid/data
      name: data-volume
    - mountPath: /druid/deepstorage
      name: deepstorage-volume
  volumes:
    - name: data-volume
      emptyDir: {}
#    - name: deepstorage-volume
#      emptyDir: {}
#      hostPath:
#        path: /tmp/druid/deepstorage
#        type: DirectoryOrCreate
  env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace

  nodes:
    brokers:
      # Optionally specify for running broker as Deployment
      # kind: Deployment
      nodeType: "broker"
      # Optionally specify for broker nodes
      # imagePullSecrets:
      # - name: tutu
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
      replicas: 1
      runtime.properties: |
        druid.service=druid/broker

        # HTTP server threads
        druid.broker.http.numConnections=5
        druid.server.http.numThreads=10

        # Processing threads and buffers
        druid.processing.buffer.sizeBytes=1
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=1
        druid.sql.enable=true
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M

    coordinators:
      # Optionally specify for running coordinator as Deployment
      # kind: Deployment
      nodeType: "coordinator"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
      replicas: 1
      runtime.properties: |
        druid.service=druid/coordinator

        # HTTP server threads
        druid.coordinator.startDelay=PT30S
        druid.coordinator.period=PT30S

        # Configure this coordinator to also run as Overlord
        druid.coordinator.asOverlord.enabled=true
        druid.coordinator.asOverlord.overlordService=druid/overlord
        druid.indexer.queue.startDelay=PT30S
        druid.indexer.runner.type=local
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M

    historicals:
      nodeType: "historical"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
      replicas: 1
      runtime.properties: |
        druid.service=druid/historical
        druid.server.http.numThreads=5
        druid.processing.buffer.sizeBytes=536870912
        druid.processing.numMergeBuffers=1
        druid.processing.numThreads=1
        # Segment storage
        druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":10737418240}]
        druid.server.maxSize=10737418240
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M
          
    routers:
      nodeType: "router"
      druid.port: 8088
      nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/router"
      replicas: 1
      runtime.properties: |
        druid.service=druid/router

        # HTTP proxy
        druid.router.http.numConnections=10
        druid.router.http.readTimeout=PT5M
        druid.router.http.numMaxThreads=10
        druid.server.http.numThreads=10

        # Service discovery
        druid.router.defaultBrokerServiceName=druid/broker
        druid.router.coordinatorServiceName=druid/coordinator

        # Management proxy to coordinator / overlord: required for unified web console.
        druid.router.managementProxy.enabled=true       
      extra.jvm.options: |-
        -Xmx512M
        -Xms512M
---
# Netowrk Policy
I have Network Policy here
---
#Router Ingress
apiVersion: extensions/v1beta1
kind: Ingress
routers:
      nodeType: "router"
      druid.port: 8088
      ingressAnnotations:
          "nginx.ingress.kubernetes.io/rewrite-target": "/"
      ingress:
        tls:
         - hosts:
            - URL
#           secretName: testsecret-tls
        rules:
         - host: URL
           http:
             paths:
             - path: /
               backend:
                 serviceName: service1
                 servicePort: 80