Druid Segment Cache Issue

Hi,

I’ve setup a druid cluster and started some ingestion jobs using s3.

I was able to import 1 days raw data (35 GB) but now when I try to load more data druid is throwing out of memory exception.

I’ve noticed the following things:

  1. The raw data which was indexed and stored in deep storage is around 2.2GB now but the segment-cache is 5.1GB. How is this possible?
  2. Druid is only using my 8GB of EBS volume and not using the 3.2TB of SSD that I have provisioned for the data server.

Am I missing some configuration?

Thanks,

Darshan

Hi Darhan,

To answer your first question, Segments in deep storage will be compressed ( for-ex index.zip). You can see this under /segments//…/…/index.zip. For the same segment in historical node, you see it’s not compressed. We can goto segment-cache/…/…

Also, there is a replication factor for retention rules which can be defined for the data-source, which will also create more disk space in the historical nodes.

Thanks,
Hemanth