Re: [druid-user] Improve Ingestion Speed Kafka + Druid

There are a few concerns I have with the ingestionSpec you shared:

  1. You should only define one of the properties below.
    "maxRowsInMemory": 1000000,
      "maxBytesInMemory": 524288000

2) You are running Kafka in localhost. I don't know the machine configuration and if everything is running in your local laptop then there's a possibility of resource contention.

3) You have dimension Exclusion. These dimensions will not be stored. They will be excluded from ingestion.

To add one idea, if you think the sketches are inaccurate, you could try increasing lgK. You have it at 12, the default, and it can go up to 21. If you have the time and energy, you could even try setting it to 21 and see whether that makes a difference. On the other hand, this could increase processing time, and storage space, but it might shed some more light on the issue.

One comment you had makes it sound like it’s not performance lag - “Furthermore, after other checks, we noticed that shortly after the end of the day, the counts for that day, including the sum of the “count” column, stops increasing, which makes us think that the events are processed quite quickly anyway, but we are unable to understand what could be the cause of the difference.” So maybe it’s the accuracy of the sketches. If it were performance, I’d expect the counts to get closer after the end of the day, as things got processed.

Best regards,

Ben Krug

It may be worth checking the ingest/events/unparseable metric to see if rows are not being ingested, as well as checking if logParseExceptions is turned on so that you get an error in log files. There was a little conversation about this a couple of weeks ago to note: