Druid ingesting wrong messages in the Kafka topic

We are sending two different JSON formatted message types to the same Kafka topic and have two ingestion tasks running on Druid side to two create two separate data sources. We noticed that some of key fields in the data stores were missing so we started to investigate.

Basically, we have A and B type of messages sent to the same Kafka topic. We enabled A and stopped producing B to the topic. Following is a sample A message:

{

"fields": {

"cpu_usage_process/avg_run_time": 0,

"cpu_usage_process/five_minutes": 0,

"cpu_usage_process/five_seconds": 0,

"cpu_usage_process/invocation_count": 1,

"cpu_usage_process/name": “Policy bind Process”,

"cpu_usage_process/one_minute": 0,

"cpu_usage_process/pid": 597,

"cpu_usage_process/total_run_time": 0,

"cpu_usage_process/tty": 0,

"five_minutes": 3,

"five_seconds": 3,

"five_seconds_intr": 0,

"one_minute": 3

},

"name": “Cisco-IOS-XE-process-cpu-oper:cpu-usage/cpu-utilization”,

"tags": {

"host": “mtllab1”,

"path": “Cisco-IOS-XE-process-cpu-oper:cpu-usage/cpu-utilization”,

"source": “10.10.10.10”,

"subscription": "12"

},

"timestamp": 1581780294

}

We terminated the task to ingest A messages on Druid and left B enabled on Druid. We noticed that the messages are still partially being ingested to B data store. Somehow the spec file for B matches messages formatted for A. Please find attached a copy of the spec file. Is there something we can do to make ingestion more strict?

Can someone please take a look and let us know what we are missing here.

Thanks,

telemetry.json (3.48 KB)

Hi Arda,

Druid can filter out unwanted data by using filters in the ingestion spec. Please take a look at the below documentation.

https://druid.apache.org/docs/0.17.0/querying/filters.html

Regards,

Muthu Lalapet.

Thank you. I will take a look at the link as soon as possible.

Arda

The proposed solution worked. Thank you!!

It worked. Thanks!!