Druid real time index not working

I am using kafka to send messages to druid for a topic and I checked that this works fine because I can consume the messages from the consumer console but when I try it with Druid I have the following error

ERROR [MonitorScheduler-0] io.druid.segment.realtime.RealtimeMetricsMonitor - [2] Unparseable events! Turn on debug logging to see exception stack trace

If you could help me, I turned on debug mode and I don’t see anything else in my log
The files I am sending via streaming are csv, in some forum law that only real-time accepts json, is that true?

regars

Hi ,

You can refer below doc: https://druid.apache.org/docs/latest/ingestion/reports.html

which could help you in troubleshooting unparseable events, this doc page has descriptions to APIs that may be useful:

Are you using data loader UI?

Thanks,

Vaibhav

The Kafka ingest can handle a variety of types: JSON, CSV, TSV, Avro, etc i.e the ones that Druid has a builtin parser for or that you’ve written a Druid extension for).

In my opinion, there should not be an issue with CSV.

Thanks,

Vaibhav

Hi Fernando ,

FYI: I just tested Druid Kafka-indexing with CSV data and I can ingest CSV data file without any issue [ Druid 0.16] Ingestion specs will look something like as below.

{
“type”: “kafka”,
“ioConfig”: {
“type”: “kafka”,
“consumerProperties”: {
“bootstrap.servers”: “<KAFKA_HOST>:9092”
},
“topic”: “test”
},
“tuningConfig”: {
“type”: “kafka”
},
“dataSchema”: {
“dataSource”: “test”,
“granularitySpec”: {
“type”: “uniform”,
“queryGranularity”: “HOUR”,
“segmentGranularity”: “HOUR”,
“rollup”: true
},
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “csv”,
“timestampSpec”: {
“column”: “me”,
“format”: “millis”
},
“hasHeaderRow”: true,
“dimensionsSpec”: {
“dimensions”: [
“event_name”,
“group_name”,
“response”,
“venue_name”
]
}
}
},
“metricsSpec”: [
{
“name”: “count”,
“type”: “count”
}
]
}
}

I will suggest you to use DataLoader UI which may give you more details in case you have bad data [ I mean unparsable ] in your CSV file .

Eg :

Thanks,

Vaibhav

Hi Vaibhav
Thank you for your response, I am using an older version of druid, version 0.10.1, it does not have the interface for loading files. In the same way I could realize that to send the kafka topic from spark the way I found to do it is to convert it to json, the confusion was because before doing the ETL with spark the file was in csv.
Here part of my code when I send the topic after the ETL

result_cdr = nuevojoinDF.selectExpr(“to_json(struct(*)) AS value”)
result_cdr.writeStream.format(“kafka”).option(“kafka.bootstrap.servers”, “nodo1.hadoop:6667”).option(“topic”, “druid”).option(“checkpointLocation”, “/tmp/checkpoint1”).start().awaitTermination()

``

Therefore in my druid spec I have to specify that it is json instead of csv. This way I index the data in real time without problems.

Any other problem I let you know
Thanks for everything

Cool ! Glad to hear that :+1:!