Illegal unquoted character ((CTRL-CHAR, code 19)): has to be escaped using backslash to be included

Hello!

We are receiving some JSON related errors occasionally when batch uploading certain characters.

We are using Go to serialize the JSON used by druid batch ingestion, which seems to leave certain unicode characters unescaped (correctly as per the JSON spec, anything between \u0000 and \u001F is quoted). Here’s a Go playground link where the correct unicode points are encoded.

The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

For example the following JSON will trigger an error:

{“a”:"–"}

``

The – is the en dash which maps to the U+2013 code point.

We’re using druid 0.9.2 and we do the batch upload using the s3 extensions. Tranquility (version 0.8.2) does not exhibit the same behaviour fortunately as our real time events are processed correctly.

There is a similar issue where another user requested configuring the flag to allow unquoted control characters, but I’m not sure this is the right way to go.

Your help would be greatly appreciated!

Best,

Alex

For reference following is the exception we receive when ingesting

Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 19)): has to be escaped using backslash to be included in string value

at [Source: { … “Abcdef Vols � bas prix” … }; line: 1, column: 1565]

``

The event is truncated, I can supply it privately if necessary. The column mentioned by the exception is the – character between “Abcdef” and “Vols” which is not printed in the logs.

Best,

Alex