We are receiving some JSON related errors occasionally when batch uploading certain characters.
We are using Go to serialize the JSON used by druid batch ingestion, which seems to leave certain unicode characters unescaped (correctly as per the JSON spec, anything between \u0000 and \u001F is quoted). Here’s a Go playground link where the correct unicode points are encoded.
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
For example the following JSON will trigger an error:
The – is the en dash which maps to the U+2013 code point.
We’re using druid 0.9.2 and we do the batch upload using the s3 extensions. Tranquility (version 0.8.2) does not exhibit the same behaviour fortunately as our real time events are processed correctly.
There is a similar issue where another user requested configuring the flag to allow unquoted control characters, but I’m not sure this is the right way to go.
Your help would be greatly appreciated!