Is it possible to read '.gz' files when using local firehose?


I’m trying to ingest data from a local gzipped file with the following ioConfig but the task is failed.

“ioConfig” : {

      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "/web/druid/events/2019/07/12",
        "filter" : "my-events-8440-1563181693-0.json.gz",

I can see such errors in the task log Unable to parse row [���������io$�}���y[D߇�;R,ł�XA�`���}���{��u4��ڝ�T�T����8�%���;������c�4]�7�{��>�g��X<���/�f>�X�9Cy��>���8y{�R�9��q�_�����g��7�����������b������t~X6��>~�?���~�����|x�o~�Ů���+�ҷg����b���j˾q��g���=�e���_l���wO��q��O=K�o��Д��v�8{��]?!+/.‡�2#}��>����e�2�������C\�ӧ>��9�?���O�]�W}�~��;�����V$��F2���%�[IZ����Deۭ�����:Y��vןߓ�>��;��ۗ?

How can I read gz file when using local firehose?



That’s [retty much it actually. If you look at the wikipedia exampe in your quickstart/tutorial/wikipedia-index.json, you will see… Can you run wiki example and see if you can read it fine? If you can, I’d check if your file is actually a gzip (run gunzip and cat the file my-events-8440-1563181693-0.json to confirm)

Yes in general you can use gz files check as Karthik said, check that your files are not corrupted.
Also if you are using Druid 0.15.0 try the data loader (“Load data” on the console) (see screenshot)

What does the “Connect” screen show for you?

Thanks for your answers.
I checked the file again and it is not currupted. when running gunzip I can load the json file.

It fails only with the gz file with the error I mentioned before.

Im using an older version 0.10.0

Now that is a version I have not heard of in a long time!
Could you upgrade to 0.15.0? You would get a sweet data loader for your troubles

It is easy to do: