How to skip header while ingesting CSV file into Druid?

Hi all,

Our data is all csv files with header, except for removing header before ingesting into Druid, is there a simple way to ingest them directly, eg. some configuraiton to skip header ?

Thanks very much !

Druid 0.10.1 will have a feature to help with this, and a release candidate is expected soon. Stay tuned to the lists for an announcement!

Have tried many time to find the way to do this, Great to know this, Thanks Gian !

Hello Gian,

Great to get 0.10.1 release and the doc, I find the solution for skipping csv header:

Note that hasHeaderRow and skipHeaderRows are effective only for non-Hadoop batch index tasks. Other types of index tasks will fail with an exception.

It meas that I can only use this configuration in index task, right ? but index task is very slow, is there way to ingest csv with header using index_hadoop/index_spark task ?

There isn’t a way that I know of using index_hadoop, although it would be a neat feature to contribute if you want! Just watch out for things like a file being split across multiple mappers, or multiple files being combined into one mapper.