Parsing NaN in metric columns

Hi,

I’m ingesting a local tsv file with the indexing service, and using the TSV parser spec.

Many of the metric columns with doubleSum aggregation may have a string NaN that druid fails to parse:

2015-08-28T11:02:25,045 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_994_2015_08_12_12_standard_feed_2015-08-28T11:02:09.516Z, type=index, dataSource=994_2015_08_12_12_standard_feed}]
com.metamx.common.parsers.ParseException: Unable to parse metrics[served_tasks_time], value[NaN]
	at io.druid.data.input.MapBasedRow.getFloatMetric(MapBasedRow.java:112) ~[druid-api-0.3.8.jar:0.3.8]
	at io.druid.segment.incremental.IncrementalIndex$1$3.get(IncrementalIndex.java:113) ~[druid-processing-0.8.0.jar:0.8.0]
	at io.druid.query.aggregation.DoubleSumAggregator.aggregate(DoubleSumAggregator.java:60) ~[druid-processing-0.8.0.jar:0.8.0]
	at io.druid.segment.incremental.OnheapIncrementalIndex.addToFacts(OnheapIncrementalIndex.java:169) ~[druid-processing-0.8.0.jar:0.8.0]
	at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:452) ~[druid-processing-0.8.0.jar:0.8.0]
	at io.druid.segment.realtime.plumber.Sink.add(Sink.java:125) ~[druid-server-0.8.0.jar:0.8.0]
	at io.druid.indexing.common.index.YeOldePlumberSchool$1.add(YeOldePlumberSchool.java:115) ~[druid-indexing-service-0.8.0.jar:0.8.0]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:374) ~[druid-indexing-service-0.8.0.jar:0.8.0]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:205) ~[druid-indexing-service-0.8.0.jar:0.8.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.8.0.jar:0.8.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.8.0.jar:0.8.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_51]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_51]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_51]
Caused by: java.lang.NumberFormatException: For input string: "NULL"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) ~[?:1.8.0_51]
	at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122) ~[?:1.8.0_51]
	at java.lang.Float.parseFloat(Float.java:451) ~[?:1.8.0_51]
	at java.lang.Float.valueOf(Float.java:416) ~[?:1.8.0_51]
	at io.druid.data.input.MapBasedRow.getFloatMetric(MapBasedRow.java:109) ~[druid-api-0.3.8.jar:0.3.8]
	... 14 more

Is there a way to tell druid what is default value for a given column in case of failure (e.g. replace NaN with 0.0) or ignore the row??

Hi,
IndexTask doesn’t have any config to replace NaN with 0.0.

I think the best way to handle this is to clean it in ETL layer.

Why not returning Float.NaN (which is a float) when a NumberFormatException is thrown by getFloatMetric()?

Hi Bachr, we’d love a PR with this change and some unit tests.

– FJ