file index task failed: java.util.NoSuchElementException: No more lines

Hi all,

Currently, I am using the file index task, this is a sample:

{

“type” : “index”,

“id” : “index_rb_flow_2015-07-07T08:47:54.594Z”,

“spec” : {

“dataSchema” : {

“dataSource” : “rb_flow”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“timestampSpec” : {

“column” : “timestamp”,

“format” : “ruby”

},

“dimensionsSpec” : {

“dimensions” : [ “application_id_name”, “biflow_direction”, “building”, “building_uuid”, “campus”, “campus_uuid”, “client_id”, “client_latlong”, “client_mac”, “client_mac_vendor”, “client_rssi”, “client_rssi_num”, “client_snr”, “client_snr_num”, “conversation”, “coordinates_map”, “darklist_category”, “darklist_direction”, “darklist_protocol”, “darklist_score”, “darklist_score_name”, “deployment”, “deployment_uuid”, “direction”, “dot11_protocol”, “dot11_status”, “dst”, “dst_as_name”, “dst_country_code”, “dst_map”, “dst_net_name”, “dst_port”, “dst_vlan”, “duration”, “engine_id_name”, “floor”, “floor_uuid”, “hnbgeolocation”, “hnblocation”, “http_host”, “http_referer_l1”, “http_social_media”, “http_social_user”, “http_user_agent_os”, “https_common_name”, “input_snmp”, “input_vrf”, “ip_protocol_version”, “l4_proto”, “market”, “market_uuid”, “namespace”, “namespace_uuid”, “organization”, “organization_uuid”, “output_snmp”, “output_vrf”, “rat”, “scatterplot”, “sensor_name”, “sensor_uuid”, “service_provider”, “service_provider_uuid”, “src”, “src_as_name”, “src_country_code”, “src_map”, “src_net_name”, “src_port”, “src_vlan”, “srv_port”, “tos”, “type”, “wireless_id”, “wireless_station”, “zone”, “zone_uuid” ],

“dimensionExclusions” : [ “timestamp”, “bytes”, “flow_end_reason”, “src_as”, “first_switched”, “dst_as”, “pkts”, “http_url” ],

“spatialDimensions” :

}

}

},

“metricsSpec” : [ {

“type” : “count”,

“name” : “events”

}, {

“type” : “longSum”,

“name” : “sum_bytes”,

“fieldName” : “bytes”

}, {

“type” : “longSum”,

“name” : “sum_pkts”,

“fieldName” : “pkts”

}, {

“type” : “hyperUnique”,

“name” : “clients”,

“fieldName” : “client_mac”

}, {

“type” : “hyperUnique”,

“name” : “wireless_stations”,

“fieldName” : “wireless_station”

}, {

“type” : “longSum”,

“name” : “sum_rssi”,

“fieldName” : “client_rssi_num”

} ],

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “HOUR”,

“queryGranularity” : {

“type” : “duration”,

“duration” : 60000,

“origin” : “1970-01-01T00:00:00.000Z”

},

“intervals” : [ “2015-07-06T00:00:00.000Z/2015-07-06T06:00:00.000Z” ]

}

},

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir” : “/tmp/.data_migration”,

“filter” : “*.json”,

“parser” : null

}

},

“tuningConfig” : {

“type” : “index”,

“targetPartitionSize” : -1,

“rowFlushBoundary” : 500000,

“numShards” : 1

}

},

“groupId” : “index_rb_flow_2015-07-07T08:47:54.594Z”,

“dataSource” : “rb_flow”,

“interval” : “2015-07-06T00:00:00.000Z/2015-07-06T06:00:00.000Z”,

“resource” : {

“availabilityGroup” : “index_rb_flow_2015-07-07T08:47:54.594Z”,

“requiredCapacity” : 1

}

}

``

And I could see that when I use a raw file without events inside the interval, the index task throw this exception:

java.util.NoSuchElementException: No more lines

at org.apache.commons.io.LineIterator.nextLine(LineIterator.java:140) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at org.apache.commons.io.LineIterator.next(LineIterator.java:129) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:54) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexing.common.task.IndexTask.getDataIntervals(IndexTask.java:209) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:165) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) [?:1.7.0_03]

at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.7.0_03]

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.7.0_03]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.7.0_03]

at java.lang.Thread.run(Unknown Source) [?:1.7.0_03]

``

Is this normal? Can someone help me with this?

Regards,

Andres

Hi Andres, this looks like a bug. I think it should be okay to have a raw file with no events in the interval. It looks like the bug is actually triggered by a totally empty file. Do you have any of those? Can you remove them and see if things work after that?

Also, I think this should fix things with empty files: https://github.com/druid-io/druid-api/pull/48/files

Thanks Gian. I will test it on the next release.

Regards,

Andres