Index Service for local CSV file

I am trying to load data via CSV from my local filesystem, so that I can test different data schema formats. I am having an issue configuring Druid to accept this as a data source. I errors that make me think there is something wrong with my spec file, which is here:

{

"type" : "index",

"spec" : {

"dataSchema" : {

"dataSource" : "authenticated",

"parser" : {

"type" : "string",

"parseSpec" : {

"format" : "csv",

"timestampSpec" : {

"column" : "minute",

"format" : "auto"

},

"columns":["minute","cli_id","domain_est","xurl_domain"],

"dimensionsSpec" : {

"dimensions": ["cli_id","domain_est","xurl_domain"],

"dimensionExclusions" : [],

"spatialDimensions" : []

}

}

},

"metricsSpec" : [

{

"type" : "count",

"name" : "count"

},

{

"type" : "doubleSum",

"name" : "measured",

"fieldName" : "measured"

},

{

"type" : "doubleSum",

"name" : "matched",

"fieldName" : "matched"

}

],

"granularitySpec" : {

"type" : "uniform",

"segmentGranularity" : "DAY",

"queryGranularity" : "NONE",

"intervals" : [ "2013-08-31/2015-09-01" ]

}

},

"ioConfig" : {

"type" : "index",

"firehose" : {

"type" : "local",

"baseDir" : "examples/authenticated/",

"filter" : "ads_data.csv"

}

},

"tuningConfig" : {

"type" : "index",

"targetPartitionSize" : 0,

"rowFlushBoundary" : 0

}

}

}

The errors that I see:

Can not deserialize instance of java.util.ArrayList out of START_OBJECT token

When I encapsulate the JSON object in an array, then I get the following error:

1) Error injecting constructor, java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: Instantiation of [simple type, class io.druid.segment.realtime.FireDepartment] value failed: dataSchema (through reference chain: java.util.ArrayList[0])

  at io.druid.guice.FireDepartmentsProvider.<init>(FireDepartmentsProvider.java:41)

  while locating io.druid.guice.FireDepartmentsProvider

  at io.druid.guice.RealtimeModule.configure(RealtimeModule.java:79)

  while locating java.util.List<io.druid.segment.realtime.FireDepartment>

    for parameter 0 at io.druid.segment.realtime.RealtimeManager.<init>(RealtimeManager.java:78)

  while locating io.druid.segment.realtime.RealtimeManager

  at io.druid.guice.RealtimeModule.configure(RealtimeModule.java:83)

  while locating io.druid.query.QuerySegmentWalker

    for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:89)

  while locating io.druid.server.QueryResource

Any advice on how to get a working spec file is appreciated.

What is the full stacktrace (in overlord log) you get in the 1st case? You don’t need to put your top level json object into an array.

– Himanshu

OK, I see now. You are trying to setup a standalone realtime node which has different json format, see http://druid.io/docs/0.7.1.1/Realtime-ingestion.html#realtime-node-ingestion and correct it.

the json you’ve used is the format of realtime task which can be submitted to overlord.

– Himanshu

Thanks. This helped me get the server started. I am running into another issue which I will ask in a new thread.

I am trying to do the same thing, loading data using local csv file and i have stumbled upon the same error. please let me know how did u make things work, i would highly appreciate that.

Error Log :

  1. Error injecting constructor, java.lang.RuntimeException: com.fasterxml.jackson.databind.JsonMappingException: Instantiation of [simple type, class io.druid.segment.realtime.FireDepartment] value failed: dataSchema (through reference chain: java.util.ArrayList[0])

at io.druid.guice.FireDepartmentsProvider.(FireDepartmentsProvider.java:41)

while locating io.druid.guice.FireDepartmentsProvider

at io.druid.guice.RealtimeModule.configure(RealtimeModule.java:79)

while locating java.util.List<io.druid.segment.realtime.FireDepartment>

for parameter 0 at io.druid.segment.realtime.RealtimeManager.<init>(RealtimeManager.java:85)

while locating io.druid.segment.realtime.RealtimeManager

at io.druid.guice.RealtimeModule.configure(RealtimeModule.java:83)

while locating io.druid.query.QuerySegmentWalker

for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:89)

while locating io.druid.server.QueryResource

1 error

at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1014) ~[guice-4.0-beta.jar:?]

at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1036) ~[guice-4.0-beta.jar:?]

at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:134) ~[druid-api-0.3.8.jar:0.7.3]

at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71) [druid-services-0.7.3.jar:0.7.3]

at io.druid.cli.ServerRunnable.run(ServerRunnable.java:38) [druid-services-0.7.3.jar:0.7.3]

at io.druid.cli.Main.main(Main.java:88) [druid-services-0.7.3.jar:0.7.3]

Hi Manvendra,

As pointed out by Himanshu earlier in the thread, you are trying to setup realtime node with format for IndexTask,

you can run either run indexing service and submit index task to it or setup a realtime node with proper format.