Unable to load data in druid

Hi,
I am trying to load a very simple data in json format to druid.

This is my index file :

{

“type” : “index”,

“spec” : {

“dataSchema” : {

“dataSource” : “datatemplate”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“dimensionsSpec” : {

“dimensions” : [

“Loc”

]

},

“timestampSpec” : {

“format” : “auto”,

“column” : “Timestamp”

}

}

},

“metricsSpec” : [{“name” : “Qty”,“type” : “doubleSum”,“fieldName” : “Qty”}],

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”,

“intervals” : [“2016-01-01T00:00:00Z/2030-06-30T00:00:00Z”],

“rollup” : true

}

},

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir” : “datatemplate/”,

“filter” : “datatemplate.json”

},

“appendToExisting” : false

},

“tuningConfig” : {

“type” : “index”,

“targetPartitionSize” : 10000000,

“maxRowsInMemory” : 40000,

“forceExtendableShardSpecs” : true

}

}

}

And This is my data file :

{“Loc”: “A”, “Qty”: “1”, “Timestamp”: “2017-12-01T00:00:00Z”}

{“Loc”: “A”, “Qty”: “1”, “Timestamp”: “2017-12-01T00:00:00Z”}

{“Loc”: “B”, “Qty”: “2”, “Timestamp”: “2017-12-01T00:00:00Z”}

{“Loc”: “B”, “Qty”: “1”, “Timestamp”: “2017-12-01T00:00:00Z”}

The error message i am getting is not very clear:

2018-07-17T07:39:17,875 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2018-07-17T07:39:17,875 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.start()] on object[io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@7bf01cb].
2018-07-17T07:39:17,876 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.java.util.emitter.service.ServiceEmitter.start()] on object[ServiceEmitter{serviceDimensions={service=druid/peon, host=172.18.0.2:8100, version=0.12.0}, emitter=io.druid.java.util.emitter.core.NoopEmitter@64355120}].
2018-07-17T07:39:17,885 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2018-07-17T07:39:17,888 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.java.util.http.client.NettyHttpClient.start()] on object[io.druid.java.util.http.client.NettyHttpClient@42505474].
2018-07-17T07:39:17,889 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.start()] on object[io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@7b477141].
2018-07-17T07:39:17,889 INFO [main] io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - starting
2018-07-17T07:39:17,889 INFO [main] io.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - started
2018-07-17T07:39:17,889 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.curator.discovery.ServerDiscoverySelector.start() throws java.lang.Exception] on object[io.druid.curator.discovery.ServerDiscoverySelector@33a55bd8].
2018-07-17T07:39:17,907 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2181, initiating session



INFO: Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope “Singleton”
Jul 17, 2018 7:39:18 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope “Singleton”
2018-07-17T07:39:18,898 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down…
2018-07-17T07:39:18,902 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_datatemplate_2018-07-17T07:39:10.896Z, type=index, dataSource=datatemplate}]
java.lang.IllegalArgumentException: Parameter ‘directory’ is not a directory: /var/lib/druid/datatemplate
at org.apache.commons.io.FileUtils.validateListFilesParameters(FileUtils.java:536) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at org.apache.commons.io.FileUtils.listFiles(FileUtils.java:512) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.segment.realtime.firehose.LocalFirehoseFactory.initObjects(LocalFirehoseFactory.java:82) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:57) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:46) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:655) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:264) ~[druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-services-0.12.0-selfcontained.jar:0.12.0]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-services-0.12.0-selfcontained.jar:0.12.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
2018-07-17T07:39:18,908 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_datatemplate_2018-07-17T07:39:10.896Z] status changed to [FAILED].
2018-07-17T07:39:18,919 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_datatemplate_2018-07-17T07:39:10.896Z”,
“status” : “FAILED”,
“duration” : 480
}
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.server.http.SegmentListerResource to GuiceManagedComponentProvider with the scope “PerRequest”
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.server.QueryResource to GuiceInstantiatedComponentProvider
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.segment.realtime.firehose.ChatHandlerResource to GuiceInstantiatedComponentProvider
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.server.http.security.ConfigResourceFilter to GuiceInstantiatedComponentProvider
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider
Jul 17, 2018 7:39:19 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope “Undefined”
Jul 17, 2018 7:39:19 AM com.sun.jersey.spi.inject.Errors processErrorMessages
WARNING: The following warnings have been detected with resource and/or provider classes:
WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.
2018-07-17T07:39:19,299 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@4f1fb828{/,null,AVAILABLE}
2018-07-17T07:39:19,310 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@7197b07f{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}
2018-07-17T07:39:19,311 INFO [main] org.eclipse.jetty.server.Server - Started @8139ms
2018-07-17T07:39:19,311 INFO [main] io.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void io.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[io.druid.query.lookup.LookupResourceListenerAnnouncer@7cff3f1d].
2018-07-17T07:39:19,335 INFO [main] io.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid/listeners/lookups/__default/http:172.18.0.2:8100]

Can someone please suggest any solution.

Hi

Did you check your index(ingestion spec) file and data file location(file path)?

Hi Abhishek,

Did you find the issue behind this?

I am facing the same problem as well.

Regards,

Chethan G Puttaswamy

Hey Chethan,

Can you post the error log you are getting?

Hi Ankit,
I’m getting same issue

2018-11-13T14:51:33,083 WARN [main] com.sun.jersey.spi.inject.Errors - The following warnings have been detected with resource and/or provider classes:
  WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.


Any idea to fix this issue would be highly appreciated !!!

Thanks,
Kiran

Hi Kiran,

It looks like segments cannot be loaded from the deep storage. Can you check if they are accessible ?

Hi Naveen,
It looks like, in index job when I give csv data from one specific local directory its loaded successfully.

When I copy same data to another directory It gives error.

WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type

It’s quite strange, I checked both directory permissions, group, owner and everything is same.

Any idea why this is happening ?

Thanks,

Kiran

On the top of my head,if druid is not able to find data to ingest locally, it wouldn’t throw an error. You would not have any segments created.

That error specifically looks like its trying to load segments and its not able to find it.

Thanks

Hi all,

java.lang.IllegalArgumentException: Parameter ‘directory’ is not a directory: /var/lib/druid/datatemplate

this may mean there’s no such directory. Probably you need to specify the absolute path rather than relative one in your ioConfig.

WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type

@Kiran, the above warning message can be safely ignored. Probably there’re real logs about what’s happening in the task logs. Would you please check again?

Jihoon

Hi Jihoon,
Thanks for your feedback, I’m checking logs from coordinator console UI

http://ip:8090/console.html by clicking on specific task on “log(all)” and “log(last 8kb)”

I have 15 nodes druid cluster with

3 nodes for - coordinator, overlord, zookeeper

6 nodes for - historical

3 nodes for - middle manager

3 nodes for - broker

Could you please give me some details, where/location I could check “Probably there’re real logs about what’s happening in the task logs”

Highly appreciated your help !!!

Thanks,

Kiran