Error while loading protobuf data to imply

Hi Everyone,

Greetings of the day!!!

I am working on a POC to load protobuf data into druid and analyse into imply.

For this I have installed imply-2.6.0 on mac. While loading the data it failed.

For more details I have attached log and config and protobuf related files

  1. address-index.json

  2. address.desc

  3. address.gpb

  4. address.proto

  5. druid_index*.log

6 common.runtime.properties

Would be great if i can get some pointers to fix this issue.

NOTE: address.desc I generated on linux(ubuntu), is this required to be genereted on mac?

Thanks,

Abhishek

Command to load data:

imply.zip (15.1 KB)

Error in log:

2018-07-02T09:37:46,304 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_addressbook_2018-07-02T09:37:41.107Z, type=index, dataSource=addressbook}]

java.lang.ClassCastException: io.druid.segment.transform.TransformingInputRowParser cannot be cast to io.druid.data.input.impl.StringInputRowParser

at io.druid.data.input.impl.AbstractTextFilesFirehoseFactory.connect(AbstractTextFilesFirehoseFactory.java:46) ~[druid-api-0.12.1-iap3.jar:0.12.1-iap3]

at io.druid.indexing.common.task.IndexTask.collectIntervalsAndShardSpecs(IndexTask.java:467) ~[druid-indexing-service-0.12.1-iap3.jar:0.12.1-iap3]

at io.druid.indexing.common.task.IndexTask.createShardSpecsFromInput(IndexTask.java:401) ~[druid-indexing-service-0.12.1-iap3.jar:0.12.1-iap3]

at io.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:339) ~[druid-indexing-service-0.12.1-iap3.jar:0.12.1-iap3]

at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:237) ~[druid-indexing-service-0.12.1-iap3.jar:0.12.1-iap3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:450) [druid-indexing-service-0.12.1-iap3.jar:0.12.1-iap3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:422) [druid-indexing-service-0.12.1-iap3.jar:0.12.1-iap3]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2018-07-02T09:37:46,313 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_addressbook_2018-07-02T09:37:41.107Z] status changed to [FAILED].

2018-07-02T09:37:46,316 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_addressbook_2018-07-02T09:37:41.107Z”,

“status” : “FAILED”,

“duration” : 367

}

Hi Abhishek,

Currently, the batch “index” task doesn’t support protobuf. It only supports line-oriented text files (json/csv/tsv).

You can read protobuf from streams however. Check out our Kafka tutorial for an example of reading from Kafka: https://docs.imply.io/on-premise/tutorial/kafka-indexing-service

If you’re interested in contributing code to Druid, check out https://github.com/druid-io/druid/issues/5584 for a related issue to extend the batch tasks to support non-text formats.