Custom parser

Hey,

I’ve written a custom DruidModule that I want to use for realtime, based on the code of the avro one (https://github.com/druid-io/druid/pull/1858).

Basically, I’ve got data from Kafka (using kafka-0.8 firehose), want to parse the message myself, and pass it down to druid.

“dataSchema” : {
“dataSource” : “custom-ds”,
“parser” : {
“type” : “custom_stream”,
“parseSpec” : {
“format” : “json”,

``

But i’m struggling to make it load within druid.

I’m not sure what is needed to be taken into account.

I guess it’s possible to run a custom extension not part of druid ?

I added it to druid.extensions.coordinates (still 8.3, i’ll migrate after make it successfully run if it’s possible), to be taken into account (without, druid fallbacks to “string” parser).

I’ve copied my bundled package into druid/extensions-repo/…

But I got a bunch of errors from aether because it tries to get it from outside :

I’m not that fluent with maven packages, I don’t know if i’m missing something obvious or if something has to be made druid-side.

Do you have some tips or an example ?

Thanks,

Stéphane

Hi Stephane,

i am not sure what is wrong but in order to have an extra module in druid all you need to do is to implement the druid module and have the right meta-inf file. If you give me a pointer to the code i can take a look.

then once the module is valid, all you need it to put it on the classpath with its dependencies of course.

please let me know if you need more help

Hi Slim,

Sorry but I can't really show the code.
To resume, I have a project with some dependencies (druid-api,
scala-library, other custom scala deps)

- A module implementing DruidModule
- A file "io.druid.initialization.DruidModule" in
"resources/META-INF/services" with the full path of the module custom
DruidModule.
- A pom.xml to build a fat-jar with maven-assembly-plugin.
- A spec using "custom_stream" as parser. (defined in getJacksonModules)
- The "ns:class:version" of the package to
druid.extensions.coordinates (and the localRepository properly set)

Then I copied the fat-jar to extensions-repo (with its
maven-metadata.xml) (it's on a remote server).

But every time I start the realtime node, it fails with the errors below.

Am I doing something wrong ?

016-02-15T16:34:32,911 DEBUG [main]
org.eclipse.aether.internal.impl.DefaultUpdateCheckManager - Skipped
remote update check for
com.company:custom:1.0.0-SNAPSHOT/maven-metadata.xml, locally
installed metadata up-to-date.
Resolved metadata com.company:custom:1.0.0-SNAPSHOT/maven-metadata.xml
from (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local,
releases+snapshots)
Resolving artifact com.company:custom:pom:1.0-SNAPSHOT
Resolving metadata com.company:custom:1.0-SNAPSHOT/maven-metadata.xml
from /root/druid/extensions-repo (simple)
Resolved metadata com.company:custom:1.0-SNAPSHOT/maven-metadata.xml
from /root/druid/extensions-repo (simple)
Resolving metadata com.company:custom:1.0-SNAPSHOT/maven-metadata.xml
from (https://repo1.maven.org/maven2/, releases+snapshots)
Resolving metadata com.company:custom:1.0-SNAPSHOT/maven-metadata.xml
from (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local,
releases+snapshots)
2016-02-15T16:34:32,932 DEBUG [pool-1-thread-2]
org.eclipse.aether.internal.impl.DefaultRepositoryConnectorProvider -
Using connector AetherRepositoryConnector with priority 3.4028235E38
for https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local
2016-02-15T16:34:32,935 DEBUG [pool-1-thread-1]
org.eclipse.aether.internal.impl.DefaultRepositoryConnectorProvider -
Using connector AetherRepositoryConnector with priority 3.4028235E38
for https://repo1.maven.org/maven2/
Downloading: https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local/com/company/custom/1.0-SNAPSHOT/maven-metadata.xml
Downloading: https://repo1.maven.org/maven2/com/company/custom/1.0-SNAPSHOT/maven-metadata.xml
io.tesla.aether.connector.ResourceDoesNotExistException: Unable to
locate resource
https://repo1.maven.org/maven2/com/company/custom/1.0-SNAPSHOT/maven-metadata.xml.
Error code 404

It finds it locally installed but tries to access it over
metamx.artifactoryonline.com anyway and 404, I dunno why.

I'll try to create a clean repo to share it with you, with the
simplest code to see if i can make it run (without deps).

Hi again,

I finally found my issue. My fat-jar was not built correctly, the
META-INF was not exposed at its root level thus druid never caught it.
Moreover, I finally use maven-shade-plugin instead of the assembly
one.
It's now working properly.

With the latest release of druid, I'm wondering if I'm following a
good path with a custom parser (I've tried to be lean).
Should I consider making a dedicated app using tranquility instead ?
(you guys seem to strongly recommend it in the doc).
Is that useful considering I will keep 1 topic = 1 druid datasource ?

Thanks,

Stéphane