What's the best way to extend InputRowParser in druid-api

Hi All

I want to know what’s the best way to extend druid-api. In my case is InputRowParser of druid-api.

I have implemented a AvroInputRowParser by extending InputRowParser. But I can’t use InputRowParser interface without modifying InputRowParser in druid-api to add a new JsonSubTypes of AvroInputRowParser.

Following is what I think I should do.

@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = “type”, defaultImpl = StringInputRowParser.class)
@JsonSubTypes(value = {
@JsonSubTypes.Type(name = “string”, value = StringInputRowParser.class),
@JsonSubTypes.Type(name = “map”, value = MapInputRowParser.class),
@JsonSubTypes.Type(name = “avro”, value = AvroInputRowParser.class) // Modify InputRowParser to add AvroInputRowParser sub class
})
public interface InputRowParser
{

}

``

I want to use interface InputRowParser instead of directly use sub-class
AvroInputRowParser is because I want to have the flexibility of switch
between other subclasses.
Is there any better ways to do this without modifying the druid-api code?

Here is my druid versions:
druid: 0.6.160
druid-api: 0.2.14.1

best
xin

Hi Xin,

Some of these threads may help you with avro:

https://groups.google.com/forum/#!searchin/druid-development/avro/druid-development/5bow2fZxZ4g/gx8yO9dRacUJ

https://groups.google.com/forum/#!searchin/druid-development/avro/druid-development/MTvKPoJfxKU/L3XQsvi9HcIJ

Please let me know if these threads don’t answer your questions!

Thanks,

FJ

Hi Xin,

If you just want to create
an extension with your implementation of InputRowParser, then please see following code (instead of InpurRowParser, it implements PasswordProvider but the mechanism would be pretty much the same)
https://github.com/himanshug/druid-pwd-provider-extn-sample

for jackson configuration you would have to write following in your DruidModule.getJacksonModule(…) implementation…

@Override
public List<? extends Module> getJacksonModules()
{
return ImmutableList.of(
new SimpleModule(“MyModule”)
.registerSubtypes(
new NamedType(AvroInputRowParser.class, “avro”)
)
);
}

as in https://github.com/himanshug/druid-pwd-provider-extn-sample/blob/master/src/main/java/io/druid/samplepwdext/SecurePwdProviderModule.java

Now
add your extension jar to the classpath of druid process start and this
should be good to go. This should work for you in the realtime ingestion case.

– Himanshu

PS:
for hadoop batch ingestion, you would rather have to write a Hadoop InputFormat instead(or in addition to) of InputRowParser . use the links
that FJ provided.

Thanks Himanshu,

I will try to use DruidModule.getJacksonModule(), this way is much elegant than modifying druid-api code.

Is there any wiki/doc describes how druid uses Guice and Jackson? I am new to Guice and Jackson. I have spent one day to read druid code, but didn’t got fully understand how druid code is organized. Maybe I should spend more time on the code.

Druid pulls from Guice injections when the @JacksonInject annotation is used. In such a case it behaves as if @Inject were used from Guice.

As a side-note, adding a
@JsonTypeName(“avro”)

``

annotation to your class, and simply calling registerSubtypes on the class rather than a named key of the class is how other places in Druid tend to do json type naming when the Interface cannot have the name mapping.

Thanks Fangjin,

I am using KafkaFirehose to realtime ingest avro from Kafka. I thought extend InputRowParser should be the easiest way.

I have read these two threads. Are both of these threads talking about batch avro ingest? Is there any sharing component between realtime and batch ingesting?

Hi Xin,

If you just want realtime ingestion with kafka firehose and want your own input parser. All that you need to do is to write your impl of [Avro]InputRowParser and configure it in the “parseSpec” of realtime spec file.

– Himanshu