NOOB Question: Best(Easiest?) way to ingest a pandas dataframe into Druid

I’ve been programming in python for a while and am fairly confident in it… I am, however, completely new to Druid… I’m still making my way through the documentation… What’s the best way to ingest data into a datasource… I’ve tried SQLAlchemy… There were errors thrown up all over the place… Maybe I’m using it wrong…

Paritosh,

You could either use Batch (files / Hadoop / s3) and Streaming insgestion (Kafka / Kinesis).

Please look up ingestion tutorials in druid documentation ->
http://druid.io/docs/latest/design/index.html

Also look up extensions.

Thanks & Rgds

Venkat

So, I’ve been through all of that…
And, I’m not sure you understood what I’m trying to drive at or maybe your post is partially complete…

I think I understand the ingest methods suggested here…

But, my question remains almost completely unanswered…

If you feel that wrong, could you please elaborate on your answer…

Also, quick side note, I’d love to hear from people using pyDruid…

Hi Paritosh,

I do not know Druid well, so certainly question whatever I write.

I do not think there is a direct pandas to druid import feature (or library like BigQuery has that plays well with pandas).

It seems, for batch work you’d use pandas to write a file to disk and then use traditional import methods.

Cheers,

Austin

Hey Austin,

Honestly, from what I’ve seen as well, that seems to be the case…

I guess we’ll have to wait for someone who is an authority on this subject to correct us…

Either way, I guess this is a place I can contribute to this project but, I still feel I need a MUCH better understanding of Druid…

Cheers,

Paritosh J C