Hey guys, i want to use data from druid in machine learning.
i i want to use spark for that, how can i import data from druid/hdfs to spark rdd/dataframe?
should i use sql/query api? total size of training sample can be very huge
Hey guys, i want to use data from druid in machine learning.
i i want to use spark for that, how can i import data from druid/hdfs to spark rdd/dataframe?
should i use sql/query api? total size of training sample can be very huge
Hi,
You may consider using https://github.com/himanshug/druid-hadoop-utils to read druid segments stored on hdfs. It contains Hadoop InputFormat and pig loader.
ps: This code is in very early stage and things might change in future.
Thanks,
Akash