Andrew
April 10, 2019, 3:15pm
#1
Hi!
I’m trying to use druid and hive together. The problem is that all columns must be lowercase, because hive will not query column if any letter in column name will be uppercase.
I read it here:
HI all, been trying to get Hive/Druid/Kafka working (noob to both hive/druid)..'getting close I think' I have gotten to the point of being able load data via kafka->druid I have been able to create 'external druid' tables in hive, and even...
And now I can’t undertend how to rename columns in existing datasource.
I know that I could use firehourse to made exist datasource as input for spec, also I know how actually rename dimensions from here https://support.imply.io/hc/en-us/articles/360005727614-How-to-rename-dimension-during-raw-data-ingestion
But my spec doesn’t parse floats. Also i have 400gb of data and 1600 intervals. Index task is very slow.
(I don’t use hadoop index, i use index task (also i can’t use index_parallel because of json firehorse input)).
So very slow spec that renames columns and didn’t parse floats:
spec.py
data = {'type': 'index',
'spec': {'dataSchema': {'dataSource': 'checkline',
'parser': {'type': 'string',
'parseSpec': {"format": "json",
'dimensionsSpec': {'dimensions': [{'name': 'CashCheckLineNo'.lower(),
'type': 'long'},
{'name': 'BasePrice'.lower(), 'type': 'float'},
{'name': 'Quantity'.lower(), 'type': 'double'},
{'name': 'BaseSum'.lower(), 'type': 'float'},
{'name': 'id_tt_cl'.lower(), 'type': 'long'},
This file has been truncated. show original
Can anybody help me?
Thanks!
The ingest spec has appendToExisting
set to true, which means the new segments will not overwrite the old ones, you’ll want to set that to false.
You could also break the task apart into smaller intervals (e.g., reingest a week at a time).