Hi all trying my first ‘update’ test. (this is on the latest 0.14.0 release)
The data has been streamed in via kafka-indexing-task (and the kafka-indexer task is still running processing ‘todays’ data…well there is no more active data streaming atm…but that is a detail…in reality there would be…).
Now If I try to ‘update’ some data data in the past…(I have created a ‘15’ minute window data file with rows for that time window)
My spec file is shown below…
when I run the indexing task:
./bin/post-index-task --file updates-overwrite-index.json
Beginning indexing data for test_1day
Task started: index_test_1day_2019-05-24T10:12:59.856Z
Task log: http://localhost:8090/druid/indexer/v1/task/index_test_1day_2019-05-24T10%3A12%3A59.856Z/log
Task status: http://localhost:8090/druid/indexer/v1/task/index_test_1day_2019-05-24T10%3A12%3A59.856Z/status
Task index_test_1day_2019-05-24T10:12:59.856Z still running…
Task index_test_1day_2019-05-24T10:12:59.856Z still running…
Task finished with status: SUCCESS
Completed indexing data for test_1day. Now loading indexed data onto the cluster…
test_1day loading complete! You may now query your data
This didn’t appear to load my 15 minute file…as the new/updated data is not there…
now…the data I was trying to load is for 2019-05-01, and the data file contains data only for the 15 minute time window.
I’m guessing…(and somewhat fearing)…that to update a datasource that was/is being streamed in…I will need to:
-
First ‘kill’ the streaming task
-
Run the ‘update’ task with the file containing updates.
-
Resume the original streaming task?
will this appoach work?, is there a better approach?T