Quickstart Ingestion issues

I am trying to ingest data that looks like this - I modeled after the example that I got from quickstart which is just sitting in test.json file

{“time”:“2015-08-07T14:26:06.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:24.122683}{“time”:“2015-08-07T14:26:07.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:25.004936}{“time”:“2015-08-07T14:26:08.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:25.730631}{“time”:“2015-08-07T14:26:09.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:26.400974}{“time”:“2015-08-07T14:26:10.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:27.158089}{“time”:“2015-08-07T14:26:11.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:27.767803}{“time”:“2015-08-07T14:26:12.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:28.540548}{“time”:“2015-08-07T14:26:13.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:29.271786}{“time”:“2015-08-07T14:26:14.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:29.992321}{“time”:“2015-08-07T14:26:15.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:30.803215}{“time”:“2015-08-07T14:26:16.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:31.545570}{“time”:“2015-08-07T14:26:17.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:32.437809}{“time”:“2015-08-07T14:26:18.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:33.209160}{“time”:“2015-08-07T14:26:19.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:33.937225}{“time”:“2015-08-07T14:26:20.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:34.898518}{“time”:“2015-08-07T14:26:21.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:35.557831}{“time”:“2015-08-07T14:26:22.330000Z”,“channel”:“U_BPOW”,“unit”:“kW”,“value”:36.362434}

With the following ingestion spec, the coordinator says it is successful but when I look in pivot it only ever shows the first point of the data - any tips to get all the data in the system?

{

“type” : “index_hadoop”,

“spec” : {

“ioConfig” : {

“type” : “hadoop”,

“inputSpec” : {

“type” : “static”,

“paths” : “quickstart/test.json”

}

},

“dataSchema” : {

“dataSource” : “engine_data_”,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”,

“intervals” : [“2015-08-07/2015-08-08”]

},

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“dimensionsSpec” : {

“dimensions” : [

“channel”,

“unit”

]

},

“timestampSpec” : {

“format” : “auto”,

“column” : “time”

}

}

},

“metricsSpec” : [

{

“name” : “value”,

  •     "type" : "doubleSum",*
    
  •     "fieldName": "value"*
    

}

]

},

“tuningConfig” : {

“type” : “hadoop”,

“partitionsSpec” : {

“type” : “hashed”,

“targetPartitionSize” : 5000000

},

“jobProperties” : {}

}

}

}

Hey Steve,

Druid indexing expects newline-separate JSON- it looks like you have all the objects on one line, so only the first one would be indexed.

Thanks, that worked - is there anyway to view raw metric values in pivot without any aggregates? It says the metricSpec requires an aggregator - why cant I just view the raw data as tables or a time series graph in pivot?

Hi Steve, this Druid rolls up data as it is ingested. This can significantly reduce the data you have store. This behavior will optionally change with https://github.com/druid-io/druid/pull/3020

Looks good - as we are testing Druid for engineering time-series data, having it remove raw data at ingestion by default doesn’t make a lot of sense ( if I’m understanding correctly) . Will this pull be included in the next version ?

Thanks

Steve, Druid isn’t losing any information about the data, it is rolling it up.

Please read http://druid.io/docs/0.9.0/design/index.html

Yeah I think that makes sense - however in our application when we are recording data from a windmill or automotive power train at 100hz every point is important and its up to the analysis or data acquisition system software to aggregate the data. If you sum or “Roll up” ten data points you might miss something - also certain emissions tests require the raw data be untouched for legal reasons.

Thanks

One workaround that you could try before that PR is merged is to include a unique key in each row, and mark that as a dimension. That would effectively disable rollup. If you don’t have a natural unique key, you could use a UUID or something like that.

I think odds are pretty decent that the no-rollup feature will make it into 0.9.2 though.