Confusion about single/double precision floating point number in ingest and summary

Hi,

I am new to Druid and while reading up on Druid read at few places about the values being imported as float. However I could not find a clear definition as to whether this was a single or double precision. We have some values that have a currency aspect (2 digits after decimal), and so I could potentially just multiply all values with
100 and then treat them as long in druid and during retrieval again divide by 100. However I think this is not a good solution as everybody dealing with this data would need to be aware of this data manipulation requirement. That said I tried importing the data directly into druid but it seems that the results I see are not entirely correct or in sync with my expectation. I will explain it with an example to make it clearer.

The load data JSON is:
{
“type”: “index”,
“spec”: {
“ioConfig”: {
“type”: “index”,
“firehose”: {
“type”: “local”,
“baseDir”: “/tmp”,
“filter”: “test.csv”
}
},
“dataSchema”: {
“dataSource”: “test2_data”,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “day”,
“queryGranularity”: “none”,
“intervals”: [
“2017-03-15/2017-03-16”
]
},
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “csv”,
“timestampSpec” : {
“column” : “TestDate”,
“format” : “yyyy-MM-dd’”
},
“columns”: [
“TestDate”,
“Region”,
“id”,
“Amt”
],
“dimensionsSpec”: {
“dimensions”: [
“Region”,
“id”
]
}
}
},
“metricsSpec”: [
{ “type”: “count”, “name”: “count”},
{ “type” : “doubleSum”, “name” : “TotAmt”, “fieldName” : “Amt” }
]
},
“tuningConfig”: {
“type”: “index”,
“partitionsSpec”: {
“type”: “hashed”,
“targetPartitionSize”: 5000000
},
“jobProperties”: {}
}
}
}

``

The data that I am loading is:
2017-03-15,R1,1,72911.87
2017-03-15,R1,2,729118.7
2017-03-15,R1,3,7291187.0
2017-03-15,R1,4,72911870
2017-03-15,R2,0,729118 2017-03-15,R2,1,729118
2017-03-15,R2,2,729118
2017-03-15,R2,3,729118
2017-03-15,R2,4,729118
2017-03-15,R2,5,729118
2017-03-15,R2,6,729118
2017-03-15,R2,7,729118
2017-03-15,R2,8,729118
2017-03-15,R2,9,729118

``

The
first four lines for R1 are with data that leads to inaccuracies in single precision float. The last 10 lines with R2 are copied 100 times overall to get 1004 lines in the file.

When I run the query to get the values grouped by the region and ID, the results are as follows:
┌────────┬────┬───────────────┐
│ Region │ id │ TotAmt │
├────────┼────┼───────────────┤
│ R1 │ 1 │ 72911.8671875 │
│ R1 │ 2 │ 729118.6875 │
│ R1 │ 3 │ 7291187 │
│ R1 │ 4 │ 72911872 │
│ R2 │ 0 │ 72911800 │
│ R2 │ 1 │ 72911800 │
│ R2 │ 2 │ 72911800 │
│ R2 │ 3 │ 72911800 │
│ R2 │ 4 │ 72911800 │
│ R2 │ 5 │ 72911800 │
│ R2 │ 6 │ 72911800 │
│ R2 │ 7 │ 72911800 │
│ R2 │ 8 │ 72911800 │
│ R2 │ 9 │ 72911800 │
└────────┴────┴───────────────┘

``

As
can be seen in this data, for R1 the data is being approximated by a single precision floating point number in the cases of ID = 1, 2, 4 (also possibly 3, but the representation is an integer). For R2 each value is reported as exactly 100 time the base number as expected.

However if I group it at just the Region level the results are:
┌────────┬───────────┐
│ Region │ TotAmt │
├────────┼───────────┤
│ R1 │ 81005088 │
│ R2 │ 729118016 │
└────────┴───────────┘

``

So
although the TotAmt is defined as DoubleSum; the sum at R2 level is not
correct if we assume a double precision calculation which is expected to be accurate till 15 digits.

Can someone explain why this behavior is being seen.

Is it possible to force Druid to use double precision numbers in all cases?

Regards,
Manish.