I have a pretty simple use case with the following data:
the columns are id, time, sessionId, funnelLevel and segments
"1", "2018-07-17T00:00:00.001-06:00", "session-1", "1", "a"
"2", "2018-07-17T01:00:00.001-06:00", "session-1", "1", "a|b"
I am trying to group by segments and count distinct session ids per segment.
If I lay out the same data without multivalued segment dimension
"1", "2018-07-17T00:00:00.001-06:00", "session-1", "1", "a"
"2", "2018-07-17T01:00:00.001-06:00", "session-1", "1", "a"
"3", "2018-07-17T01:00:00.001-06:00", "session-1", "1", "b"
This query can be used:
{
“queryType” : “groupBy”,
“dataSource” : “s8”,
“intervals” : [ “2018-07-01/2018-08-01” ],
“granularity” : “all”,
“dimensions” : [ “segments” ],
“aggregations” : [ {
“type” : “cardinality”,
“name” : “distinct_count”,
“fields” : [ “segments”, “sessionId” ],
“byRow” : true
} ]
}
and it produces the expected result:
{
“version” : “v1”,
“timestamp” : “2018-07-01T00:00:00.000Z”,
“event” : {
“distinct_count” : 1.0002442201269182,
“segments” : “a”
}
}, {
“version” : “v1”,
“timestamp” : “2018-07-01T00:00:00.000Z”,
“event” : {
“distinct_count” : 1.0002442201269182,
“segments” : “b”
}
} ]
The same query with the multivalued data produces this output. Note that segment “a” has a count of 2 instead of th
[ {
“version” : “v1”,
“timestamp” : “2018-07-01T00:00:00.000Z”,
“event” : {
“distinct_count” : 2.000977198748901,
“segments” : “a”
}
}, {
“version” : “v1”,
“timestamp” : “2018-07-01T00:00:00.000Z”,
“event” : {
“distinct_count” : 1.0002442201269182,
“segments” : “b”
}
} ]