Hi

I am new to druid. Here is what i am trying to solve -

Input event:

event 1 ->

data: {

“key1” : “value1”,

“key2” : “value2”,

“pv”: 1,

“res” : {

“img” : {

“nm” : “img1”,

“p1” : 123,

“p2” : 456,

} ,

“xhrs”: [

{

“nm” : “xhrs1”,

“s1” : 345,

“s2” : 567,

},

{

“nm” : “xhrs2”,

“s1” : 234,

“s2” : 673,

}

]

}

}

event 2 ->

data: {

“key1” : “value1”,

“key2” : “value3”,

“pv” : 1,

“res” : {

“src” : {

“nm” : “src1”,

“p1” : 222,

“p2” : 333,

} ,

“xhrs”: [

{

“nm” : “xhrs11”,

“s1” : 444,

“s2” : 555,

},

{

“nm” : “xhrs2”,

“s1” : 666,

“s2” : 777,

}

{

“nm” : “xhrs3”,

“s1” : 888,

“s2” : 999,

}

]

}

}

Note: xhrs,nm, src, img, css, key1, key2 are dimensions and

pv, s1, s2, p1, p2 are metrics.

What we want to query -

Get the nm’s and metrics corresponding to xhrs given a filter criteria (say key1 = value1)

The result should include -

{

“nm” : “name1”,

“s1” : 345,

“s2” : 567,

},

{

“nm” : “name2”,

“s1” : 234,

“s2” : 673,

}

{

“nm” : “name1”,

“s1” : 444,

“s2” : 555,

},

{

“nm” : “name3”,

“s1” : 666,

“s2” : 777,

}

How do I achieve this?

Below are the options that i could think of -

Option1 - flatten out everything and index as below

event1-> {“key1” : “value1”, “key2”:“value2”, “pv”: 1, “res_img_nm”: “img1”, “res_img_p1”: 123, “res_img_p2”: 456, “res_xhrs_1_nm”: “xhrs1”, “res_xhrs_1_s1” : 345, “res_xhrs_1_s2”:567, “res_xhrs_2_nm”: “xhrs2”, “res_xhrs_2_s1” : 234, “res_xhrs_2_s2”:673 }

event2-> {“key1” : “value1”, “key2”:“value3”, “pv”: 1, “res_src_nm”: “src1”, “res_src_p1”: 222, “res_src_p2”: 333, “res_xhrs_1_nm”: “xhrs11”, “res_xhrs_1_s1” : 444, “res_xhrs_1_s2”:555, “res_xhrs_2_nm”: “xhrs2”, “res_xhrs_2_s1” : 666, “res_xhrs_2_s2”:777, “res_xhrs_3_nm”: “xhrs3”, “res_xhrs_3_s1” : 888, “res_xhrs_3_s2”:999 }

Pros:

No duplicates in data being indexed.

Cons:

parsing the result to check if dimension name has xhrs and to get the corresponding names and metric values from the query result.

Option2 - index multiple entries for every event as below -

event 1 ->

{“key1” : “value1”, “key2”:“value2”, “pv”: 1, “res_img_nm”: “img1”, “res_img_p1”: 123, “res_img_p2”: 456 }

{“key1” : “value1”, “key2”:“value2”, “res_xhrs_nm”: “xhrs1”, “res_xhrs_s1” : 345, “res_xhrs_s2”:567 }

{“key1” : “value1”, “key2”:“value2”, “res_xhrs_nm”: “xhrs2”, “res_xhrs_s1” : 234, “res_xhrs_s2”:673 }

{“key1” : “value1”, “key2”:“value3”, “pv”: 1, “res_src_nm”: “src1”, “res_src_p1”: 222, “res_src_p2”: 333}

{“key1” : “value1”, “key2”:“value3”, “res_xhrs_nm”: “xhrs11”, “res_xhrs_s1” : 444, “res_xhrs_s2”:555}

{“key1” : “value1”, “key2”:“value3”, “res_xhrs_nm”: “xhrs2”, “res_xhrs_s1” : 666, “res_xhrs_s2”:777 }

{“key1” : “value1”, “key2”:“value3”, “res_xhrs_nm”: “xhrs3”, “res_xhrs_s1” : 888, “res_xhrs_s2”:999 }

Pros:

Duplicate Data being indexed multiple times.

Cons:

straight forward to get the list of xhrs names and metrics given any filter criteria.

Can you tell me if there is a better way of achieving what i want?

If not, what would you suggest of the two options above and why?

Thanks

Shobana