Calculating Conversion Percentages

Hello All,

I have tried a bunch of things around aggregations and post aggregations. I presume someone must have dealt with this before… I am trying to calculate a simple Impression / Click. Here is my query.

{

“dimensions”: [

“metric”

],

“filter”: {

“fields”: [

{

“type”: “selector”,

“dimension”: “metric”,

“value”: “Provider Click”

},

{

“type”: “selector”,

“dimension”: “metric”,

“value”: “Impression”

}

],

“type”: “or”

},

“aggregations”: [

{

“type”: “count”,

“name”: “count”

}

],

“intervals”: “2016-09-20T20:08:21+00:00/2016-09-21T20:08:21+00:00”,

“limitSpec”: {

“limit”: 5000,

“type”: “default”,

“columns”: [

{

“direction”: “descending”,

“dimension”: “metric”

}

]

},

“granularity”: “all”,

“postAggregations”: ,

“queryType”: “groupBy”,

“dataSource”: “source”

}

This returns

[

{

“version”: “v1”,

“timestamp”: “2016-09-20T20:08:21.000Z”,

“event”: {

“count”: 847,

“metric”: “Provider Click”

}

},

{

“version”: “v1”,

“timestamp”: “2016-09-20T20:08:21.000Z”,

“event”: {

“count”: 19048,

“metric”: “Impression”

}

}

]

I would ideally want a reply such as

[

{

“version”: “v1”,

“timestamp”: “2016-09-20T20:08:21.000Z”,

“event”: {

“count”: 847,

“metric”: “Provider Click”

}

},

{

“version”: “v1”,

“timestamp”: “2016-09-20T20:08:21.000Z”,

“event”: {

“count”: 19048,

“metric”: “Impression”

}

},

{

“version”: “v1”,

“timestamp”: “2016-09-20T20:08:21.000Z”,

“event”: {

“count”: 0.04,

“metric”: “Conversion”

}

}

]

Is this possible in a druid query, if not, I would have to do a pre ingestion work around? Thank you for the help.

Please try this query,hopefully it should give the desired result:

{

“dimensions”: [

“metric”

],

“aggregations”: [

{

“type”: “filtered”,

“filter”: {

“type”: “selector”,

“dimension”: “metric”,

“value”: “Provider Click”

},

“aggregator”: {

“type”: “count”,

“name”: “Provider Click Count”

}

}, {

“type”: “filtered”,

“filter”: {

“type”: “selector”,

“dimension”: “metric”,

“value”: “Impression”

},

“aggregator”: {

“type”: “count”,

“name”: “Impression Count”

}

}

],

“intervals”: “2016-09-20T20:08:21+00:00/2016-09-21T20:08:21+00:00”,

“limitSpec”: {

“limit”: 5000,

“type”: “default”,

“columns”: [{

“direction”: “descending”,

“dimension”: “metric”

}]

},

“granularity”: “all”,

“postAggregations”: [{

“type”: “arithmetic”,

“name”: “div”,

“fn”: “/”,

“fields”: [{

“type”: “fieldAccess”,

“name”: “Provider Click Count”,

“fieldName”: “Provider Click Count”

}, {

“type”: “fieldAccess”,

“name”: “Impression Count”,

“fieldName”: “Impression Count”

}]

}],

“queryType”: “timeseries”,

“dataSource”: “source”

}

PS:I have converted the group by query to time series,if you feel the results are not expected do change it back to group by.
The main take away here is the use of “filtered aggregator” and “post aggregator”

Rohit,

Thank you for the response. Looks like the groupby does not work, but the time series does. We are using airbnb caravel and caravel tends to do everything with groupby.

Any ideas?

Thanks,

Hey Pritesh,

You should be able to translate that strategy Rohit used (using filtered aggregators and post aggregators) to a groupBy query too. It supports all those features.

We are doing similar calculations in Caravel on top of druid. We define the metrics on the data source, so you would define a filtered metric for Provider Click, and another for Impression like so:

{

“type” : “filtered”,

“filter” : {

“type”: “selector”,

“dimension”: “metric”,

“value”: “Provider Click”

},

“aggregator” : {

“type”: “count”,

“name”: “count_provider_clicks”

}

}

{

“type” : “filtered”,

“filter” : {

“type”: “selector”,

“dimension”: “metric”,

“value”: “Impression”

},

“aggregator” : {

“type”: “count”,

“name”: “count_impressions”

}

}

Then you would define an additional metric for the conversion rate as a “postagg” metric:

{

“type” : “arithmetic”,

“name” : “conversion”,

“fn” : “/”,

“fields”: [{ “type” : “fieldAccess”, “name” : “count_provider_clicks”, “fieldName” : “count_provider_clicks” },{ “type” : “fieldAccess”, “name” : “count_impressions”, “fieldName” : “count_impressions” }],

“ordering” : null

}

Thanks that did the trick

I am unable to create aggregation and post-aggregation fields. Plz give me some clue how to ingest data with these aggregated and post aggregated values

{

“type”: “index_hadoop”,

“spec”: {

“dataSchema”: {

“dataSource”: “LargeDataCSV_Testing_3”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “tsv”,

“timestampSpec”: {

“format”: “auto”,

“column”: “visit_date”

},

“columns”: [

“id”,

“calc_answer”,

“answer”,

“area_name”,

“areas_code”,

“audit_type_code”,

“banner_code”,

“banner_name”,

“cat_code”,

“channel_code”,

“channel_name”,

“city_code”,

“city_name”,

“cust_subtype_code”,

“cust_subtype_name”,

“cust_type_code”,

“cust_type_name”,

“customer_code”,

“customer_name”,

“journey_date”,

“journey_plancode”,

“level”,

“menu_code”,

“menu_name”,

“prd_code”,

“prd_id”,

“prd_name”,

“province_code”,

“province_name”,

“question_code”,

“region_code”,

“region_name”,

“country_code”,

“country_name”,

“route_code”,

“route_name”,

“sales_region_code”,

“sales_region_name”,

“store_code”,

“store_name”,

“store_location_code”,

“store_location_name”,

“store_area_code”,

“store_area_name”,

“subchannel_code”,

“subchannel_name”,

“territory_code”,

“territory_name”,

“visit_date”,

“yearweek”,

“month”,

“year”,

“calendar_week”,

“calendar_month”,

“calendar_monthname”,

“calendar_year”,

“achivement”,

“totaluli”,

“totalcategory”,

“osaflag”,

“kpi”,

“source_id”,

“inserted_at”,

“division_id”,

“division_code”,

“division_name”,

“subdivision_id”,

“subdivision_code”,

“subdivision_name”,

“category_id”,

“category_code”,

“category_name”,

“market_id”,

“market_code”,

“market_name”,

“brand_id”,

“brand_code”,

“brand_name”,

“sector_id”,

“sector_code”,

“sector_name”,

“product_group_id”,

“product_group_code”,

“product_group_name”,

“sub_brand_id”,

“sub_brand_code”,

“sub_brand_name”,

“base_pack_id”,

“base_pack_code”,

“base_pack_name”,

“product_id”,

“product_code”,

“product_name”,

“prd_source_id”,

“prd_inserted_at”

],

“delimiter”: “;”,

“dimensionsSpec”: {

“dimensions”: [

“id”,

“calc_answer”,

“answer”,

“area_name”,

“areas_code”,

“audit_type_code”,

“banner_code”,

“banner_name”,

“cat_code”,

“channel_code”,

“channel_name”,

“city_code”,

“city_name”,

“cust_subtype_code”,

“cust_subtype_name”,

“cust_type_code”,

“cust_type_name”,

“customer_code”,

“customer_name”,

“journey_date”,

“journey_plancode”,

“level”,

“menu_code”,

“menu_name”,

“prd_code”,

“prd_id”,

“prd_name”,

“province_code”,

“province_name”,

“question_code”,

“region_code”,

“region_name”,

“country_code”,

“country_name”,

“route_code”,

“route_name”,

“sales_region_code”,

“sales_region_name”,

“store_code”,

“store_name”,

“store_location_code”,

“store_location_name”,

“store_area_code”,

“store_area_name”,

“subchannel_code”,

“subchannel_name”,

“territory_code”,

“territory_name”,

“visit_date”,

“yearweek”,

“month”,

“year”,

“calendar_week”,

“calendar_month”,

“calendar_monthname”,

“calendar_year”,

“achivement”,

“totaluli”,

“totalcategory”,

“osaflag”,

“kpi”,

“source_id”,

“inserted_at”,

“division_id”,

“division_code”,

“division_name”,

“subdivision_id”,

“subdivision_code”,

“subdivision_name”,

“category_id”,

“category_code”,

“category_name”,

“market_id”,

“market_code”,

“market_name”,

“brand_id”,

“brand_code”,

“brand_name”,

“sector_id”,

“sector_code”,

“sector_name”,

“product_group_id”,

“product_group_code”,

“product_group_name”,

“sub_brand_id”,

“sub_brand_code”,

“sub_brand_name”,

“base_pack_id”,

“base_pack_code”,

“base_pack_name”,

“product_id”,

“product_code”,

“product_name”,

“prd_source_id”,

“prd_inserted_at”

],

“dimensionExclusions”: ,

“spatialDimensions”:

}

}

},

“metricsSpec”: [{

“name”: “count”,

“type”: “count”

},{

“name” : “calc_answer_sum”,

“type” : “longSum”,

“fieldName”:“calc_answer”

},{

“name” : “calc_answer_count”,

“type” : “count”,

“fieldName”:“calc_answer”

}

],“aggregations”: [

{ “type”: “longSum”, “name”: “calc_answer_sum_agg”, “fieldName”: “calc_answer_sum”},

{ “type”: “count”, “name”: “calc_answer_count_agg”, “fieldName”: “calc_answer_count”}

],

“postAggregations”: [

{

“type”: “arithmetic”,

“name”: “score”,

“fn”: “/”,

“fields”: [

{ “type”: “fieldAccess”, “name”: “calc_answer_sum_agg”, “fieldName”: “calc_answer_sum_agg” },

{ “type”: “fieldAccess”, “name”: “calc_answer_count_agg”, “fieldName”: “calc_answer_count_agg”}

]

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “NONE”,

“intervals”: [“2017-01-01/2017-07-25”]

}

},

“ioConfig”: {

“type”: “hadoop”,

“inputSpec”: {

“type”: “static”,

“paths”: “hdfs://localhost:9000/druid/part.csv”

}

},

“tuningConfig”: {

“type”: “hadoop”

}

}

}