Customer cohort analysis

Hi,

We are trying to do customer cohort analysis.

The objective is to find first transaction Qtr for each customer, and then analyzing how much of those customers over a period of time.

In the below example, we are first computing Year-Qtr, and then joining on customer. I know that Druid does not support joins, but the below SQL is only an example to explain our scenario.

select “__time” , first_trans_qtr, sum(row_count), sum(NetValue_USD)

from retail_index_test_4,

(select DW_EK_CRMConsumer, concat(TIME_EXTRACT(min(__time),‘year’), ‘-Q’ ,TIME_EXTRACT(min(__time),‘quarter’)) first_trans_qtr

from retail_index_test_4 group by DW_EK_CRMConsumer) tmp_first_cust_qtr

where retail_index_test_4.DW_EK_CRMConsumer = tmp_first_cust_qtr.DW_EK_CRMConsumer

group by “__time” asc , first_trans_qtr asc

order by “__time” asc , first_trans_qtr asc

I was going through the DS documentation (https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html). But the example in the document computes the DS of unique customers. It does not help in “tagging” a customer based on some condition (ex: first transaction date).

Can somebody help if this is possible?

Regards, Chari.

Hi,

An update. I managed to get the required functionality using LOOKUP.

I was apprehensive about performance, but surprisingly the latency is negligible with LOOKUP.

Kudos to Druid Architecture and Engineering team.

Regards, Chari.