How to know count of ingested rows in druid

I am ingesting my data in druid using realtime node. Total
no of records ingested is 40 million but when I query I see output of only 13
million record. I can see that there is no rejection of events in logs but
still count is less than my given data ingest. When going through druid doc it

“To count the number of ingested rows of data, include a
count aggregator at ingestion time, and a longSum aggregator at query time.”

What is the meaning of this line ?

I am giving following
line while ingesting data:

“metricsSpec” : [{

“type” : “count”,

“name” : “COUNT”


And querying using below lines:


“queryType”: “groupBy”,

“dataSource”: “abcd”,

“granularity”: “all”,

“dimensions”: ,

“aggregations”: [

{“type”: “count”, “name”:

{“type”: “count”, “name”: “UNIQUE_CUSTOMERS”,
“fieldName”: “CUSTOMER_ID”}


“intervals”: [""]


It gives me very13m counts(I am sure it is returning druid
row counts.), but If I try change query from(as suggested in above lines) {“type”:
“count”, “name”: “count”}, to {“type”:
“longSum”, “name”: “count”}, it gives syntax

I even tried querying it using segmentQuery but this also gives me same 13m counts:






Can anyone suggest any way to know how many original rows were ingested by druid ?

At query time, instead of {“type”: “count”, “name”: “count”} you want {“type”: “longSum”, “name”: “count”, “fieldName”: “count”}. The idea is that at indexing time you’re doing a count, but at query time you’re summing an already-computed count.

Hey everyone,

I’m also facing the same issue. The total rows ingested are
21731674 but the longSum count is 16570932. Can I recover the data loss/collapsed as this will effect the score I want to calculate ? Is there any workaround I can prevent rows getting collapsed ?

Hi Parveen,

Separate from the total row count questions, I noticed your query has this aggregator:

{“type”: “count”, “name”: “UNIQUE_CUSTOMERS”, “fieldName”: “CUSTOMER_ID”}

The “count” aggregator doesn’t provide the cardinality of a dimension and doesn’t take a “fieldName” parameter, it only counts the number of rows that are returned in a query.

To get a cardinality estimate for a column, you’d want to use a cardinality, hyperunique, or datasketch aggregator, e.g.

Hi Akul,

I am not sure what you mean by “recover the data loss/collapsed” but if you are asking how to disable Druid’s rollup summarization feature, you can do that by setting rollup: false in your granularitySpec.