Re: count() versus sum(count) – note that the metric, COUNT, is created at ingestion time – I presume that you have the roll-up feature turned on. Roll-up does a GROUP BY on incoming rows to generate the metrics. Therefore, just like a GROUP BY, if you have 2 rows with the same data incoming, then 1 row will be output. Doing a COUNT() tells you the actual rows in the table, SUM(count) is giving you the total incoming rows.
E.g. this command is useful to understand what roll-up ratio you are getting:
If you disable roll-up, you will have a row-for-row match incoming to table, at the cost of a performance hit.
As for 10m rows in source data versus what your SUM(“count”) is telling you, there are various approaches to understanding why you do not have the same number of rows source → ingestion. My first step would probably be to try and isolate time periods: e.g. one day in one system versus one day in another – to see if I can pinpoint the point in time where there is a discrepency. I can then move on to looking at the ingestion task logs to see if there were failures on particular files etc.