I cross verified with my source and it has only single record.
Not sure what is causing the duplicates. I tried deleting segments from S3 and from druid console and did a fresh ingestion but then also facing the same issue.
Please let me know what is going wrong.
Also note that i am NOT setting appendToExisting
REPLACE INTO fact_extrapolated_revenue OVERWRITE ALL WITH fact_extrapolated_revenue_table AS
(SELECT * FROM TABLE( EXTERN( '{\"type\":\"s3\",\"prefixes\":[\"s3://data-dev/druid_poc_1/fact_extrapolated_revenue\"]}', '{\"type\":\"csv\",\"findColumnsFromHeader\":true}', '[{\"name\":\"product_id\",\"type\":\"string\"},
{\"name\":\"facilitytype_id\",\"type\":\"long\"},
{\"name\":\"month_end_datekey\",\"type\":\"string\"}
,{\"name\":\"dist_total_units\",\"type\":\"long\"},
{\"name\":\"dist_total_revenue\",\"type\":\"long\"},
{\"name\":\"modified\",\"type\":\"string\"}]' ) ))
SELECT coalesce(TIME_PARSE(fez.month_end_datekey,'yyyyMMdd'),current_timestamp) AS __time, EXTRACT(MONTH from coalesce(TIME_PARSE(fez.month_end_datekey,'yyyyMMdd'),current_timestamp)) AS sale_month, EXTRACT(YEAR from coalesce(TIME_PARSE(fez.month_end_datekey,'yyyyMMdd'),current_timestamp)) AS sale_year, EXTRACT(QUARTER from coalesce(TIME_PARSE(fez.month_end_datekey,'yyyyMMdd'),current_timestamp)) AS sale_quarter, fez.product_id, fez.facilitytype_id, fez.dist_total_units, fez.dist_total_revenue FROM fact_extrapolated_revenue_table fez PARTITIONED BY YEAR
can you rename fact_extrapolated_revenue to fact_extrapolated_revenue_csv and then use that in your ingestion query? Note fact_extrapolated_revenue_parquet will pick up pnly that folder but fact_extrapolated_revenue will pick up all four