Hi guys

I’m trying to run this query:
select avg(skucount) from (select count(distinct unique_skucode) skucount,doc_no_key from sales_sketch where “__time” BETWEEN timestamp ‘2018-01-01 00:00:00’ and timestamp ‘2018-12-31 00:00:00’ group by doc_no_key);

and getting error: org.apache.druid.java.util.common.IAE: Incompatible type for metric[unique_skucode], expected a HyperUnique, got a class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder (datanode1:8083)

I think it is due to schema

please have a look at my schema, is it well define or not?

** “type” : “index_hadoop”,**

** “spec” : {**

** “dataSchema” : {**

** “dataSource” : “sales_sketch”,**

** “parser”: {**

** “type”: “hadoopyString”,**

** “parseSpec”: {**

** “format”: “tsv”,**

** “columns”:[“distcode_locationid_key”,“doc_no”,“skucode_key”,“cartons”,“units”,**

** “net_amount”,“gross_amount”,“foreign_amount”,“order_type”,“doc_date”,“saledate” ,“ex_factory_value”,**

** “shopcode_without_dist”,“lfl_ly”,“lfl_ty”,“distributorcode_dsr_code”,**

** “doc_no_key”,“dist_section_key”,“distributorcode_dsrcode_pjp_sell_category”,“shopcode_without_dist_saledate”],**

** “delimiter”: “;”,**

** “timestampSpec”: {**

** “column”: “saledate”,**

** “format”: “auto”**

** },**

** “dimensionsSpec”: {**

** “dimensions”:[“distcode_locationid_key”,“doc_no”,“skucode_key”,“order_type”,“doc_date”,**

** “shopcode_without_dist”,“lfl_ly”,“lfl_ty”,“distributorcode_dsr_code”,**

** “doc_no_key”,“dist_section_key”,“distributorcode_dsrcode_pjp_sell_category”,**

** “shopcode_without_dist_saledate”]**

** }**

** }**

** },**

** “metricsSpec” : [**

** {“type”: “count”,“name”: “count”},**

** {“type”:“thetaSketch”, “name”:“unique_skucode”, “fieldName”:“skucode_key”},**

** {“type”:“thetaSketch”, “name”:“unique_distcode”, “fieldName”:“distcode_locationid_key”},**

** {“type”:“thetaSketch”, “name”:“unique_doc_no”, “fieldName”:“doc_no_key”},**

** {“type”:“thetaSketch”, “name”:“unique_dist_sec”, “fieldName”:“dist_section_key”},**

** {“type”: “doubleSum”, “name”: “totalunits”, “fieldName”: “units”},**

** {“type”: “doubleSum”, “name”: “totalexfactoryvalue”, “fieldName”: “ex_factory_value”},**

** {“type”: “doubleSum”, “name”: “totalcartons”, “fieldName”: “cartons”},**

** {“type”: “doubleSum”, “name”: “totalnetamount”, “fieldName”: “net_amount”},**

** {“type”: “doubleSum”, “name”: “totalgrossamount”, “fieldName”: “gross_amount”},**

** {“type”: “doubleSum”, “name”: “totalforeingamount”, “fieldName”: “foreign_amount”}**

** ],**

** “granularitySpec” : {**

** “type” : “Uniform”,**

** “segmentGranularity” : “Day”,**

** “queryGranularity” : “None”,**

** “rollup” : “false”**

** }**

** },**

** “ioConfig” : {**

** “type” : “hadoop”,**

** “inputSpec” : {**

** “type” : “static”,**

** “paths” : “/user/FactSecondarySales_withoutheader.txt”**

** }**

** },**

** “tuningConfig” : {**

** “type” : “hadoop”,**

** “partitionsSpec” : {**

** “type” : “hashed”,**

** “targetPartitionSize” : 500000000**

** },**

** “forceExtendableShardSpecs” : true,**

** “jobProperties” : {**

** “fs.default.name” : “hdfs://namenode:9000”,**

** “fs.defaultFS” : “hdfs://namenode:9000”,**

** “dfs.datanode.address” : “namenode”,**

** “dfs.client.use.datanode.hostname” : “true”,**

** “dfs.datanode.use.datanode.hostname” : “true”,**

** “yarn.resourcemanager.hostname” : “namenode”,**

** “yarn.nodemanager.vmem-check-enabled” : “false”,**

** “mapreduce.map.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,**

** “mapreduce.job.user.classpath.first” : “true”,**

** “mapreduce.reduce.java.opts” : “-Duser.timezone=UTC -Dfile.encoding=UTF-8”,**

** “mapreduce.map.memory.mb” : 4096,**

** “mapreduce.reduce.memory.mb” : 4096,**

** “mapreduce.job.classloader”: “true”**

** }**

** }**

** },**

** “hadoopDependencyCoordinates”: [“org.apache.hadoop:hadoop-client:2.8.3”]**

}

Since unique_skucode is a theta sketch column, you’ll want to use APPROX_COUNT_DISTINCT_DS_THETA function instead (only available in 0.14.0-incubating and later), documented here: https://druid.apache.org/docs/latest/querying/sql#aggregation-functions

COUNT(DISTINCT expr) only works on numeric/string columns and HyperUnique columns (a different kind of aggregator than thetasketch).

Thanks

Please consider a more informative subject line next time. Thank you.