Question on Druid data export using DumpSegment tool

Hi,

I am trying to export data from Druid using the DumpSegment tool. Referred to the documentation on this at https://druid.apache.org/docs/0.14.2-incubating/operations/dump-segment.html. As per the steps, I am able to puse the command line told to provide the index.zip (unzipped) location as input to the command, and write the output to a file.

I see data written out to a file in JSON format with all fields. The values for dimensions are present, aggregators like count/longSum are present, but for approximate aggregators like ‘hyperUnique’ I see some string values in the output file. Below are the details:

Ingestion template:

{
“type”: “index”,
“spec”: {
“ioConfig”: {
“type”: “index”,
“firehose”: {
“type”: “static-s3”,
“uris”: [
“s3://test-bucket/druid-json/data.json.gz”
],
“prefixes”: ,
“fetchTimeout”: 180000
}
},
“dataSchema”: {
“dataSource”: “json_data_ingestion_export”,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “DAY”,
“intervals”: [
“2019-09-01T00:00:00.000Z/2019-09-02T00:00:00.000Z”
]
},
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“format”: “yyyy-MM-dd HH:mm:ss”,
“column”: “date”
},
“dimensionsSpec”: {
“dimensions”: [
“platform”,
“manufacturer”,
“browser”
]
}
}
},
“metricsSpec”: [
{
“name”: “platform_count”,
“type”: “longSum”,
“fieldName”: “platform”
},
{
“name”: “time_spent_long_sum”,
“type”: “longSum”,
“fieldName”: “time_spent”
},
{
“name”: “manufacturer_hyperunique”,
“type”: “hyperUnique”,
“fieldName”: “manufacturer”
}
]
}
}
}

``

Command used to export data:

java -classpath “/Users/patilv/Downloads/apache-druid-0.14.0-incubating/lib/*” org.apache.druid.cli.Main tools dump-segment --directory “~/index/” --out ~/Desktop/druid-data-export/data/json/output/output.json --dump rows --time-iso8601

``

I don’t believe the dumpsegment tool supports datasketches today. This tool is really meant to be a general troubleshooting tool and not meant for exact segment output to a file. Make sure you are including the datasketches extension in your command line when you run the dumpsegment program.