HTTP(s) Ingestion troubleshooting help

Hi Friends,

I am new here, evaluating Druid, amazing project!!

Running small local cluster using ‘docker-compose.yaml’ found here:

https://github.com/apache/incubator-druid/tree/master/distribution/docker

Added ‘kafka’ extensions to the environment and able to ingest, parse, and query with no issues from a Kafka topic. Therefore, I believe the cluster is configured properly. All 7 components are running and no exceptions in the logs. Memory is properly tuned and no 137 errors.

Next part of my evaluation is to perform end-to-end ingestion of HTTP endpoint data (json) and query datasource table.

This is where the problem for me resides. I have set root logging level to DEBUG and I still cannot find any errors or problems. The Query tab does not show a database table and the Druid console logs in ‘Task’ view only displays ‘Request failed with status code 404’, despite showing Task status as ‘SUCCESS’.

Not sure why the 404 error, since I was able to Connect to the static http endpoint through the Connection wizard and define columns, flattening, filters, etc on the data returned.

The endpoint is a very simple test endpoint, publicly available. I have verified the middlemanager component can wget the endpoint and ping the host.

Under ‘Datasource’ tab in Druid console, its says my datasource is 0.0% available.

Any tips or suggestions on troubleshooting further?

All the best,

Ray

Task Payload:

{
“type”: “index_parallel”,
“id”: “index_parallel_spot_baoanfjb_2019-11-24T17:22:03.471Z”,
“groupId”: “index_parallel_spot_baoanfjb_2019-11-24T17:22:03.471Z”,
“resource”: {
“availabilityGroup”: “index_parallel_spot_baoanfjb_2019-11-24T17:22:03.471Z”,
“requiredCapacity”: 1
},
“spec”: {
“dataSchema”: {
“dataSource”: “spot”,
“timestampSpec”: null,
“dimensionsSpec”: null,
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
},
{
“type”: “doubleSum”,
“name”: “sum_amount”,
“fieldName”: “amount”,
“expression”: null
},
{
“type”: “doubleMax”,
“name”: “max_amount”,
“fieldName”: “amount”,
“expression”: null
},
{
“type”: “doubleMin”,
“name”: “min_amount”,
“fieldName”: “amount”,
“expression”: null
}
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “WEEK”,
“queryGranularity”: “HOUR”,
“rollup”: true,
“intervals”: null
},
“transformSpec”: {
“filter”: null,
“transforms”:
},
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “!!!no_such_column!!!”,
“missingValue”: “2010-01-01T00:00:00Z”
},
“flattenSpec”: {
“fields”: [
{
“type”: “path”,
“name”: “base”,
“expr”: “.data.base" }, { "type": "path", "name": "currency", "expr": ".data.currency”
},
{
“type”: “path”,
“name”: “amount”,
“expr”: “$.data.amount”
}
]
},
“dimensionsSpec”: {
“dimensions”: [
“base”,
“currency”
]
}
}
}
},
“ioConfig”: {
“type”: “index_parallel”,
“firehose”: {
“type”: “http”,
“uris”: [
https://api.coinbase.com/v2/prices/BTC-USD/spot
],
“maxCacheCapacityBytes”: 1073741824,
“maxFetchCapacityBytes”: 1073741824,
“prefetchTriggerBytes”: 536870912,
“fetchTimeout”: 60000,
“maxFetchRetry”: 3,
“httpAuthenticationUsername”: null,
“httpAuthenticationPassword”: null
},
“inputSource”: null,
“inputFormat”: null,
“appendToExisting”: false
},
“tuningConfig”: {
“type”: “index_parallel”,
“maxRowsPerSegment”: null,
“maxRowsInMemory”: 1000000,
“maxBytesInMemory”: 0,
“maxTotalRows”: null,
“numShards”: null,
“splitHintSpec”: null,
“partitionsSpec”: null,
“indexSpec”: {
“bitmap”: {
“type”: “concise”
},
“dimensionCompression”: “lz4”,
“metricCompression”: “lz4”,
“longEncoding”: “longs”
},
“indexSpecForIntermediatePersists”: {
“bitmap”: {
“type”: “concise”
},
“dimensionCompression”: “lz4”,
“metricCompression”: “lz4”,
“longEncoding”: “longs”
},
“maxPendingPersists”: 0,
“forceGuaranteedRollup”: false,
“reportParseExceptions”: false,
“pushTimeout”: 0,
“segmentWriteOutMediumFactory”: null,
“maxNumConcurrentSubTasks”: 1,
“maxRetry”: 3,
“taskStatusCheckPeriodMs”: 1000,
“chatHandlerTimeout”: “PT10S”,
“chatHandlerNumRetries”: 5,
“maxNumSegmentsToMerge”: 100,
“totalNumMergeTasks”: 10,
“logParseExceptions”: false,
“maxParseExceptions”: 2147483647,
“maxSavedParseExceptions”: 0,
“partitionDimensions”: ,
“buildV9Directly”: true
}
},
“context”: {
“forceTimeChunkLock”: true
},
“dataSource”: “spot”
}

``

Hi Ray,

From your master node and data nodes can you try running curl https://api.coinbase.com/v2/prices/BTC-USD/spot and see if you can access the file?

Eric

Eric,

I’m not running a Hadoop cluster with master nodes and data nodes. The deep storage root is local, at /tmp for development and evaluation purposes. Perhaps this is problem?

All the best,

Ray

Hi Ray,

So currently your Druid cluster is a single node? Please try the curl command from your Druid node.

Eric

Eric Graham

Solutions EngineerCell: +1-303-589-4581
egraham@imply.io