to Druid User
Recently i created a druid cluster with below server configuration.
Master Node - m5a.xlarge
Data Node (Middle Manager) - a1.metal
Data Node (Historical) - m5a.2xlarge (Mounted EBS volume 200.00 GB)
Query Node - m5a.xlarge
Min 100GB storage for all nodes EC2.
I used the official documentation for the setup instruction:
Druid version : apache-druid-0.17.0
We have used S3 as deep storage and RDS as mysql database, respective configuration and access is taken care and working perfectly fine.
Below are the extension used
druid.extensions.loadList=[“druid-parquet-extensions”, “druid-avro-extensions”, “druid-basic-security”, “druid-google-extensions”, “druid-protobuf-extensions”,“druid-lookups-cached-global”, “mysql-metadata-storage”, “druid-s3-extensions”,“druid-kafka-indexing-service”, “druid-datasketches”]
The problem is when we try to ingest some data, the job is getting failed continuously. There are no specific errors found on middele-manager or the master server.
The segments data is not getting saved on s3, however some small files are getting created so not an access issue.
I tried few troubleshooting:
All the ports were open to public still jobs were failing, so we identified this may not be a network issue so we brought the cluster under private VPN.
To find issue we did took below steps & after these step deleted all data of s3 and database from rds and restarted all services for fresh start.
Log level changes to debug for understanding and getting more details.
all file permission changed to 755
all files user owner changes to root
RDS and s3 access works fine.
Telnet and all ports are able to connect between the servers.
Can anyone please help us with this?
pfa some snap for ref. which shows segments available as false and only 1% data is available to load and some failed task. Please let me know if you need more information.