DronaBlog

Tuesday, June 27, 2023

How to fix: The Nomad service fails with error "No cluster leader" in EDC

 The error message "No cluster leader" in EDC (Enterprise Data Catalog) for Informatica indicates that the Nomad service, which is responsible for managing the cluster and coordination among nodes, is unable to identify a leader node within the cluster. This error typically occurs when there is a problem with the cluster configuration or the availability of the leader node.






To troubleshoot and resolve this error, you can follow these steps:

  1. Verify network connectivity: Ensure that all the nodes in the cluster can communicate with each other. Check if there are any network connectivity issues or firewall restrictions that might be preventing communication.
  2. Check Nomad service status: Verify the status of the Nomad service on each node of the cluster. Ensure that the Nomad service is running and healthy on all nodes. You can use commands like systemctl status nomad or service nomad status to check the status.
  3. Review Nomad configuration: Examine the Nomad configuration files on each node, typically located in the /etc/nomad/ directory. Pay attention to the cluster configuration settings, such as the addresses of other cluster nodes, leader election parameters, and any authentication or encryption settings. Ensure that the configuration is accurate and consistent across all nodes.
  4. Check for cluster inconsistencies: If the cluster configuration appears to be correct, investigate for any inconsistencies or issues within the cluster. Review the logs of each node to identify any error messages or warnings related to Nomad or cluster coordination. Look for any network partitioning or connectivity problems between nodes.
  5. Restart Nomad service: If there are no apparent configuration or cluster issues, try restarting the Nomad service on all nodes of the cluster. This can help refresh the cluster state and trigger leader election. Use commands like systemctl restart nomad or service nomad restart to restart the service.







Root cause:

The Nomad service may fail with following error when we update IP addresses of cluster nodes

T16:14:25.145Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.Register server=xxxx

T16:14:25.145Z [ERROR] client: error registering: error="rpc error: No cluster leader" 

T16:14:25.489Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=xxx

T16:14:25.489Z [ERROR] client: error updating allocations: error="rpc error: No cluster leader" 


Normally, such an error occurs when we deploy Enterprise Data Catalog in containers such as Docker. The image will run on a new IP address when we re-deploy an image, whereas the Nomad cache contains the IP address from the earlier deployment. 


Solution:

For a new deployment, to delete the cache files that store the IP address, perform the following steps: 

  1. Delete the $clusterCustomDir/nomad/nomadserver/server/ directory.  
  2. Disable the Informatica Cluster Service. 
  3. Enable the Informatica Cluster Service. 

No comments:

Post a Comment

Please do not enter any spam link in the comment box.

Understanding Survivorship in Informatica IDMC - Customer 360 SaaS

  In Informatica IDMC - Customer 360 SaaS, survivorship is a critical concept that determines which data from multiple sources should be ret...