
Keepalived Blocking SSH? Troubleshooting Your High Availability Setup
Experiencing SSH connection issues after setting up Keepalived for high availability? You're not alone. This guide dives into common causes and solutions when Keepalived interferes with SSH access, ensuring your servers remain accessible even during failover.
Understanding the Problem: Why Keepalived Might Block SSH
Keepalived dynamically manages IP addresses. When a server takes over as MASTER, it assigns a virtual IP. However, if not configured correctly, this can disrupt existing network connections, including SSH. SSH failures are typically related to incorrect IP address assignments, firewall rules or routing issues after Keepalived transitions occur.
Step-by-Step Troubleshooting: Getting SSH Working with Keepalived
1. Firewall Rules:
- The Culprit: Firewalls are often the primary suspect. After Keepalived switches IPs, firewall rules might not be updated to allow SSH traffic to the new virtual IP.
- The Fix: Ensure your firewall (e.g.,
iptables
,firewalld
) allows SSH traffic (port 22 by default) to the virtual IP address (10.0.0.222 in your case). Consider adding rules that specifically permit traffic from your monitoring or administrative networks.
Example firewalld
command:
2. Virtual IP Address Configuration:
- The Culprit: The virtual IP address (
10.0.0.222/24
) configuration within Keepalived might be incorrect. Double-check the interface (ens3) is the correct one. A misconfigured IP causes connectivity problems. - The Fix: Use the
ip addr
command to verify the virtual IP is correctly assigned to theens3
interface after Keepalived takes over.
3. Routing Issues:
- The Culprit: Post-failover, traffic might not be routed correctly to the active server. This can stem from faulty routing tables or incorrect gateway configurations.
- The Fix: Examine your routing tables (
route -n
orip route
) on both servers. Ensure traffic destined for networks behind the virtual IP is correctly routed. Yourvirtual_routes
section looks okay, but confirm that10.0.0.138
is the correct gateway for the10.0.0.0/24
network on both servers. Consider using more robust routing protocols if your network is complex.
4. Keepalived State Transitions & Preempt_delay:
- The Culprit: The
preempt_delay
parameter dictates how long a server waits before taking over as MASTER after regaining priority. A too-short delay can cause flapping (frequent state changes), interrupting SSH sessions. - The Fix: Increase
preempt_delay
to a value that allows sufficient time for the virtual IP address to be fully assigned and for network services to stabilize before the server takes over. Experiment with values between 10 and 30 seconds.
5. Docker Integration & Check Scripts:
- The Culprit: Your
check_docker
script may be intermittently failing or taking too long to execute, forcing Keepalived to switch states unnecessarily. - The Fix: Thoroughly test your
check_docker
script independently. Ensure it's resilient to temporary Docker daemon hiccups. Log the script's output for debugging. Consider adding a timeout mechanism to prevent the script from hanging indefinitely.
6. Authentication Issues
- The Culprit: When using
auth_type PASS
, be aware that this method is not secure and can expose credentials to network sniffers. - The Fix: It is advised to migrate toward more secure authentication options such as
AH
in production environments.
Configuration review
Review the following suggestions to your configuration:
- Unicast Peers: Confirm connectivity exist between unicast peers, and ensure the addresses are correct.
- Priority Configuration:
VM-00
has a priority of 150 andVM-01
one of 100,VM-00
should be the MASTER under normal conditions. - Script Execution: Check the script execution to ensure it has the appropriate permissions.
Avoid Common Pitfalls
- Network segmentation issues.
- Make sure Keepalived daemon has the required permissions.
- Check other daemon interfering with Keepalived instances.
By systematically investigating these areas, you can diagnose and resolve the SSH connectivity problems caused by Keepalived, ensuring your high availability setup functions smoothly and your servers remain accessible. Remember to test your configurations thoroughly in a staging environment before deploying to production.