Dual-Hub SD-WAN¶
Dual-hub SD-WAN provides resilience against both underlay (Internet) and overlay (VPN) failures. A branch router maintains dual WAN connections (primary and backup) to two geographically separated hub sites. If the primary WAN link fails, VPN tunnels automatically shift to the backup link. If the primary hub becomes unreachable, traffic automatically routes through the backup hub, ensuring uninterrupted connectivity to the data center.
Overview¶
How It Works¶
Dual-hub SD-WAN architecture combines two resilience mechanisms:
- Dual WAN (underlay) — The branch router has two Internet connections to choose from
- Dual VPN (overlay) — The branch router maintains two separate VPN tunnels, one to each hub site
During normal operation, traffic prefers the primary path: primary WAN link + primary VPN tunnel to DC-1. When either the primary WAN or primary hub fails, the router automatically shifts to alternate paths without dropping application traffic.
Note
The DC-1/DC-2 Internet can be any ISP (typically direct from DC hosting provider), as long as they are reachable by router connected to both ISP1 and ISP2.
Use Cases¶
| Sector | Scenario | Benefit |
|---|---|---|
| Enterprise | Multi-location offices with redundant HQ connectivity | Continuous access to centralized services even during ISP outages |
| Financial Services | Branch banking centers requiring zero downtime | Automatic failover with no manual intervention or traffic loss |
| Retail | Multi-store chain with DC backup | Point-of-sale systems remain connected during primary link failure |
| Healthcare | Remote clinic with critical patient data access | Instant backup routing to secondary medical records data center |
| Manufacturing | Factory floor with dual-hub ERP access | Production lines continue without interruption during hub/ISP failure |
Requirements¶
Infrastructure¶
- Primary Hub (DC-1): RansNet gateway running SD-WAN with direct Internet access and static public IP
- Backup Hub (DC-2): RansNet gateway running SD-WAN with direct Internet access and unique static public IP
- Branch Router: RansNet Branch router with dual WAN interfaces (Broadband, LTE, 5G, PPPoE, or mixed)
- NNI Between ISP: Both ISPs must have network-to-network interconnections to both hubs, so the branch router can reach either hub public IP via any available path. For example, traffic can route through Primary WAN → ISP1 → ISP2 → DC-2, or Backup WAN → ISP2 → ISP1 → DC-1.
Network Configuration¶
- Common LAN networks — Both hubs must advertise the same internal networks (e.g., 10.0.0.0/16 for DC resources)
- Unique hub IPs — Each hub gateway has its own public IP address for IPsec termination
- Branch site identity — Branch network (e.g., 192.168.1.0/24) must be unique and advertised to both hubs
Network Behavior¶
Steady-State (Normal Operation)¶
When both WAN links and both hub gateways are healthy:
- Branch router maintains two live default routes (one per ISP)
- Router automatically elects primary WAN link (typically the first to come up or lowest metric, eg. primary fiber, backup 5G)
- Both VPN tunnels are established via primary WAN link but traffic prefers the primary tunnel:
- Primary VPN tunnel (to DC-1) is active
- Backup VPN tunnel (to DC-2) remains standby
- All application traffic flows to DC-1
Tip
The branch router always uses the lowest metric default route to build VPN tunnels. When the primary WAN link is available, both VPN tunnels use the primary path. When the primary WAN link fails, both tunnels automatically shift to the backup WAN link (the only available default route).
Failover Scenario 1: Primary WAN Link Fails¶
When the branch's primary Internet connection drops:
- Primary WAN link becomes unavailable
- Both VPN tunnels automatically shift to the backup WAN link (ISP-2)
- Application traffic continues through primary VPN tunnel (to DC-1), now overlaid on backup WAN link
- Failover time is typically 2–5 seconds (depending on link detection timer)
- When primary WAN recovers, both tunnels shift back to primary link
Failover Scenario 2: Primary Hub (DC-1) Unreachable¶
When DC-1 gateway becomes unreachable (link down, gateway outage, or upstream routing failure):
- Primary VPN tunnel to DC-1 fails (no response to keepalive probes)
- BGP route via primary VPN tunnel is withdrawn. Route via backup tunnel is active.
- Application traffic automatically routes through backup VPN tunnel to DC-2
- Users connect to the same corporate network resources now via DC-2
- When DC-1 recovers, traffic shifts back to primary tunnel
Deployment¶
Architecture Principles¶
Before configuring, understand these core concepts:
| Concept | Description |
|---|---|
| Spoke-to-Spoke (L3) | Branch router learns routes from both hubs and makes forwarding decisions based on BGP weight and cost. Requires BGP enabled on all devices. |
| BGP Weight | Routes with higher weight are preferred. Primary tunnel uses weight 100; backup uses weight 0 (or lower). |
| Overlay Independence | The VPN overlay (which hub you reach) is independent from the underlay (which ISP you use). |
| Transparent Failover | Applications don't detect failover because the VPN tunnels remain up; only the active path changes. |
Configuration Steps¶
Step 1: Enable WAN Failover
Configure automatic failover between the branch's two WAN links so that VPN tunnels can seamlessly shift paths. See WAN Failover Configuration for detailed setup.
Key requirement: Use Option 2 (PBR with Tracking) or Option 3 (Multi-WAN) to detect upstream failures, not just physical link down.
Step 2: Configure Hub Gateways
(Config details to be added later)
Each hub must: - Configure two SD-WAN VPN instances (one per gateway public IP) - Advertise common DC networks (10.0.0.0/16) in both instances - Set different BGP weights: primary = 100, backup = 0
Step 3: Configure Branch Router
(Config details to be added later)
The branch router must: - Join both VPN instances (one for each hub) - Advertise its local network (192.168.1.0/24) to both instances - Enable dynamic route selection (BGP)
Verification¶
Use these commands to verify dual-hub configuration and diagnose issues:
| What to Check | Command | Expected Output |
|---|---|---|
| Active WAN links | show interface all |
Both eth0 and wwan0 show UP with IP addresses assigned |
| Primary WAN selection | show ip route \| include 0.0.0.0 |
One default route marked as primary (lower metric or higher preference) |
| VPN tunnel status | show vpn tunnel-state |
Both tunnels show established with active keepalive |
| BGP routes learned | show ip bgp summary |
Both hub routes present; primary route has higher weight |
| Traffic flow path | show ip route 10.0.0.0 |
Shows next hop via primary tunnel endpoint |
| WAN failover timer | show tracking |
Probes running at configured interval; displays status (UP/DOWN) |
Example: Verify both VPN tunnels are active:
RansNet# show vpn tunnel-state
Tunnel 1 (to DC-1): ESTABLISHED (via ISP-1)
Tunnel 2 (to DC-2): ESTABLISHED (via ISP-1)
Troubleshooting¶
Common Issues¶
| Symptom | Likely Cause | Solution |
|---|---|---|
| Only one VPN tunnel shows UP | Second hub gateway not reachable | Verify both hub gateways have Internet connectivity and firewall permits IPsec (UDP 500, 4500) |
| Traffic goes to backup hub during normal operation | BGP weight not set correctly on primary tunnel | Verify primary tunnel has weight 100; backup has weight 0 (or lower) |
| Failover doesn't trigger when primary WAN fails | WAN failover not configured | Enable WAN failover using Option 2 or 3 per WAN Failover guide |
| Branch can't reach DC networks | Branch router not advertising routes to both hubs | Verify branch network is advertised in both VPN instances |
| Failover takes too long (>30 seconds) | Tracking probe interval too long or retries too high | Reduce probe interval (e.g., timer 5 5 for 25-second failover) |
Debugging Commands¶
RansNet# show ip route
RansNet# show ip bgp
RansNet# show ipsec status
RansNet# show interface eth0
RansNet# show interface wwan0
Best Practices¶
Resilience¶
- Diverse ISPs — Use different ISP providers for primary and backup WAN links to avoid shared dependencies (e.g., fiber + LTE, or different fiber providers)
- Geographic separation — Place backup hub (DC-2) in a different data center or geographic region from primary hub
- Test failover regularly — Simulate failures and verify traffic switches as expected; document failover times
Performance¶
- Appropriate probing — Use reliable public DNS servers (8.8.8.8, 1.1.1.1) for health probes; avoid probing directly to ISP gateways which may be unreachable from some locations
- WAN link metrics — Explicitly set primary WAN with lower metric than backup to avoid unintended primary/backup flipping
- Monitor tunnel latency — Backup hubs may have higher latency; ensure applications can tolerate variance during failover
Security¶
- Encryption on all tunnels — Both primary and backup VPN tunnels must use identical encryption policies; mismatched policies prevent backup failover
- Access control — Apply firewalls on both hubs with identical security policies; asymmetric policies cause traffic asymmetry and dropped return packets
- BGP authentication — Use BGP MD5 or other authentication to prevent unauthorized route injection
- Hub route filtering — Apply route-maps on hubs to only advertise intended DC networks; prevents accidental leakage of hub-internal routes



