Skip to content

SD-WAN Failover (WAN Resiliency)

WAN resiliency ensures that network connectivity is maintained when a primary WAN link fails.

graph TD
    ISP1["☁ ISP1-Fixed"]
    ISP2["📶 ISP2-4/5G"]
    R["RansNet Router"]

    ISP1 -->|Primary| R
    ISP2 -. Backup .-> R

RansNet devices support three distinct failover approaches, each with different levels of detection capability and configuration complexity.

Option Method Detects Interface Down Detects Upstream Failure
1 — Route Metric Kernel default route failover Yes No
2 — PBR with Tracking Policy-based routing + ICMP probe Yes Yes
3 — Multi-WAN (MWAN) MWAN engine + ICMP probe Yes Yes

Option 1 — Kernel Default Route Failover

The simplest failover method. Each WAN interface is assigned a route metric — the interface with the lowest metric becomes the primary default gateway. When the primary interface goes down, the kernel withdraws its route and traffic shifts automatically to the next lowest metric interface.

Limitation: This method only detects physical link failure (interface UP/DOWN). It does not verify upstream reachability — if the WAN port stays physically up but the ISP connection drops, failover will not trigger.

Failover time: Typically 2–3 seconds for physical link failure detection.

GUI Configuration

Below is an example of setting route-metric using GUI.

Navigate to Device Settings → Network → Interfaces, select the WAN interface and click on "Route Metric" option to set the desired value.

Failover

To set route metric, for WWAN interface, use below option

Failover

CLI Configuration

interface eth0
  description "ISP1 - Fixed line"
  enable
  route-metric 21

interface wwan0
  description "ISP2 - LTE backup"
  enable
  route-metric 20

In this example, we intentionally set wwan0 with lower metric (20) and is the preferred (primary) path. eth0 becomes active only if wwan0 goes down.

Note

Route metrics are assigned automatically based on interface load order at boot. Explicitly setting route-metric ensures predictable primary/backup behaviour regardless of boot sequence. By default, eth0 is booted up earlier and will be the primary path over wwan0.


Option 2 — PBR with Upstream Tracking

Policy-Based Routing (PBR) combined with ICMP tracking provides upstream-aware failover. Specific traffic is matched by PBR rule and sent via a WAN gateway, while a continuous ICMP probe monitors end-to-end reachability. When the probe fails, the PBR rule is withdrawn and traffic falls through to alternative paths.

Advantage over Option 1: Detects upstream failures (e.g., ISP routing issues) even when the WAN interface remains physically UP.

Failover time: Depends on the tracking probe interval and retry count.

Use cases: - All-traffic failover: Route all LAN traffic via primary WAN with failover to secondary - Selective failover: Route only specific traffic (e.g., ransnet.com) to secondary WAN when primary fails; all other traffic drops if primary is unavailable

GUI Configuration

Navigate to Device Settings → Network → Interfaces and configure the primary WAN interface with DHCP and Ignore Default Route enabled:

Failover

Navigate to Device Settings → SD-WAN → Traffic Steering to create PBR rules:

Failover

Refer to Tracking Configuration for detailed SLA thresholds.

CLI Configuration

All-Traffic Failover

Route all LAN traffic via primary WAN; if primary fails, all traffic falls through to secondary:

interface eth0
  description "Connection to WAN"
  enable
  ip address dhcp nodefault       ! No kernel default route; PBR controls path

interface wwan0
  description "LTE backup"
  enable
  ip address dhcp                 ! Provides fallback default route

interface vlan 1 1
  description "LAN"
  enable
  ip address 192.168.8.1/22
  dhcp-server
    router 192.168.8.1
    dns 8.8.8.8 8.8.4.4
    range 192.168.8.10 192.168.11.254
    enable

! Primary PBR rule: match all LAN traffic via eth0's gateway with upstream tracking
ip pbr policy 100 src 192.168.8.0/22 remark "All-LAN-traffic"
ip pbr 100 nexthop 192.168.98.1 track icmp 1.1.1.1 15

firewall-access 100 permit outbound eth0
firewall-access 101 permit outbound wwan+
firewall-snat 100 overload outbound eth0
firewall-snat 101 overload outbound wwan+

How it works:

  • eth0 has nodefault → no kernel default route on eth0
  • PBR rule routes all traffic via explicit gateway IP (192.168.98.1) with tracking
  • When eth0 fails: tracking probe fails → PBR rule withdrawn → traffic has no path via eth0 → falls through to wwan0's kernel default route

Warning

If eth0 had a kernel default route, withdrawn PBR would leave traffic matching eth0's dead default route (blackhole). The nodefault flag is critical.

Key points:

  • Primary WAN: ip address dhcp nodefault → NO kernel default route (PBR is the only path)
  • Secondary WAN: ip address dhcp → provides fallback default route
  • PBR nexthop uses explicit gateway IP (nexthop 192.168.98.1), not interface name (because eth0 has no default route to resolve it from)

Selective Traffic Failover

Route only specific traffic (e.g., ransnet.com) to secondary WAN when primary fails. All other traffic uses primary; if primary fails, other traffic is dropped:

interface eth0
  description "Primary WAN - Fiber"
  enable
  ip address dhcp                ! Installs default route (all unmatched traffic)

interface wwan0
  description "Secondary WAN - 5G (for ransnet.com only)"
  enable
  ip address dhcp nodefault      ! KEY: no default route via wwan0

interface vlan 1 1
  description "LAN"
  enable
  ip address 192.168.8.1/22
  dhcp-server
    router 192.168.8.1
    dns 8.8.8.8 8.8.4.4
    range 192.168.8.10 192.168.11.254
    enable

! Define firewall object for ransnet.com traffic
object-group ransnet_destinations
  fqdn ransnet.com
  fqdn www.ransnet.com

! Mark ransnet.com traffic with fwmark 100
firewall-set 100 mark 100 inbound vlan1 ip dst_object ransnet_destinations

! PBR rules: route ransnet traffic via eth0 (primary), fallback to wwan0
ip pbr policy 100 fwmark 100 remark "ransnet.com"
ip pbr policy 101 fwmark 100 remark "ransnet.com"
ip pbr 100 nexthop eth0 track icmp 1.1.1.1 15 remark "ransnet via primary"
ip pbr 101 nexthop wwan0 remark "ransnet via secondary (fallback)"

firewall-access 100 permit outbound eth0
firewall-access 101 permit outbound wwan+
!
firewall-snat 100 overload outbound eth0
firewall-snat 101 overload outbound wwan+

How it works:

  • Normal (eth0 up): All unmatched traffic uses eth0 default route; marked traffic (ransnet.com) also matches PBR rule 100 (same result)
  • eth0 down: PBR rule 100 withdrawn → marked traffic falls through to PBR rule 101 → routes via wwan0; all other traffic has no path (dropped)
  • eth0 recovers: PBR rule 100 re-installed → marked traffic returns to eth0

Key points:

  • Primary WAN: ip address dhcp → installs default route (used by all unmatched traffic)
  • Secondary WAN: ip address dhcp nodefault → NO default route (only reached by explicit PBR rules)
  • Firewall object marks only the traffic you want to failover (e.g., ransnet.com by FQDN)
  • PBR rule 100 routes marked traffic via eth0 with tracking (uses nexthop eth0 because eth0 has default route)
  • PBR rule 101 provides fallback to wwan0 for marked traffic only (uses nexthop wwan0 because it's point-to-point)
  • Unmatched traffic drops if primary fails (no unwanted failover to secondary)

Tracking and Nexthop Rules

Tracking parameters:

  • track icmp 1.1.1.1 15 — probe 1.1.1.1 every 15 seconds via the nexthop interface. If probe fails, the PBR rule is withdrawn.
  • For slower links (5G), use track icmp 8.8.8.8 30 (longer interval) to reduce false failovers.

Nexthop selection:

  • Static routes on Ethernet: Always use IP address (nexthop 61.13.198.165). Never use interface name.
  • PBR on Ethernet with nodefault: Use explicit IP address (e.g., nexthop 192.168.98.1) because there's no default route to resolve the interface name from.
  • PBR on Ethernet with default route: Can use interface name (nexthop eth0) because the system learns the gateway IP from the default route.
  • PPPoE / WWAN (static or PBR): Can use interface name or IP address (both work for point-to-point links).
  • See Nexthop: IP Address vs Interface for detailed rules.

Option 3 — Multi-WAN (MWAN)

Multi-WAN is the most capable and flexible option. It supports both active/standby (failover) and active/active (load balancing) configurations. Each WAN interface independently tracks upstream reachability via ICMP probes. Routing decisions are made based on per-interface metric and weight values, and traffic is distributed across healthy interfaces according to those parameters.

Advantages over Options 1 and 2: - Upstream-aware failover per interface - Active/active load balancing with configurable traffic weighting - Supports multiple WAN links simultaneously

Failover time: Configurable via tracking timer and retry count (e.g., timer 5 5 = probe every 5 seconds, fail after 5 consecutive missed probes = ~25 seconds).

CLI Configuration

Active/Standby (Failover)

interface eth0
  description "ISP1 connection via fixed line"
  enable
  ip address dhcp
  mwan-group 99
  track 8.8.8.8 timer 5 5
  metric 1
  weight 1

interface wwan0
  description "ISP2 connection via LTE"
  enable
  mwan-group 99
  track 8.8.4.4 timer 10 10
  metric 2
  weight 1

mwan-rule 99 ip dst 0.0.0.0/0 group 99

Both interfaces belong to mwan-group 99. eth0 has metric 1 (primary) and wwan0 has metric 2 (standby). Traffic flows through eth0 as long as its probe to 8.8.8.8 succeeds. On probe failure, MWAN routes traffic through wwan0.

Active/Active (Load Balancing)

To balance traffic across both links simultaneously, set equal metrics and adjust weights to control the traffic ratio:

interface eth0
  description "ISP1 - 100 Mbps fibre"
  enable
  ip address dhcp
  mwan-group 99
  track 8.8.8.8 timer 5 5
  metric 1
  weight 2

interface wwan0
  description "ISP2 - LTE backup"
  enable
  mwan-group 99
  track 8.8.4.4 timer 5 5
  metric 1
  weight 1

mwan-rule 99 ip dst 0.0.0.0/0 group 99

With equal metrics, both interfaces are active. The weight ratio (2:1) distributes approximately two-thirds of traffic through eth0 and one-third through wwan0.

Tracking Parameters

The track command syntax is:

track <probe-ip> timer <interval> <retries>
Parameter Description
probe-ip IP address to probe (use a reliable public IP, e.g., 8.8.8.8 or 1.1.1.1)
interval Probe interval in seconds
retries Number of consecutive failed probes before the interface is marked down

Failover time = interval × retries. For example, timer 5 5 triggers failover after ~25 seconds.

Tip

Set longer time for wwan (SIM) interface to avoid false failover. eg. timer 10 10, because wwan latency is usually higher and less reliable.


Verification and Troubleshooting

Use these commands to verify failover configuration and diagnose issues:

What to Check Command Expected Output
Active routes show ip route Primary WAN has default route (0.0.0.0/0); secondary (if using PBR) may not
PBR policies and rules show ip pbr All configured policies and their current status (active * or withdrawn)
Firewall marking show firewall set-list Firewall rules with marks (e.g., mark 100 for ransnet.com)
Tracking probe status show logging system include track Enable log in the tracking config to check tracking status
Interface state show interface all WAN interfaces show UP and IP addresses are assigned
DHCP-learned gateway show ip dhcp-lease Shows DHCP lease, assigned IP, and gateway (especially for nodefault validation)
FQDN object resolution show object-list <name> Shows resolved IP addresses for FQDN objects (e.g., ransnet.com IPs)

Example: Verify selective failover configuration (ransnet.com):

! Step 1: Check routes
router# show ip route
...
K>* 0.0.0.0/0 [0/0] via 203.0.113.1, eth0  ← primary default route
...

! Step 2: Check PBR rules
router# show ip pbr
ID  src        dst        fwmark  priority action         tracked nexthop
100 -          -          100     -        MATCH ransnet  yes     eth0       (UP)
101 -          -          100     -        MATCH ransnet  no      wwan0      (fallback)

! Step 3: Verify ransnet.com traffic is marked
router# show firewall set-list
ID   Rule                                   Mark  Target
100  inbound vlan1 ip dst_object ransnet   100   MARK

! Step 4: When eth0 fails, verify PBR rule 100 is withdrawn
router# show ip pbr  ← Rule 100 disappears; rule 101 now active for ransnet traffic
ID  src        dst        fwmark  priority action         tracked nexthop
101 -          -          100     -        MATCH ransnet  no      wwan0      (active - fallback)

Common issues:

Issue Likely Cause Diagnosis
PBR rules don't appear in show ip pbr Firewall marking not matching traffic Verify show firewall set-list shows the rule; ensure FQDN object is resolved via show object-list
Tracking shows UP but PBR still fails over Tracking probe target unreachable via the nexthop interface Verify show track probe uses correct interface; change probe target (e.g., from 1.1.1.1 to 8.8.8.8)
Secondary WAN traffic missing after primary fails (selective failover) Secondary WAN has nodefault but PBR fallback rule missing Check show ip pbr — rule 101 must reference secondary WAN interface; ensure lower ID = higher priority
All traffic drops on primary failure (not failover to secondary) Secondary WAN not configured or missing default route Check show ip route and show interface all for secondary WAN; remove nodefault if secondary should catch unmapped traffic

Choosing the Right Option

Scenario Recommended Option
Single WAN with WWAN backup, simple setup Option 1 — Route Metric
Dual WAN, need upstream failure detection, no load balancing required Option 2 — PBR with Tracking
Dual or multi-WAN, need upstream detection + load balancing Option 3 — MWAN