WAN resiliency is one of the most important considerations for any critical enterprise network.
There are several configuration options available to achieve the final objective, depending on specific business requirements.
1. Use kernel default routes metric for failover
We will use the below topology for our discussion here, using one HSA-500L2 with two SIMs from two different providers.
This method utilizes the kernel routing table to perform failover. We simply need to have a lower route metric for SIM1 (wwan0), and higher for SIM2 (wwan1), so HSA/UA will have two default routes.
Since the default route for wwan0 has a lower metric (preferred), traffic will be routed out from wwan0 by default. In case of SIM1 failure (eg. connection loss or SIM card failure), the SIM1 default route will be withdrawn and the SIM2 default route will kick in and traffic will failover to SIM2 immediately.
Configure to set the lower route-metric for preferred link
User can navigate to ‘SD-Branch>Network>Wireless-WAN’ to configure the below
CLI configuration for fail-over:
! interface wwan0 enable apn isp1apn route-metric 20 ! interface wwan1 enable apn isp2apn route-metric 21 !
The output of default route [User can use the command ‘#
show ip route‘] as shown below, wwan0 has a lower route metric and will be the preferred next-hop, therefore traffic will primarily route out from wwan0 (SIM1).
K>* 0.0.0.0/0 [0/20] via 10.64.189.185, wwan0, src 10.64.189.184, 02:21:25 K>* 0.0.0.0/0 [0/21] via 10.65.10.164, wwan1, src 10.65.10.165, 02:21:25
Once the kernel detects the link is down, it will quickly withdraw all routes associated with the link/interface. Then the second kernel route (using wwan1 as nexthop) will kick in and traffic will route out from wwan1. The advantage is that the failover is very fast. However, it’s purely active/standby only, eg. traffic can only use one of the links at a time.
by default, the HSA/UA system will auto-assign route-metric to each interface at bootup, in order of bootup sequence. In other words, the CLI interface loaded first (at the top) will have a lower metric, eg. wwan0 by default will have a lower metric than wwan1. We simply need to slot in the primary SIM into slot M2 (wwan0), and the secondary SIM into slot M1 (wwan1).
2. Use default routes with upstream host tracking
There is one major disadvantage of using the above #1 kernel routing for failover. When we combine WAN/eth0 with SIM and if there’s an upstream router for eth0 connection, it’s unable to detect upstream failure.
For example, if the link between the upstream router fails but the connection between the upstream router and HSA/UA is still available, the default kernel route (which uses upstream router IP as nexthop) still remains, therefore no failover will occur.
To address such failure/failover scenarios, we can do upstream tracking to determine if the end-to-end connection is indeed up. The below topology illustrates the scenario.
- Disable default kernel route for each link
- Configure default route using CLI
- Set higher administrative distance for the backup link (less preferred)
- Set route tracking for the primary default route
! interface eth0 enable ip address dhcp nodefault ! interface wwan0enable
apn isp2apn ip address mobile nodefault ! ip route 0.0.0.0/0 nexthop 172.16.1.1 track-host 220.127.116.11 15 ip route 0.0.0.0/0 nexthop wwan0 distance 200
In the above configuration:
- We used the “nodefault” option so that HSA/UA doesn’t install kernel default route for each link and use the CLI configured default routes instead.
- 172.16.1.1 is the upstream router LAN IP address.
- “18.104.22.168 15” tracks 22.214.171.124 across upstream links (therefore can detect upstream link failure), at 15s intervals.
HSA/UA will ping 126.96.36.199 using the eth0 link every 15 seconds, but it will announce ping failure at 2nd attempt and withdraw this default route, so the maximum failover time is double the configured interval here (the 30s in this case).
3. Use Multi-WAN Link Balancing
Multi-WAN link balancing is a more advanced traffic steering approach to provide link aggregation and failover between links. Refer to the link MWAN.
MWAN also uses the ping approach to detect upstream end-to-end link availability. It combines both routing metric and ping tracking to make failover decisions.
! interface eth0 description "ISP1 connection via fixed line" enable ip address dhcp mwan-group 10 track 188.8.131.52 timer 5 5 metric 1 weight 1 ! interface wwan0 description "ISP2 connection via LTE" enable mwan-group 10 track 184.108.40.206 timer 10 10 metric 2 weight 1 !
In the above configuration:
- eth0 (ISP1) is the primary link (with a lower metric) and wwan0 (ISP2) is the backup link with a higher metric.
The weight setting is for link balancing when both links have the same metric (active/active), so it’s not in use for an active/standby scenario.
- The “timer x y” determines how to determine upstream link failure or availability. X (in seconds) configures the interval (default 5s) between each ping test; Y configures number of consecutive test attempts before the link is declared UP/DOWN.
Using the above configuration example,
- If eth0 is disconnected (or upstream router powered off), the eth0 link is down, and HSA/UA will immediately withdraw the default kernel route for eth0 and failover to wwan0, so the failover is fast for such a situation, typically in 2-3 seconds.
- If eth0 upstream link is down, eg. eth0 is connected but the upstream link is down, therefore tracking to 220.127.116.11 will fail. After max of 25s (5s x 5 attempts), the tracking will declare eth0 is unusable and failover to wwan0, so the failover is much slower.
- Fallback to eth0 will be determined by tracking confirmation (5s x 5 attempts), therefore typically fallback is much slower.
* To speed up the failover/fallback time, you can set shorter intervals and lesser attempts, however, for slow and unreliable links (especially mobile/LTE links), it’s not recommended to set too short as this may cause flapping (false failover/fallback).
* The biggest benefit for MWAN is its link balancing capability (same metric for both links, optionally with weights) so that we can balance traffic to both active links and aggregate total upstream link capacity. If you just want a simple active/standby setup, option #1 or 2 is more recommended.