SD-WAN over SSLVPN bonding (OSPF)

VPN “bonding” is part of our SD-WAN deployment technique for connecting multiple remote sites to HQ/DC securely, over redundant WAN connections at remote sites. It uses our CMG as VPN concentrator at HQ/DC, and HSA/UA at each remote location.

There’re three main options for achieving VPN “bonding”, depending on the exact requirements.

  1. Use Multi-WAN (MWAN) to achieve WAN link redundancy, then build a VPN tunnel across MWAN, with link redundancy. (see more details on MWAN and SSLVPN).
    • This approach achieves link redundancy, but because it still builds a single VPN tunnel only, so the failover can be quite long since VPN tunnel is persistent and can’t be “balanced” across multiple links at the same time.
    • So at any point in time, the VPN tunnel can only use one of the active link, and after MWAN detects link failover, the VPN tunnel needs to re-establish VPN tunnel across the failover/new link. The total failover delay = MWAN link detection delay + VPN re-establishment delay.
    • This option is more suitable for deployments with a small number of remote sites, where each site network route can be learned from SSLVPN configuration (eg. from client net command).

Notice Title

MWAN doesn’t work well with dynamic routing, because each time when routes are learned dynamically (via OSPF or BGP), MWAN is unaware of the newly learned routes and will not deny traffic passing to new routes. For large deployments, we recommend using option #2 below.

  1. Use OSPF load balancing/failover across dual/multiple VPN tunnels. This feature combines our Multi-VPN tunneling and dynamic routing capabilities.
    • We build dual/multiple tunnels, one VPN tunnel per WAN link, then run OSPF dynamic routing protocols across the tunnels.
    • use OSPF to dynamically learn routes for each remote site, load balance traffic across multiple paths (VPN tunnels), and auto-failover between paths/tunnels.
    • To switch between active/active or active/standby mode, we simply tweak OSPF link costs (eg. set “tap OSPF cost xx” higher for backup tap).
    • This design is the most scalable and recommended for large deployment.
    • Another advantage of this approach is that the VPN gateways can be separated into two physical CMG, for gateway redundancy
    • We use PBR to map each tunnel/tap traffic to the respective physical/LTE interfaces

  1. Use layer-2 LACP protocol to “bond” multiple VPN tunnels. This feature utilizes our Multi-VPN tunneling and Layer-2 bonding capabilities.
    • The HSA builds dual/multiple tunnels to CMG, one VPN tunnel per WAN link, then uses LACP to “bond” these tunnels as one logical link.
    • Traffic will be load-balanced across the tunnels as if we have a logical link with aggregated bandwidth.
    • But do note that if any of the link/tunnels have lower speed (or inconsistent performance), it will impact overall bonding link performance.
    • And unlike the #2 approach, where the hub end SSLVPN server can have one VPN instance to terminate many remote tunnels, this approach requires a dedicated VPN instance per remote end, because LACP bonding is “point-to-point”. So if you have many remote sites/tunnels, you need to run many VPN instances on the server.
    • This approach is good for smaller deployments, requiring large throughput (aggregated bandwidth) between sites, and the WAN link performance are consistent and identical.

NOTE

Unlike the #2 approach, LACP bonding requires both VPN tunnels to terminate on the same VPN gateway (CMG).

In this section, we will focus on VPN bonding using OSPF, for large deployment scenario. We will have a separate topic on VPN bonding with LACP.

WAN Bonding Demo Scenario:

  1. Dual 4G SIM (active/active)
  2. Branch secure network (Branch_WiFi → VLAN10)
  3. Bonding VPN tunnel to aggregate bandwidth

Sim1: 111.65.34.187
Speed: 18Mbps/19Mbps

Sim2: 111.65.45.212
Speed: 10Mbps/15Mbps

MWAN: 30Mbps/37Mbps

In this design, we’re using HSA/UA with dual LTE/SIM to provide multiple WAN connections to tunnel to the hub CMG. Then build a VPN tunnel across each LTE connection. But in real live deployment, we can also have different WAN connections (eg. MPLS, Fiber, PPPoE) to the HSA/UA WAN port.

A few key points to NOTE:

1. SSLVPN must run in tap mode (layer 2 tunnel) to support OSPF

2. On CMG (SSLVPN server)

* configure two VPN server instances (if both tunnels sharing the same gateway), under “security sslvpn-server x” configure unique port number for each instance, so that remote client (HSA) will import two profiles and built separate tunnels to each instance. when configuring “server address xxx”, it’s recommended to use DNS name instead of public IP, so that client profile uses DNS name to connect to SSLVPN server. This allows potential change of server public IP without clients re-importing VPN profiles.

* assign tunnel-pool for the tap interfaces, to advertise OSPF routes.

* Use “tap ospf priority 255” to make sure OSPF must ALWAYS be in DR state, so that it can receive/push routes from/to all remote ends.

* Use “tap ospf cost xx” to tweak tap cost for each tunnel if you want to run active/standby taps, otherwise both taps will have an ospf cost of 10 and load sharing (load balance) traffic across both taps. NOTE: If you are changing “tap ospf cost xx”, please do it on both ends, both CMG and HSA, for the same VPN instance, to avoid asymmetric routing problems.

* configure firewall-input and firewall-access rules to permit VPN tunnel and internal traffic to pass-through


3. On HSA

* Set OSPF priority to 0 to make sure OSPF must NOT be DR state.

* Use GUI (or mfusion) to configure tap interfaces, and put them into the correct “lan” firewall zone

* Use PBR to map each tunnel to the respective WAN interfaces, eg. lte0, lte1, WAN/DSL/MPLS, etc

1. we must not have a default route in HSA, otherwise, if the PBR fails (eg. mapped tunnel physical interface down), the tunnel will try to go through the default route to form a tunnel, so the tunnel will still remain (but go through the default route, which can be routed through other interfaces), and the failover doesn’t really happen.

2. So, If any WAN interface is getting DHCP from ISP (especially LTE/DSL), we must configure HSA not to receive the default route from ISP. Go to GUI, Network –> Interfaces, “Edit” target interface and go to “Advanced Settings”, uncheck “Use default gateway”. Do this for all WAN interfaces.

3. However, if there’s no default route in the main routing table, PBR will not kick in (therefore all tunnels will not form), so we need to define a dummy default route to kickstart PBR, but make sure the dummy default route has a higher distance than OSPF, so default route injected from OSPF will take precedence. eg. “ip route 0.0.0.0/0 nexthop lo distance 200”

Notice Title

1. We can use OSPF load balancing features (equal path, equal costs) across dual/multiple tunnels, but if one of the links is slow or has poor performance, it will impact the overall performance. You can use “tap ospf cost” to change the link cost, to switch between active/active or active/standby mode. tap with lower cost will be the active path.

2. Each CMG VPN instance can support hundreds of remote tunnels (remote OSPF neighbors). But since each tunnel is in layer2 mode, we recommend no more than 200 remote peers per instance, to reduce broadcast domains.

3. if you have large remote sites, it’s recommended to put each tap into a different OSPF area ID, to minimize OSPF topology update overheads.