Networking & Content Delivery
Advanced Routing scenarios with AWS Direct Connect SiteLink
SiteLink, a new feature of AWS Direct Connect (DX), makes it easy to send data from one Direct Connect location to another, bypassing AWS Regions. Once you have made connections at two or more Direct Connect locations, you can turn on (or off) the SiteLink feature on Private/Transit VIFs and in minutes, a global, reliable, and private network is ready for use.
In our first blog post, Introducing Direct Connect SiteLink, we explained the feature, use cases, and key considerations for using SiteLink. If SiteLink is new to you, that might be the best place to start. Keep reading this post if you’re familiar with SiteLink, as we focus on how routing works with SiteLink. With the help of advanced scenarios, we explain the default traffic forwarding behavior of SiteLink and ways to influence this behavior using BGP path attributes.
Traffic Flow between Direct Connect Locations with SiteLink enabled
To route traffic between your data centers, you must first connect them to a Direct Connect location. All Direct Connect locations, except for the ones that are associated with AWS China and AWS GovCloud, are interconnected using the purpose-built AWS Global Cloud Infrastructure that is designed for high availability and low-latency. Direct Connect locations use this infrastructure to connect with AWS Regions globally. When using SiteLink, traffic between Direct Connect locations takes the shortest path on the AWS backbone without entering the AWS Regions. This is depicted in the diagram that follows (figure 1).
Prior to the availability of SiteLink, you had to use AWS Transit Gateway (TGW) with inter-Region peering enabled to establish connectivity between Direct Connect locations. As shown in the following figure 2, this traffic flow traverses AWS Regions, adding additional latency and cost.
Routing and forwarding behavior when using SiteLink
When you create a VIF (with or without SiteLink enabled), your router establishes an external Border Gateway Protocol (eBGP) neighbor peering relationship with an AWS Direct Connect logical device (ALD). The ALDs form part of the control and data plane of Direct Connect Gateway (DXGW). ALDs advertise the prefixes learned from your routers to the DXGW control plane. The DXGW then advertises these prefixes to other ALDs with SiteLink enabled VIFs, and to the associated Virtual Private Gateways (VGW) or Transit Gateways (TGW). Consistent with the Direct Connect resiliency recommendations, you could establish multiple connections at redundant Direct Connect locations. You might choose to advertise the same IPv4/IPv6 prefix over these DX connections to achieve load balancing and/or high-availability. For each prefix you can use the options defined in the following table to achieve predictable traffic path to/from your on-premises location. These options are not mutually exclusive and can be used together.
Traffic Path | Option 1 | Option 2 | Option 3 |
Ingress | Splitting Prefix to influence Longest Prefix Match. | Local preference BGP Communities | AS path prepend |
If we receive the same prefix from multiple SiteLink enabled VIFs attached to the same DXGW, the ALD follows the rules mentioned below to select the best path. Please note that we have used simpler diagrams to illustrate the rules, these diagrams are not architecture guidance to connect your on-premises location to Direct Connect and we strongly recommend using Direct Connect Resiliency Toolkit to establish connectivity between your location and AWS.
Rule 1—An ALD would prefer the path from your directly connected router (if it exists), over a path from a remote ALD. This is because of the shorter AS path length via the directly connected customer edge (CE) router, please refer to the section Routing with Direct Connect SiteLink for more details. Please note that if the AS path length is identical for the directly connected CE, and from the remote ALD, then the directly connected path is preferred over the remote ALD. In figure 3 that follows, ALD1 has three BGP paths to reach the prefix 10.0.0.0/24:
- ALD1 → CE2
- ALD1 → ALD2 → CE 3
- ALD1 → ALD 3 → CE4
As per Rule 1, it chooses the first path as the BGP best path and installs it in the forwarding tables.
Rule 2—If there isn’t a BGP path via directly connected CE for a prefix, ALD installs paths from all remote ALDs advertising the same IPv4/IPv6 prefix (because of BGP multipath). ALD load balances your traffic on a per flow (five tuple hash) basis to all ALDs globally (part of the same DXGW), advertising the same prefix irrespective of their geographical location.
In the following diagram (figure 4), ALD1 has two BGP paths to reach the prefix 10.0.0.0/24:
- ALD1 → ALD 2 → CE 2
- ALD1 → ALD 3 → CE 3
As per Rule 2, ALD1 installs both paths in BGP/forwarding table and ECMP (Equal Cost Multi Path) traffic destined to 10.0.0.0/24 across these two paths.
Rule 3 – If your on-premises router (CE) influences the ingress path for a prefix by setting Local preference BGP Community and/or using BGP AS path prepend, then all ALDs will use the signaled preferred path (post BGP best-path algorithm).
If you want to influence the traffic destined for prefix 10.0.0.0/24 that is received through the Sydney Data Center 2 location, as shown in the following figure 5, you tag the advertised prefixes from CE2 with 7224:7300 standard BGP community. This directs all ALDs (part of the same DXGW) to prefer this path.
Rule Priority
Rule 3 takes precedence over Rule 1, and Rule 1 takes precedence over Rule 2. So, if you have the setup described in figure 3, and add a BGP community to influence traffic path, as shown in figure 6, the path with tagged community is the preferred path.
Alternatively, you can prepend AS path to the routes advertised from CE1 and CE2, to achieve the same (figure 6) routing behavior also shown in the following figure 7. Please note if CE2 advertises the prefix 10.0.0.0/24 to ALD1 with prepending its ASN twice then ALD1 prefers the path through CE2 compared to the path received from ALD3 (as described in Rule 1).
In both the preceding examples, ALD1 prefers the path to the New York data center over the directly connected path to CE2, and the path using the second Direct Connect location in Sydney.
Note that if you prepend AS path and tag your BGP Prefixes with Local preference BGP communities, then ALD evaluates the reserved community tags before AS path length. As shown in the following diagram (figure 8), both the paths ALD1 → CE2, and ALD1 → ALD2 → CE3 are tagged with the same BGP community, ALD1 prefers ALD1 → CE2 path as it has the BGP community and shorter AS path length. If that path fails, then ALD1 → ALD2 → CE3 is preferred over the ALD1 → ALD3 → CE4 path.
Effect of AS path prepend on VPC to on-premises traffic when using SiteLink
The routing behavior for traffic from AWS Regions (VGW/TGW) to on-premises locations over SiteLink enabled VIF varies slightly from the default Direct Connect VIF behavior with AWS path prepend. With SiteLink enabled VIFs from an AWS Region we now prefer the BGP path with lower AS path length from a DX location, irrespective of the Associated AWS Region. Let us unpack this with an example. We advertise an Associated AWS Region for each AWS DX location. With SiteLink disabled, traffic coming from VGW/TGW by default prefers a Direct Connect location that is associated with that AWS Region, even if your router from DX locations associated with different AWS Regions advertises a path with a shorter AS path length. The VGW/TGW still prefers the path from DX locations local to the Associated AWS Region. We show this in the diagram (figure 9) that follows. It is also illustrated in Private Transit VIF example page of our documentation.
Note: You can override this default behavior using BGP communities.
With SiteLink enabled, in the same scenario (discussed in figure 9) the Sydney Region VPC traffic now prefers the New York data center. We show this in the following diagram (figure 10).
Global connectivity with multiple sites advertising the same prefix
Let’s consider the following sample architecture (figure 11). You have four geographically distributed on-premises data centers (Sydney, Melbourne, New York and Las Vegas) connected to Direct Connect locations. You have redundant connections at each location. By creating SiteLink enabled VIFs and associating them to the same DXGW, you enable global connectivity between your data centers. You also have resources hosted inside VPCs connected via TGW in Sydney, Northern Virginia, and Oregon AWS Regions.
In this scenario, the same prefix, 10.0.0.0/24, is advertised by Sydney and New York Data Centers without modifying any BGP path attributes.
- ALD1 receives this prefix with the BGP Next-hop of ALD2, ALD5, ALD6, CE1, and CE2. After BGP best-path algorithm ALD1 selects and installs the advertisement from CE1 and CE2 (as detailed in Rule 1) in its BGP/forwarding table and advertises it to the DXGW Control Plane to be propagated to all remote ALDs. We observe similar behavior on ALD2.
- ALD5 receives this prefix with the BGP Next-hop of ALD1, ALD2, ALD6, and CE5. After BGP best-path algorithm ALD5 installs the advertisement from CE5 (as detailed in Rule 1) in its BGP/forwarding table and advertises it to DXGW Control Plane to be propagated to all remote ALDs. Similar behavior is observed for ALD 6.
- The DXGW Control Plane will propagate the 0.0.0.0/24 to 10.0.0.0/24.
- TGW in the Oregon AWS Region with the BGP next hop of ALD1, ALD2, ALD5 and ALD6.
- TGW in the Sydney AWS Region with the BGP next hop of ALD1 and ALD2 (Associated AWS Region).
- TGW in the N. Virginia AWS Region with the BGP next hop of ALD5 and ALD6 (Associated AWS Region).
The ALD propagates the prefix to the locally connected CE routers as shown in the following diagram (figure 12).
Traffic Flow from VPCs, Melbourne, and Las Vegas to 10.0.0.0/24 prefix
ALD3, ALD4, ALD 7, ALD8 receives the 10.0.0.0/24 prefix with the BGP Next-hop of ALD1, ALD2, ALD5 and ALD6 and installs all paths in BGP/forwarding table (as detailed in Rule 2). As shown in figure 12, we see a single BGP Prefix on CE3, CE4, CE7, and CE8. The AS path contained in this BGP update can either be 65001 65001 65401 or 65001 65001 65403 (depending on the prefix selected as bestpath in ALD BGP table). Note that ALDs in Melbourne DX location make no differentiation between the prefix learnt from Sydney vs New York ALDs.
CE3, CE4, CE7 and CE8 send traffic destined to 10.0.0.0/24 to the locally connected ALD. The ALDs in Melbourne and Las Vegas ECMP the traffic to ALD1, ALD2, ALD5 and ALD6.
We summarize the forwarding behavior in both the table that follows and figure 13.
From |
First hop inside AWS Cloud | 2nd hop inside AWS Cloud |
---|---|---|
CEn sending traffic to ALDn where n = 3,4,7,8 | ECMP to ALDs (ALD1, ALD2, ALD5 and ALD6) based on rule2 | ALD1/ALD2 will ECMP across CE1 and CE2 based on rule 1 ALD5 sends traffic to CE5 based on rule 1 ALD6 sends traffic to CE6 based on rule 1 |
TGW-Sydney Region | ECMP to ALDs (ALD1, ALD2) | ALD1/ALD2 will ECMP across CE1 and CE2 based on rule 1 |
TGW-N. Virginia Region | ECMP to ALDs (ALD5, ALD6) | ALD5 sends traffic to CE5 based on rule 1 ALD6 sends traffic to CE6 based on rule 1 |
TGW-Oregon Region | ECMP to ALDs (ALD1, ALD2, ALD5 and ALD6) | ALD1/ALD2 will ECMP across CE1 and CE2 based on rule 1 ALD5 sends traffic to CE5 based on rule 1 ALD6 sends traffic to CE6 based on rule 1 |
In this scenario, traffic to 10.0.0.0/24 is crossing intercontinental links. Traffic from Melbourne/Las Vegas is load balanced to New York and Sydney. You can use unique region-specific prefixes (10.0.x.0/24, where x is unique per region) to advertise common services from the region. However, if you want to build a global anycast design, where 10.0.0.0/24 is advertised from more than one region and you want traffic to be accessed from closest geographic data center (i.e., Asia Pacific locations use Sydney, and North American locations use New York), then you can refer to the next section “Segmenting Global and Regional Networks”.
Segmenting Global and Regional Networks
While SiteLink builds a full mesh global network, to meet security and routing requirements, you can add segmentation within your global network. We covered some segmentation use cases in the Introducing Direct Connect SiteLink blogpost.
One segmentation use-case is to create a Regional backbone network that connects data centers within an isolated geography (a continent, for example) and an additional global backbone network that connects the isolated Regional backbones through your Regional headquarter data centers. This network architecture helps to achieve Regional administrative domains, build network hierarchy, achieve smaller fault domains, achieve route summarization, simplify bandwidth planning for each Regional and global network, and optionally enforce inspection on inter-Regional traffic. This design helps you build an architecture where Regional Data Centers in one continent use the head-quarters data center within Region to access services in a remote Region’s data center. We illustrate this in figure 14 that follows, where 10.0.0.0/8 is advertised only to customer Regional backbone networks from the Regional headquarters. This architecture also allows to have anycast architecture (shown with 10.0.0.0/24) where your Regional Data Centers use Regional headquarters to access common services behind the anycast prefix range. Note: With this architecture any traffic flow between Regional only data centers, such as from Asia DC1 to Europe DC2, as shown in the following figure 14, incur multiple AWS Data Transfer Out (DTO) charges (twice over Regional network and once over global network) billed to the AWS account that owns the SiteLink enabled VIFs/DXGWs.
The above requirement can be met by deploying three DXGWs as shown in figure 15.
Overlay architecture without using BGP between DXGW and your device
We recommend using BGP between your router and DXGW, however there are use cases where you cannot run BGP with DXGW, such as lack of BGP support on your router/appliances, strict requirements to use overlay networks for on-premises to on-premises connectivity without running BGP with AWS. In order to assist with such situations, Create SiteLink enabled VIFs with required BGP parameters and attach these to the same DXGW. The DXGW advertises the connected subnet configured on the SiteLink enabled VIFs, containing the Amazon router peer IP and your router peer IP, to all ALDs (part of the same DXGW). This allows you to build an overlay network between your routers/SD-wan appliances by simply configuring a static route (containing all connected subnets) pointing to your directly connected ALD without running BGP with DXGW/ALD. As elaborated in figure 16, You configure each SiteLink enabled VIF with 169.254.2.x/31 IPv4 subnet and configure a static route (169.254.2.0/24) on your appliances pointing to the ALD (as a next-hop) to establish the underlay connectivity required by your overlay network. This enables you to run an overlay network using SiteLink as an underlay without establishing BGP session with DXGW. Please note that with this setup, DXGW can just route traffic between connected subnets as shown with the outer IPv4 header in figure 16.
Key Considerations
Here are some additional considerations when enabling SiteLink.
- We strongly recommend using the Direct Connect Resiliency Toolkit to establish connectivity between your on-premises location to AWS when using SiteLink.
- If you advertise the same BGP prefix (ex: 10.0.0.0/16) from an AWS Region (from TGW or VGW) and from your on-premises routers connected using SiteLink, to the same DXGW, then we always prefer the AWS Region. This behavior cannot be changed by setting BGP Communities/BGP Path Attributes from your router. You can overcome this behavior by advertising more specific prefixes (ex: 10.0.0.0/17 and 10.128.0.0/17) from your on-premises router.
- Within a Direct Connect location, if you have two or more CE routers connected to same/different ALDs (part of DXGW) and all of your CE routers are advertising unique prefixes without the same prefixes advertised from CEs in other remote locations, then you route traffic between your CE routers without the traffic leaving the Direct Connect location. The SiteLink rates still apply.
- The connected subnet, containing the Amazon router peer IP and your router peer IP, configured on the SiteLink enabled VIF is advertised to all remote SiteLink enabled VIFs by DXGW. These connected subnets are not propagated to AWS Transit Gateway (TGW) or to AWS Virtual Private Gateway (VGW). We do not count these prefixes towards your service quota.
- There is no enforced quota on prefixes advertised from AWS DXGW to on-premises. The maximum number of prefixes (per Address family) that can be advertised from AWS DXGW to SiteLink enabled VIF will be sum of the following (calculated to 2989 based on the current quotas, also please note that the supported value will change if we change any of the quotas).
- Maximum number of supported VIFs per DXGW minus one
- (Maximum number of supported VIFs per DXGW minus one) multiplied by (Maximum number of routes that each SiteLink enabled VIF can advertise to AWS)
- Maximum number of routes that can be advertised from AWS Region to DXGW
Conclusion
Direct Connect SiteLink allows you to connect your data centers using Direct Connect following the shortest path on the AWS global network. You can connect multiple data centers globally to AWS and implement different routing strategies to meet your traffic flow requirements. While the AWS global network defaults to load balancing all traffic across redundant paths, you have the flexibility to influence traffic flow using BGP communities. You can also use SiteLink, along with multiple DXGWs, for advanced use-cases that require the additional security and traffic control that comes with network segmentation. Overall, the Direct Connect SiteLink feature makes creating global networks easier while providing the routing and segmentation flexibility you need to meet advanced enterprise requirements. For more details on how to get started, visit our SiteLink documentation pages.