Networking & Content Delivery

Creating active/passive BGP connections over AWS Direct Connect

There are many ways to connect your data centers to Amazon Web Services. This blog post answers a few common questions that customers ask us when trying to build a communications path over AWS Direct Connect (DX). In particular, how to create active passive Border Gateway Protocol (BGP) connections with AWS over Direct Connect.

To achieve redundancy and provide higher capacity, many customers deploy two or more DX connections. Two common patterns involve using a combination of Private Virtual Interfaces (PrivateVIF) and Direct Connect Gateway (DXGW) or Transit Virtual Interfaces (TransitVIF) and DXGW. Let’s look at an illustration of these architectures:

 

 

 

Both configurations use multiple DX locations (generally considered a good practice) and both locations are connected by some means outside of the cloud connectivity. In both cases, the customer is connected to a DXGW that is in turn is either directly connected to Virtual Private Gateways (VGW), or an AWS Transit Gateway (TGW).

Whilst both of the preceding scenarios provide higher capacity and increased resiliency, there is more you can do. I’d encourage you to read the DX SLA detail that you can find here, https://aws.amazon.com/directconnect/sla/, to learn more.

Provisioning multiple DX connections affords the customer some unique opportunities for route tuning, and poses some equally interesting questions around optimization.

Some of the common questions customers ask when creating multiple DX connections, are:

  • How can they dedicate a single DX connection as primary for destinations on AWS that are classified as ‘production’ networks? With another dedicated connection being allocated for networks that are classified as ‘development’.
  • How can they avoid less than optimal routing across their private estate, and avoid asymmetric routing by influencing traffic that is sent towards their end of the DX connections across a particular pathway (DX connection)
  • In both cases, high availability (HA) is a requirement in the event that a DX connection or a customer routing device is lost or is in need of maintenance

 

BGP Routing Overview

I remember studying for my Cisco CCNP R&S, the content was a huge step-up from the CCNA material. Border Gateway Protocol (BGP) was a core CCNP topic and I can still recall one the mnemonics that I used in my attempts to solidify the BGP best-path algorithm:

We Love Oranges AS Oranges Mean Pure Refreshment

This translates to:

Weight, Local PrefOriginate, AS_Path, Origin, MED, Paths, RouterID

This isn’t a complete list of all the BGP steps, but these will suffice for the purposes of this post. In fact, we are going to be talking about Local Pref, AS_Path and MED, specifically.

If you’ve not heard of the best-path algorithm before, then fear not, we are going to cover this briefly now. I assume that we can generally agree on oranges being refreshing, however!

Border Gateway Protocol (BGP) is an Exterior Gateway Routing protocol (EGP) An EGP is concerned with advertising address information between Autonomous Systems (AS). You can think of an AS, as a wholly controlled administrative unit that is responsible for the address space within; an AS is identified by a special 16 bit or 32 bit number. Unlike Interior Gateway routing Protocols (IGP) that work within an AS and concern themselves with link-states or interface costs, EGPs are primarily focused on the paths to destination AS without. This is why they are sometimes called Path Vector Routing protocols. When BGP speakers, or peers, advertise Network Layer Reachability Information (NLRI) to each other they also advertise a series of constructs called Path Attributes (PA). These are sent in a special type of message called a BGP Update message. The best-path algorithm that runs as part of BGP considers all routes it receives and tries to select the best ones. It uses configured policies and received path attributes when stepping through the logic. The process finishes when an appropriate route or routes are found.

A detailed explanation of how this works is beyond the scope of this post, but you can find a world of information here: https://tools.ietf.org/html/rfc4271

Let’s explore three BGP path attributes a little more closely. When configured, these attributes can materially affect routing behavior, in both directions, over a DX connection.

Local Pref – This path attribute is considered right at the start of the best-path algorithm, and as such, is an optimal tuning parameter! This is used for both Inbound and Outbound tuning – higher values are preferred.

AS_Path – This path attribute is a concatenation of all the AS numbers the advertisement has passed through. It is used as a loop avoidance mechanism on the one hand and as an indication of distance on the other. This is used for both Inbound and Outbound tuning – shorter AS_Path lengths are preferred.

MED – This path attribute uses a metric as well. MED is typically used by an AS that is multi-homed to instruct an external AS (that it is peered with) that it has a preferred entry point for a particular network address block. This can be used for inbound tuning – lower metric values are preferred.

 

Prerequisites

When using DX to connect your on-premises locations to AWS, BGP is a requirement. Requiring BGP though means that you have the opportunity to engineer your traffic flows between your on-premises locations and AWS using policies and specific processing logic.

BGP peering is configured between opposite ends of AWS Virtual Interfaces. The following table shows the types of Virtual Interfaces that are possible over DX and the nature of the BGP connection:

Use Case Private Addressing Public Addressing
Type PrivateVIF TransitVIF PublicVIF*
Attaches to Virtual Private Gateway / Direct Connect Gateway Direct Connect Gateway Account Construct
Peer IP Support Private Private Public
BGP PA Support Local Pref, AS_Path, MED Local Pref, AS_Path, MED AS_Path**

 * When using a PublicVIF, the home Region for DX connections must be the same if you want to use load balancing.

**When using a PublicVIF, AS_Path prepending is only supported when using a Public AS number.

 

Aside from BGP path attributes, note that more specific prefixes are generally preferred above all others! If you advertise a more specific prefix on one of your links, you attract traffic from AWS across that link, irrespective of what you configure with BGP.

 For further information on Virtual Interface types, see the following link: https://docs.aws.amazon.com/directconnect/latest/UserGuide/WorkingWithVirtualInterfaces.html 

For further information on DX Connection types, see the following link: https://docs.aws.amazon.com/directconnect/latest/UserGuide/WorkingWithConnections.html 

 

Walkthrough

You may have noticed that in both patterns depicted above, there are BGP AS numbers beyond DXGW. These AS numbers are not present in the AS_Path list as viewed from the perspective of DC1/DC2. DXGW in essence is acting as a BGP reflector for AS’ that sit behind it on the AWS side.

The following table lists the connection specifics for both scenarios:

Network Space On-Premises AWS
BGP ASN 65000 65001
Production 172.16.1.0/24 10.0.1.0/24
Development 172.16.2.0/24 10.0.2.0/24

For both patterns:

  • DC1 and DC2 routers advertise all on-premises network ranges towards AWS
  • DC1 and DC2 routers receive all AWS network route prefixes from DXGW

 

Let’s talk datacenter routing tables!

Without any tuning whatsoever, this is what the route tables would look like:

  • DC1/DC2 would have routes to 10.0.1.0/24 and 10.0.2.0/24, the AS_Path for these routes would have 65001 in the AS_Path list – the outgoing interface would the eBGP interfaces for the PrivateVIF or TransitVIF.
  • DXGW would have routes to 172.16.1.0/24 and 172.16.2.0/24, the AS_Path for these routes would have 65000 in the list – the outgoing interfaces could be either of the associated PrivateVIFs or TransitVIFs from DC1 or DC2.

Now then, I’m sure that you can see some of the considerations here:

  • Without any tuning, we can’t influence outbound traffic towards AWS for the production or development networks from the perspective of DC1/DC2.
  • We can’t influence return traffic from AWS via DXGW so that traffic doesn’t cross our inter-dc link unnecessarily.
  • We introduce asymmetric routing, which in itself isn’t necessarily an issue but it could be if there were security devices in the path.

 

For both the PrivateVIF and TransitVIF scenarios there is additional configuration that is required beyond the DX BGP configuration in order for traffic to flow end to end. If you are new to DXGW or TGW, check out these great blogs by Jeff Barr:

https://aws.amazon.com/blogs/aws/new-aws-direct-connect-gateway-inter-region-vpc-access/

https://aws.amazon.com/blogs/aws/new-use-an-aws-transit-gateway-to-simplify-your-network-architecture/

 

What’s our Vector? 

The first ask was to influence outbound connections from DC1/DC2 towards AWS. How can we tell DC1/DC2 to agree on a common egress point for production or development networks? Remember Local Pref? This path attribute is designed for this type of work – what we must do is to create a policy definition and specify this policy on the BGP neighbor configuration between DC1/DC2.

Let’s have a look at the BGP route table on each of the enterprise routers: DC1/DC2, before we make any changes:

As we can see, in both cases, DC1 and DC2 prefer their eBGP peers for AWS network destinations. This isn’t surprising as eBGP paths are preferred over iBGP paths – this is step 7 (Pure/Paths) in full effect. (Refer back to the mnemonic from earlier in this post!)

 

Tuning outbound with Local Pref

Let’s influence the routing decision made locally by creating a policy that matches the traffic and mutually advertises an increased Local Pref value. We do this between our BGP peers within AS65000. Local Pref is only ever advertised between iBGP peers, or in other words, peers that share a common AS.

Here’s the workflow we must follow:

 

 

Let’s check those BGP tables again!

 

Exactly what we were after! Now, as packets arrive on DC1 destined for the production network ‘10.0.1.0/24’, the exit interface is that of the eBGP peer ‘169.254.254.41’. Packets destined toward the development network ‘10.0.2.0/24’ traverse the inter-dc link before making their way toward AWS. When looking at DC2, we can see that the inverse is also true.

 

Tuning outbound with AS_Path

So what about AS_Path? Considered after Local Pref, it is another tuning parameter that we can use. The config here is similar, but we must increase the AS_Path for the networks that are advertised to DC1/DC2, respectively, from their eBGP peers in AWS. The result? Each router views the specified eBGP learned route as ‘longer’ and then prefers its iBGP peer connection (something that wouldn’t normally happen due to step 7! ) But remember one thing, the matching logic is reversed here since we increased the length of the path we don’t want to optimize!

 

Here’s the workflow we must follow:

 

 

Let’s check those BGP tables.

 

Great! Our tuning had the desired effect. We influenced an outbound path choice by increasing the path length.

Outbound tuning summary

You can use either of these methods, or indeed others (perhaps local weight on your device). But regarding BGP, consider that Local Pref, as well as being slightly more intuitive, is considered before AS_Path as part of the BGP best-path algorithm. So any locally configured Local Pref values may interfere with your AS_Path configuration!

Here’s an illustration of what we have achieved so far, optimized outbound routing using either Local Pref or AS_Path:

 

 

Tuning inbound with Local Pref

So far I have covered outbound tuning, let’s now turn our attention to inbound tuning.

For inbound tuning we start again with Local Pref. However, recall that I highlighted that you can’t use Local Pref between eBGP peers? This is broadly true, so how do we ask the AWS AS 65001 to use specific Local Pref values when considering which routes to install in its BGP table?

To do this, we use a BGP community. A BGP community is a construct that allows additional information to be sent between BGP speakers. You can read more about BGP Communities, here: https://tools.ietf.org/html/rfc1997.

AWS supports several community values; the following link takes you to more information: https://docs.aws.amazon.com/directconnect/latest/UserGuide/routing-and-bgp.html

For each prefix that you advertise over a BGP session, you apply a community tag to indicate the priority of the associated path for returning traffic. For the purposes of this post, we are interested in the following community tags:

  • 7224:7100-Low preference
  • 7224:7200-Medium preference
  • 7224:7300-High preference

 

Let’s see a graphical representation of the current state, from the perspective of DXGW, and where it expects to route packets.

 

 

DXGW balances traffic across paths if the path costs are equal*. We can see that is the case here, but before we make any changes, here is a trace-route from an EC2 instance that lives in the production VPC. You can see that both paths are used – this is due to Equal Cost Multipath (ECMP) which is a fancy way for saying ‘I have more than one route in my route table of equal cost; use any of them’.

* If both DX connections are within the same Region, DXGW will load balance here, otherwise DXGW uses a default Local Pref value to prefer the Region, local to the source of the AWS traffic. This can be overridden through BGP community values specified by the customer.

I’m using an enhanced version of traceroute called paris-traceroute – check it out! https://paris-traceroute.net/

sh-4.2$ sudo paris-traceroute -n -a exh 172.16.1.1
traceroute [(10.0.1.10:33456) -> (172.16.1.1:33457)], protocol udp, algo exh, duration 51 s
 1  P(16, 16) 169.254.254.29:0,4,8,9  0.359/0.482/0.984/0.181 ms  169.254.254.25:1,2,3,5,6,7,10  0.352/1.242/5.367/1.710 ms
 2  P(12, 16) 169.254.254.42:0,6,8,9,10  11.271/11.488/11.979/0.234 ms  169.254.254.34:5,7  10.749/10.974/11.200/0.226 ms
sh-4.2$ sudo paris-traceroute -n -a exh 172.16.2.1
traceroute [(10.0.1.10:33456) -> (172.16.2.1:33457)], protocol udp, algo exh, duration 21 s
 1  P(16, 16) 169.254.254.29:0,2,5  0.365/0.876/2.579/0.798 ms  169.254.254.25:1,3,4,6,7,8,9,10  0.325/0.416/0.570/0.075 ms
 2  P(14, 16) 169.254.254.42:0,1,5,7,8,9  11.204/11.715/12.042/0.299 ms  169.254.254.34:2,6,10  10.675/11.104/11.336/0.303 ms

 

When you use the tool in Exhaustive mode, it tries to produce an accurate calculation of all possible paths/hops. We see in the output that some of the hops pass through the peering to DC1 and others pass through the peering to DC2. If we apply some traffic engineering on both routers now, let’s see what happens.

 

 

Here’s that trace again after we have asked AWS to prefer specific links towards the datacenter.

 

sh-4.2$ sudo paris-traceroute -n -a exh 172.16.1.1
traceroute [(10.0.1.10:33456) -> (172.16.1.1:33457)], protocol udp, algo exh, duration 11 s
 1  P(16, 16) 169.254.254.29:0,4,8,9  0.367/0.417/0.488/0.035 ms  169.254.254.25:1,2,3,5,6,7,10  0.379/0.435/0.495/0.047 ms
 2  P(1, 6) 169.254.254.42  11.006/11.006/11.006/0.000 ms
sh-4.2$ sudo paris-traceroute -n -a exh 172.16.2.1
traceroute [(10.0.1.10:33456) -> (172.16.2.1:33457)], protocol udp, algo exh, duration 26 s
 1  P(16, 16) 169.254.254.25:0,1,3,4,8,10  0.346/0.651/1.619/0.388 ms  169.254.254.29:2,5,6,7,9  0.375/0.418/0.449/0.033 ms
 2  P(6, 6) 169.254.254.34  9.904/10.142/10.517/0.264 ms

 

Great! Our second hop is now always the preferred route. Let’s see that illustrated.

 

Tuning inbound with AS_Path and MED

I’m rapidly running out of word space for this blog but for completeness, I wanted to mention that in addition to Local Pref, both AS_Path and MED can be used for influencing inbound pathways. I covered AS_Path earlier through the creation of a policy construct that matched route information, delivered from the upstream (AWS) peering to influence routes in our route tables. To make this work ‘inbound’, simply flip the logic around and apply an outbound policy, increasing the length of the routes that are sent to AWS, specifically for the routes that you want to de-prioritize for the link.

MED – step 6 (Mean/MED) is a path attribute used between eBGP speakers and can influence external ASs towards a particular AS entry point when that AS is multi-homed. MED is considered after Local Pref and AS_Path and as such is the least preferred tuning parameter of the three listed. Whilst it certainly worked in my testing – your mileage may vary.

Route tuning summary

Here’s what we have achieved.

  • Optimized outbound routing using either Local Pref / AS_Path
  • Optimized inbound tuning using either Local Pref / AS_Path or potentially even MED
  • Higher capacity and availability with multiple DX connections

 

 

Considerations

Something to be aware of! If you have well-defined source and destination networks (one-to-one), both on-premises and in AWS, then these configurations should work for you. If the networks were one-to-many, say for example if the White network needed speak to the Development network as well, then there would be asymmetry in the flows. This could be overcome by using policy-based routing or perhaps even Network Address Translation (NAT). Your particular use case may differ here.

Summary

In this blog post I showed you how to tune the dynamic routing between your datacenters and AWS using BGP. We used BGP’s best-path algorithm to successfully tune routing toward and inbound from AWS, using several path attributes and community flags. This wasn’t an exhaustive list of what is possible as there are other ways to achieve similar outcomes.

How did you remember the BGP best path algorithm?

Author Bio

Adam Palmer

Adam Palmer is a Senior Specialist Network Solutions Architect at AWS. Prior to joining AWS, Adam worked as an Architect in the Financial Service sector; specializing in Networking, VMware, Microsoft platform and End-User Compute solutions. In his spare time he can be found climbing mountain faces, wherever the weather is good!