Learn

BGP Path Selection

In this article, we will look at how the BGP protocol chooses the best route to use for forwarding traffic. With other routing protocols, discriminating between paths and selecting the best ones will help ensure optimal network utilization and great performance. Other protocols, such as OSPF or Spanning Tree protocols, optimize for link state, minimal latency, and using the fastest path first. In BGP, however, the emphasis is on policy-based path selection. BGP optimizes for shorter paths at the autonomous system level rather than the link level, and it optimizes for best path according to the relationship with the neighboring autonomous system.

BGP is not like a true link-state or distance-vector protocol — it is known as a path vector protocol. It was designed from the ground up to offer fine-grained administrative policy control over path selection. Baked into the BGP protocol and its implementations are sensible defaults ensuring reasonably optimal path selection even without a policy being applied. Most implementations will, by default, select a single best path to a given IP destination. 

We will use Juniper JunOS examples across this article to show you how the administrator can influence BGP path selection. We’ll cover some implementation details and typical use cases with policy examples. Note that we will focus on path selection between BGP routes installed in the routing table; out of scope will be BGP routes that are filtered out by policy or deemed invalid due to, for example, AS path loops. We also won’t cover vendor-specific path selection features, such as Cisco’s “weight” attribute, which is not part of the BGP standard.

Executive Summary

We will explain the main factors influencing BGP path selection, which are the following.

BGP Path Selection Defaults

In Juniper JunOS.

Prefix Length

More-specific prefixes are chosen first, i.e., the longest mask wins, all else being equal.

Administrative Policy

Policy is used to affect selection.

BGP Attributes

Router defaults and policies combine to influence path selection. Attributes are attached to each BGP path and selected for or modified by BGP policy.

Typical Use Cases

  • Default route selection with two ISPs.
  • Inbound traffic influence with two ISPs.

RPKI ROA Considerations

RPKI influence on selection and there are risks of misconfiguration.

{{banner-14="/design/banners"}}

BGP Path Selection

BGP Defaults

Each router vendor implements its own BGP path selection defaults. Notably, Cisco includes a vendor-specific parameter called Weight. The path with the highest weight is always chosen. 

As mentioned, in this article, we will use Juniper Junos for our explanation and examples. Juniper documents its default selection process on this page, where you can read about the selection process and how various tiebreakers are evaluated. 

It’s essential to note that routes are compared and selected at each step of the algorithm using sorting and tiebreaking. For example, let’s say we start with two eBGP routes to the same destination, and one has a higher BGP local preference than the other. At step two of the process, the route with the higher local preference will be chosen, and the process stops (for that destination). If, instead, these two routes had the same local preference, but one had a shorter AS path than the other, it would be chosen at step five of the process.

Quoting from Juniper’s article, and simplifying some details, the full algorithm looks essentially like this:

  1. Verify that the next hop can be resolved.
  2. Choose the path with the lowest preference value (routing protocol process preference). For example, static routes are preferred over IGP routes, IGP routes are preferred over BGP routes, and so on.
  3. Prefer the path with higher local preference.
  4. If the accumulated interior gateway protocol (AIGP) attribute is enabled, prefer the path with the lower AIGP attribute. That is, for routes internal to your BGP domain, prefer the route with the lowest IGP metric or cost.
  5. Prefer the path with the shortest autonomous system (AS) path value (skipped if the as-path-ignore statement is configured). A confederation segment (sequence or set) has a path length of 0. An AS set has a path length of 1.
  6. Prefer the route with the lower origin code. Routes learned from an IGP have a lower origin code than those learned from an exterior gateway protocol (EGP), and both have lower origin codes than incomplete routes (those whose origin is unknown).
  7. Prefer the path with the lowest multiple exit discriminator (MED) metric.
  8. Confederations are not considered when determining neighboring ASes. A missing MED metric is treated as if a MED were present but zero. By default, only the MEDs of routes that have the same peer autonomous systems (ASes) are compared.
  9. Prefer strictly internal paths, which include IGP routes and locally generated routes (static, direct, local, and so forth).
  10. Prefer strictly external BGP (EBGP) paths over external paths learned through internal BGP (IBGP) sessions.
  11. Prefer the path whose next hop is resolved through the IGP route with the lowest metric.

    NOTE: A path is considered a BGP equal-cost path (and will be used for forwarding) if a tiebreak is performed after the previous step. All paths with the same neighboring AS, learned by a multipath-enabled BGP neighbor, are considered. BGP multipath does not apply to paths that share the same MED-plus-IGP cost yet differ in IGP cost. Multipath path selection is based on the IGP cost metric, even if two paths have the same MED-plus-IGP cost.
  12. If both paths are external, prefer the currently active path to minimize route-flapping. This rule is not used if any one of the following conditions is true:
    • path-selection external-router-id is configured.
    • Both peers have the same router ID.
    • Either peer is a confederation peer.
    • Neither path is the current active path.
  13. Prefer a primary route over a secondary route. A primary route is one that belongs to the routing table, while a secondary route is added to the routing table through an export policy.
  14. Prefer the path from the peer with the lowest router ID. For any path with an originator ID attribute, substitute the originator ID for the router ID during router ID comparison.
  15. Prefer the path with the shortest cluster list length. The length is 0 for no list.
  16. Prefer the path from the peer with the lowest peer IP address.

The algorithm in the previous section explains how the best route is selected from a set of routes with the same prefix length, but it’s crucial to note that the most specific prefix, i.e., the longest subnet mask, will always be chosen first. Let’s consider some examples:

A. 192.168.0.0/16   via xe-0/0/0.0 <- mask is 16 bits, 255.255.0.0
B. 196.168.1.0/24   via xe-1/2/0.0 <- mask is 24 bits, 255.255.255.0
C. 196.168.1.0/24   via xe-2/2/0.0 <- mask is 24 bits, 255.255.255.0
D. 196.168.1.128/25 via ge-3/0/6.0 <- mask is 25 bits, 255.255.128.0

Routes A and D will be selected as active routes for their given prefix lengths, but a selection process will occur between routes B and C because they have the same prefix length.

Route A covers the range 192.168.0.0 through 192.168.255.255.

Route D covers the range 192.168.1.128 through 192.168.1.255.

Routes B and C cover the range 192.168.1.0 through 192.168.1.255.

Let’s consider the implications of the different prefix lengths when a packet forwarding decision is made:

  • When a packet is forwarded to 192.168.1.120, route B or C will be chosen.
  • When a packet is forwarded to 192.168.1.130, route D will be chosen, because it has a longer prefix.
  • When a packet is forwarded to anything inside the range 192.168.0.0 - 192.168.255.255 that does not match the more specific /25 and /24 routes, route A will be chosen.

Administrative Policy

Let’s move on to policy. As mentioned, BGP offers fine-grained policy controls for path selection and the specific ordering in the algorithm is intended to give sensible results, and often near-optimal ones, even without a specific policy being applied.

However, it is often the case that a given BGP peering should be preferred over another for reasons of bandwidth cost per Mbps, or latency, or relationship to that peer. BGP itself is not aware of peer relationships, path latency, link or path bandwidth, or the financial cost of that bandwidth. This is where you, as the administrator, come in. A very typical arrangement is to have a tiered local preference structure based on relationship, for example:

  • Customer routes are most preferred, with a higher-than-default local preference, e.g., 110.
  • Private or internet exchange routes are preferred next, with a default local preference, e.g., 100.
  • Transit routes are the least preferred, with a lower-than-default preference, e.g., 90.

A tiered local preference structure like this is used because the best path is always the most direct one. You wouldn’t route traffic out of your network to take a scenic route via the internet transit providers to reach your customer. Accordingly, you want to always prefer your direct links to customers to reach their prefixes, and direct or exchange links to peers to reach peers, and then use transit for everything else. On the financial side, customers pay you, peers exchange traffic settlement-free, generally speaking, and you pay your transit providers.

 {{banner-7="/design/banners"}}

BGP Attributes

The main attributes worthy of consideration when creating policy and understanding how route selection will work in operation are the following:

  • Local preference
  • AS path length
  • MED (metric)

Note: As described earlier, prefix length is not a BGP attribute but supersedes BGP attribute route selection criteria.

When crafting policy adjustments to local preference or MED, also consider how AS path length plays a role and how route selection affects how traffic will route and reroute as links fail, routers go down for maintenance, etc.

Typical Use Cases

Let’s get a little more granular with the local preference policy outlined previously. If we add some distinctions, such as offering customers primary and backup connections, we might set the primary to local preference 110 and the backup to local preference 105. This ensures that no matter the value of any other route attributes, primary connections remain primary and active if available, and backup connections remain as idle backups, only becoming active if no primary is available.

With peering connections, we might differentiate between directly connected peers, which cross-connect directly to our router in the data center, and indirectly connected peers, which meet at an internet exchange point such as a shared LAN switch.

If we had two transit providers and one was more expensive than the other but otherwise similarly performing, we might elect to use local preference 90 for the cheaper transit and local preference 85 for the more expensive one. Again, this would ensure that the cheaper transit is used when available, and the expensive transit is only used when the primary is down. Our local preference schema would then look like this:

Customer - primary	     110
Customer - backup	     105
Peering - direct		     100
Peering - exchanges         95
Transit - primary		 90
Transit - backup	            85

{{banner-15="/design/banners"}}

RPKI ROA Considerations

Harking back to the introduction, which explains that filtered or otherwise invalid routes are out of scope for route selection, it’s important to note that in a global network of BGP sessions, RPKI ROA policies play an important part in route selection. 

This article primarily outlines how you, as an administrator, can influence and manage route selection in your network. However, be aware that networks implementing ROA validation policies use RPKI ROA attributes to determine whether to accept a route advertisement or reject it, and if accepting it, whether to use the standard or a lowered local preference value.

A quick summary is that every AS number on the internet has the option to create ROA records detailing which origin AS numbers and prefix lengths are valid for its IP space. Networks evaluating ROAs as part of their BGP import policies will check to see if a route they receive has RPKI ROA status valid, invalid, or unknown.

ROAs look like this (simplified):

192.168.0.0/23        AS65000
192.168.0.0/24        AS65000
192.168.1.0/24        AS65000

This means any BGP route seen on the internet for the IP range 192.168.0.0 - 192.168.1.255 must originate from AS65000 and must be in the form of a route prefix 192.168.0.0/23, 192.168.0.0/24, or 192.168.1.0/24. Any other route referencing this IP range will be deemed invalid because it clashes with this ROA specification.

Therefore, a valid route is one that matches a ROA record, while an unknown route is one that has no ROA covering the IP space to which it pertains. Most networks accept valid and unknown routes because, at the time of writing, only one-quarter to one-third of the internet routing table is covered by ROAs. If the route is valid, it will be accepted, and if the route has no ROA, the same—the alternative is to drop connectivity to most of the internet! RPKI ROA adoption is growing, but it will be some time before most internet prefixes are covered. Until then, the only sane policy is to accept unknowns.

As an newtork operator or engineer, it’s important for you to know that invalid routes are those that clash with a ROA. Many large transit providers now drop invalid routes; some will accept them, but with a lower local preference value. The RPKI ROA system is designed to prevent misconfigurations in route announcements, with the key implication being that it helps prevent BGP route hijacking.

The takeaway is that if you or your customers create ROAs for your prefixes, you absolutely must take great care to maintain accuracy. Otherwise, you may find yourself cut off from the internet as your BGP announcements are dropped!

{{banner-sre="/design/banners"}}

Conclusion

You should now understand the basic considerations for default path selection, how you can change this behavior using policy, and why you might want to do so. We also touched on RPKI ROA considerations, including the benefits and risks of deploying them. Equipped with this knowledge, you are hopefully now empowered to begin traffic engineering your own BGP sessions for best effect.

Chapters