Learn

BGP Peer

The Border Gateway Protocol (BGP) is the de facto routing protocol used on the Internet today. Like many dynamic routing protocols, BGP works based on the creation of neighbor adjacencies between routers. In the language of BGP, this is called BGP peering, and the devices participating in such a peering are called BGP peers.

BGP peers are at the very heart of BGP's operation. The way that BGP peers are established, including who, where, and under what policies, drives latency (keeping paths short and local), traffic locality (balancing transit versus peer-to-peer or provider-to-customer policies), and cost (favoring settlement-free peers at IXPs), which is why continuous peer monitoring is critical. 

BGP session states, uptime, flap detection, error codes, and security are just some elements that must be considered when monitoring BGP in the context of BGP peers. Understanding these concepts is an important part of successfully and effectively monitoring BGP on a broader scale.

This article introduces the concept of BGP peers, focusing specifically on a series of peer-related elements crucial to BGP monitoring. It then outlines specific best practices to follow when dealing with BGP peers. You will gain greater clarity into what to monitor, how to do it, and how to respond effectively to BGP peer-related issues.

Summary of key BGP peer best practices

The table below summarizes the BGP peer best practices that this article will explore in more detail.

Best practice Description
Monitor BGP session state Monitor the current state of the BGP session to assess peer health and acceptable transitions.
Track peer uptime and flap detection Track how long a BGP peering session has been active and identify instability from frequent session resets.
Observe prefix advertisement and reception Watch the number and types of prefixes exchanged between peers, including using mechanisms like RPKI to validate route origin and prevent hijacks or leaks.
Analyze BGP error codes and notifications Capture and analyze BGP error messages for rapid fault diagnosis.
Monitor BGP peer session security Examine BGP peering security parameters, including MD5/SHA authentication and TTL security.

BGP peer monitoring

BGP peers, also referred to as BGP neighbors, are routers configured to exchange routing information using BGP. The most important BGP neighbors when it comes to monitoring are border routers belonging to different ASes that are configured to exchange routing information using BGP.  These are considered External BGP peers, or eBGP peers.

Monitoring various elements of BGP peers, and of eBGP peers in particular, is an important part of delivering a holistic monitoring solution.  Thus, understanding the nature of these types of BGP peers and what they are is an important step in successfully monitoring BGP.

What is a BGP peering?

BGP peers are established over a TCP session, typically using port 179 to share route advertisements that define how traffic should be routed across the Internet and within large enterprise networks. 

BGP peers can be external (eBGP) or internal (iBGP), depending on whether they connect routers in different autonomous systems (ASes) or within the same AS. Having said that, the focus of BGP peer monitoring is primarily placed on eBGP peerings due to the nature of their route exchanges. It is eBGP relationships that form the foundation of global Internet routing. For this reason, we will primarily focus on eBGP peers in the remainder of the article.

The role of eBGP peers

eBGP peers shape how traffic enters and leaves a particular autonomous system.  The common business relationships involved in these peerings include provider-to-customer (p2c), where a provider carries a customer’s traffic to the wider Internet, and the customer typically advertises its own and its customer’s prefixes, and peer to peer (p2p), where networks of similar scale exchange only their own and their customers’ routes, usually on a settlement-free basis.  This mix influences latency, traffic locality, and cost.

Additional entities that affect BGP peers

There are two entities that play an integral role in this process:

  • Internet Exchange Points (IXPs) are neutral “meet-me” facilities distributed around the world that make it easy to establish local BGP peerings, shorten paths, and reduce transit costs.
  • The second is PeeringDB, a free user-managed interconnection database that serves as the industry directory for networks, facilities, policies, and contacts. It helps operators decide which IXPs to join and which peers to approach.

Monitoring prioritization

Considering these entities, from a monitoring perspective, the following metrics should be prioritized: session stability and flaps, prefix counts and unexpected imports, import and export policy compliance, and per-peer or per-IXP performance metrics such as latency, jitter, and packet loss. In this way, leaks or misconfigurations can be identified quickly, and the peering mix can be continuously optimized.

Peering choices are easier to validate in production with an Internet-wide context. Catchpoint Internet Sonar, for example, offers a live, AI-powered view of global Internet health, answering the question: ‘is it us or something else?’ Internet Stack Map builds live dependency maps so you can see when an upstream, IXP, or DNS/CDN issue is the real root cause.

Why are BGP peers important to monitoring BGP?

Today, no organization can reach every network in the world on its own.  It is virtually impossible because of costs and the continuous growth of the Internet.  Every new player would require that every organization be informed immediately about it. The Internet was conceived from the beginning as a decentralized mesh of autonomous systems that must interact using BGP peering to achieve global reachability. Thus, strategic peering is not just a technical necessity but a cornerstone of scalable interconnectivity.

In this context, the integrity and health of each peering session directly affect route availability, path selection, and overall network reachability. 

Because inter-AS policy drives latency, locality, and cost, visibility at the peering layer is non-negotiable. Catchpoint BGP Monitoring provides real-time route visibility and attack/misconfig detection across a global collector set, so teams can spot leaks, hijacks, and reachability regressions before customers do. 

BGP peer monitoring best practices

When monitoring BGP peers, network operators must go beyond simply checking for an up or down state. Session behavior needs to be understood, route exchange patterns must be analyzed, and anomalies must be detected early. That is why the best practices outlined in this article are essential; each is explored in greater detail below.

Monitor BGP peer states

BGP sessions adhere to the rules laid out by the BGP finite state machine (FSM). An FSM is a mathematical model of computation that explicitly describes a model's transitions from one state to another and the conditions that must be met to make each transition. The operation of BGP peering is best understood within the context of an FSM, since BGP peers do progress through a defined sequence of states. When all conditions are successfully met, the peers enter the Established state, at which point routing information is exchanged.

If conditions are not met, the peers will fail to enter the Established state. Instead, they may transition into other intermediate states within the BGP FSM, reflecting various stages of session negotiation and retry attempts.

BGP’s FSM is explicitly defined in RFC 4271, which fully describes the BGP protocol and its operation and is the definitive source for the operation of the protocol. The following diagram shows the states and the transition paths that may be followed through them.

BGP’s finite state machine

Monitoring BGP peer states and the transitions between them is essential for diagnosing the root cause of BGP peering failures. Such monitoring can assess peer health and ensure that acceptable peer transitions are taking place.

Monitoring the state transitions involved in BGP’s FSM also helps with identifying session setup failures and intermittent connectivity issues. Furthermore, it aids in detecting security issues such as TCP RST (reset) attacks. Alerting on unexpected or untimely transitions allows operators and network admins to react before route withdrawal impacts service delivery.

Track peer uptime and flap detection

Let’s start by defining some relevant terminology for better understanding:

  • BGP peer flapping refers to frequent up and down transitions of a BGP session between two BGP peers. Such instability can disrupt route propagation, increase CPU load, and cause widespread network performance issues.
  • Route churn describes rapid and repeated changes in routing information within a network, such as frequent updates, withdrawals, and re-advertisements of BGP routes.

BGP is very different from internal gateway protocols (IGPs) such as OSPF and EIGRP, which are designed to achieve convergence within seconds, or even sub-second intervals. BGP is intentionally designed to be slow to converge, with global convergence times on the order of several minutes to dozens of minutes. This intentional slowness helps avoid routing instability on a global scale. However, it also means that flapping BGP peers can have devastating consequences, causing extreme route churn and widespread latency.

Tracking BGP peer uptime and session resets allows operators to pinpoint the root causes of instability. The detection of persistent flapping should trigger appropriate alerts, leading to an investigation that could potentially reveal problems with physical links, intermediate routers, or upstream policies.

Observe prefix advertising and reception

Once BGP peers reach the Established state, they begin to exchange routing prefixes, which define reachability and influence path selection. Any significant drop or increase in prefix counts from a particular BGP peer should trigger an alarm. 

Monitoring the prefix exchanges between specific BGP peers helps enforce routing policies. Keeping track of the exchange patterns of advertisements for particular peers helps identify any unusual exchange of prefixes that do not conform to established patterns.

Using the resource public key infrastructure (RPKI) to validate route origins is also helpful in conjunction with watching the prefix exchange patterns between BGP peers, protecting against route hijacks or accidental leaks.

Analyze BGP error codes and notifications

Within BGP’s FSM, the typical progression to reach the Established state is through the Idle, Connect, OpenSent, and OpenConfirm states. Any transition other than those due to an error or misconfiguration will result in the sending of a NOTIFICATION BGP message between the BGP peers.

These NOTIFICATION messages contain an error code and a subcode that identify the reasons for not following the most direct path to the Established state. Errors may occur as a result of malformed messages, authentication failures, misconfigurations, or attempts to employ unsupported BGP capabilities.

According to RFC 4271, there are six error codes associated with NOTIFICATION messages and several subcodes associated with each one:

Error Code Description Number of subcodes
1 Message Header Error 3
2 OPEN Message Error 6
3 UPDATE Message Error 11
4 Hold Timer Expired 0
5 Finite State Machine Error 0
6 Cease 0

Additional RFCs, such as RFC 8203, describe methodologies that enhance the BGP NOTIFICATION messages with subcodes that can transmit short freeform messages describing why a BGP session was shutdown or reset.

RFC 9003 further enhances these subcodes by enabling extended BGP administrative shutdown communication with up to 255 octets using multibyte character sets. Thus, messages can contain more information and can be sent in multiple languages.

Other than the codes and subcodes, a NOTIFICATION message also has a variable-length data field, which is used to diagnose the reason for the NOTIFICATION. The contents of this field depend upon the error code and subcode.

BGP peering monitoring means looking at NOTIFICATION messages sent between peers and then tracking and decoding them in real time to allow for quick fault identification, diagnosis, and localization. 

Monitor BGP peer session security

One of the oldest and most effective attacks that can be launched against a BGP router is spoofing a BGP peer. In such an attack, a malicious device impersonates a BGP router and establishes an unauthorized BGP peering with a router on the Internet. If the attacker successfully reaches the Established state, the malicious peer is able to advertise virtually any prefix, often crafted with a wide range of BGP attributes, making its prefix advertisements preferred over all those sent by other BGP peers.

More on this can be explored in this related article.

Typical mitigation techniques include MD5 authentication and the more secure and industry-standard SHA authentication, which ensure the integrity of the session. Additional mechanisms that aid in preventing such attacks include TTL security, known as the Generalized TTL Security Mechanism (GTSM). 

Once established, these safeguards should be monitored for proper operation, and any anomalies detected should trigger alarms, thus helping to preserve the trustworthiness of the BGP session.

Recommendations 

Based on the above analysis, we can summarize some of the best practices into specific recommendations to be adhered to for effective BGP peer monitoring:

  • Track state transitions with alerts: Configure alerting for BGP state changes, especially when peers drop out of the Established state.
  • Log and analyze peer uptime: Track how long peer sessions stay up and identify patterns of instability.
  • Set prefix thresholds: Define expected prefix ranges for each peer, and alert on significant deviations.
  • Decode and act on BGP error messages: Use monitoring tools to capture and analyze BGP NOTIFICATION messages for rapid incident response.
  • Ensure that session security is enforced: Validate that MD5/SHA authentication and TTL settings are correctly applied and active on critical BGP peers.

Last thoughts

The importance of BGP for the correct and reliable operation of the Internet cannot be understated. BGP peers, as a fundamental component of BGP, can be thought of as the backbone of Internet routing, and thus their health is vital to network stability and performance. By monitoring elements of BGP peers, such as session states, uptime, route exchanges, errors, and security mechanisms, network operators can proactively maintain robust BGP connectivity, adding significantly to the reliability and value of their networks.

Chapters

Learn

Heading

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

  • Item A
  • Item B
  • Item C

Text link

Bold text

Emphasis

Superscript

Subscript

Chapters