BGP Troubleshooting Cheat Sheet

Border Gateway Protocol (BGP) is the de facto standard routing protocol connecting networks together to form the internet. It is widely used in data centers and also in large on-premises networks.

In this troubleshooting article, we aim to give you experienced guidance, tips, and tricks to assist you in quickly diagnosing and fixing BGP session issues. By the end, you will understand how to troubleshoot BGP sessions in a methodical and effective manner. For this article, we will use Juniper Junos commands and outputs as examples. 

Regardless of the BGP use case or configured options, there is a common set of troubleshooting steps we can take to address session issues. In each section, we’ll explain the possible issues and how to verify correctness. The following table provides a quick overview:

ACLs / Firewalls Troubleshoot firewall issues that are preventing BGP sessions from coming up.
Authentication Ensure that authentication parameters match and allow the session to come up.
AS Numbers Ensure that local and peer AS numbers are properly configured on both sides.
Multihop TTL Ensure that multihop sessions have a workable TTL value set.
BGP Notification Messages Address BGP error messages defined by RFC4271, pinpointing configuration problems or other issues.
Route Exchange Learn how to verify routes sent, received, accepted, and active as well as what these different counters refer to.
MTU Mismatch How to deal with a common “gotcha”: an MTU (TCP MSS) mismatch that leads to unpredictable route exchange issues.
Common Commands A list of helpful commands to verify status and troubleshoot issues.

BGP Troubleshooting Cheat Sheet

ACLs / Firewalls

For two BGP peers to communicate and establish a session, the first requirement is IP and TCP reachability on port 179. The BGP daemon listens on TCP port 179, so when a BGP speaker is configured to establish a session with a peer, it sends a BGP OPEN message to its peer’s IP address on port 179. Usually, both speakers send this BGP OPEN message, but only one of these “session requests” will win.

You must permit each speaker to receive traffic from peers on TCP port 179. Start troubleshooting using ping x.x.x.x and telnet x.x.x.x port 179 to check.

A good way of configuring your router’s loopback firewall filter is to use apply-path statements to ensure that all BGP peers are permitted. For example:

set policy-options prefix-list BGP_NEIGHBORS apply-path "protocols bgp group <*> neighbor <*.*>"
set policy-options prefix-list BGP_NEIGHBORS_IPV6 apply-path "protocols bgp group <*> neighbor <*:*>"

Then, in your loopback firewall filters, assuming you have a default term deny at the end, you can do this

set firewall family inet filter LOOPBACK term BGP_ALLOW from source-prefix-list BGP_NEIGHBORS
set firewall family inet filter LOOPBACK term BGP_ALLOW from protocol tcp
set firewall family inet filter LOOPBACK term BGP_ALLOW from port bgp
set firewall family inet filter LOOPBACK term BGP_ALLOW then accept

set firewall family inet filter LOOPBACK_IPV6 term BGP_ALLOW from source-prefix-list BGP_NEIGHBORS_IPV6
set firewall family inet filter LOOPBACK_IPV6 term BGP_ALLOW from payload-protocol tcp
set firewall family inet filter LOOPBACK_IPV6 term BGP_ALLOW from port bgp
set firewall family inet filter LOOPBACK_IPV6 term BGP_ALLOW then accept

Authentication

If configured with authentication, typically MD5 hashed, the password must match on both sides. Ensure that you’ve configured the same password on both ends, taking into account quotes, special characters, and whitespace. An example of configuration is shown here:

set protocols bgp group TRANSIT neighbor 2001:dead:beef::1 authentication-key "$9$Bfeg3gSHRs4Gsfg"

Reference

AS Numbers

When configuring a BGP session, the local router being configured has a system AS number defined, such as AS 65100. The remote router will have its own system AS number, e.g., 65200. The local system AS is known as the local AS, and the remote system AS is known as the peer AS. You must ensure that each router is configured with the correct peer AS for the session. If this is plain iBGP, the peer AS will match. If this is eBGP, the AS numbers will differ.

Note: It is possible to use multiple local AS numbers and vary those numbers depending on the session, but this is advanced usage beyond the scope of this article. In general, it is best to ensure that the AS numbers match.

To verify the BGP session characteristics, run:

user@router> show bgp neighbor 10.10.10.1

A full output example is given on Juniper’s website, and you can study each attribute there. For troubleshooting, it’s usually best to draw attention to a few key fields, and so a more targeted command to run is:

user@router> show bgp neighbor 10.10.10.1 | match "description|state:|prefixes|flaps|NLRI"

Reference

Multihop TTL

iBGP sessions are multihop, but eBGP sessions default to a TTL of 1 unless configured specifically as multihop sessions. This is because most eBGP sessions are configured with directly connected neighbors. With multihop, specify a high enough number of hops to successfully reach the remote peer even in the event of traffic rerouting. If you sometimes see an eBGP multihop session go down without other obvious cause, it may be due to a low TTL for multihop. You can configure multihop like so:

set protocols bgp group FOO neighbor 192.168.10.10 multihop ttl 8

BGP Notification Messages

BGP notifications are error messages defined by the BGP protocol specification and are highly informative about session initiation and session teardown issues. These notification messages will explicitly inform you of configuration issues or configuration changes, such as mismatched AS numbers or the number of routes exchanged exceeding a preconfigured prefix-limit threshold. 

Example:

rpd[82483]: bgp_process_open:8311: NOTIFICATION sent to 2001:dead:beef::1 (External AS 64512): code 2 (Open Message Error) subcode 2 (bad peer AS number), Reason: peer 2001:dead:beef::1 (External AS 64512) claims 65111

To look for BGP notification errors or other useful router log messages related to a BGP session, you can issue a command like this:

user@router> show log messages | match 10.10.10.1

Reference

Reference

Route Exchange

If the BGP session is up but traffic is not passing as expected, or if you fail to receive a route you expect, the following commands are useful in validating what is or is not sent, received, accepted, and active.

Verify the number of routes sent and received by running this command:

user@router> show bgp neighbor 10.10.10.1 | match prefixes
    Active prefixes:              876825
    Received prefixes:            879399
    Accepted prefixes:            877055
    Advertised prefixes:          23

Note that:

  • Sent indicates the number of BGP routes exported or advertised to the peer (neighbor).
  • Received specifies the number of BGP routes received from the peer. These received routes are stored in a routing information base (RIB) called Adj-RIB-in. Each BGP peering session has its own Adj-RIB-in RIB table.
  • Accepted indicates the number of BGP routes received that were permitted by the configured import policy. For example, your import policy may reject bogon routes such as RFC1918 addresses, so this number is often lower than the received number. These routes are accepted into the main inet.0 / int6.0 RIB.
  • Active refers to the number of routes that have been received from the peer, accepted into the main RIB and which have then become active routes used for forwarding traffic; active routes are the routes selected by the route-selection decision process, aka ‘best path’.

You can see the routes sent by running this command:

user@router> show route advertising-protocol bgp 10.10.10.1

inet.0: 881772 destinations, 4634165 routes (879214 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
* 10.100.0.0/16           Self                 0                  I
* 10.200.0.0/16           Self                 0                  I
* 172.16.31.0/24          Self                 0                  I
<etc.>

You can see the routes received by running this command, showing the Adj-RIB-In. In this example, this is a full-table transit provider feed, in this case from AS 3356:

user@router> show route receive-protocol bgp 10.10.10.1 table inet.0

inet.0: 881772 destinations, 4634165 routes (879214 active, 0 holddown, 0 hidden)
  Prefix        Nexthop      MED     Lclpref    AS path
* 1.0.0.0/24    10.10.10.1                      3356 4777 13335 I
* 1.0.4.0/22    10.10.10.1                      3356 4777 6939 4826 38803 I
* 1.0.4.0/24    10.10.10.1                      3356 4777 6939 4826 38803 I

For example, if you expect to receive a route 172.16.100.0/24 from a peer, but this is not seen in your main RIB via show route 172.16.100.0/24 exact, you can check to see if it was actually received at all. If you do see it received but not in your main RIB, it’s possible that it is being rejected by your import policy.

Most BGP sessions will negotiate address families or types of Network Layer Reachability Information (NLRIs). Commonly used address families include inet-unicast for plain IPv4 routes and inet6-unicast for plain IPv6 routes; layer 3 VPN routes are carried by inet-vpn unicast and inet6-vpn unicast. Typical configurations might be:

set protocols bgp group SOMEGROUP family inet unicast
set protocols bgp group SOMEGROUP_IPV6 family inet unicast
or
set protocols bgp group ANOTHERGROUP family inet any

It’s important to know that two BGP speakers must have at least one NLRI in common in order to establish a session. Sometimes one speaker has more families configured than the other, but this in itself is not a problem. Changing configured families will usually result in the session bouncing.

MTU/MSS Mismatch

Note that BGP sessions rely on IP and TCP transport, and MTU is a factor. Specifically, the negotiated TCP MSS will determine what the maximum BGP packet size is. As BGP keepalives are very small, it is sometimes possible that BGP sessions will come up with mismatched path MTUs, and keepalives will work fine, but route advertisement packets will be dropped.

In this scenario, you sometimes see a “stuck” BGP session, where both ends show some routes sent and received, but the dynamic route exchange gets stuck due to lost update packets. In this case, a route that should be advertised or withdrawn may not be, which can cause suboptimal routing or traffic blackholing.

Ensure that your interface and path MTUs match, and enable path-mtu-discovery under protocols bgp, your BGP neighbor group, or BGP peer session directly. This will ensure that the maximum permissible “MTU” (actually, TCP MSS) is used, eliminating mismatches and ensuring optimal route exchange: fewer, larger packets means more efficient convergence.

Example:

set protocols bgp group TRANSIT mtu-discovery
set protocols bgp group TRANSIT_IPV6 mtu-discovery

To double check the actual TCP MSS for the BGP session, you can use the following - note the format, <peer IP>.179:

user@router> show system connections extensive | match "10.10.10.1.179|mss"
<snip>
tcp4       0      0  10.10.10.1.179                               10.10.10.2.52896                             ESTABLISHED
    rttmin:       1000  mss:       8192        jlocksmode:          1

Here you can see that the negotiated MSS is 8192, which is expected for Junos MTU discovery when the eBGP link MTU is jumbo-sized (IP MTU 9000).

BGP Command Summary

Here’s a summary of useful commands for seeing overall BGP status and digging deep.

# get basic BGP session info
show bgp summary
show bgp group <group name>
show bgp neighbor <peer IP>

# find error messages
show log messages | match bgp | last 100

# find TCP MSS size for session
show system connections | match “179|mss”

# see routes sent/received
show route receive-protocol bgp <peer IP>
show route advertising-protocol bgp <peer IP>

# clear BGP sessions
Clear bgp neighbor soft-inbound <peer IP>  # outbound 
Clear bgp neighbor soft-inbound <peer IP>  # inbound only
Clear bgp neighbor <peer IP>               # clear session

Summary

You can see that the best way to troubleshoot a downed or problematic BGP session is to ensure that you start with the most basic and common issues first. By following these steps, you should be able to resolve most issues. 

Reachability

  • Ensure IP reachability with ICMP ping. In case of problems, check ARP: show arp no-resolve | match <peer IP>. In case of no ARP, check physical connections, VLAN ID, etc.
  • Ensure TCP reachability on port 179 with telnet. This can be useful as it doesn’t rely on the BGP session working but verifies that IP/TCP is working.
  • In case of issues, you may have a routing problem or an ACL/firewall problem. Another possibility is that the remote end may not have the BGP session configured and therefore may not be listing on TCP 179.

Matching Configuration

Various parameters must match on both ends for proper session establishment and communication:

  • MTU between routers: mtu-discovery is your friend.
  • AS numbers on local and peer routers.
  • Authentication, if used.

Other parameters must at least be compatible on both ends for proper session establishment and communication:

  • TTL between routers, if using multihop.
  • BGP session families/NLRIs: at least one family must match.

Error Messages

  • Look in router logs for messages containing the BGP peer’s IP address.
  • BGP notification messages are particularly helpful.

Route Exchange

  • Ensure that you check that routes sent and received are as expected.
  • Comparing received vs. accepted routes may highlight a BGP policy issue to resolve.
  • In some cases, mismatched NLRIs could pose a problem: Ensure that both ends are configured for the desired and compatible set of NLRIs.

Chapters