Border Gateway Protocol (BGP) is the de facto standard routing protocol connecting networks together to form the internet. It is widely used in data centers and also in large on-premises networks.
In this troubleshooting article, we aim to give you experienced guidance, tips, and tricks to assist you in quickly diagnosing and fixing BGP session issues. By the end, you will understand how to troubleshoot BGP sessions in a methodical and effective manner. For this article, we will use Juniper Junos commands and outputs as examples.
Regardless of the BGP use case or configured options, there is a common set of troubleshooting steps we can take to address session issues. In each section, we’ll explain the possible issues and how to verify correctness. The following table provides a quick overview:
BGP Troubleshooting Cheat Sheet
ACLs / Firewalls
For two BGP peers to communicate and establish a session, the first requirement is IP and TCP reachability on port 179. The BGP daemon listens on TCP port 179, so when a BGP speaker is configured to establish a session with a peer, it sends a BGP OPEN message to its peer’s IP address on port 179. Usually, both speakers send this BGP OPEN message, but only one of these “session requests” will win.
You must permit each speaker to receive traffic from peers on TCP port 179. Start troubleshooting using ping x.x.x.x and telnet x.x.x.x port 179 to check.
A good way of configuring your router’s loopback firewall filter is to use apply-path statements to ensure that all BGP peers are permitted. For example:
Then, in your loopback firewall filters, assuming you have a default term deny at the end, you can do this
If configured with authentication, typically MD5 hashed, the password must match on both sides. Ensure that you’ve configured the same password on both ends, taking into account quotes, special characters, and whitespace. An example of configuration is shown here:
When configuring a BGP session, the local router being configured has a system AS number defined, such as AS 65100. The remote router will have its own system AS number, e.g., 65200. The local system AS is known as the local AS, and the remote system AS is known as the peer AS. You must ensure that each router is configured with the correct peer AS for the session. If this is plain iBGP, the peer AS will match. If this is eBGP, the AS numbers will differ.
Note: It is possible to use multiple local AS numbers and vary those numbers depending on the session, but this is advanced usage beyond the scope of this article. In general, it is best to ensure that the AS numbers match.
To verify the BGP session characteristics, run:
A full output example is given on Juniper’s website, and you can study each attribute there. For troubleshooting, it’s usually best to draw attention to a few key fields, and so a more targeted command to run is:
iBGP sessions are multihop, but eBGP sessions default to a TTL of 1 unless configured specifically as multihop sessions. This is because most eBGP sessions are configured with directly connected neighbors. With multihop, specify a high enough number of hops to successfully reach the remote peer even in the event of traffic rerouting. If you sometimes see an eBGP multihop session go down without other obvious cause, it may be due to a low TTL for multihop. You can configure multihop like so:
BGP Notification Messages
BGP notifications are error messages defined by the BGP protocol specification and are highly informative about session initiation and session teardown issues. These notification messages will explicitly inform you of configuration issues or configuration changes, such as mismatched AS numbers or the number of routes exchanged exceeding a preconfigured prefix-limit threshold.
To look for BGP notification errors or other useful router log messages related to a BGP session, you can issue a command like this:
If the BGP session is up but traffic is not passing as expected, or if you fail to receive a route you expect, the following commands are useful in validating what is or is not sent, received, accepted, and active.
Verify the number of routes sent and received by running this command:
- Sent indicates the number of BGP routes exported or advertised to the peer (neighbor).
- Received specifies the number of BGP routes received from the peer. These received routes are stored in a routing information base (RIB) called Adj-RIB-in. Each BGP peering session has its own Adj-RIB-in RIB table.
- Accepted indicates the number of BGP routes received that were permitted by the configured import policy. For example, your import policy may reject bogon routes such as RFC1918 addresses, so this number is often lower than the received number. These routes are accepted into the main inet.0 / int6.0 RIB.
- Active refers to the number of routes that have been received from the peer, accepted into the main RIB and which have then become active routes used for forwarding traffic; active routes are the routes selected by the route-selection decision process, aka ‘best path’.
You can see the routes sent by running this command:
You can see the routes received by running this command, showing the Adj-RIB-In. In this example, this is a full-table transit provider feed, in this case from AS 3356:
For example, if you expect to receive a route 172.16.100.0/24 from a peer, but this is not seen in your main RIB via show route 172.16.100.0/24 exact, you can check to see if it was actually received at all. If you do see it received but not in your main RIB, it’s possible that it is being rejected by your import policy.
Most BGP sessions will negotiate address families or types of Network Layer Reachability Information (NLRIs). Commonly used address families include inet-unicast for plain IPv4 routes and inet6-unicast for plain IPv6 routes; layer 3 VPN routes are carried by inet-vpn unicast and inet6-vpn unicast. Typical configurations might be:
It’s important to know that two BGP speakers must have at least one NLRI in common in order to establish a session. Sometimes one speaker has more families configured than the other, but this in itself is not a problem. Changing configured families will usually result in the session bouncing.
Note that BGP sessions rely on IP and TCP transport, and MTU is a factor. Specifically, the negotiated TCP MSS will determine what the maximum BGP packet size is. As BGP keepalives are very small, it is sometimes possible that BGP sessions will come up with mismatched path MTUs, and keepalives will work fine, but route advertisement packets will be dropped.
In this scenario, you sometimes see a “stuck” BGP session, where both ends show some routes sent and received, but the dynamic route exchange gets stuck due to lost update packets. In this case, a route that should be advertised or withdrawn may not be, which can cause suboptimal routing or traffic blackholing.
Ensure that your interface and path MTUs match, and enable path-mtu-discovery under protocols bgp, your BGP neighbor group, or BGP peer session directly. This will ensure that the maximum permissible “MTU” (actually, TCP MSS) is used, eliminating mismatches and ensuring optimal route exchange: fewer, larger packets means more efficient convergence.
To double check the actual TCP MSS for the BGP session, you can use the following - note the format, <peer IP>.179:
Here you can see that the negotiated MSS is 8192, which is expected for Junos MTU discovery when the eBGP link MTU is jumbo-sized (IP MTU 9000).
BGP Command Summary
Here’s a summary of useful commands for seeing overall BGP status and digging deep.
You can see that the best way to troubleshoot a downed or problematic BGP session is to ensure that you start with the most basic and common issues first. By following these steps, you should be able to resolve most issues.
- Ensure IP reachability with ICMP ping. In case of problems, check ARP: show arp no-resolve | match <peer IP>. In case of no ARP, check physical connections, VLAN ID, etc.
- Ensure TCP reachability on port 179 with telnet. This can be useful as it doesn’t rely on the BGP session working but verifies that IP/TCP is working.
- In case of issues, you may have a routing problem or an ACL/firewall problem. Another possibility is that the remote end may not have the BGP session configured and therefore may not be listing on TCP 179.
Various parameters must match on both ends for proper session establishment and communication:
- MTU between routers: mtu-discovery is your friend.
- AS numbers on local and peer routers.
- Authentication, if used.
Other parameters must at least be compatible on both ends for proper session establishment and communication:
- TTL between routers, if using multihop.
- BGP session families/NLRIs: at least one family must match.
- Look in router logs for messages containing the BGP peer’s IP address.
- BGP notification messages are particularly helpful.
- Ensure that you check that routes sent and received are as expected.
- Comparing received vs. accepted routes may highlight a BGP policy issue to resolve.
- In some cases, mismatched NLRIs could pose a problem: Ensure that both ends are configured for the desired and compatible set of NLRIs.