Preseem is a Quality of Experience (QoE) monitoring and optimization platform that leverages highly granular and frequent network telemetry to find network problems that negatively affect the subscriber experience. This post outlines a real-world WISP network configuration problem that was detrimental to the subscriber experience and was identified by Preseem.
We are currently working with a WISP customer to roll-out Preseem in their network. One of the first things we observed was that the percentage of retransmitted or out of order TCP segments in the network was much higher than what is typical for WISP networks.
The chart above shows the percentage of TCP segments that were retransmitted or delivered out-of-order and also shows the dramatic change once the underlying issue was fixed.
To explain out of order delivery, TCP will be used throughout the rest of this post but note that out of order packets negatively impact most other types of traffic as well. In TCP terms, the unit of transmission is called a segment.
TCP, the most common transport protocol, is a reliable byte stream protocol. That is, it delivers an ordered byte stream from one application to another. Since packet networks are inherently lossy, TCP compensates for this by adding a sequence number to every segment sent. This allows the receiving end to ask for missing segments and reconstruct the order if the segments get reordered in the network.
Being able to quickly respond to loss is critical to TCP performance and is accomplished through techniques like Fast Retransmit. Unfortunately, these and other techniques interact poorly with a network that delivers packets out of order. A quick search finds many articles that explain the impacts of packet reordering on TCP flows. At a high level, packet reordering causes:
- Delayed delivery of data to applications
- Unnecessary retransmissions
The end result is higher application latency and lower throughput to the end subscriber – hence a poorer subscriber experience.
There are two main primary causes of packet recording: route changes and flow unaware parallelism.
Many networks are built with redundant network paths – obviously, this is a good thing. When there is a failure or policy change, the path that communication between two particular end nodes use may change. While the network converges on the new path, packets from the same flow can take different paths. This leads to packet reordering at the end host but this is typically a short-lived phenomenon.
Flow unaware parallelism is typically the source of persistent packet reordering. The canonical example of this is bonded network interfaces between two switches or routers.
Bonding interfaces is often desirable when the total traffic exceeds the capacity of a single link. The problem occurs when the outgoing link of a bonded pair is chosen by a simple algorithm such as round-robin. That is, packet 1 goes to link 1, packet 2 goes to link 2, packet 3 goes to link 1, repeat. In this case, packets from a single flow take a different, albeit very similar path to the end node. Given variations such as packet size and buffer utilization, this can result in a surprising amount of packet reordering. Even more insidiously, packet reordering can occur within a single device (router, switch) when packets are processed different processors or NICs. In both cases, the solution is to do flow aware load balancing. That is, all packets for a given flow are sent to a single link or processor. This is often accomplished by hashing fields of the packet header.
Indeed, the source of the high percentage of out of order TCP segments in this WISP network was bonded Ethernet interfaces that didn’t have consistent flow hashing configured. It took a bit of investigation by the customer to determine the troublesome devices but Preseem made the probably painfully apparent within minutes of being deployed.
To find out how your network configuration impacts your subscriber quality of experience, contact us for a free 30-day trial of Preseem.