1. The time a packet takes to travel from Point A to Point B can be referred to as Latency.
2. Latency is expressed in milliseconds
3. The contributors to network latency include:
- Actual Propagation Time
- Transmission Media
- L3 device or a Firewall Processing
- Storage Delays at Switches or Bridges
4. Usually, the latency being calculated using network tools such as ping tests & traceroute utility.
5. The ping command measures how long it takes for the data packet (usually 32 bytes) to leave the source computer, travel to the destination computer, and return back to the source computer. This round-trip-time is measured in milliseconds.
Demonstrate and troubleshoot the Latency issue in a network.
|Windows Client Machine IP-address||10.10.10.100|
|Acceptable Latency between Client & Server Communication||~1-2ms|
When we face an issue with the latency and this being pointed on the next L3 device say a Firewall, then we can follow the below approach for troubleshooting:
- Initiate the traffic and record the Latency on the Client Machine.
- Capture on the Ingress Interface of L3 device.
- Capture on the Egress Interface of L3 device.
- Calculate the time the L3 device took to process the packet.
- Record the time to get the response from the Server for a Request Sent out of the egress interface of L3 device.
1) Send the ping traffic from Client machine to the Server and record the latency. Under normal working scenario, the latency is around 2ms.
2) Capture the packets on the Ingress Interface (eth0) of the Firewall using either tcpdump or fw monitor utility.
Let’s consider the highlighted ping request & reply packet. The request packet arrived on the Ingress interface at 15:21:43.429240 . The reply packet is going out of Ingress interface towards client machine at 15:21:43.430064 .
3) Capture the packets on the Egress Interface (eth1) of the Firewall using either tcpdump or fw monitor utility.
The request packet after firewall processing going out of Egress interface at 15:21:43.429460 . The reply packet coming from the Server is hitting Egress interface at 15:21:43.429927 .
4) Let’s do the math for Processing Time on Firewall:
- The amount of time Firewall took to process a ping request packet is (15:21:43.429460)-(15:21:43.429240) = 220micro seconds ( or 0.22ms )
- The amount of time Firewall took to process the ping reply packet is (15:21:43.430064)-(15:21:43.429927) = 137micro seconds ( or 0.137ms )
- Total Firewall Processing time for a round-trip ping packet is 0.22ms + 0.137ms = 0.357ms .
5) Total Time taken by the Server (or any other intermediate devices after Firewall) took to process and reply for the request can be calculated with reference to the capture on the Egress Interface eth1, when a request sent out and the reply received.
So, in our case 15:21:43.429927 – 15:21:43:429460 = 467micro seconds (or 0.467ms)
6) In this scenario, the total time taken by Firewall & Server to process and provide the reply to the Client Machine is within the expected Latency (0.347 + 0.467 = 0.824ms).
1) To simulate this scenario, I have used the built-in Linux Traffic Control (tc) & Network Emulator (netem) utility.
2) In order to apply a delay of 500ms on packets going through Egress Interface (eth1), run below command on Firewall:
# tc qdisc add dev eth1 root netem delay 500ms
In order to revert the changes, I mean the delay factor on any interface, run:
# tc qdisc del dev eth1 root netem delay 500ms
1) Send the ping traffic from Client machine to the Server and record the latency.
There is a latency of around 500ms between the Client & Server communication. We need to investigate who’s causing the latency.
2) Capture the packets on the Ingress Interface (eth0) of the Firewall using tcpdump utility.
3) Capture the packets on the Egress Interface (eth1) of Firewall.
4) Calculate Firewall Processing Time:
- For processing Request Ping Packet (15:58:07.569066) – (15:58:07.068472) = 500594 micro seconds (or 500.594ms).
- For processing Reply Ping Packet (15:58:07.569355) – (15:58:07.569263) = 92 micro seconds (or 0.92ms).
- Total Firewall Processing time for a Round-Trip Ping Packet is (500.594ms + 0.92ms) = 501.514ms
Clearly, there is an issue with the Firewall which taking more than the expected time to process the communication between the Client & Server. The common reasons for this behavior are
- High utilization of resources (RAM / CPU) on the Firewall.
- Due to the additional feature enabled on the Firewall.
- Interface Ring Size (Buffer Size).
1) To simulate this scenario, I have used a Windows Tool called Clumsy.
2) Configured the Clumsy tool on Server side to add a latency of 500ms to any request to it.
1) Configure the Server Machine to add a delay of 500ms to the communication from Client using Clumsy utility.
2) Send the ping traffic from Client machine to the Server and record the latency.
There is a latency of around 500ms between the Client & Server communication. Let’s investigate who’s causing the latency using tcpdump utility.
3) Capture the packets on the Ingress Interface (eth0) of the Firewall using tcpdump utility.
4) Capture the packets on the Egress Interface (eth1) of Firewall.
5) Calculate Firewall Processing Time:
- For processing Request Ping Packet (22:56:45.384798) – (22:56:45.384575) = 223 micro seconds (or 0.223ms).
- For processing Reply Ping Packet (22:56:45.893135) – (22:56:45.892886) = 249 micro seconds (or 0.249ms).
- Total Firewall Processing time for a Round-Trip Ping Packet is (500.594ms + 0.92ms) = 0.472ms
There is no issue with the firewall processing aspect. Let’s see what happens on the server side.
6) Calculate the time taken by the Server (or any other device after firewall) to respond to a request sent out of firewall’s Egress interface (eth1):
- Time taken to get a response for a request is (22:56:45.893135) – (22:56:45.384798) = 508337 microseconds (or 508.337ms)
So, the latency which was faced at the client side was due to the delay in response from the server or the devices placed after the firewall.
The Server-side processing or the devices coming in between the client and server might add the latency because of the below possible reasons:
- High utilization of resources (RAM / CPU) on the Server side.
- A number of simultaneous connections handled by Server. etc..