Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

TCP Performance Tuning

News

Commercial Linuxes

Recommended Books

Recommended Links

Performance tuning

Kernel parameters tuning on Linux Performance Monitoring
 tcpdump iptraf netstat ntop nfsstat lsof vmstat
Disk subsystem tuning Linux Kernel Tuning Linux Virtual Memory Subsystem Tuning TCP performance tuning NFS performance tuning strace sar

Troubleshooting Linux Performance

Linux performance bottlenecks

Linux Swap filesystem

VMware Virtualization Humor Etc

Notes:


Top updates

Bulletin Latest Past week Past month
Google Search


Old News

[Jan 26, 2011] UNIX network performance analysis

Adapted from UNIX network performance analysis by Martin Brown, published at developerWorks Sep 08, 2009
 

The performance of your network can have a significant impact on the general performance and reliability of the rest of your environment. If different applications and services are waiting for data over the network, or your clients are having trouble connecting or receiving the information, then you need to address these issues.

Performance issues can also affect the reliability of your applications and environment, and can both be triggered by network faults, and in some cases they can even be the reason for a network fault. To understand and diagnose network issues, you first need to understand the nature of the issue; usually the problem will be related either to a latency or a bandwidth issue.

In general, network performance issues are often tied to the underlying hardware; you cannot exceed the physical limits of the network environment.

This article looks at the following steps involved in identifying performance issues:

Understanding network metrics

To understand and diagnose performance issues, you first need to determine your baseline performance level. Let's first introduce two of the key concepts used in determining baseline performance: network latency and network bandwidth.

Network latency

The network latency is the time between sending a request to a destination and the destination actually receiving the sent packet. As a metric for network performance, increased latency is a good indicator of a busy network, as it either indicates that the number of packets being transmitted exceeds the capacity, or that the senders of data are having to wait before either transmission or re-transmission.

Network latency can also be introduced when the complexity of the network and the number of hosts or gateways that a packet has to travel through increases. The length of cable between points can also have an effect on the latency. For long distances, traditional copper cable will always be slower than using a fibre optic connection.

Network latency is also different from application latency. Network latency deals exclusively with the transmission of packets over the network, while application latency refers to the delay between the application receiving a request and its ability to respond.

Network bandwidth

Bandwidth is a measure of the number of packets that can be transmitted over a network during a specific period of time. The bandwidth affects how much data can be transmitted, and will either limit the transmission of data to one host to the practical maximum supported by the network connection, or will limit the aggregate transmission rate when dealing with multiple simultaneous connections.

The network bandwidth should, in theory, never change, unless you change the networking interface and hardware. The major variable within network bandwidth is in the number of hosts using the network at any given time.

For example, a 1GB Ethernet interface can talk 1GB to one other network host, 100MB to ten simultaneous hosts, or 10MB to 100 hosts. In reality, of course, the sustained bandwidth is not often required. There will be many hundreds of smaller requests from a number of different hosts over a period of time, and so the available bandwidth of a server can appear much greater than the sum of the client bandwidth.

Getting network statistics

Before you can identify whether there is a problem within your network, you first need to have a baseline performance on which to base your assumptions. To do this you must check the various parameters -- latency, performance and any tests relevant to your network application environment -- to determine the performance and then monitor and compare this over time.

When performing the baseline networking tests, you should do them under controlled conditions. Ideally, you should perform them under both isolated (meaning with no other network traffic) and with typical network traffic to give you the two baselines:

For the actual testing process, there are a number of standard tools and tests that you can perform to determine your baseline values.

Measuring latency

The ping sends an echo packet to the device, and expects the device to echo the packet contents back. During the process, ping can monitor the time it takes to send and receive the response, which can be an effective method of measuring the response time of the echo process. In the simplest form, you can send an echo request to a host and find out the response time:

$ ping example

PING example.example.pri (192.168.0.2): 56 data bytes
64 bytes from 192.168.0.2: icmp_seq=0 ttl=64 time=0.169 ms
64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.167 ms
^C
--- example.example.pri ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.167/0.168/0.169/0.001 ms

You need to use Control-C to stop the ping process. On Solaris and AIX®, you need to use the -s option to send more than one echo packet and get the timing information. For getting baseline figures, you can use the -c option (on Linux®) to specify the count. On Solaris/AIX, you must specify the packet size (the default is 56 bytes), and the number of packets to send so that you do not have to manually terminate the process. You can then use this to extract the timing information automatically:

$ ping -s example 56 10
PING example: 56 data bytes
64 bytes from example.example.pri (192.168.0.2): icmp_seq=0. time=0.143 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=1. time=0.163 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=2. time=0.146 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=3. time=0.134 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=4. time=0.151 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=5. time=0.107 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=6. time=0.142 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=7. time=0.136 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=8. time=0.143 ms
64 bytes from example.example.pri (192.168.0.2): icmp_seq=9. time=0.103 ms

----example PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms)  min/avg/max/stddev = 0.103/0.137/0.163/0.019

The example above was made during a quiet period on the network. If the host being checked (or the network itself) was busy during the testing period, the ping times could be increased significantly. However, ping alone is not necessarily an indicator of a problem, but it can occasionally give you a quick idea if there is something that needs to be identified.

It is possible to switch off support for ping, and so you should ensure that you can reach the host before using it as a verification that a host is available.

Ideally, you should track the ping times between specific hosts over a period of time, and even continually, so that you can track the average response times and then identify where to start looking.

>Using sprayd

The sprayd daemon and the associated spray tool send a large stream of packets to a specified host and determine how many of those packets get a response. As a method for measuring the performance of a network, it should not be relied on as a performance metric because it uses a connectionless transport mechanism. By definition, packets sent using connectionless transport are not guaranteed to reach their destination, and so dropped packets are allowed in the communication anyway.

That said, using spray can tell you whether there is a lot of traffic on the network, because if the connectionless transport (UDP) is dropping packets, then it probably means the network (or the host) is too busy to carry the packets.

Spray is available on Solaris and AIX, and some other UNIX platforms. You may need to enable the spray daemon (usually through inetd) to use it. Once the sprayd daemon has been started, you can run spray specifying the hostname

$ spray tiger
sending 1162 packets of length 86 to tiger ...
        101 packets (8.692%) dropped by tiger
        70 packets/sec, 6078 bytes/sec

As already mentioned, the speed should not be relied upon, but the dropped packet counts can be a useful metric.

Using simple network transfer tests

The best method for determining the bandwidth performance of your network is to check the actual speed when transferring data to or from the machine. There are lots of different tools that you can use to perform the tests across a number of different applications and protocols, but usually the simplest method is the most effective one.

For example, to determine the network bandwidth when transferring a file over the network using NFS, you can time a simple file transfer test. To create a simple test, create a large file using mkfile (for example, 2GB: $ mkfile 2g 2gbfile), and then time how long it takes to transfer the file over a network to another machine:.

$ time cp /nfs/mysql-live/transient/2gbfile .

real	3m45.648s
user	0m0.010s
sys	0m9.840s
You should run the tests multiple times and then take the average of the transfer process to get an idea of the standard performance.

You can automate the copy and timing process by using a Perl script:

#!/usr/bin/perl
               
use Benchmark; 
use File::Copy;
use Data::Dumper;
                 
my $file = shift or die "Need a file to copy from\n";
my $srcdir = shift or die "Need a source directory to copy from\n";
my $count = shift || 10;
                        
my $t = timeit($count,sub {copy(sprintf("%s/%s",$srcdir,$file),$file)}); 
                 
printf("Time is %.2fs\n",($t->[0]/$count)); 

To execute, supply the name of the source file and the source directory, and an optional count of the number of copies to make. You can then execute the script and get a time:.

$ ./timexfer.pl 2gbfile /nfs/mysql-live/transient 20
Time is 28.45s

You can use this both to create a baseline figure and during normal operations to check the transfer performance.

Diagnosing a problem

Typically, you will identify a network problem only when a network-related application fails for some reason. However, it is important to identify that the problem is network related and not a problem elsewhere.

First, you should try to reach the machine using ping. If the machine does not respond to a ping request, and other network communication does not work, then your first option should be to check the physical cables and make sure everything is still connected.

If you can still connect to the machine, but the ping time is increased, then you need to determine where the problem lies. An increase in ping times can in rare cases be related to the load on the machine, but more often than not indicates an issue with the network.

Once you get a long ping time from one machine, you should run ping from another machine on the network, ideally on a different network switch, to find out if the problem is related to the specific machine or the network.

Checking network stats

If the ping times are higher than you expect, then you should start to get some basic statistics about the network interface you are using to see if the problem is related to the network interface, or a specific protocol.

Under Linux, you can get some basic network statistic information by using the ifconfig tool:

$ ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:1a:ee:01:01:c0  
          inet addr:192.168.0.2  Bcast:192.168.3.255  Mask:255.255.252.0
          inet6 addr: fe80::21a:eeff:fe01:1c0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7916836 errors:0 dropped:78489 overruns:0 frame:0
          TX packets:6285476 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:11675092739 (10.8 GiB)  TX bytes:581702020 (554.7 MiB)
          Interrupt:16 Base address:0x2000 

The important rows are those beginning RX and TX, which show information about the packets sent and received. The packets value is a simple count of the packets transferred. The errors, dropped, and overruns figures show how many of the packets indicated some kind of fault. A high number of dropped packets in comparison to the packets sent probably indicate that the network is busy.

You can also get extended statistic information on all platforms by using the netstat tool. Under Linux, the tool provides more specific base protocol statistics, such as the packet transmissions for TCP-IP and UDP packet types. Again, the information contains some basic statistics.

$ netstat -s
Ip:
    8437387 total packets received
    1 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    8437383 incoming packets delivered
    6820934 requests sent out
    6 reassemblies required
    3 packets reassembled ok
... ... ... 

Under Solaris and other UNIX variants, the information provided by netstat differs depending upon the platform. For example, under Solaris, you get detailed statistics for each protocol, and separate information for IPv4 and IPv6 connections (see Listing 9). The output in the listing has been truncated.

$ netstat -s

RAWIP   rawipInDatagrams    =   440     rawipInErrors       =     0
        rawipInCksumErrs    =     0     rawipOutDatagrams   =    91
        rawipOutErrors      =     0

UDP     udpInDatagrams      = 15756     udpInErrors         =     0
        udpOutDatagrams     = 16515     udpOutErrors        =     0

TCP     tcpRtoAlgorithm     =     4     tcpRtoMin           =   400
        tcpRtoMax           = 60000     tcpMaxConn          =    -1
... ... ... 
...

In all cases, you are looking for a high level of error packets, retransmissions, or dropped packet transmission, all of which indicate that the network is busy. If the error rate is excessively high compared to the packets transmitted or received, then it may indicate a fault with the network hardware.

Checking NFS stats

When checking problems related to NFS connections, and indeed most other network applications, you should first ensure that the issue is not related to a problem on the machine, such as high load (which will obviously affect the speed at which requests can be processed). A simple check using uptime and ps to identify the processes will tell you how busy the machine is.

You can also check the NFS statistics that are generated by the NFS service. The nfsstat command generates detailed stats for both the server and client side of the NFS service. For example, the statistics in Listing 10 show the detailed NFS v3 statistics for the server side of the NFS service, selected by using the -s command-line option and -v to specify the NFS version.

$ nfsstat -s -v3  

Server rpc:
Connection oriented:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs    
36118      0          0          0          0          410        0          
Connectionless:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs
75         0          0          0          0          0          0          

Server NFSv3:
calls     badcalls  
35847     0         
Version 3: (35942 calls)
null        getattr     setattr     lookup      access      readlink
15 0%       190 0%      83 0%       3555 9%     21222 59%   0 0%        
read        write       create      mkdir       symlink     mknod       
9895 27%    300 0%      7 0%        0 0%        0 0%        0 0%        
remove      rmdir       rename      link        readdir     readdirplus 
0 0%        0 0%        0 0%        0 0%        37 0%       20 0%       
fsstat      fsinfo      pathconf    commit      
521 1%      2 0%        1 0%        94 0%       

Server nfs_acl:
Version 3: (0 calls)
null        getacl      setacl      getxattrdir 
0 0%        0 0%        0 0%        0 0%  

A high number of badcalls values indicate that bad requests are being sent to the server, which may indicate that a client is not functioning correctly and submitting bad requests, either due to a software problem or faulty hardware.

Ping times in larger networks

If you can ping the machine, but the network performance is still a problem, then you need to determine where in your network the performance problem is located. In a larger network where you have different segments of your network separated by routers, you can use the traceroute tool determine whether there is a specific point in the route between the two machines where there is a problem.

Related to the ping tool, the traceroute tool will normally provide you with the ping times for each router that the network packets travel through to reach their destination. In a larger network this can help you isolate where the problem is. This can also be used to identify potential problems when sending packets over the Internet, where different routers are used at different points to transmit packets between different Internet Service Providers (ISP).

For example, the trace shown in Listing 11 is between two offices in the UK that use two different ISPs. In this case, the destination machine cannot be reached due to a fault.

$ traceroute gendarme.example.com
traceroute to gendarme.example.com (82.70.138.102), 30 hops max, 40 byte packets
 1  voyager.example.pri (192.168.1.1)  14.998 ms  95.530 ms  4.922 ms
 2  dsl.vispa.net.uk (83.217.160.18)  32.251 ms  95.674 ms  30.742 ms
 3  rt-gw1.tcm.vispa.net.uk (62.24.228.1)  49.178 ms  47.718 ms  123.261 ms
 4  195.50.119.249 (195.50.119.249)  47.036 ms  50.440 ms  143.123 ms
 5  ae-11-11.car1.Manchesteruk1.Level3.net (4.69.133.97)  92.398 ms  137.382 ms  
52.780 ms
 6  PACKET-EXCH.car1.Manchester1.Level3.net (195.16.169.90)  45.791 ms  140.165 ms  
35.312 ms
 7  spinoza-ae2-0.hq.zen.net.uk (62.3.80.54)  33.034 ms  39.442 ms  33.253 ms
 8  galileo-fe-3-1-172.hq.zen.net.uk (62.3.80.174)  34.341 ms  33.684 ms  33.703 ms
 9  * * *
10  * * *
11  * * *
12  * * *

In a smaller network you are unlikely to have routers separating the networks, and so traceroute will not be of any help. Both ping and traceroute rely on being able to reach a host to determine the problem.

You are now armed with some knowledge and techniques to deal with UNIX network performance.

Identifying UNIX network performance issues is hard to determine from a single machine when the problem is usually widespread across the network. It is usually possible, though, to use ping and/or traceroute to narrow down the machine by looking at the performance from different points within your network. Once you have some starting points, you can use the other network tools to get more detailed information about the protocol or application that is causing the problem. This article looked at the basic methods to get baseline information and then the different tools that can be used to zero in on the issue.

[Nov 30, 2010] Life As A Sys Admin Best Networking Tweaks for Linux By Michael Adams

Nov 29, 2010 | Network World

A Linux system can be tweaked to a degree Windows users may envy (or fear) especially for networking. Tweaking a Linux box for networking is a bit more mundane than other platforms: there are specific driver settings one can work with but its best flexibility comes from a mix of OS-level modifications and adherence to different RFCs.

ifconfig (interface) txqueuelen #

Software buffers for network adapters on Linux start off at a conservative 1000 packets. Network researchers and scientists have mucked around with this, and figured out that we should be using 10,000 for anything decent on a LAN; more if you're running GB or 10GE stuff. Slow interfaces, such as modems and WAN links, can default to 0-100, but don't be afraid to bump it up towards 1000 and see if your performance improves. Bumping up this setting does use memory, so be careful if you're using an embedded router or something (I've used 10,000 on 16MB RAM OpenWRT units, no prob).

You can edit /etc/rc.local, add an "up" command to /etc/networking/interfaces, or whatever your distribution suggests and it's best to put a command like this at startup.

/etc/sysctl.conf

This file governs default behavior for many network and file operation settings on Linux and other *nix-based systems. If you deploy Ubuntu or Fedora systems, you'll notice they will add their own tweaks (usually security or file-oriented) to the file: don't delete those, unless you read up on them, or see any that are contradicted by the suggested additions here...

net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_window_scaling=1
net.ipv4.tcp_workaround_signed_windows=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_fack=1
net.ipv4.tcp_low_latency=1
net.ipv4.ip_no_pmtu_disc=0
net.ipv4.tcp_mtu_probing=1
net.ipv4.tcp_frto=2
net.ipv4.tcp_frto_response=2
net.ipv4.tcp_congestion_control=illinois

1. RFC 1337, TIME-WAIT Assassination Hazards in TCP, a fix written in 1992 for some theoretically-possible failure modes for TCP connections. To this day this RFC still has people confused if it negatively impacts performance or not or is supported by any decent router. Murphy's Law is that the only router that it would even have trouble with, is most likely your own.

2. TCP window scaling tries to avoid getting the network adapter saturated with incoming packets.

3. TCP SACK and FACK refer to options found in RFC 2018 and are also documented back to Linux Kernel 2.6.17 with an experimental "TCP-Peach" set of functions. These are meant to get you your data without excessive losses.

4. The latency setting is 1 if you prefer more packets vs bandwidth, or 0 if you prefer bandwidth. More packets are ideal for things like Remote Desktop and VOIP: less for bulk downloading.

5. I found RFC 2923, which is a good review of PMTU. IPv6 uses PMTU by default to avoid segmenting packets at the router level, but its optional for IPv4. PMTU is meant to inform routers of the best packet sizes to use between links, but its a common admin practice to block ICMP ports that allow pinging, thus breaking this mechanism. Linux tries to use it, and so do I: if you have problems, you have a problem router, and can change the "no" setting to 1. "MTU probing" is also a part of this: 1 means try, and 0 means don't.

6. FRTO is a mechanism in newer Linux kernels to optimize for wireless hosts: use it if you have them; delete the setting, or set to 0, if you don't.

For further study, there's a great IBM article regarding network optimizations: it was my source for some of these settings, as well as following numerous articles on tweaking Linux networking over the years (SpeedGuide has one from 2003).

TCP Congestion Controls

Windows Vista and newer gained Compound TCP as an alternative to standard TCP Reno. Linux Kernel 2.6 has had numerous mechanisms available to it for some time: 2.6.19 defaulted to CUBIC which was supposed to work well over "long links."  My two personal favorites: TCP Westwood + and TCP Illinois. But you can dig in, look at different research papers online, and see what works best for your environment.

1. Make sure your kernel has the correct module: in my example, I use TCP Illinois, which has been compiled with any standard Ubuntu kernel since 2008, and is found as tcp_illinois.

2. Add said kernel module to /etc/modules

3. Change /etc/sysctl.conf to use the non "tcp_" part of your selection.

There you have it -- some of my favorite Linux tweaks for networking. I'm interested in hearing how these worked for you. If you have some of your own, please post a comment and share them with other readers.

How To Network - TCP - UDP Tuning

How To: Network / TCP / UDP Tuning This is a very basic step by step description of how to improve the performance networking (TCP & UDP) on Linux 2.4+ for high-bandwidth applications. These settings are especially important for GigE links. Jump to Quick Step or All The Steps.

Assumptions

This howto assumes that the machine being tuned is involved in supporting high-bandwidth applications. Making these modifications on a machine that supports multiple users and/or multiple connections is not recommended - it may cause the machine to deny connections because of a lack of memory allocation.

The Steps

  1. Make sure that you have root privleges.

     
  2. Type: sysctl -a | grep mem
    This will display your current buffer settings. Save These! You may want to roll-back these changes
     
  3. Type: sysctl -w net.core.rmem_max=8388608
    This sets the max OS receive buffer size for all types of connections.
     
  4. Type: sysctl -w net.core.wmem_max=8388608
    This sets the max OS send buffer size for all types of connections.
     
  5. Type: sysctl -w net.core.rmem_default=65536
    This sets the default OS receive buffer size for all types of connections.
     
  6. Type: sysctl -w net.core.wmem_default=65536
    This sets the default OS send buffer size for all types of connections.

     
  7. Type: sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
    TCP Autotuning setting. "The tcp_mem variable defines how the TCP stack should behave when it comes to memory usage. ... The first value specified in the tcp_mem variable tells the kernel the low threshold. Below this point, the TCP stack do not bother at all about putting any pressure on the memory usage by different TCP sockets. ... The second value tells the kernel at which point to start pressuring memory usage down. ... The final value tells the kernel how many memory pages it may use maximally. If this value is reached, TCP streams and packets start getting dropped until we reach a lower memory usage again. This value includes all TCP sockets currently in use."

     
  8. Type: sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
    TCP Autotuning setting. "The first value tells the kernel the minimum receive buffer for each TCP connection, and this buffer is always allocated to a TCP socket, even under high pressure on the system. ... The second value specified tells the kernel the default receive buffer allocated for each TCP socket. This value overrides the /proc/sys/net/core/rmem_default value used by other protocols. ... The third and last value specified in this variable specifies the maximum receive buffer that can be allocated for a TCP socket."

     
  9. Type: sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
    TCP Autotuning setting. "This variable takes 3 different values which holds information on how much TCP sendbuffer memory space each TCP socket has to use. Every TCP socket has this much buffer space to use before the buffer is filled up. Each of the three values are used under different conditions. ... The first value in this variable tells the minimum TCP send buffer space available for a single TCP socket. ... The second value in the variable tells us the default buffer space allowed for a single TCP socket to use. ... The third value tells the kernel the maximum TCP send buffer space."

     
  10. Type:sysctl -w net.ipv4.route.flush=1
    This will enusre that immediatly subsequent connections use these values.
     

Quick Step

Cut and paste the following into a linux shell with root privleges:

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
sysctl -w net.ipv4.route.flush=1
 

References

All of this information comes directly from these very reliable sources:

Feedback

Please send me some feedback on how this worked for you. I'd be happy to help you figure it out on yours. I've used these or similar settings for a number of high-bandwidth applications with great results.

TCP Tuning Guide - Linux TCP Tuning

Department of energy, office of science

There are a lot of differences between Linux version 2.4 and 2.6, so first we'll cover the tuning issues that are the same in both 2.4 and 2.6. To change TCP settings in, you add the entries below to the file /etc/sysctl.conf, and then run "sysctl -p".

Like all operating systems, the default maximum Linux TCP buffer sizes are way too small. I suggest changing them to the following settings:

  # increase TCP max buffer size setable using setsockopt()
  net.core.rmem_max = 16777216
  net.core.wmem_max = 16777216
  # increase Linux autotuning TCP buffer limits
  # min, default, and max number of bytes to use
  # set max to at least 4MB, or higher if you use very high BDP paths
 = 4096 87380 16777216 
  net.ipv4.tcp_wmem = 4096 65536 16777216

You should also verify that the following are all set to the default value of 1

  sysctl net.ipv4.tcp_window_scaling 
  sysctl net.ipv4.tcp_timestamps 
  sysctl net.ipv4.tcp_sack

Note: you should leave tcp_mem alone. The defaults are fine.

Another thing you can try that may help increase TCP throughput is to increase the size of the interface queue. To do this, do the following:

     ifconfig eth0 txqueuelen 1000

I've seen increases in bandwidth of up to 8x by doing this on some long, fast paths. This is only a good idea for Gigabit Ethernet connected hosts, and may have other side effects such as uneven sharing between multiple streams.

Also, I've been told that for some network paths, using the Linux 'tc' (traffic control) system to pace traffic out of the host can help improve total throughput.

Linux 2.6

Starting in Linux 2.6.7 (and back-ported to 2.4.27), linux includes alternative congestion control algorithms beside the traditional 'reno' algorithm. These are designed to recover quickly from packet loss on high-speed WANs.

Linux 2.6 also includes and both send and receiver-side automatic buffer tuning (up to the maximum sizes specified above). There is also a setting to fix the ssthresh caching weirdness described above.

There are a couple additional sysctl settings for 2.6:

   # don't cache ssthresh from previous connection
   net.ipv4.tcp_no_metrics_save = 1
   net.ipv4.tcp_moderate_rcvbuf = 1
   # recommended to increase this for 1000 BT or higher
   net.core.netdev_max_backlog = 2500
   # for 10 GigE, use this
   # net.core.netdev_max_backlog = 30000   

Starting with version 2.6.13, Linux supports pluggable congestion control algorithms . The congestion control algorithm used is set using the sysctl variable net.ipv4.tcp_congestion_control, which is set to cubic or reno by default, depending on which version of the 2.6 kernel you are using.

To get a list of congestion control algorithms that are available in your kernel, run:

   sysctl net.ipv4.tcp_available_congestion_control

The choice of congestion control options is selected when you build the kernel. The following are some of the options are available in the 2.6.23 kernel:

For very long fast paths, I suggest trying cubic or htcp if reno is not is not performing as desired. To set this, do the following:

 
	sysctl -w net.ipv4.tcp_congestion_control=htcp

More information on each of these algorithms and some results can be found here .

More information on tuning parameters and defaults for Linux 2.6 are available in the file ip-sysctl.txt, which is part of the 2.6 source distribution.

Warning on Large MTUs: If you have configured your Linux host to use 9K MTUs, but the connection is using 1500 byte packets, then you actually need 9/1.5 = 6 times more buffer space in order to fill the pipe. In fact some device drivers only allocate memory in power of two sizes, so you may even need 16/1.5 = 11 times more buffer space!

And finally a warning for both 2.4 and 2.6: for very large BDP paths where the TCP window is > 20 MB, you are likely to hit the Linux SACK implementation problem. If Linux has too many packets in flight when it gets a SACK event, it takes too long to located the SACKed packet, and you get a TCP timeout and CWND goes back to 1 packet. Restricting the TCP buffer size to about 12 MB seems to avoid this problem, but clearly limits your total throughput. Another solution is to disable SACK.

How to Optimize your Internet Connection using MTU and RWIN - SWiK

TCP Receive Window (RWIN)


In computer networking, RWIN (TCP Receive Window) is the maximum amount of data that a computer will accept before acknowledging the sender. In practical terms, that means when you download say a 20 MB file, the remote server does not just send you the 20 MB continuously after you request it. When your computer sends the request for the file, your computer tells the remote server what your RWIN value is; the remote server then starts streaming data at you until it reaches your RWIN value, and then the server waits until your computer acknowledges that you received that data OK. Once your computer sends the acknowledgement, then the server continues to send more data in chunks of your RWIN value, each time waiting for your acknowledgment before proceeding to send more.

Now the crux of the problem here is with what is called latency, or the amount of time that it takes to send and receive packets from the remote server. Note that latency will depend not only on how fast the connection is between you and the remote server, but it also includes all additional delays, such as the time that it takes for the server to process your request and respond. You can easily find out the latency between you and the remote server with the ping command. When you use ping, the time that ping reports is the round-trip time (RTT), or latency, between you and the remote server.

When I ping google.com, I typically get a latency of 100 msec. Now if there were no concept of RWIN, and thus my computer had to acknowledge every single packet sent between me and google, then transfer speed between me and them would be simply the (packet size)/RTT. Thus for a maximum sized packet (my MTU as we learned above), my transfer speed would be:

1492 bytes/.1 sec = 14,920 B/sec or 14.57 KiB/sec

That is pathetically slow considering that my connection is 3 Mb/sec, which is the same as 366 KiB/sec; so I would be using only about 4% of my available bandwidth. Therefore, we use the concept of RWIN so that a remote server can stream data to me without having to acknowledge every single packet and slow everything down to a crawl.

Note that the TCP receive window (RWIN) is independent of the MTU setting. RWIN is determined by the BDP (Bandwidth Delay Product) for your internet connection, and BDP can be calculated as:

BDP = max bandwidth of your internet connection (Bytes/second) * RTT (seconds)

Therefore RWIN does not depend on the TCP packet size, and TCP packet size is of course limited by the MTU (Maximum Transmission Unit).

Before we change RWIN, use the following command to get the kernel variables related to RWIN:

sysctl -a 2> /dev/null | grep -iE "_mem |_rmem|_wmem"

Note the space after the _mem is deliberate, don't remove it or add other spaces elsewhere between the quotes.

You should get the following three variables:

net.ipv4.tcp_rmem = 4096 87380 2584576
net.ipv4.tcp_wmem = 4096 16384 2584576
net.ipv4.tcp_mem = 258576 258576 258576

The variable numbers are in bytes, and they represent the minimum, default, and maximum values for each of those variables.

net.ipv4.tcp_rmem = Receive window memory vector
net.ipv4.tcp_wmem = Send window memory vector
net.ipv4.tcp_mem = TCP stack memory vector

Note that there is no exact equivalent variable in Linux that corresponds to RWIN, the closest is the net.ipv4.tcp_rmem variable. The variables above control the actual memory usage (not just the TCP window size) and include memory used by the socket data structures as well as memory wasted by short packets in large buffers. The maximum values have to be larger than the BDP (Bandwidth Delay Product) of the path by some suitable overhead.

To try and optimize RWIN, first use ping to send the maximum size packet your connection allows (MTU) to some distant server. Since my MTU is 1492, the ping command payload would be 1492-28=1464. Thus:

ping -s 1464 -c5 google.com

PING google.com (64.233.167.99) 1464(1492) bytes of data.
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=1 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=2 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=3 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=4 ttl=237 (truncated)
64 bytes from py-in-f99.google.com (64.233.167.99): icmp_seq=5 ttl=237 (truncated)

--- google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 101.411/102.699/105.723/1.637 ms

Note though that you should run the above test several times at different times during the day, and also try pinging other destinations. You'll see RTT might vary quite a bit.

But for the above example, the RTT average is about 103 msec. Now since the maximum speed of my internet connection is 3 Mbits/sec, then the BDP is:
Code:

(3,000,000 bits/sec) * (.103 sec) * (1 byte/8 bits) = 38,625 bytes

Thus I should set the default value in net.ipv4.tcp_rmem to about 39,000. For my internet connection, I've seen RTT as bad as 500 msec, which would lead to a BDP of 187,000 bytes. Therefore, I could set the max value in net.ipv4.tcp_rmem to about 187,000. The values in net.ipv4.tcp_wmem should be the same as net.ipv4.tcp_rmem since both sending and receiving use the same internet connection. And since net.ipv4.tcp_mem is the maximum total memory buffer for TCP transactions, it is usually set to the the max value used in net.ipv4.tcp_rmem and net.ipv4.tcp_wmem.

And lastly, there are two more kernel TCP variables related to RWIN that you should set:

sysctl -a 2> /dev/null | grep -iE "rcvbuf|save"

which returns:

net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1

Note enabling net.ipv4.tcp_no_metrics_save (setting it to 1) means have Linux optimize the TCP receive window dynamically between the values in net.ipv4.tcp_rmem and net.ipv4.tcp_wmem. And enabling net.ipv4.tcp_moderate_rcvbuf removes an odd behavior in the 2.6 kernels, whereby the kernel stores the slow start threshold for a client between TCP sessions. This can cause undesired results, as a single period of congestion can affect many subsequent connections.

Before you change any of the above variables, try going to http://www.speedtest.net or a similar website and check the speed of your connection. Then temporarily change the variables by using the following command with your own computed values:

sudo sysctl -w net.ipv4.tcp_rmem="4096 39000 187000" net.ipv4.tcp_wmem="4096 39000 187000" net.ipv4.tcp_mem="187000 187000 187000" net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_moderate_rcvbuf=1

Then retest your connection and see if your speed improved at all.

Once you tweak the values to your liking, you can make them permanent by adding them to /etc/sysctl.conf as follows:

net.ipv4.tcp_rmem=4096 39000 187000
net.ipv4.tcp_wmem=4096 39000 187000
net.ipv4.tcp_mem=187000 187000 187000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_moderate_rcvbuf=1

And then do the following command to make the changes permanent:

sudo sysctl -p

How To Tweak Linux for broadband [Archive] - Ubuntu Forums

Posh

May 24th, 2007, 02:14 PM

I don't believe this will work as intended on machines with Edgy and beyond. From what I understand if you have tcp_moderate_rcvbuf = 1 (which is default) then the receive window is adjusted automatically. Now setting the max values could help but I'm not sure what setting the defalts do when you have tcp_moderate_rcvbuf enabled. Also I believe you will probably want to use net.ipv4.tcp_no_metrics_save = 1 instead of using the route.flush=1.

Here is a website with some tuning tips (http://dsd.lbl.gov/TCP-tuning/linux.html)

 

OldGaf

September 5th, 2006, 02:55 PM

Add the following to /etc/sysctl.conf (substituting your window size in place of 524288, if necessary):

# Tweaks for faster broadband...
net.core.rmem_default = 524288
net.core.rmem_max = 524288
net.core.wmem_default = 524288
net.core.wmem_max = 524288
net.ipv4.tcp_wmem = 4096 87380 524288
net.ipv4.tcp_rmem = 4096 87380 524288
net.ipv4.tcp_mem = 524288 524288 524288
net.ipv4.tcp_rfc1337 = 1
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_ecn = 0
net.ipv4.route.flush = 1

Then to have the settings take effect immediately, run:

sysctl -p

See the whole story here. (http://www.santa-li.com/linuxonbb.html)

Made a HUGE diff for me \\:D/

 

Thank you guys. This topic solved my high response times for my router. Here is the config I use:

#net.core.rmem_default = 4194304
# default values seems to work fine with my system
net.core.rmem_max = 4194304
#net.core.wmem_default = 4194304
# default values seems to work fine with my system
net.core.wmem_max = 4194304
net.ipv4.tcp_wmem = 4096 87380 4194304
net.ipv4.tcp_rmem = 4096 87380 4194304
#net.ipv4.tcp_mem = 256960 256960 4194304
# this should be uncommented only if it's not working well
net.ipv4.tcp_rfc1337 = 1
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_ecn = 0
net.ipv4.route.flush = 1

# don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
# recommended to increase this for 1000 BT or higher
net.core.netdev_max_backlog = 2500
net.ipv4.tcp_congestion_control=cubic

This settings work very well on an 2 mbit connection

Recommended Links

 Linux Performance and Tuning Guidelines

June 05, 2007 | IBM Redbooks

Over the past few years, Linux has made its way into the data centers of many corporations all over the globe. The Linux operating system has become accepted by both the scientific and enterprise user population. Today, Linux is by far the most versatile operating system. You can find Linux on embedded devices such as firewalls and cell phones and mainframes. Naturally, performance of the Linux operating system has become a hot topic for both scientific and enterprise users. However, calculating a global weather forecast and hosting a database impose different requirements on the operating system. Linux has to accommodate all possible usage scenarios with the most optimal performance. The consequence of this challenge is that most Linux distributions contain general tuning parameters to accommodate all users.

IBM® has embraced Linux, and it is recognized as an operating system suitable for enterprise-level applications running on IBM systems. Most enterprise applications are now available on Linux, including file and print servers, database servers, Web servers, and collaboration and mail servers.

With use of Linux in an enterprise-class server comes the need to monitor performance and, when necessary, tune the server to remove bottlenecks that affect users. This IBM Redpaper describes the methods you can use to tune Linux, tools that you can use to monitor and analyze server performance, and key tuning parameters for specific server applications. The purpose of this redpaper is to understand, analyze, and tune the Linux operating system to yield superior performance for any type of application you plan to run on these systems.

The tuning parameters, benchmark results, and monitoring tools used in our test environment were executed on Red Hat and Novell SUSE Linux kernel 2.6 systems running on IBM System x servers and IBM System z servers. However, the information in this redpaper should be helpful for all Linux hardware platforms.

dkftpbench

http://www.kegel.com/

http://linuxperf.nl.linux.org/

http://www.citi.umich.edu/projects/citi-netscape/

NFS Performance Tunging

http://home.att.net/~jageorge/performance.html

http://www.linux.com/tuneup/

http://www.psc.edu/networking/perf_tune.html#Linux

Utilities


Etc

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg  : C++ Humor : ARE YOU A BBS ADDICT? : Object oriented programmers of all nationsC Humor : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humorPseudoScience Related Humor : Networking Humor  : Shell Humor: Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : The Most Comprehensive Collection of Editor-related Humor : Microsoft plans to buy Catholic Church : Education Humor : IBM HumorAssembler-related HumorVIM Humor Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor : Best Russian Programmer Humor : Russian Musical Humor : The Perl Purity TestPolitically Incorrect Humor : GPL-related Humor : OFM Humor : IDS Humor : Real Programmers Humor : Scripting Humor : Web Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor :

The Last but not Least


Copyright © 1996-2013 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.

Disclaimer:

The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: January 20, 2013