Configuring the network subsystem using netutils

    Configuring the network subsystem using netutils

    Install

    apt install python-pip
    pip install netutils-linux
    

    network-top

    Image

    This utility is needed to evaluate the applied settings and displays the uniformity of the load distribution (interrupts, softirqs, the number of packets per second per processor core) on the server resources, all kinds of packet processing errors. Values that exceed the thresholds are highlighted.

    rss-ladder

    # rss-ladder eth1 0
    - distributing interrupts of eth1 (-TxRx) on socket 0:"
      - eth1: irq 67 eth1-TxRx-0 -> 0
      - eth1: irq 68 eth1-TxRx-1 -> 1
      - eth1: irq 69 eth1-TxRx-2 -> 2
      - eth1: irq 70 eth1-TxRx-3 -> 3
      - eth1: irq 71 eth1-TxRx-4 -> 8
      - eth1: irq 72 eth1-TxRx-5 -> 9
      - eth1: irq 73 eth1-TxRx-6 -> 10
      - eth1: irq 74 eth1-TxRx-7 -> 11
    

    This utility distributes network card interrupts to the cores of the selected physical processor (default is 0).

    server-info

    # server-info --rate
    cpu:
      BogoMIPS: 7
      CPU MHz: 7
      CPU(s): 1
      Core(s) per socket: 1
      L3 cache: 1
      Socket(s): 10
      Thread(s) per core: 10
      Vendor ID: 10
     disk:
       vda:
         size: 1
         type: 1
     memory:
       MemTotal: 1
       SwapTotal: 10
     net:
       eth1:
         buffers:
           cur: 5
           max: 10
         driver: 1
         queues: 1
     system:
       Hypervisor vendor: 1
       Virtualization type: 1
    

    This utility allows you to do two things:

    server-info --show: see what hardware is installed on the server. In General, it is similar to lshw, but with an emphasis on the parameters of interest to us.

    server-info --rate: find bottlenecks in server hardware. In General, it is similar to the Windows performance index, but with an emphasis on the parameters of interest to us. The assessment is made on a scale from 1 to 10.

    Other utilities

    rx-buffers-increase eth1 
    

    automatically increases the buffer of the selected network card to the optimal value.

    maximize-cpu-freq
    

    disables the floating frequency of the processor.

    Example of use:

    Example 1. As simple as possible.

    task:

    one processor with 4 cores.
    one 1 Gbps network card (eth0) with 4 combined queues
    incoming traffic 600 Mbit/s, no outgoing traffic.
    all queues hang on CPU0, a total of 55,000 interrupts and 350,000 packets per second, of which about 200 packets/sec are lost by the network card. The remaining 3 cores are idle

    Decision:

    distribute the queues between the cores with the commandrss-ladder eth0
    increase buffer with command rx-buffers-increase eth0

    Example 2

    task:

    two processors with 8 cores\
    two NUMA-nodes
    Two dual-port 10 Gbps network cards (eth0, eth1, eth2, eth3), each port has 16 queues, all tied to node 0, incoming traffic volume: 3 Gbps per each
    1 x 1Gbps network card, 4 queues, tied to node 0, outgoing traffic: 100Mbps.

    Decision:

    1 put one of the 10 Gbit/s network cards to another PCI slot, bound to NUMA node 1.
    2 Reduce the number of combined queues for 10 Gigabit ports to the number of cores per physical processor:

    for dev in eth0 eth1 eth2 eth3; do
      ethtool -L $dev combined 8
    done
    

    3 Distribute interrupts of ports eth0, eth1 on the processor cores getting to NUMA node 0, and ports eth2, eth3 on the processor cores getting to NUMA node1:

    rss-ladder eth0 0
    rss-ladder eth1 0
    rss-ladder eth2 1
    rss-ladder eth3 1
    

    4 Increase eth0, eth1, eth2, eth3 RX buffers:

    for dev in eth0 eth1 eth2 eth3; do
      rx-buffers-increase $dev
    done
    

    Reminder:

    In the case of network cards with a single queue, you can use RPS to distribute the load between the cores, but this does not eliminate the loss of copying packets into memory.

    The distribution of interrupts is based on the calculation of the hash function (the remainder of the division) from the totality of such data: protocol, source and destination IP, and source and destination port. The technology is called: Receive-side scaling (RSS).