Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 7d0ae236 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull networking fixes from David Miller:

 1) Fix endless loop in nf_tables, from Phil Sutter.

 2) Fix cross namespace ip6_gre tunnel hash list corruption, from
    Olivier Matz.

 3) Don't be too strict in phy_start_aneg() otherwise we might not allow
    restarting auto negotiation. From Heiner Kallweit.

 4) Fix various KMSAN uninitialized value cases in tipc, from Ying Xue.

 5) Memory leak in act_tunnel_key, from Davide Caratti.

 6) Handle chip errata of mv88e6390 PHY, from Andrew Lunn.

 7) Remove linear SKB assumption in fou/fou6, from Eric Dumazet.

 8) Missing udplite rehash callbacks, from Alexey Kodanev.

 9) Log dirty pages properly in vhost, from Jason Wang.

10) Use consume_skb() in neigh_probe() as this is a normal free not a
    drop, from Yang Wei. Likewise in macvlan_process_broadcast().

11) Missing device_del() in mdiobus_register() error paths, from Thomas
    Petazzoni.

12) Fix checksum handling of short packets in mlx5, from Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (96 commits)
  bpf: in __bpf_redirect_no_mac pull mac only if present
  virtio_net: bulk free tx skbs
  net: phy: phy driver features are mandatory
  isdn: avm: Fix string plus integer warning from Clang
  net/mlx5e: Fix cb_ident duplicate in indirect block register
  net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
  net/mlx5e: Fix wrong error code return on FEC query failure
  net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
  tools: bpftool: Cleanup license mess
  bpf: fix inner map masking to prevent oob under speculation
  bpf: pull in pkt_sched.h header for tooling to fix bpftool build
  selftests: forwarding: Add a test case for externally learned FDB entries
  selftests: mlxsw: Test FDB offload indication
  mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
  net: bridge: Mark FDB entries that were added by user as such
  mlxsw: spectrum_fid: Update dummy FID index
  mlxsw: pci: Return error on PCI reset timeout
  mlxsw: pci: Increase PCI SW reset timeout
  mlxsw: pci: Ring CQ's doorbell before RDQ's
  MAINTAINERS: update email addresses of liquidio driver maintainers
  ...
parents bb617b9b 6436408e
Loading
Loading
Loading
Loading
+13 −13
Original line number Diff line number Diff line
@@ -11,19 +11,19 @@ Contents:
   batman-adv
   can
   can_ucan_protocol
   dpaa2/index
   e100
   e1000
   e1000e
   fm10k
   igb
   igbvf
   ixgb
   ixgbe
   ixgbevf
   i40e
   iavf
   ice
   device_drivers/freescale/dpaa2/index
   device_drivers/intel/e100
   device_drivers/intel/e1000
   device_drivers/intel/e1000e
   device_drivers/intel/fm10k
   device_drivers/intel/igb
   device_drivers/intel/igbvf
   device_drivers/intel/ixgb
   device_drivers/intel/ixgbe
   device_drivers/intel/ixgbevf
   device_drivers/intel/i40e
   device_drivers/intel/iavf
   device_drivers/intel/ice
   kapi
   z8530book
   msg_zerocopy
+0 −45
Original line number Diff line number Diff line
@@ -1000,51 +1000,6 @@ The kernel interface functions are as follows:
     size should be set when the call is begun.  tx_total_len may not be less
     than zero.

 (*) Check to see the completion state of a call so that the caller can assess
     whether it needs to be retried.

	enum rxrpc_call_completion {
		RXRPC_CALL_SUCCEEDED,
		RXRPC_CALL_REMOTELY_ABORTED,
		RXRPC_CALL_LOCALLY_ABORTED,
		RXRPC_CALL_LOCAL_ERROR,
		RXRPC_CALL_NETWORK_ERROR,
	};

	int rxrpc_kernel_check_call(struct socket *sock, struct rxrpc_call *call,
				    enum rxrpc_call_completion *_compl,
				    u32 *_abort_code);

     On return, -EINPROGRESS will be returned if the call is still ongoing; if
     it is finished, *_compl will be set to indicate the manner of completion,
     *_abort_code will be set to any abort code that occurred.  0 will be
     returned on a successful completion, -ECONNABORTED will be returned if the
     client failed due to a remote abort and anything else will return an
     appropriate error code.

     The caller should look at this information to decide if it's worth
     retrying the call.

 (*) Retry a client call.

	int rxrpc_kernel_retry_call(struct socket *sock,
				    struct rxrpc_call *call,
				    struct sockaddr_rxrpc *srx,
				    struct key *key);

     This attempts to partially reinitialise a call and submit it again while
     reusing the original call's Tx queue to avoid the need to repackage and
     re-encrypt the data to be sent.  call indicates the call to retry, srx the
     new address to send it to and key the encryption key to use for signing or
     encrypting the packets.

     For this to work, the first Tx data packet must still be in the transmit
     queue, and currently this is only permitted for local and network errors
     and the call must not have been aborted.  Any partially constructed Tx
     packet is left as is and can continue being filled afterwards.

     It returns 0 if the call was requeued and an error otherwise.

 (*) Get call RTT.

	u64 rxrpc_kernel_get_rtt(struct socket *sock, struct rxrpc_call *call);
+125 −5
Original line number Diff line number Diff line
@@ -336,7 +336,26 @@ time client replies ACK, this socket will get another chance to move
to the accept queue.


TCP Fast Open
* TcpEstabResets
Defined in `RFC1213 tcpEstabResets`_.

.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48

* TcpAttemptFails
Defined in `RFC1213 tcpAttemptFails`_.

.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48

* TcpOutRsts
Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
the 'segments sent containing the RST flag', but in linux kernel, this
couner indicates the segments kerenl tried to send. The sending
process might be failed due to some errors (e.g. memory alloc failed).

.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52


TCP Fast Path
============
When kernel receives a TCP packet, it has two paths to handler the
packet, one is fast path, another is slow path. The comment in kernel
@@ -383,8 +402,6 @@ increase 1.

TCP abort
========


* TcpExtTCPAbortOnData
It means TCP layer has data in flight, but need to close the
connection. So TCP layer sends a RST to the other side, indicate the
@@ -545,7 +562,6 @@ packet yet, the sender would know packet 4 is out of order. The TCP
stack of kernel will increase TcpExtTCPSACKReorder for both of the
above scenarios.


DSACK
=====
The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
@@ -566,13 +582,63 @@ The TCP stack receives an out of order duplicate packet, so it sends a
DSACK to the sender.

* TcpExtTCPDSACKRecv
The TCP stack receives a DSACK, which indicate an acknowledged
The TCP stack receives a DSACK, which indicates an acknowledged
duplicate packet is received.

* TcpExtTCPDSACKOfoRecv
The TCP stack receives a DSACK, which indicate an out of order
duplicate packet is received.

invalid SACK and DSACK
====================
When a SACK (or DSACK) block is invalid, a corresponding counter would
be updated. The validation method is base on the start/end sequence
number of the SACK block. For more details, please refer the comment
of the function tcp_is_sackblock_valid in the kernel source code. A
SACK option could have up to 4 blocks, they are checked
individually. E.g., if 3 blocks of a SACk is invalid, the
corresponding counter would be updated 3 times. The comment of the
`Add counters for discarded SACK blocks`_ patch has additional
explaination:

.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32

* TcpExtTCPSACKDiscard
This counter indicates how many SACK blocks are invalid. If the invalid
SACK block is caused by ACK recording, the TCP stack will only ignore
it and won't update this counter.

* TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo
When a DSACK block is invalid, one of these two counters would be
updated. Which counter will be updated depends on the undo_marker flag
of the TCP socket. If the undo_marker is not set, the TCP stack isn't
likely to re-transmit any packets, and we still receive an invalid
DSACK block, the reason might be that the packet is duplicated in the
middle of the network. In such scenario, TcpExtTCPDSACKIgnoredNoUndo
will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld
will be updated. As implied in its name, it might be an old packet.

SACK shift
=========
The linux networking stack stores data in sk_buff struct (skb for
short). If a SACK block acrosses multiple skb, the TCP stack will try
to re-arrange data in these skb. E.g. if a SACK block acknowledges seq
10 to 15, skb1 has seq 10 to 13, skb2 has seq 14 to 20. The seq 14 and
15 in skb2 would be moved to skb1. This operation is 'shift'. If a
SACK block acknowledges seq 10 to 20, skb1 has seq 10 to 13, skb2 has
seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be
discard, this operation is 'merge'.

* TcpExtTCPSackShifted
A skb is shifted

* TcpExtTCPSackMerged
A skb is merged

* TcpExtTCPSackShiftFallback
A skb should be shifted or merged, but the TCP stack doesn't do it for
some reasons.

TCP out of order
===============
* TcpExtTCPOFOQueue
@@ -662,6 +728,60 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11

TCP receive window
=================
* TcpExtTCPWantZeroWindowAdv
Depending on current memory usage, the TCP stack tries to set receive
window to zero. But the receive window might still be a no-zero
value. For example, if the previous window size is 10, and the TCP
stack receives 3 bytes, the current window size would be 7 even if the
window size calculated by the memory usage is zero.

* TcpExtTCPToZeroWindowAdv
The TCP receive window is set to zero from a no-zero value.

* TcpExtTCPFromZeroWindowAdv
The TCP receive window is set to no-zero value from zero.


Delayed ACK
==========
The TCP Delayed ACK is a technique which is used for reducing the
packet count in the network. For more details, please refer the
`Delayed ACK wiki`_

.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment

* TcpExtDelayedACKs
A delayed ACK timer expires. The TCP stack will send a pure ACK packet
and exit the delayed ACK mode.

* TcpExtDelayedACKLocked
A delayed ACK timer expires, but the TCP stack can't send an ACK
immediately due to the socket is locked by a userspace program. The
TCP stack will send a pure ACK later (after the userspace program
unlock the socket). When the TCP stack sends the pure ACK later, the
TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
mode.

* TcpExtDelayedACKLost
It will be updated when the TCP stack receives a packet which has been
ACKed. A Delayed ACK loss might cause this issue, but it would also be
triggered by other reasons, such as a packet is duplicated in the
network.

Tail Loss Probe (TLP)
===================
TLP is an algorithm which is used to detect TCP packet loss. For more
details, please refer the `TLP paper`_.

.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01

* TcpExtTCPLossProbes
A TLP probe packet is sent.

* TcpExtTCPLossProbeRecovery
A packet loss is detected and recovered by TLP.

examples
=======
+2 −2
Original line number Diff line number Diff line
@@ -417,7 +417,7 @@ is again deprecated and ts[2] holds a hardware timestamp if set.

Hardware time stamping must also be initialized for each device driver
that is expected to do hardware time stamping. The parameter is defined in
/include/linux/net_tstamp.h as:
include/uapi/linux/net_tstamp.h as:

struct hwtstamp_config {
	int flags;	/* no flags defined right now, must be zero */
@@ -487,7 +487,7 @@ enum {
	HWTSTAMP_FILTER_PTP_V1_L4_EVENT,

	/* for the complete list of values, please check
	 * the include file /include/linux/net_tstamp.h
	 * the include file include/uapi/linux/net_tstamp.h
	 */
};

+3 −4
Original line number Diff line number Diff line
@@ -3471,10 +3471,9 @@ F: drivers/i2c/busses/i2c-octeon*
F:	drivers/i2c/busses/i2c-thunderx*

CAVIUM LIQUIDIO NETWORK DRIVER
M:	Derek Chickles <derek.chickles@caviumnetworks.com>
M:	Satanand Burla <satananda.burla@caviumnetworks.com>
M:	Felix Manlunas <felix.manlunas@caviumnetworks.com>
M:	Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
M:	Derek Chickles <dchickles@marvell.com>
M:	Satanand Burla <sburla@marvell.com>
M:	Felix Manlunas <fmanlunas@marvell.com>
L:	netdev@vger.kernel.org
W:	http://www.cavium.com
S:	Supported
Loading