Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 43d86ee8 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull networking fixes from David Miller:
 "Several fixes here. Basically split down the line between newly
  introduced regressions and long existing problems:

   1) Double free in tipc_enable_bearer(), from Cong Wang.

   2) Many fixes to nf_conncount, from Florian Westphal.

   3) op->get_regs_len() can throw an error, check it, from Yunsheng
      Lin.

   4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
      driver, from Scott Wood.

   5) Inifnite loop in fib_empty_table(), from Yue Haibing.

   6) Use after free in ax25_fillin_cb(), from Cong Wang.

   7) Fix socket locking in nr_find_socket(), also from Cong Wang.

   8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.

   9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.

  10) Fix ptr_ring wrap during queue swap, from Cong Wang.

  11) Missing shutdown callback in hinic driver, from Xue Chaojing.

  12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
      Brivio.

  13) BPF out of bounds speculation fixes from Daniel Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  ipv6: Consider sk_bound_dev_if when binding a socket to an address
  ipv6: Fix dump of specific table with strict checking
  bpf: add various test cases to selftests
  bpf: prevent out of bounds speculation on pointer arithmetic
  bpf: fix check_map_access smin_value test when pointer contains offset
  bpf: restrict unknown scalars of mixed signed bounds for unprivileged
  bpf: restrict stack pointer arithmetic for unprivileged
  bpf: restrict map value pointer arithmetic for unprivileged
  bpf: enable access to ax register also from verifier rewrite
  bpf: move tmp variable into ax register in interpreter
  bpf: move {prev_,}insn_idx into verifier env
  isdn: fix kernel-infoleak in capi_unlocked_ioctl
  ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
  net/hamradio/6pack: use mod_timer() to rearm timers
  net-next/hinic:add shutdown callback
  net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
  ip: validate header length on virtual device xmit
  tap: call skb_probe_transport_header after setting skb->dev
  ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
  net: rds: remove unnecessary NULL check
  ...
parents 645ff1e8 c5ee0663
Loading
Loading
Loading
Loading
+239 −1
Original line number Diff line number Diff line
@@ -571,7 +571,97 @@ duplicate packet is received.

* TcpExtTCPDSACKOfoRecv
The TCP stack receives a DSACK, which indicate an out of order
duplciate packet is received.
duplicate packet is received.

TCP out of order
===============
* TcpExtTCPOFOQueue
The TCP layer receives an out of order packet and has enough memory
to queue it.

* TcpExtTCPOFODrop
The TCP layer receives an out of order packet but doesn't have enough
memory, so drops it. Such packets won't be counted into
TcpExtTCPOFOQueue.

* TcpExtTCPOFOMerge
The received out of order packet has an overlay with the previous
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
packets will also be counted into TcpExtTCPOFOQueue.

TCP PAWS
=======
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
which is used to drop old packets. It depends on the TCP
timestamps. For detail information, please refer the `timestamp wiki`_
and the `RFC of PAWS`_.

.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps

* TcpExtPAWSActive
Packets are dropped by PAWS in Syn-Sent status.

* TcpExtPAWSEstab
Packets are dropped by PAWS in any status other than Syn-Sent.

TCP ACK skip
===========
In some scenarios, kernel would avoid sending duplicate ACKs too
frequently. Please find more details in the tcp_invalid_ratelimit
section of the `sysctl document`_. When kernel decides to skip an ACK
due to tcp_invalid_ratelimit, kernel would update one of below
counters to indicate the ACK is skipped in which scenario. The ACK
would only be skipped if the received packet is either a SYN packet or
it has no data.

.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

* TcpExtTCPACKSkippedSynRecv
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
in the Syn-Recv status. But in several scenarios, the TCP stack need
to send an ACK. E.g., the TCP stack receives the same SYN packet
repeately, the received packet does not pass the PAWS check, or the
received packet sequence number is out of window. In these scenarios,
the TCP stack needs to send ACK. If the ACk sending frequency is higher than
tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
increase TcpExtTCPACKSkippedSynRecv.


* TcpExtTCPACKSkippedPAWS
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
or Time-Wait statuses, the skipped ACK would be counted to
TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
would be counted to TcpExtTCPACKSkippedPAWS.

* TcpExtTCPACKSkippedSeq
The sequence number is out of window and the timestamp passes the PAWS
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.

* TcpExtTCPACKSkippedFinWait2
The ACK is skipped in Fin-Wait-2 status, the reason would be either
PAWS check fails or the received sequence number is out of window.

* TcpExtTCPACKSkippedTimeWait
Tha ACK is skipped in Time-Wait status, the reason would be either
PAWS check failed or the received sequence number is out of window.

* TcpExtTCPACKSkippedChallenge
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
three scenarios, In some TCP status, the linux TCP stack would also
send challenge ACKs if the ACK number is before the first
unacknowledged number (more strict than `RFC 5961 section 5.2`_).

.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11


examples
=======
@@ -1188,3 +1278,151 @@ Run nstat on server B::
We have deleted the default route on server B. Server B couldn't find
a route for the 8.8.8.8 IP address, so server B increased
IpOutNoRoutes.

TcpExtTCPACKSkippedSynRecv
------------------------
In this test, we send 3 same SYN packets from client to server. The
first SYN will let server create a socket, set it to Syn-Recv status,
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
again, and record the reply time (the duplicate ACK reply time). The
third SYN will let server check the previous duplicate ACK reply time,
and decide to skip the duplicate ACK, then increase the
TcpExtTCPACKSkippedSynRecv counter.

Run tcpdump to capture a SYN packet::

  nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
  tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

Open another terminal, run nc command::

  nstatuser@nstat-a:~$ nc nstat-b 9000

As the nstat-b didn't listen on port 9000, it should reply a RST, and
the nc command exited immediately. It was enough for the tcpdump
command to capture a SYN packet. A linux server might use hardware
offload for the TCP checksum, so the checksum in the /tmp/syn.pcap
might be not correct. We call tcprewrite to fix it::

  nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum

On nstat-b, we run nc to listen on port 9000::

  nstatuser@nstat-b:~$ nc -lkv 9000
  Listening on [0.0.0.0] (family 0, port 9000)

On nstat-a, we blocked the packet from port 9000, or nstat-a would send
RST to nstat-b::

  nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP

Send 3 SYN repeatly to nstat-b::

  nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done

Check snmp cunter on nstat-b::

  nstatuser@nstat-b:~$ nstat | grep -i skip
  TcpExtTCPACKSkippedSynRecv      1                  0.0

As we expected, TcpExtTCPACKSkippedSynRecv is 1.

TcpExtTCPACKSkippedPAWS
----------------------
To trigger PAWS, we could send an old SYN.

On nstat-b, let nc listen on port 9000::

  nstatuser@nstat-b:~$ nc -lkv 9000
  Listening on [0.0.0.0] (family 0, port 9000)

On nstat-a, run tcpdump to capture a SYN::

  nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
  tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

On nstat-a, run nc as a client to connect nstat-b::

  nstatuser@nstat-a:~$ nc -v nstat-b 9000
  Connection to nstat-b 9000 port [tcp/*] succeeded!

Now the tcpdump has captured the SYN and exit. We should fix the
checksum::

  nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum

Send the SYN packet twice::

  nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done

On nstat-b, check the snmp counter::

  nstatuser@nstat-b:~$ nstat | grep -i skip
  TcpExtTCPACKSkippedPAWS         1                  0.0

We sent two SYN via tcpreplay, both of them would let PAWS check
failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.

TcpExtTCPACKSkippedSeq
--------------------
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
timestamp (to pass PAWS check) but the sequence number is out of
window. The linux TCP stack would avoid to skip if the packet has
data, so we need a pure ACK packet. To generate such a packet, we
could create two sockets: one on port 9000, another on port 9001. Then
we capture an ACK on port 9001, change the source/destination port
numbers to match the port 9000 socket. Then we could trigger
TcpExtTCPACKSkippedSeq via this packet.

On nstat-b, open two terminals, run two nc commands to listen on both
port 9000 and port 9001::

  nstatuser@nstat-b:~$ nc -lkv 9000
  Listening on [0.0.0.0] (family 0, port 9000)

  nstatuser@nstat-b:~$ nc -lkv 9001
  Listening on [0.0.0.0] (family 0, port 9001)

On nstat-a, run two nc clients::

  nstatuser@nstat-a:~$ nc -v nstat-b 9000
  Connection to nstat-b 9000 port [tcp/*] succeeded!

  nstatuser@nstat-a:~$ nc -v nstat-b 9001
  Connection to nstat-b 9001 port [tcp/*] succeeded!

On nstat-a, run tcpdump to capture an ACK::

  nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
  tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

On nstat-b, send a packet via the port 9001 socket. E.g. we sent a
string 'foo' in our example::

  nstatuser@nstat-b:~$ nc -lkv 9001
  Listening on [0.0.0.0] (family 0, port 9001)
  Connection from nstat-a 42132 received!
  foo

On nstat-a, the tcpdump should have caputred the ACK. We should check
the source port numbers of the two nc clients::

  nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
  State  Recv-Q   Send-Q         Local Address:Port           Peer Address:Port
  ESTAB  0        0            192.168.122.250:50208       192.168.122.251:9000
  ESTAB  0        0            192.168.122.250:42132       192.168.122.251:9001

Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
port 50208::

  nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum

Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b::

  nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done

Check TcpExtTCPACKSkippedSeq on nstat-b::

  nstatuser@nstat-b:~$ nstat | grep -i skip
  TcpExtTCPACKSkippedSeq          1                  0.0
+2 −2
Original line number Diff line number Diff line
@@ -852,7 +852,7 @@ u16 capi20_get_manufacturer(u32 contr, u8 *buf)
	u16 ret;

	if (contr == 0) {
		strlcpy(buf, capi_manufakturer, CAPI_MANUFACTURER_LEN);
		strncpy(buf, capi_manufakturer, CAPI_MANUFACTURER_LEN);
		return CAPI_NOERROR;
	}

@@ -860,7 +860,7 @@ u16 capi20_get_manufacturer(u32 contr, u8 *buf)

	ctr = get_capi_ctr_by_nr(contr);
	if (ctr && ctr->state == CAPI_CTR_RUNNING) {
		strlcpy(buf, ctr->manu, CAPI_MANUFACTURER_LEN);
		strncpy(buf, ctr->manu, CAPI_MANUFACTURER_LEN);
		ret = CAPI_NOERROR;
	} else
		ret = CAPI_REGNOTINSTALLED;
+2 −0
Original line number Diff line number Diff line
@@ -1169,11 +1169,13 @@ HFCPCI_l1hw(struct PStack *st, int pr, void *arg)
		if (cs->debug & L1_DEB_LAPD)
			debugl1(cs, "-> PH_REQUEST_PULL");
#endif
		spin_lock_irqsave(&cs->lock, flags);
		if (!cs->tx_skb) {
			test_and_clear_bit(FLG_L1_PULL_REQ, &st->l1.Flags);
			st->l1.l1l2(st, PH_PULL | CONFIRM, NULL);
		} else
			test_and_set_bit(FLG_L1_PULL_REQ, &st->l1.Flags);
		spin_unlock_irqrestore(&cs->lock, flags);
		break;
	case (HW_RESET | REQUEST):
		spin_lock_irqsave(&cs->lock, flags);
+3 −4
Original line number Diff line number Diff line
@@ -303,11 +303,10 @@ static int bcm_sf2_sw_mdio_write(struct mii_bus *bus, int addr, int regnum,
	 * send them to our master MDIO bus controller
	 */
	if (addr == BRCM_PSEUDO_PHY_ADDR && priv->indir_phy_mask & BIT(addr))
		bcm_sf2_sw_indir_rw(priv, 0, addr, regnum, val);
		return bcm_sf2_sw_indir_rw(priv, 0, addr, regnum, val);
	else
		mdiobus_write_nested(priv->master_mii_bus, addr, regnum, val);

	return 0;
		return mdiobus_write_nested(priv->master_mii_bus, addr,
				regnum, val);
}

static irqreturn_t bcm_sf2_switch_0_isr(int irq, void *dev_id)
+3 −1
Original line number Diff line number Diff line
@@ -473,7 +473,9 @@ static void atl1e_mdio_write(struct net_device *netdev, int phy_id,
{
	struct atl1e_adapter *adapter = netdev_priv(netdev);

	atl1e_write_phy_reg(&adapter->hw, reg_num & MDIO_REG_ADDR_MASK, val);
	if (atl1e_write_phy_reg(&adapter->hw,
				reg_num & MDIO_REG_ADDR_MASK, val))
		netdev_err(netdev, "write phy register failed\n");
}

static int atl1e_mii_ioctl(struct net_device *netdev,
Loading