Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit c8628155 authored by Eric Dumazet's avatar Eric Dumazet Committed by David S. Miller
Browse files

tcp: reduce out_of_order memory use



With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.

Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.

When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.

Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.

Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.

This also reduced average memory used by tcp sockets.

With help from Neal Cardwell.

Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent e86b2919
Loading
Loading
Loading
Loading
+1 −0
Original line number Diff line number Diff line
@@ -233,6 +233,7 @@ enum
	LINUX_MIB_TCPREQQFULLDOCOOKIES,		/* TCPReqQFullDoCookies */
	LINUX_MIB_TCPREQQFULLDROP,		/* TCPReqQFullDrop */
	LINUX_MIB_TCPRETRANSFAIL,		/* TCPRetransFail */
	LINUX_MIB_TCPRCVCOALESCE,			/* TCPRcvCoalesce */
	__LINUX_MIB_MAX
};

+1 −0
Original line number Diff line number Diff line
@@ -257,6 +257,7 @@ static const struct snmp_mib snmp4_net_list[] = {
	SNMP_MIB_ITEM("TCPReqQFullDoCookies", LINUX_MIB_TCPREQQFULLDOCOOKIES),
	SNMP_MIB_ITEM("TCPReqQFullDrop", LINUX_MIB_TCPREQQFULLDROP),
	SNMP_MIB_ITEM("TCPRetransFail", LINUX_MIB_TCPRETRANSFAIL),
	SNMP_MIB_ITEM("TCPRcvCoalesce", LINUX_MIB_TCPRCVCOALESCE),
	SNMP_MIB_SENTINEL
};

+18 −1
Original line number Diff line number Diff line
@@ -4484,7 +4484,24 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
	end_seq = TCP_SKB_CB(skb)->end_seq;

	if (seq == TCP_SKB_CB(skb1)->end_seq) {
		/* Packets in ofo can stay in queue a long time.
		 * Better try to coalesce them right now
		 * to avoid future tcp_collapse_ofo_queue(),
		 * probably the most expensive function in tcp stack.
		 */
		if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) {
			NET_INC_STATS_BH(sock_net(sk),
					 LINUX_MIB_TCPRCVCOALESCE);
			BUG_ON(skb_copy_bits(skb, 0,
					     skb_put(skb1, skb->len),
					     skb->len));
			TCP_SKB_CB(skb1)->end_seq = end_seq;
			TCP_SKB_CB(skb1)->ack_seq = TCP_SKB_CB(skb)->ack_seq;
			__kfree_skb(skb);
			skb = NULL;
		} else {
			__skb_queue_after(&tp->out_of_order_queue, skb1, skb);
		}

		if (!tp->rx_opt.num_sacks ||
		    tp->selective_acks[0].end_seq != seq)