Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit f6722583 authored by Yuchung Cheng's avatar Yuchung Cheng Committed by David S. Miller
Browse files

tcp: track min RTT using windowed min-filter



Kathleen Nichols' algorithm for tracking the minimum RTT of a
data stream over some measurement window. It uses constant space
and constant time per update. Yet it almost always delivers
the same minimum as an implementation that has to keep all
the data in the window. The measurement window is tunable via
sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.

The algorithm keeps track of the best, 2nd best & 3rd best min
values, maintaining an invariant that the measurement time of
the n'th best >= n-1'th best. It also makes sure that the three
values are widely separated in the time window since that bounds
the worse case error when that data is monotonically increasing
over the window.

Upon getting a new min, we can forget everything earlier because
it has no value - the new min is less than everything else in the
window by definition and it's the most recent. So we restart fresh
on every new min and overwrites the 2nd & 3rd choices. The same
property holds for the 2nd & 3rd best.

Therefore we have to maintain two invariants to maximize the
information in the samples, one on values (1st.v <= 2nd.v <=
3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
now). These invariants determine the structure of the code

The RTT input to the windowed filter is the minimum RTT measured
from ACK or SACK, or as the last resort from TCP timestamps.

The accessor tcp_min_rtt() returns the minimum RTT seen in the
window. ~0U indicates it is not available. The minimum is 1usec
even if the true RTT is below that.

Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 9e45a3e3
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
@@ -384,6 +384,14 @@ tcp_mem - vector of 3 INTEGERs: min, pressure, max
	Defaults are calculated at boot time from amount of available
	memory.

tcp_min_rtt_wlen - INTEGER
	The window length of the windowed min filter to track the minimum RTT.
	A shorter window lets a flow more quickly pick up new (higher)
	minimum RTT when it is moved to a longer path (e.g., due to traffic
	engineering). A longer window makes the filter more resistant to RTT
	inflations such as transient congestion. The unit is seconds.
	Default: 300

tcp_moderate_rcvbuf - BOOLEAN
	If set, TCP performs receive buffer auto-tuning, attempting to
	automatically size the buffer (no greater than tcp_rmem[2]) to
+3 −0
Original line number Diff line number Diff line
@@ -217,6 +217,9 @@ struct tcp_sock {
	u32	mdev_max_us;	/* maximal mdev for the last rtt period	*/
	u32	rttvar_us;	/* smoothed mdev_max			*/
	u32	rtt_seq;	/* sequence number to update rttvar	*/
	struct rtt_meas {
		u32 rtt, ts;	/* RTT in usec and sampling time in jiffies. */
	} rtt_min[3];

	u32	packets_out;	/* Packets which are "in flight"	*/
	u32	retrans_out;	/* Retransmitted packets out		*/
+7 −0
Original line number Diff line number Diff line
@@ -279,6 +279,7 @@ extern int sysctl_tcp_limit_output_bytes;
extern int sysctl_tcp_challenge_ack_limit;
extern unsigned int sysctl_tcp_notsent_lowat;
extern int sysctl_tcp_min_tso_segs;
extern int sysctl_tcp_min_rtt_wlen;
extern int sysctl_tcp_autocorking;
extern int sysctl_tcp_invalid_ratelimit;
extern int sysctl_tcp_pacing_ss_ratio;
@@ -671,6 +672,12 @@ static inline bool tcp_ca_dst_locked(const struct dst_entry *dst)
	return dst_metric_locked(dst, RTAX_CC_ALGO);
}

/* Minimum RTT in usec. ~0 means not available. */
static inline u32 tcp_min_rtt(const struct tcp_sock *tp)
{
	return tp->rtt_min[0].rtt;
}

/* Compute the actual receive window we are currently advertising.
 * Rcv_nxt can be after the window if our peer push more data
 * than the offered window.
+7 −0
Original line number Diff line number Diff line
@@ -576,6 +576,13 @@ static struct ctl_table ipv4_table[] = {
		.mode		= 0644,
		.proc_handler	= proc_dointvec
	},
	{
		.procname	= "tcp_min_rtt_wlen",
		.data		= &sysctl_tcp_min_rtt_wlen,
		.maxlen		= sizeof(int),
		.mode		= 0644,
		.proc_handler	= proc_dointvec
	},
	{
		.procname	= "tcp_low_latency",
		.data		= &sysctl_tcp_low_latency,
+1 −0
Original line number Diff line number Diff line
@@ -388,6 +388,7 @@ void tcp_init_sock(struct sock *sk)

	icsk->icsk_rto = TCP_TIMEOUT_INIT;
	tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
	tp->rtt_min[0].rtt = ~0U;

	/* So many TCP implementations out there (incorrectly) count the
	 * initial SYN frame in their delayed-ACK and congestion control
Loading