Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 38940042 authored by David S. Miller's avatar David S. Miller
Browse files


David Howells says:

====================
net-next: AF_RXRPC fixes and development

Here are some AF_RXRPC fixes:

 (1) Fix to remove incorrect checksum calculation made during recvmsg().  It's
     unnecessary to try to do this there since we check the checksum before
     reading the RxRPC header from the packet.

 (2) Fix to prevent the sending of an ABORT packet in response to another
     ABORT packet and inducing a storm.

 (3) Fix UDP MTU calculation from parsing ICMP_FRAG_NEEDED packets where we
     don't handle the ICMP packet not specifying an MTU size.

And development patches:

 (4) Add sysctls for configuring RxRPC parameters, specifically various delays
     pertaining to ACK generation, the time before we resend a packet for
     which we don't receive an ACK, the maximum time a call is permitted to
     live and the amount of time transport, connection and dead call
     information is cached.

 (5) Improve ACK packet production by adjusting the handling of ACK_REQUESTED
     packets, ignoring the MORE_PACKETS flag, delaying the production of
     otherwise immediate ACK_IDLE packets and delaying all ACK_IDLE production
     (barring the call termination) to half a second.

 (6) Add more sysctl parameters to expose the Rx window size, the maximum
     packet size that we're willing to receive and the number of jumbo rxrpc
     packets we're willing to handle in a single UDP packet.

 (7) Request ACKs on alternate DATA packets so that the other side doesn't
     wait till we fill up the Tx window.

 (8) Use a RCU hash table to look up the rxrpc_call for an incoming packet
     rather than stepping through a hierarchy involving several spinlocks.
====================

Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 4caeccb4 7727640c
Loading
Loading
Loading
Loading
+81 −0
Original line number Diff line number Diff line
@@ -27,6 +27,8 @@ Contents of this document:

 (*) AF_RXRPC kernel interface.

 (*) Configurable parameters.


========
OVERVIEW
@@ -864,3 +866,82 @@ The kernel interface functions are as follows:

     This is used to allocate a null RxRPC key that can be used to indicate
     anonymous security for a particular domain.


=======================
CONFIGURABLE PARAMETERS
=======================

The RxRPC protocol driver has a number of configurable parameters that can be
adjusted through sysctls in /proc/net/rxrpc/:

 (*) req_ack_delay

     The amount of time in milliseconds after receiving a packet with the
     request-ack flag set before we honour the flag and actually send the
     requested ack.

     Usually the other side won't stop sending packets until the advertised
     reception window is full (to a maximum of 255 packets), so delaying the
     ACK permits several packets to be ACK'd in one go.

 (*) soft_ack_delay

     The amount of time in milliseconds after receiving a new packet before we
     generate a soft-ACK to tell the sender that it doesn't need to resend.

 (*) idle_ack_delay

     The amount of time in milliseconds after all the packets currently in the
     received queue have been consumed before we generate a hard-ACK to tell
     the sender it can free its buffers, assuming no other reason occurs that
     we would send an ACK.

 (*) resend_timeout

     The amount of time in milliseconds after transmitting a packet before we
     transmit it again, assuming no ACK is received from the receiver telling
     us they got it.

 (*) max_call_lifetime

     The maximum amount of time in seconds that a call may be in progress
     before we preemptively kill it.

 (*) dead_call_expiry

     The amount of time in seconds before we remove a dead call from the call
     list.  Dead calls are kept around for a little while for the purpose of
     repeating ACK and ABORT packets.

 (*) connection_expiry

     The amount of time in seconds after a connection was last used before we
     remove it from the connection list.  Whilst a connection is in existence,
     it serves as a placeholder for negotiated security; when it is deleted,
     the security must be renegotiated.

 (*) transport_expiry

     The amount of time in seconds after a transport was last used before we
     remove it from the transport list.  Whilst a transport is in existence, it
     serves to anchor the peer data and keeps the connection ID counter.

 (*) rxrpc_rx_window_size

     The size of the receive window in packets.  This is the maximum number of
     unconsumed received packets we're willing to hold in memory for any
     particular call.

 (*) rxrpc_rx_mtu

     The maximum packet MTU size that we're willing to receive in bytes.  This
     indicates to the peer whether we're willing to accept jumbo packets.

 (*) rxrpc_rx_jumbo_max

     The maximum number of packets that we're willing to accept in a jumbo
     packet.  Non-terminal packets in a jumbo packet must contain a four byte
     header plus exactly 1412 bytes of data.  The terminal packet must contain
     a four byte header plus any amount of data.  In any event, a jumbo packet
     may not exceed rxrpc_rx_mtu in size.
+2 −3
Original line number Diff line number Diff line
@@ -20,9 +20,8 @@ af-rxrpc-y := \
	ar-skbuff.o \
	ar-transport.o

ifeq ($(CONFIG_PROC_FS),y)
af-rxrpc-y += ar-proc.o
endif
af-rxrpc-$(CONFIG_PROC_FS) += ar-proc.o
af-rxrpc-$(CONFIG_SYSCTL) += sysctl.o

obj-$(CONFIG_AF_RXRPC) += af-rxrpc.o

+9 −0
Original line number Diff line number Diff line
@@ -838,6 +838,12 @@ static int __init af_rxrpc_init(void)
		goto error_key_type_s;
	}

	ret = rxrpc_sysctl_init();
	if (ret < 0) {
		printk(KERN_CRIT "RxRPC: Cannot register sysctls\n");
		goto error_sysctls;
	}

#ifdef CONFIG_PROC_FS
	proc_create("rxrpc_calls", 0, init_net.proc_net, &rxrpc_call_seq_fops);
	proc_create("rxrpc_conns", 0, init_net.proc_net,
@@ -845,6 +851,8 @@ static int __init af_rxrpc_init(void)
#endif
	return 0;

error_sysctls:
	unregister_key_type(&key_type_rxrpc_s);
error_key_type_s:
	unregister_key_type(&key_type_rxrpc);
error_key_type:
@@ -865,6 +873,7 @@ static int __init af_rxrpc_init(void)
static void __exit af_rxrpc_exit(void)
{
	_enter("");
	rxrpc_sysctl_exit();
	unregister_key_type(&key_type_rxrpc_s);
	unregister_key_type(&key_type_rxrpc);
	sock_unregister(PF_RXRPC);
+51 −10
Original line number Diff line number Diff line
@@ -19,7 +19,49 @@
#include <net/af_rxrpc.h>
#include "ar-internal.h"

static unsigned int rxrpc_ack_defer = 1;
/*
 * How long to wait before scheduling ACK generation after seeing a
 * packet with RXRPC_REQUEST_ACK set (in jiffies).
 */
unsigned rxrpc_requested_ack_delay = 1;

/*
 * How long to wait before scheduling an ACK with subtype DELAY (in jiffies).
 *
 * We use this when we've received new data packets.  If those packets aren't
 * all consumed within this time we will send a DELAY ACK if an ACK was not
 * requested to let the sender know it doesn't need to resend.
 */
unsigned rxrpc_soft_ack_delay = 1 * HZ;

/*
 * How long to wait before scheduling an ACK with subtype IDLE (in jiffies).
 *
 * We use this when we've consumed some previously soft-ACK'd packets when
 * further packets aren't immediately received to decide when to send an IDLE
 * ACK let the other end know that it can free up its Tx buffer space.
 */
unsigned rxrpc_idle_ack_delay = 0.5 * HZ;

/*
 * Receive window size in packets.  This indicates the maximum number of
 * unconsumed received packets we're willing to retain in memory.  Once this
 * limit is hit, we should generate an EXCEEDS_WINDOW ACK and discard further
 * packets.
 */
unsigned rxrpc_rx_window_size = 32;

/*
 * Maximum Rx MTU size.  This indicates to the sender the size of jumbo packet
 * made by gluing normal packets together that we're willing to handle.
 */
unsigned rxrpc_rx_mtu = 5692;

/*
 * The maximum number of fragments in a received jumbo packet that we tell the
 * sender that we're willing to handle.
 */
unsigned rxrpc_rx_jumbo_max = 4;

static const char *rxrpc_acks(u8 reason)
{
@@ -82,24 +124,23 @@ void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
	switch (ack_reason) {
	case RXRPC_ACK_DELAY:
		_debug("run delay timer");
		call->ack_timer.expires = jiffies + rxrpc_ack_timeout * HZ;
		add_timer(&call->ack_timer);
		return;
		expiry = rxrpc_soft_ack_delay;
		goto run_timer;

	case RXRPC_ACK_IDLE:
		if (!immediate) {
			_debug("run defer timer");
			expiry = 1;
			expiry = rxrpc_idle_ack_delay;
			goto run_timer;
		}
		goto cancel_timer;

	case RXRPC_ACK_REQUESTED:
		if (!rxrpc_ack_defer)
		expiry = rxrpc_requested_ack_delay;
		if (!expiry)
			goto cancel_timer;
		if (!immediate || serial == cpu_to_be32(1)) {
			_debug("run defer timer");
			expiry = rxrpc_ack_defer;
			goto run_timer;
		}

@@ -1174,11 +1215,11 @@ void rxrpc_process_call(struct work_struct *work)
	mtu = call->conn->trans->peer->if_mtu;
	mtu -= call->conn->trans->peer->hdrsize;
	ackinfo.maxMTU	= htonl(mtu);
	ackinfo.rwind	= htonl(32);
	ackinfo.rwind	= htonl(rxrpc_rx_window_size);

	/* permit the peer to send us jumbo packets if it wants to */
	ackinfo.rxMTU	= htonl(5692);
	ackinfo.jumbo_max = htonl(4);
	ackinfo.rxMTU	= htonl(rxrpc_rx_mtu);
	ackinfo.jumbo_max = htonl(rxrpc_rx_jumbo_max);

	hdr.serial = htonl(atomic_inc_return(&call->conn->serial));
	_proto("Tx ACK %%%u { m=%hu f=#%u p=#%u s=%%%u r=%s n=%u }",
+205 −8
Original line number Diff line number Diff line
@@ -12,10 +12,22 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/circ_buf.h>
#include <linux/hashtable.h>
#include <linux/spinlock_types.h>
#include <net/sock.h>
#include <net/af_rxrpc.h>
#include "ar-internal.h"

/*
 * Maximum lifetime of a call (in jiffies).
 */
unsigned rxrpc_max_call_lifetime = 60 * HZ;

/*
 * Time till dead call expires after last use (in jiffies).
 */
unsigned rxrpc_dead_call_expiry = 2 * HZ;

const char *const rxrpc_call_states[] = {
	[RXRPC_CALL_CLIENT_SEND_REQUEST]	= "ClSndReq",
	[RXRPC_CALL_CLIENT_AWAIT_REPLY]		= "ClAwtRpl",
@@ -38,8 +50,6 @@ const char *const rxrpc_call_states[] = {
struct kmem_cache *rxrpc_call_jar;
LIST_HEAD(rxrpc_calls);
DEFINE_RWLOCK(rxrpc_call_lock);
static unsigned int rxrpc_call_max_lifetime = 60;
static unsigned int rxrpc_dead_call_timeout = 2;

static void rxrpc_destroy_call(struct work_struct *work);
static void rxrpc_call_life_expired(unsigned long _call);
@@ -47,6 +57,145 @@ static void rxrpc_dead_call_expired(unsigned long _call);
static void rxrpc_ack_time_expired(unsigned long _call);
static void rxrpc_resend_time_expired(unsigned long _call);

static DEFINE_SPINLOCK(rxrpc_call_hash_lock);
static DEFINE_HASHTABLE(rxrpc_call_hash, 10);

/*
 * Hash function for rxrpc_call_hash
 */
static unsigned long rxrpc_call_hashfunc(
	u8		clientflag,
	__be32		cid,
	__be32		call_id,
	__be32		epoch,
	__be16		service_id,
	sa_family_t	proto,
	void		*localptr,
	unsigned int	addr_size,
	const u8	*peer_addr)
{
	const u16 *p;
	unsigned int i;
	unsigned long key;
	u32 hcid = ntohl(cid);

	_enter("");

	key = (unsigned long)localptr;
	/* We just want to add up the __be32 values, so forcing the
	 * cast should be okay.
	 */
	key += (__force u32)epoch;
	key += (__force u16)service_id;
	key += (__force u32)call_id;
	key += (hcid & RXRPC_CIDMASK) >> RXRPC_CIDSHIFT;
	key += hcid & RXRPC_CHANNELMASK;
	key += clientflag;
	key += proto;
	/* Step through the peer address in 16-bit portions for speed */
	for (i = 0, p = (const u16 *)peer_addr; i < addr_size >> 1; i++, p++)
		key += *p;
	_leave(" key = 0x%lx", key);
	return key;
}

/*
 * Add a call to the hashtable
 */
static void rxrpc_call_hash_add(struct rxrpc_call *call)
{
	unsigned long key;
	unsigned int addr_size = 0;

	_enter("");
	switch (call->proto) {
	case AF_INET:
		addr_size = sizeof(call->peer_ip.ipv4_addr);
		break;
	case AF_INET6:
		addr_size = sizeof(call->peer_ip.ipv6_addr);
		break;
	default:
		break;
	}
	key = rxrpc_call_hashfunc(call->in_clientflag, call->cid,
				  call->call_id, call->epoch,
				  call->service_id, call->proto,
				  call->conn->trans->local, addr_size,
				  call->peer_ip.ipv6_addr);
	/* Store the full key in the call */
	call->hash_key = key;
	spin_lock(&rxrpc_call_hash_lock);
	hash_add_rcu(rxrpc_call_hash, &call->hash_node, key);
	spin_unlock(&rxrpc_call_hash_lock);
	_leave("");
}

/*
 * Remove a call from the hashtable
 */
static void rxrpc_call_hash_del(struct rxrpc_call *call)
{
	_enter("");
	spin_lock(&rxrpc_call_hash_lock);
	hash_del_rcu(&call->hash_node);
	spin_unlock(&rxrpc_call_hash_lock);
	_leave("");
}

/*
 * Find a call in the hashtable and return it, or NULL if it
 * isn't there.
 */
struct rxrpc_call *rxrpc_find_call_hash(
	u8		clientflag,
	__be32		cid,
	__be32		call_id,
	__be32		epoch,
	__be16		service_id,
	void		*localptr,
	sa_family_t	proto,
	const u8	*peer_addr)
{
	unsigned long key;
	unsigned int addr_size = 0;
	struct rxrpc_call *call = NULL;
	struct rxrpc_call *ret = NULL;

	_enter("");
	switch (proto) {
	case AF_INET:
		addr_size = sizeof(call->peer_ip.ipv4_addr);
		break;
	case AF_INET6:
		addr_size = sizeof(call->peer_ip.ipv6_addr);
		break;
	default:
		break;
	}

	key = rxrpc_call_hashfunc(clientflag, cid, call_id, epoch,
				  service_id, proto, localptr, addr_size,
				  peer_addr);
	hash_for_each_possible_rcu(rxrpc_call_hash, call, hash_node, key) {
		if (call->hash_key == key &&
		    call->call_id == call_id &&
		    call->cid == cid &&
		    call->in_clientflag == clientflag &&
		    call->service_id == service_id &&
		    call->proto == proto &&
		    call->local == localptr &&
		    memcmp(call->peer_ip.ipv6_addr, peer_addr,
			      addr_size) == 0 &&
		    call->epoch == epoch) {
			ret = call;
			break;
		}
	}
	_leave(" = %p", ret);
	return ret;
}

/*
 * allocate a new call
 */
@@ -91,7 +240,7 @@ static struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
	call->rx_data_expect = 1;
	call->rx_data_eaten = 0;
	call->rx_first_oos = 0;
	call->ackr_win_top = call->rx_data_eaten + 1 + RXRPC_MAXACKS;
	call->ackr_win_top = call->rx_data_eaten + 1 + rxrpc_rx_window_size;
	call->creation_jif = jiffies;
	return call;
}
@@ -128,11 +277,31 @@ static struct rxrpc_call *rxrpc_alloc_client_call(
		return ERR_PTR(ret);
	}

	/* Record copies of information for hashtable lookup */
	call->proto = rx->proto;
	call->local = trans->local;
	switch (call->proto) {
	case AF_INET:
		call->peer_ip.ipv4_addr =
			trans->peer->srx.transport.sin.sin_addr.s_addr;
		break;
	case AF_INET6:
		memcpy(call->peer_ip.ipv6_addr,
		       trans->peer->srx.transport.sin6.sin6_addr.in6_u.u6_addr8,
		       sizeof(call->peer_ip.ipv6_addr));
		break;
	}
	call->epoch = call->conn->epoch;
	call->service_id = call->conn->service_id;
	call->in_clientflag = call->conn->in_clientflag;
	/* Add the new call to the hashtable */
	rxrpc_call_hash_add(call);

	spin_lock(&call->conn->trans->peer->lock);
	list_add(&call->error_link, &call->conn->trans->peer->error_targets);
	spin_unlock(&call->conn->trans->peer->lock);

	call->lifetimer.expires = jiffies + rxrpc_call_max_lifetime * HZ;
	call->lifetimer.expires = jiffies + rxrpc_max_call_lifetime;
	add_timer(&call->lifetimer);

	_leave(" = %p", call);
@@ -320,9 +489,12 @@ struct rxrpc_call *rxrpc_incoming_call(struct rxrpc_sock *rx,
		parent = *p;
		call = rb_entry(parent, struct rxrpc_call, conn_node);

		if (call_id < call->call_id)
		/* The tree is sorted in order of the __be32 value without
		 * turning it into host order.
		 */
		if ((__force u32)call_id < (__force u32)call->call_id)
			p = &(*p)->rb_left;
		else if (call_id > call->call_id)
		else if ((__force u32)call_id > (__force u32)call->call_id)
			p = &(*p)->rb_right;
		else
			goto old_call;
@@ -347,9 +519,31 @@ struct rxrpc_call *rxrpc_incoming_call(struct rxrpc_sock *rx,
	list_add_tail(&call->link, &rxrpc_calls);
	write_unlock_bh(&rxrpc_call_lock);

	/* Record copies of information for hashtable lookup */
	call->proto = rx->proto;
	call->local = conn->trans->local;
	switch (call->proto) {
	case AF_INET:
		call->peer_ip.ipv4_addr =
			conn->trans->peer->srx.transport.sin.sin_addr.s_addr;
		break;
	case AF_INET6:
		memcpy(call->peer_ip.ipv6_addr,
		       conn->trans->peer->srx.transport.sin6.sin6_addr.in6_u.u6_addr8,
		       sizeof(call->peer_ip.ipv6_addr));
		break;
	default:
		break;
	}
	call->epoch = conn->epoch;
	call->service_id = conn->service_id;
	call->in_clientflag = conn->in_clientflag;
	/* Add the new call to the hashtable */
	rxrpc_call_hash_add(call);

	_net("CALL incoming %d on CONN %d", call->debug_id, call->conn->debug_id);

	call->lifetimer.expires = jiffies + rxrpc_call_max_lifetime * HZ;
	call->lifetimer.expires = jiffies + rxrpc_max_call_lifetime;
	add_timer(&call->lifetimer);
	_leave(" = %p {%d} [new]", call, call->debug_id);
	return call;
@@ -533,7 +727,7 @@ void rxrpc_release_call(struct rxrpc_call *call)
	del_timer_sync(&call->resend_timer);
	del_timer_sync(&call->ack_timer);
	del_timer_sync(&call->lifetimer);
	call->deadspan.expires = jiffies + rxrpc_dead_call_timeout * HZ;
	call->deadspan.expires = jiffies + rxrpc_dead_call_expiry;
	add_timer(&call->deadspan);

	_leave("");
@@ -665,6 +859,9 @@ static void rxrpc_cleanup_call(struct rxrpc_call *call)
		rxrpc_put_connection(call->conn);
	}

	/* Remove the call from the hash */
	rxrpc_call_hash_del(call);

	if (call->acks_window) {
		_debug("kill Tx window %d",
		       CIRC_CNT(call->acks_head, call->acks_tail,
Loading