Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 8dfedc53 authored by David S. Miller's avatar David S. Miller
Browse files

Merge branch 'udp-scalability-improvements'

Paolo Abeni says:

====================
udp: scalability improvements

This patch series implement an idea suggested by Eric Dumazet to
reduce the contention of the udp sk_receive_queue lock when the socket is
under flood.

An ancillary queue is added to the udp socket, and the socket always
tries first to read packets from such queue. If it's empty, we splice
the content from sk_receive_queue into the ancillary queue.

The first patch introduces some helpers to keep the udp code small, and the
following two implement the ancillary queue strategy. The code is split
to hopefully help the reviewing process.

The measured overall gain under udp flood is up to the 30% depending on
the numa layout and the number of ingress queue used by the relevant nic.

The performance numbers have been gathered using pktgen as sender, with 64
bytes packets, random src port on a host b2b connected via a 10Gbs link
with the dut.

The receiver used the udp_sink program by Jesper [1] and an h/w l4 rx hash on
the ingress nic, so that the number of ingress nic rx queues hit by the udp
traffic could be controlled via ethtool -L.

The udp_sink program was bound to the first idle cpu, to get more
stable numbers.

On a single numa node receiver:

nic rx queues           vanilla                 patched kernel
1                       1820 kpps               1900 kpps
2                       1950 kpps               2500 kpps
16                      1670 kpps               2120 kpps

When using a single nic rx queue, busy polling was also enabled,
elsewhere, in the above scenario, the bh processing becomes the bottle-neck
and this produces large artifacts in the measured performances (e.g.
improving the udp sink run time, decreases the overall tput, since more
action from the scheduler comes into play).

[1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c



v1 -> v2:
  Patches 1/3 and 2/3 are unchanged, in patch 3/3 the rx_queue_lock_held param
  of udp_rmem_release() is now a bool.
====================

Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 9dca599b 6dfb4367
Loading
Loading
Loading
Loading
+7 −0
Original line number Diff line number Diff line
@@ -3056,6 +3056,13 @@ static inline void skb_frag_list_init(struct sk_buff *skb)

int __skb_wait_for_more_packets(struct sock *sk, int *err, long *timeo_p,
				const struct sk_buff *skb);
struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
					  struct sk_buff_head *queue,
					  unsigned int flags,
					  void (*destructor)(struct sock *sk,
							   struct sk_buff *skb),
					  int *peeked, int *off, int *err,
					  struct sk_buff **last);
struct sk_buff *__skb_try_recv_datagram(struct sock *sk, unsigned flags,
					void (*destructor)(struct sock *sk,
							   struct sk_buff *skb),
+3 −0
Original line number Diff line number Diff line
@@ -80,6 +80,9 @@ struct udp_sock {
						struct sk_buff *skb,
						int nhoff);

	/* udp_recvmsg try to use this before splicing sk_receive_queue */
	struct sk_buff_head	reader_queue ____cacheline_aligned_in_smp;

	/* This field is dirtied by udp_recvmsg() */
	int		forward_deficit;
};
+2 −2
Original line number Diff line number Diff line
@@ -2035,8 +2035,8 @@ void sk_reset_timer(struct sock *sk, struct timer_list *timer,

void sk_stop_timer(struct sock *sk, struct timer_list *timer);

int __sk_queue_drop_skb(struct sock *sk, struct sk_buff *skb,
			unsigned int flags,
int __sk_queue_drop_skb(struct sock *sk, struct sk_buff_head *sk_queue,
			struct sk_buff *skb, unsigned int flags,
			void (*destructor)(struct sock *sk,
					   struct sk_buff *skb));
int __sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+2 −7
Original line number Diff line number Diff line
@@ -249,13 +249,8 @@ void udp_destruct_sock(struct sock *sk);
void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len);
int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb);
void udp_skb_destructor(struct sock *sk, struct sk_buff *skb);
static inline struct sk_buff *
__skb_recv_udp(struct sock *sk, unsigned int flags, int noblock, int *peeked,
	       int *off, int *err)
{
	return __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
				   udp_skb_destructor, peeked, off, err);
}
struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
			       int noblock, int *peeked, int *off, int *err);
static inline struct sk_buff *skb_recv_udp(struct sock *sk, unsigned int flags,
					   int noblock, int *err)
{
+1 −1
Original line number Diff line number Diff line
@@ -26,8 +26,8 @@ static __inline__ int udplite_getfrag(void *from, char *to, int offset,
/* Designate sk as UDP-Lite socket */
static inline int udplite_sk_init(struct sock *sk)
{
	udp_init_sock(sk);
	udp_sk(sk)->pcflag = UDPLITE_BIT;
	sk->sk_destruct = udp_destruct_sock;
	return 0;
}

Loading