Merge branch 'xps-symmretric-queue-selection' (97680ade) · Commits · e / devices / android_kernel_teracube_emerald

Documentation/ABI/testing/sysfs-class-net-queues

+11 −0

Original line number	Diff line number	Diff line
		@@ -42,6 +42,17 @@ Description:
		network device transmit queue. Possible vaules depend on the
		number of available CPU(s) in the system.

		What: /sys/class/<iface>/queues/tx-<queue>/xps_rxqs
		Date: June 2018
		KernelVersion: 4.18.0
		Contact: netdev@vger.kernel.org
		Description:
		Mask of the receive queue(s) currently enabled to participate
		into the Transmit Packet Steering packet processing flow for this
		network device transmit queue. Possible values depend on the
		number of available receive queue(s) in the network device.
		Default is disabled.

		What: /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time
		Date: November 2011
		KernelVersion: 3.3

Documentation/networking/scaling.txt

+50 −11

Original line number	Diff line number	Diff line
		@@ -366,8 +366,13 @@ XPS: Transmit Packet Steering

		Transmit Packet Steering is a mechanism for intelligently selecting
		which transmit queue to use when transmitting a packet on a multi-queue
		device. To accomplish this, a mapping from CPU to hardware queue(s) is
		recorded. The goal of this mapping is usually to assign queues
		device. This can be accomplished by recording two kinds of maps, either
		a mapping of CPU to hardware queue(s) or a mapping of receive queue(s)
		to hardware transmit queue(s).

		1. XPS using CPUs map

		The goal of this mapping is usually to assign queues
		exclusively to a subset of CPUs, where the transmit completions for
		these queues are processed on a CPU within this set. This choice
		provides two benefits. First, contention on the device queue lock is
		@@ -377,15 +382,40 @@ transmit queue). Secondly, cache miss rate on transmit completion is
		reduced, in particular for data cache lines that hold the sk_buff
		structures.

		XPS is configured per transmit queue by setting a bitmap of CPUs that
		may use that queue to transmit. The reverse mapping, from CPUs to
		transmit queues, is computed and maintained for each network device.
		When transmitting the first packet in a flow, the function
		get_xps_queue() is called to select a queue. This function uses the ID
		of the running CPU as a key into the CPU-to-queue lookup table. If the
		2. XPS using receive queues map

		This mapping is used to pick transmit queue based on the receive
		queue(s) map configuration set by the administrator. A set of receive
		queues can be mapped to a set of transmit queues (many:many), although
		the common use case is a 1:1 mapping. This will enable sending packets
		on the same queue associations for transmit and receive. This is useful for
		busy polling multi-threaded workloads where there are challenges in
		associating a given CPU to a given application thread. The application
		threads are not pinned to CPUs and each thread handles packets
		received on a single queue. The receive queue number is cached in the
		socket for the connection. In this model, sending the packets on the same
		transmit queue corresponding to the associated receive queue has benefits
		in keeping the CPU overhead low. Transmit completion work is locked into
		the same queue-association that a given application is polling on. This
		avoids the overhead of triggering an interrupt on another CPU. When the
		application cleans up the packets during the busy poll, transmit completion
		may be processed along with it in the same thread context and so result in
		reduced latency.

		XPS is configured per transmit queue by setting a bitmap of
		CPUs/receive-queues that may use that queue to transmit. The reverse
		mapping, from CPUs to transmit queues or from receive-queues to transmit
		queues, is computed and maintained for each network device. When
		transmitting the first packet in a flow, the function get_xps_queue() is
		called to select a queue. This function uses the ID of the receive queue
		for the socket connection for a match in the receive queue-to-transmit queue
		lookup table. Alternatively, this function can also use the ID of the
		running CPU as a key into the CPU-to-queue lookup table. If the
		ID matches a single queue, that is used for transmission. If multiple
		queues match, one is selected by using the flow hash to compute an index
		into the set.
		into the set. When selecting the transmit queue based on receive queue(s)
		map, the transmit device is not validated against the receive device as it
		requires expensive lookup operation in the datapath.

		The queue chosen for transmitting a particular flow is saved in the
		corresponding socket structure for the flow (e.g. a TCP connection).
		@@ -404,11 +434,15 @@ acknowledged.

		XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by
		default for SMP). The functionality remains disabled until explicitly
		configured. To enable XPS, the bitmap of CPUs that may use a transmit
		queue is configured using the sysfs file entry:
		configured. To enable XPS, the bitmap of CPUs/receive-queues that may
		use a transmit queue is configured using the sysfs file entry:

		For selection based on CPUs map:
		/sys/class/net/<dev>/queues/tx-<n>/xps_cpus

		For selection based on receive-queues map:
		/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs

		== Suggested Configuration

		For a network device with a single transmission queue, XPS configuration
		@@ -421,6 +455,11 @@ best CPUs to share a given queue are probably those that share the cache
		with the CPU that processes transmit completions for that queue
		(transmit interrupts).

		For transmit queue selection based on receive queue(s), XPS has to be
		explicitly configured mapping receive-queue(s) to transmit queue(s). If the
		user configuration for receive-queue map does not apply, then the transmit
		queue is selected based on the CPUs map.

		Per TX Queue rate limitation:
		=============================

include/linux/cpumask.h

+8 −3

Original line number	Diff line number	Diff line
		@@ -115,12 +115,17 @@ extern struct cpumask __cpu_active_mask;
		#define cpu_active(cpu) ((cpu) == 0)
		#endif

		/* verify cpu argument to cpumask_* operators */
		static inline unsigned int cpumask_check(unsigned int cpu)
		static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits)
		{
		#ifdef CONFIG_DEBUG_PER_CPU_MAPS
		WARN_ON_ONCE(cpu >= nr_cpumask_bits);
		WARN_ON_ONCE(cpu >= bits);
		#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
		}

		/* verify cpu argument to cpumask_* operators */
		static inline unsigned int cpumask_check(unsigned int cpu)
		{
		cpu_max_bits_warn(cpu, nr_cpumask_bits);
		return cpu;
		}

include/linux/netdevice.h

+95 −3

Original line number	Diff line number	Diff line
		@@ -731,10 +731,15 @@ struct xps_map {
		*/
		struct xps_dev_maps {
		struct rcu_head rcu;
		struct xps_map __rcu *cpu_map[0];
		struct xps_map __rcu attr_map[0]; / Either CPUs map or RXQs map */
		};
		#define XPS_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) + \

		#define XPS_CPU_DEV_MAPS_SIZE(_tcs) (sizeof(struct xps_dev_maps) + \
		(nr_cpu_ids * (_tcs) * sizeof(struct xps_map *)))

		#define XPS_RXQ_DEV_MAPS_SIZE(_tcs, _rxqs) (sizeof(struct xps_dev_maps) +\
		(_rxqs * (_tcs) * sizeof(struct xps_map *)))

		#endif /* CONFIG_XPS */

		#define TC_MAX_QUEUE 16
		@@ -1910,7 +1915,8 @@ struct net_device {
		int watchdog_timeo;

		#ifdef CONFIG_XPS
		struct xps_dev_maps __rcu *xps_maps;
		struct xps_dev_maps __rcu *xps_cpus_map;
		struct xps_dev_maps __rcu *xps_rxqs_map;
		#endif
		#ifdef CONFIG_NET_CLS_ACT
		struct mini_Qdisc __rcu *miniq_egress;
		@@ -3259,6 +3265,92 @@ static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
		#ifdef CONFIG_XPS
		int netif_set_xps_queue(struct net_device dev, const struct cpumask mask,
		u16 index);
		int __netif_set_xps_queue(struct net_device dev, const unsigned long mask,
		u16 index, bool is_rxqs_map);

		/**
		* netif_attr_test_mask - Test a CPU or Rx queue set in a mask
		* @j: CPU/Rx queue index
		* @mask: bitmask of all cpus/rx queues
		* @nr_bits: number of bits in the bitmask
		*
		* Test if a CPU or Rx queue index is set in a mask of all CPU/Rx queues.
		*/
		static inline bool netif_attr_test_mask(unsigned long j,
		const unsigned long *mask,
		unsigned int nr_bits)
		{
		cpu_max_bits_warn(j, nr_bits);
		return test_bit(j, mask);
		}

		/**
		* netif_attr_test_online - Test for online CPU/Rx queue
		* @j: CPU/Rx queue index
		* @online_mask: bitmask for CPUs/Rx queues that are online
		* @nr_bits: number of bits in the bitmask
		*
		* Returns true if a CPU/Rx queue is online.
		*/
		static inline bool netif_attr_test_online(unsigned long j,
		const unsigned long *online_mask,
		unsigned int nr_bits)
		{
		cpu_max_bits_warn(j, nr_bits);

		if (online_mask)
		return test_bit(j, online_mask);

		return (j < nr_bits);
		}

		/**
		* netif_attrmask_next - get the next CPU/Rx queue in a cpu/Rx queues mask
		* @n: CPU/Rx queue index
		* @srcp: the cpumask/Rx queue mask pointer
		* @nr_bits: number of bits in the bitmask
		*
		* Returns >= nr_bits if no further CPUs/Rx queues set.
		*/
		static inline unsigned int netif_attrmask_next(int n, const unsigned long *srcp,
		unsigned int nr_bits)
		{
		/* -1 is a legal arg here. */
		if (n != -1)
		cpu_max_bits_warn(n, nr_bits);

		if (srcp)
		return find_next_bit(srcp, nr_bits, n + 1);

		return n + 1;
		}

		/**
		* netif_attrmask_next_and - get the next CPU/Rx queue in src1p & src2p
		* @n: CPU/Rx queue index
		* @src1p: the first CPUs/Rx queues mask pointer
		* @src2p: the second CPUs/Rx queues mask pointer
		* @nr_bits: number of bits in the bitmask
		*
		* Returns >= nr_bits if no further CPUs/Rx queues set in both.
		*/
		static inline int netif_attrmask_next_and(int n, const unsigned long *src1p,
		const unsigned long *src2p,
		unsigned int nr_bits)
		{
		/* -1 is a legal arg here. */
		if (n != -1)
		cpu_max_bits_warn(n, nr_bits);

		if (src1p && src2p)
		return find_next_and_bit(src1p, src2p, nr_bits, n + 1);
		else if (src1p)
		return find_next_bit(src1p, nr_bits, n + 1);
		else if (src2p)
		return find_next_bit(src2p, nr_bits, n + 1);

		return n + 1;
		}
		#else
		static inline int netif_set_xps_queue(struct net_device *dev,
		const struct cpumask *mask,

include/net/busy_poll.h

+1 −0

Original line number	Diff line number	Diff line
		@@ -151,6 +151,7 @@ static inline void sk_mark_napi_id(struct sock sk, const struct sk_buff skb)
		#ifdef CONFIG_NET_RX_BUSY_POLL
		sk->sk_napi_id = skb->napi_id;
		#endif
		sk_rx_queue_set(sk, skb);
		}

		/* variant used for unconnected sockets */