Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (c4cde580) · Commits · e / devices / android_kernel_fairphone_FP5

Documentation/bpf/index.rst

+1 −0

Original line number	Diff line number	Diff line
		@@ -42,6 +42,7 @@ Program types
		.. toctree::
		:maxdepth: 1

		prog_cgroup_sockopt
		prog_cgroup_sysctl
		prog_flow_dissector

Documentation/bpf/prog_cgroup_sockopt.rst

0 → 100644

+93 −0

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: GPL-2.0

		============================
		BPF_PROG_TYPE_CGROUP_SOCKOPT
		============================

		``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two
		cgroup hooks:

		* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt``
		system call.
		* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt``
		system call.

		The context (``struct bpf_sockopt``) has associated socket (``sk``) and
		all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``.

		BPF_CGROUP_SETSOCKOPT
		=====================

		``BPF_CGROUP_SETSOCKOPT`` is triggered before the kernel handling of
		sockopt and it has writable context: it can modify the supplied arguments
		before passing them down to the kernel. This hook has access to the cgroup
		and socket local storage.

		If BPF program sets ``optlen`` to -1, the control will be returned
		back to the userspace after all other BPF programs in the cgroup
		chain finish (i.e. kernel ``setsockopt`` handling will not be executed).

		Note, that ``optlen`` can not be increased beyond the user-supplied
		value. It can only be decreased or set to -1. Any other value will
		trigger ``EFAULT``.

		Return Type
		-----------

		* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
		* ``1`` - success, continue with next BPF program in the cgroup chain.

		BPF_CGROUP_GETSOCKOPT
		=====================

		``BPF_CGROUP_GETSOCKOPT`` is triggered after the kernel handing of
		sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval``
		if it's interested in whatever kernel has returned. BPF hook can override
		the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen``
		has been increased above initial ``getsockopt`` value (i.e. userspace
		buffer is too small), ``EFAULT`` is returned.

		This hook has access to the cgroup and socket local storage.

		Note, that the only acceptable value to set to ``retval`` is 0 and the
		original value that the kernel returned. Any other value will trigger
		``EFAULT``.

		Return Type
		-----------

		* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace.
		* ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return
		``retval`` from the syscall (note that this can be overwritten by
		the BPF program from the parent cgroup).

		Cgroup Inheritance
		==================

		Suppose, there is the following cgroup hierarchy where each cgroup
		has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with
		``BPF_F_ALLOW_MULTI`` flag::

		A (root, parent)
		\
		B (child)

		When the application calls ``getsockopt`` syscall from the cgroup B,
		the programs are executed from the bottom up: B, A. First program
		(B) sees the result of kernel's ``getsockopt``. It can optionally
		adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that
		control will be passed to the second (A) program which will see the
		same context as B including any potential modifications.

		Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to
		A and B, the trigger order is B, then A. If B does any changes
		to the input arguments (``level``, ``optname``, ``optval``, ``optlen``),
		then the next program in the chain (A) will see those changes,
		not the original input ``setsockopt`` arguments. The potentially
		modified values will be then passed down to the kernel.

		Example
		=======

		See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example
		of BPF program that handles socket options.

Documentation/networking/af_xdp.rst

+15 −1

Original line number	Diff line number	Diff line
		@@ -220,7 +220,21 @@ Usage
		In order to use AF_XDP sockets there are two parts needed. The
		user-space application and the XDP program. For a complete setup and
		usage example, please refer to the sample application. The user-space
		side is xdpsock_user.c and the XDP side xdpsock_kern.c.
		side is xdpsock_user.c and the XDP side is part of libbpf.

		The XDP code sample included in tools/lib/bpf/xsk.c is the following::

		SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
		{
		int index = ctx->rx_queue_index;

		// A set entry here means that the correspnding queue_id
		// has an active AF_XDP socket bound to it.
		if (bpf_map_lookup_elem(&xsks_map, &index))
		return bpf_redirect_map(&xsks_map, index, 0);

		return XDP_PASS;
		}

		Naive ring dequeue and enqueue could look like this::

drivers/net/ethernet/intel/i40e/i40e_xsk.c

+7 −5

Original line number	Diff line number	Diff line
		@@ -641,8 +641,8 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
		struct i40e_tx_desc *tx_desc = NULL;
		struct i40e_tx_buffer *tx_bi;
		bool work_done = true;
		struct xdp_desc desc;
		dma_addr_t dma;
		u32 len;

		while (budget-- > 0) {
		if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
		@@ -651,21 +651,23 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
		break;
		}

		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
		break;

		dma_sync_single_for_device(xdp_ring->dev, dma, len,
		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);

		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
		DMA_BIDIRECTIONAL);

		tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
		tx_bi->bytecount = len;
		tx_bi->bytecount = desc.len;

		tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use);
		tx_desc->buffer_addr = cpu_to_le64(dma);
		tx_desc->cmd_type_offset_bsz =
		build_ctob(I40E_TX_DESC_CMD_ICRC
		\| I40E_TX_DESC_CMD_EOP,
		0, len, 0);
		0, desc.len, 0);

		xdp_ring->next_to_use++;
		if (xdp_ring->next_to_use == xdp_ring->count)

drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

+9 −6

Original line number	Diff line number	Diff line
		@@ -571,8 +571,9 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
		union ixgbe_adv_tx_desc *tx_desc = NULL;
		struct ixgbe_tx_buffer *tx_bi;
		bool work_done = true;
		u32 len, cmd_type;
		struct xdp_desc desc;
		dma_addr_t dma;
		u32 cmd_type;

		while (budget-- > 0) {
		if (unlikely(!ixgbe_desc_unused(xdp_ring)) \|\|
		@@ -581,14 +582,16 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
		break;
		}

		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &desc))
		break;

		dma_sync_single_for_device(xdp_ring->dev, dma, len,
		dma = xdp_umem_get_dma(xdp_ring->xsk_umem, desc.addr);

		dma_sync_single_for_device(xdp_ring->dev, dma, desc.len,
		DMA_BIDIRECTIONAL);

		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
		tx_bi->bytecount = len;
		tx_bi->bytecount = desc.len;
		tx_bi->xdpf = NULL;
		tx_bi->gso_segs = 1;

		@@ -599,10 +602,10 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
		cmd_type = IXGBE_ADVTXD_DTYP_DATA \|
		IXGBE_ADVTXD_DCMD_DEXT \|
		IXGBE_ADVTXD_DCMD_IFCS;
		cmd_type \|= len \| IXGBE_TXD_CMD;
		cmd_type \|= desc.len \| IXGBE_TXD_CMD;
		tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
		tx_desc->read.olinfo_status =
		cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
		cpu_to_le32(desc.len << IXGBE_ADVTXD_PAYLEN_SHIFT);

		xdp_ring->next_to_use++;
		if (xdp_ring->next_to_use == xdp_ring->count)