Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit e9ec8045 authored by David S. Miller's avatar David S. Miller
Browse files

Merge branch 'Modify-action-API-for-implementing-lockless-actions'



Vlad Buslov says:

====================
Modify action API for implementing lockless actions

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a first step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in action API that
prevents it from being executed concurrently. This patch set does not
completely unlock rules or actions update path. Additional patch sets
are required to refactor individual actions and filters update for
parallel execution.

As a preparation for executing TC rules update handlers without rtnl
lock, action API code was audited to determine areas that assume
external synchronization with rtnl lock and must be changed to allow
safe concurrent access with following results:

1. Action idr is already protected with spinlock. However, some code
   paths assume that idr state is not changes between several
   consecutive tcf_idr_* function calls.
2. tc_action reference and bind counters are implemented as plain
   integers. They purpose was to allow single actions to be shared
   between multiple filters, not to provide means for concurrent
   modification.
3. tc_action 'cookie' pointer field is not protected against
   modification.
4. Action API functions, that work with set of actions, use intrusive
   linked list, which cannot be used concurrently without additional
   synchronization.
5. Action API functions don't take reference to actions while using
   them, assuming external synchronization with rtnl lock.

Following solutions to these problems are implemented:

1. To remove assumption that idr state doesn't change between tcf_idr_*
   calls, implement new functions that atomically perform several
   operations on idr without releasing idr spinlock. (function to
   atomically lookup and delete action by index, function to atomically
   check if action exists and allocate new one if necessary, etc.)
2. Use atomic operations on counters to make them suitable for
   concurrent get/put operations.
3. Data that 'cookie' points to is never modified, so it enough to
   refactor it to rcu pointer to prevent concurrent de-allocation.
4. Action API doesn't actually use any linked list specific operations
   on actions intrusive linked list, so it can be refactored to array in
   straightforward manner.
5. Always take reference to action while accessing it in action API.
   tcf_idr_search function modified to take reference to action before
   returning it, so there is no way to lookup an action without
   incrementing its reference counter. All users of this function are
   modified to release the reference, after they done using action. With
   all users using reference counting, it is now safe to concurrently
   delete actions.

Additionally, actions init function signature was expanded with
'rtnl_held' argument, that allows actions that have internal dependency
on rtnl lock to take/release it when necessary.

Since only shared state in action API module are actions themselves and
action idr, these changes are sufficient to not to rely on global rtnl
lock for protection of internal action API data structures.

Changes from V5 to V6:
- Rebase on current net-next
- When action is deleted, set pointer in actions array to NULL to
  prevent double freeing.

Changes from V4 to V5:
- Change action delete API to track actions that were deleted, to
  prevent releasing them on error.

Changes from V3 to V4:
- Expand cover letter.
- Reduce actions array size in tcf_action_init_1.
- Rebase on latest net-next.

Changes from V2 to V3:
- Re-send with changelog copied to individual patches.

Changes from V1 to V2:
- Removed redundant actions ops lookup during delete.
- Merge action ops delete definition and implementation.
- Assume all actions have delete implemented and don't check for it
  explicitly.
- Resplit action lookup/release code to prevent memory leaks in
  individual patches.
- Make __tcf_idr_check function static
- Remove unique idr insertion function. Change original idr insert to do
  the same thing.
- Merge changes that take reference to action when performing lookup and
  changes that account for this additional reference when dumping action
  to user space into single patch.
- Change convoluted commit message.
- Rename "unlocked" to "rtnl_held" for clarity.
- Remove estimator lock add patch.
- Refactor action check-alloc code into standalone function.
- Rename tcf_idr_find_delete to tcf_idr_delete_index.
- Rearrange variable definitions in tc_action_delete.
- Add patch that refactors action API code to use array of pointers to
  actions instead of intrusive linked list.
- Expand cover letter.
====================

Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents b2335040 90b73b77
Loading
Loading
Loading
Loading
+17 −8
Original line number Original line Diff line number Diff line
@@ -6,6 +6,7 @@
 * Public action API for classifiers/qdiscs
 * Public action API for classifiers/qdiscs
*/
*/


#include <linux/refcount.h>
#include <net/sch_generic.h>
#include <net/sch_generic.h>
#include <net/pkt_sched.h>
#include <net/pkt_sched.h>
#include <net/net_namespace.h>
#include <net/net_namespace.h>
@@ -26,8 +27,8 @@ struct tc_action {
	struct tcf_idrinfo		*idrinfo;
	struct tcf_idrinfo		*idrinfo;


	u32				tcfa_index;
	u32				tcfa_index;
	int				tcfa_refcnt;
	refcount_t			tcfa_refcnt;
	int				tcfa_bindcnt;
	atomic_t			tcfa_bindcnt;
	u32				tcfa_capab;
	u32				tcfa_capab;
	int				tcfa_action;
	int				tcfa_action;
	struct tcf_t			tcfa_tm;
	struct tcf_t			tcfa_tm;
@@ -37,7 +38,7 @@ struct tc_action {
	spinlock_t			tcfa_lock;
	spinlock_t			tcfa_lock;
	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
	struct gnet_stats_queue __percpu *cpu_qstats;
	struct gnet_stats_queue __percpu *cpu_qstats;
	struct tc_cookie	*act_cookie;
	struct tc_cookie	__rcu *act_cookie;
	struct tcf_chain	*goto_chain;
	struct tcf_chain	*goto_chain;
};
};
#define tcf_index	common.tcfa_index
#define tcf_index	common.tcfa_index
@@ -91,7 +92,8 @@ struct tc_action_ops {
			  struct netlink_ext_ack *extack);
			  struct netlink_ext_ack *extack);
	int     (*init)(struct net *net, struct nlattr *nla,
	int     (*init)(struct net *net, struct nlattr *nla,
			struct nlattr *est, struct tc_action **act, int ovr,
			struct nlattr *est, struct tc_action **act, int ovr,
			int bind, struct netlink_ext_ack *extack);
			int bind, bool rtnl_held,
			struct netlink_ext_ack *extack);
	int     (*walk)(struct net *, struct sk_buff *,
	int     (*walk)(struct net *, struct sk_buff *,
			struct netlink_callback *, int,
			struct netlink_callback *, int,
			const struct tc_action_ops *,
			const struct tc_action_ops *,
@@ -99,6 +101,7 @@ struct tc_action_ops {
	void	(*stats_update)(struct tc_action *, u64, u32, u64);
	void	(*stats_update)(struct tc_action *, u64, u32, u64);
	size_t  (*get_fill_size)(const struct tc_action *act);
	size_t  (*get_fill_size)(const struct tc_action *act);
	struct net_device *(*get_dev)(const struct tc_action *a);
	struct net_device *(*get_dev)(const struct tc_action *a);
	int     (*delete)(struct net *net, u32 index);
};
};


struct tc_action_net {
struct tc_action_net {
@@ -151,6 +154,10 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
		   int bind, bool cpustats);
		   int bind, bool cpustats);
void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);


void tcf_idr_cleanup(struct tc_action_net *tn, u32 index);
int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
			struct tc_action **a, int bind);
int tcf_idr_delete_index(struct tc_action_net *tn, u32 index);
int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);


static inline int tcf_idr_release(struct tc_action *a, bool bind)
static inline int tcf_idr_release(struct tc_action *a, bool bind)
@@ -161,18 +168,20 @@ static inline int tcf_idr_release(struct tc_action *a, bool bind)
int tcf_register_action(struct tc_action_ops *a, struct pernet_operations *ops);
int tcf_register_action(struct tc_action_ops *a, struct pernet_operations *ops);
int tcf_unregister_action(struct tc_action_ops *a,
int tcf_unregister_action(struct tc_action_ops *a,
			  struct pernet_operations *ops);
			  struct pernet_operations *ops);
int tcf_action_destroy(struct list_head *actions, int bind);
int tcf_action_destroy(struct tc_action *actions[], int bind);
int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
		    int nr_actions, struct tcf_result *res);
		    int nr_actions, struct tcf_result *res);
int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
		    struct nlattr *est, char *name, int ovr, int bind,
		    struct nlattr *est, char *name, int ovr, int bind,
		    struct list_head *actions, size_t *attr_size,
		    struct tc_action *actions[], size_t *attr_size,
		    struct netlink_ext_ack *extack);
		    bool rtnl_held, struct netlink_ext_ack *extack);
struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
				    struct nlattr *nla, struct nlattr *est,
				    struct nlattr *nla, struct nlattr *est,
				    char *name, int ovr, int bind,
				    char *name, int ovr, int bind,
				    bool rtnl_held,
				    struct netlink_ext_ack *extack);
				    struct netlink_ext_ack *extack);
int tcf_action_dump(struct sk_buff *skb, struct list_head *, int, int);
int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind,
		    int ref);
int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int);
int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int);
int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int);
int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int);
+1 −0
Original line number Original line Diff line number Diff line
@@ -781,6 +781,7 @@ struct tc_mqprio_qopt_offload {
struct tc_cookie {
struct tc_cookie {
	u8  *data;
	u8  *data;
	u32 len;
	u32 len;
	struct rcu_head rcu;
};
};


struct tc_qopt_offload_stats {
struct tc_qopt_offload_stats {
+290 −125

File changed.

Preview size limit exceeded, changes collapsed.

+24 −10
Original line number Original line Diff line number Diff line
@@ -141,8 +141,8 @@ static int tcf_bpf_dump(struct sk_buff *skb, struct tc_action *act,
	struct tcf_bpf *prog = to_bpf(act);
	struct tcf_bpf *prog = to_bpf(act);
	struct tc_act_bpf opt = {
	struct tc_act_bpf opt = {
		.index   = prog->tcf_index,
		.index   = prog->tcf_index,
		.refcnt  = prog->tcf_refcnt - ref,
		.refcnt  = refcount_read(&prog->tcf_refcnt) - ref,
		.bindcnt = prog->tcf_bindcnt - bind,
		.bindcnt = atomic_read(&prog->tcf_bindcnt) - bind,
		.action  = prog->tcf_action,
		.action  = prog->tcf_action,
	};
	};
	struct tcf_t tm;
	struct tcf_t tm;
@@ -276,7 +276,8 @@ static void tcf_bpf_prog_fill_cfg(const struct tcf_bpf *prog,


static int tcf_bpf_init(struct net *net, struct nlattr *nla,
static int tcf_bpf_init(struct net *net, struct nlattr *nla,
			struct nlattr *est, struct tc_action **act,
			struct nlattr *est, struct tc_action **act,
			int replace, int bind, struct netlink_ext_ack *extack)
			int replace, int bind, bool rtnl_held,
			struct netlink_ext_ack *extack)
{
{
	struct tc_action_net *tn = net_generic(net, bpf_net_id);
	struct tc_action_net *tn = net_generic(net, bpf_net_id);
	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
@@ -298,22 +299,28 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,


	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);


	if (!tcf_idr_check(tn, parm->index, act, bind)) {
	ret = tcf_idr_check_alloc(tn, &parm->index, act, bind);
	if (!ret) {
		ret = tcf_idr_create(tn, parm->index, est, act,
		ret = tcf_idr_create(tn, parm->index, est, act,
				     &act_bpf_ops, bind, true);
				     &act_bpf_ops, bind, true);
		if (ret < 0)
		if (ret < 0) {
			tcf_idr_cleanup(tn, parm->index);
			return ret;
			return ret;
		}


		res = ACT_P_CREATED;
		res = ACT_P_CREATED;
	} else {
	} else if (ret > 0) {
		/* Don't override defaults. */
		/* Don't override defaults. */
		if (bind)
		if (bind)
			return 0;
			return 0;


		if (!replace) {
			tcf_idr_release(*act, bind);
			tcf_idr_release(*act, bind);
		if (!replace)
			return -EEXIST;
			return -EEXIST;
		}
		}
	} else {
		return ret;
	}


	is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
	is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
	is_ebpf = tb[TCA_ACT_BPF_FD];
	is_ebpf = tb[TCA_ACT_BPF_FD];
@@ -355,7 +362,6 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,


	return res;
	return res;
out:
out:
	if (res == ACT_P_CREATED)
	tcf_idr_release(*act, bind);
	tcf_idr_release(*act, bind);


	return ret;
	return ret;
@@ -387,6 +393,13 @@ static int tcf_bpf_search(struct net *net, struct tc_action **a, u32 index,
	return tcf_idr_search(tn, a, index);
	return tcf_idr_search(tn, a, index);
}
}


static int tcf_bpf_delete(struct net *net, u32 index)
{
	struct tc_action_net *tn = net_generic(net, bpf_net_id);

	return tcf_idr_delete_index(tn, index);
}

static struct tc_action_ops act_bpf_ops __read_mostly = {
static struct tc_action_ops act_bpf_ops __read_mostly = {
	.kind		=	"bpf",
	.kind		=	"bpf",
	.type		=	TCA_ACT_BPF,
	.type		=	TCA_ACT_BPF,
@@ -397,6 +410,7 @@ static struct tc_action_ops act_bpf_ops __read_mostly = {
	.init		=	tcf_bpf_init,
	.init		=	tcf_bpf_init,
	.walk		=	tcf_bpf_walker,
	.walk		=	tcf_bpf_walker,
	.lookup		=	tcf_bpf_search,
	.lookup		=	tcf_bpf_search,
	.delete		=	tcf_bpf_delete,
	.size		=	sizeof(struct tcf_bpf),
	.size		=	sizeof(struct tcf_bpf),
};
};


+21 −8
Original line number Original line Diff line number Diff line
@@ -96,7 +96,7 @@ static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = {


static int tcf_connmark_init(struct net *net, struct nlattr *nla,
static int tcf_connmark_init(struct net *net, struct nlattr *nla,
			     struct nlattr *est, struct tc_action **a,
			     struct nlattr *est, struct tc_action **a,
			     int ovr, int bind,
			     int ovr, int bind, bool rtnl_held,
			     struct netlink_ext_ack *extack)
			     struct netlink_ext_ack *extack)
{
{
	struct tc_action_net *tn = net_generic(net, connmark_net_id);
	struct tc_action_net *tn = net_generic(net, connmark_net_id);
@@ -118,11 +118,14 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,


	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
	parm = nla_data(tb[TCA_CONNMARK_PARMS]);


	if (!tcf_idr_check(tn, parm->index, a, bind)) {
	ret = tcf_idr_check_alloc(tn, &parm->index, a, bind);
	if (!ret) {
		ret = tcf_idr_create(tn, parm->index, est, a,
		ret = tcf_idr_create(tn, parm->index, est, a,
				     &act_connmark_ops, bind, false);
				     &act_connmark_ops, bind, false);
		if (ret)
		if (ret) {
			tcf_idr_cleanup(tn, parm->index);
			return ret;
			return ret;
		}


		ci = to_connmark(*a);
		ci = to_connmark(*a);
		ci->tcf_action = parm->action;
		ci->tcf_action = parm->action;
@@ -131,16 +134,18 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,


		tcf_idr_insert(tn, *a);
		tcf_idr_insert(tn, *a);
		ret = ACT_P_CREATED;
		ret = ACT_P_CREATED;
	} else {
	} else if (ret > 0) {
		ci = to_connmark(*a);
		ci = to_connmark(*a);
		if (bind)
		if (bind)
			return 0;
			return 0;
		if (!ovr) {
			tcf_idr_release(*a, bind);
			tcf_idr_release(*a, bind);
		if (!ovr)
			return -EEXIST;
			return -EEXIST;
		}
		/* replacing action and zone */
		/* replacing action and zone */
		ci->tcf_action = parm->action;
		ci->tcf_action = parm->action;
		ci->zone = parm->zone;
		ci->zone = parm->zone;
		ret = 0;
	}
	}


	return ret;
	return ret;
@@ -154,8 +159,8 @@ static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,


	struct tc_connmark opt = {
	struct tc_connmark opt = {
		.index   = ci->tcf_index,
		.index   = ci->tcf_index,
		.refcnt  = ci->tcf_refcnt - ref,
		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
		.bindcnt = ci->tcf_bindcnt - bind,
		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
		.action  = ci->tcf_action,
		.action  = ci->tcf_action,
		.zone   = ci->zone,
		.zone   = ci->zone,
	};
	};
@@ -193,6 +198,13 @@ static int tcf_connmark_search(struct net *net, struct tc_action **a, u32 index,
	return tcf_idr_search(tn, a, index);
	return tcf_idr_search(tn, a, index);
}
}


static int tcf_connmark_delete(struct net *net, u32 index)
{
	struct tc_action_net *tn = net_generic(net, connmark_net_id);

	return tcf_idr_delete_index(tn, index);
}

static struct tc_action_ops act_connmark_ops = {
static struct tc_action_ops act_connmark_ops = {
	.kind		=	"connmark",
	.kind		=	"connmark",
	.type		=	TCA_ACT_CONNMARK,
	.type		=	TCA_ACT_CONNMARK,
@@ -202,6 +214,7 @@ static struct tc_action_ops act_connmark_ops = {
	.init		=	tcf_connmark_init,
	.init		=	tcf_connmark_init,
	.walk		=	tcf_connmark_walker,
	.walk		=	tcf_connmark_walker,
	.lookup		=	tcf_connmark_search,
	.lookup		=	tcf_connmark_search,
	.delete		=	tcf_connmark_delete,
	.size		=	sizeof(struct tcf_connmark_info),
	.size		=	sizeof(struct tcf_connmark_info),
};
};


Loading