Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 9ab89acc authored by David S. Miller's avatar David S. Miller
Browse files

Merge branch 'xen-netback-netfront-multiqueue'

Wei Liu says:

====================
This is rebased version of Andrew's V8 patch series. The original cover letter:

--------------------
xen-net{back,	front}: Multiple transmit and receive queues

This patch series implements multiple transmit and receive queues (i.e.
multiple shared rings) for the xen virtual network interfaces.

The series is split up as follows:
 - Patch 1 brings the 'grant_copy_op' array back into struct xenvif, in
   preparation for multi-queue support. See the patch itself for more details.
- Patches 2 and 4 factor out the queue-specific data for netback and
  netfront respectively, and modify the rest of the code to use these
  as appropriate.
- Patches 3 and 5 introduce new XenStore keys to negotiate and use
  multiple shared rings and event channels, and code to connect these
  as appropriate.
- Patch 6 documents the XenStore keys required for the new feature
  in include/xen/interface/io/netif.h

All other transmit and receive processing remains unchanged, i.e. there
is a kthread per queue and a NAPI context per queue.

The performance of these patches has been analysed in detail, with
results available at:

http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing



To summarise:
  * Using multiple queues allows a VM to transmit at line rate on a 10
    Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
    with a single queue.
  * For intra-host VM--VM traffic, eight queues provide 171% of the
    throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
  * There is a corresponding increase in total CPU usage, i.e. this is a
    scaling out over available resources, not an efficiency improvement.
  * Results depend on the availability of sufficient CPUs, as well as the
    distribution of interrupts and the distribution of TCP streams across
    the queues.

Queue selection is currently achieved via an L4 hash on the packet (i.e.
TCP src/dst port, IP src/dst address) and is not negotiated between the
frontend and backend, since only one option exists. Future patches to
support other frontends (particularly Windows) will need to add some
capability to negotiate not only the hash algorithm selection, but also
allow the frontend to specify some parameters to this.

Note that queue selection is a decision by the transmitting system about
which queue to use for a particular packet. In general, the algorithm
may differ between the frontend and the backend with no adverse effects.

Queue-specific XenStore entries for ring references and event channels
are stored hierarchically, i.e. under .../queue-N/... where N varies
from 0 to one less than the requested number of queues (inclusive). If
only one queue is requested, it falls back to the flat structure where
the ring references and event channels are written at the same level as
other vif information.

V8:
- Squash the queue error handling code into patch 3.
- Update the documentation (patch 6) according to comments on the
  equivalent patch to Xen.

V7:
- Rebase on latest net-next, which includes the netback grant mapping
  patch series from Zoltan Kiss
- Reduce QUEUE_NAME_SIZE by 1 to avoid double-counting the trailing '\0'
- Simplify the queue hashing by using (hash % num_queues) instead of
  multiply & shift.
- Add ratelimited warning for invalid queue selection.
- Fix error handling to correctly tear down already setup queues.
- Use dev->real_num_tx_queues instead of separately maintaining a
  count of the number of queues.

V6:
- Use 'max_queues' as the module param. name for both netback and netfront.

V5:
- Fix bug in xenvif_free() that could lead to an attempt to transmit an
  skb after the queue structures had been freed.
- Improve the XenStore protocol documentation in netif.h.
- Fix IRQ_NAME_SIZE double-accounting for null terminator.
- Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
- Don't initialise a local variable that is set in both branches (xspath).

V4:
- Add MODULE_PARM_DESC() for the multi-queue parameters for netback
  and netfront modules.
- Move del_timer_sync() in netfront to after unregister_netdev, which
  restores the order in which these functions were called before applying
  these patches.

V3:
- Further indentation and style fixups.

V2:
- Rebase onto net-next.
- Change queue->number to queue->id.
- Add atomic operations around the small number of stats variables that
  are not queue-specific or per-cpu.
- Fixup formatting and style issues.
- XenStore protocol changes documented in netif.h.
- Default max. number of queues to num_online_cpus().
- Check requested number of queues does not exceed maximum.
--------------------

I rebased this on top of net-next. No functional change is introduced.  The
patch that needed some extra care was "xen-netback: Factor queue-specific data
into queue struct" because it clashed with a fix introduced in net. A simple
test of creating guest, iperf, then shutting down guest worked as expected.

The last patch fixes a minor problem that queue name is not initialised in
xen-netfront, resulting in names like "-tx" "-rx" in /proc/interrupt.

Changes since v9 (no functional change introduced):
* include commit summary in the commit message of first patch
* fold David Vrabel's Reviewed-by into last patch
====================

Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 9bcc14d2 8b715010
Loading
Loading
Loading
Loading
+72 −35
Original line number Original line Diff line number Diff line
@@ -99,22 +99,43 @@ struct xenvif_rx_meta {
 */
 */
#define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN
#define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN


struct xenvif {
/* Queue name is interface name with "-qNNN" appended */
	/* Unique identifier for this interface. */
#define QUEUE_NAME_SIZE (IFNAMSIZ + 5)
	domid_t          domid;
	unsigned int     handle;


	/* Is this interface disabled? True when backend discovers
/* IRQ name is queue name with "-tx" or "-rx" appended */
	 * frontend is rogue.
#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)

struct xenvif;

struct xenvif_stats {
	/* Stats fields to be updated per-queue.
	 * A subset of struct net_device_stats that contains only the
	 * fields that are updated in netback.c for each queue.
	 */
	 */
	bool disabled;
	unsigned int rx_bytes;
	unsigned int rx_packets;
	unsigned int tx_bytes;
	unsigned int tx_packets;

	/* Additional stats used by xenvif */
	unsigned long rx_gso_checksum_fixup;
	unsigned long tx_zerocopy_sent;
	unsigned long tx_zerocopy_success;
	unsigned long tx_zerocopy_fail;
	unsigned long tx_frag_overflow;
};

struct xenvif_queue { /* Per-queue data for xenvif */
	unsigned int id; /* Queue ID, 0-based */
	char name[QUEUE_NAME_SIZE]; /* DEVNAME-qN */
	struct xenvif *vif; /* Parent VIF */


	/* Use NAPI for guest TX */
	/* Use NAPI for guest TX */
	struct napi_struct napi;
	struct napi_struct napi;
	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
	unsigned int tx_irq;
	unsigned int tx_irq;
	/* Only used when feature-split-event-channels = 1 */
	/* Only used when feature-split-event-channels = 1 */
	char tx_irq_name[IFNAMSIZ+4]; /* DEVNAME-tx */
	char tx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-tx */
	struct xen_netif_tx_back_ring tx;
	struct xen_netif_tx_back_ring tx;
	struct sk_buff_head tx_queue;
	struct sk_buff_head tx_queue;
	struct page *mmap_pages[MAX_PENDING_REQS];
	struct page *mmap_pages[MAX_PENDING_REQS];
@@ -150,7 +171,7 @@ struct xenvif {
	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
	/* When feature-split-event-channels = 0, tx_irq = rx_irq. */
	unsigned int rx_irq;
	unsigned int rx_irq;
	/* Only used when feature-split-event-channels = 1 */
	/* Only used when feature-split-event-channels = 1 */
	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
	char rx_irq_name[IRQ_NAME_SIZE]; /* DEVNAME-qN-rx */
	struct xen_netif_rx_back_ring rx;
	struct xen_netif_rx_back_ring rx;
	struct sk_buff_head rx_queue;
	struct sk_buff_head rx_queue;
	RING_IDX rx_last_skb_slots;
	RING_IDX rx_last_skb_slots;
@@ -158,14 +179,29 @@ struct xenvif {


	struct timer_list wake_queue;
	struct timer_list wake_queue;


	/* This array is allocated seperately as it is large */
	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
	struct gnttab_copy *grant_copy_op;


	/* We create one meta structure per ring request we consume, so
	/* We create one meta structure per ring request we consume, so
	 * the maximum number is the same as the ring size.
	 * the maximum number is the same as the ring size.
	 */
	 */
	struct xenvif_rx_meta meta[XEN_NETIF_RX_RING_SIZE];
	struct xenvif_rx_meta meta[XEN_NETIF_RX_RING_SIZE];


	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
	unsigned long   credit_bytes;
	unsigned long   credit_usec;
	unsigned long   remaining_credit;
	struct timer_list credit_timeout;
	u64 credit_window_start;

	/* Statistics */
	struct xenvif_stats stats;
};

struct xenvif {
	/* Unique identifier for this interface. */
	domid_t          domid;
	unsigned int     handle;

	u8               fe_dev_addr[6];
	u8               fe_dev_addr[6];


	/* Frontend feature information. */
	/* Frontend feature information. */
@@ -179,19 +215,13 @@ struct xenvif {
	/* Internal feature information. */
	/* Internal feature information. */
	u8 can_queue:1;	    /* can queue packets for receiver? */
	u8 can_queue:1;	    /* can queue packets for receiver? */


	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
	/* Is this interface disabled? True when backend discovers
	unsigned long   credit_bytes;
	 * frontend is rogue.
	unsigned long   credit_usec;
	 */
	unsigned long   remaining_credit;
	bool disabled;
	struct timer_list credit_timeout;
	u64 credit_window_start;


	/* Statistics */
	/* Queues */
	unsigned long rx_gso_checksum_fixup;
	struct xenvif_queue *queues;
	unsigned long tx_zerocopy_sent;
	unsigned long tx_zerocopy_success;
	unsigned long tx_zerocopy_fail;
	unsigned long tx_frag_overflow;


	/* Miscellaneous private stuff. */
	/* Miscellaneous private stuff. */
	struct net_device *dev;
	struct net_device *dev;
@@ -206,7 +236,10 @@ struct xenvif *xenvif_alloc(struct device *parent,
			    domid_t domid,
			    domid_t domid,
			    unsigned int handle);
			    unsigned int handle);


int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
int xenvif_init_queue(struct xenvif_queue *queue);
void xenvif_deinit_queue(struct xenvif_queue *queue);

int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
		   unsigned int rx_evtchn);
		   unsigned int rx_evtchn);
void xenvif_disconnect(struct xenvif *vif);
void xenvif_disconnect(struct xenvif *vif);
@@ -217,44 +250,47 @@ void xenvif_xenbus_fini(void);


int xenvif_schedulable(struct xenvif *vif);
int xenvif_schedulable(struct xenvif *vif);


int xenvif_must_stop_queue(struct xenvif *vif);
int xenvif_must_stop_queue(struct xenvif_queue *queue);

int xenvif_queue_stopped(struct xenvif_queue *queue);
void xenvif_wake_queue(struct xenvif_queue *queue);


/* (Un)Map communication rings. */
/* (Un)Map communication rings. */
void xenvif_unmap_frontend_rings(struct xenvif *vif);
void xenvif_unmap_frontend_rings(struct xenvif_queue *queue);
int xenvif_map_frontend_rings(struct xenvif *vif,
int xenvif_map_frontend_rings(struct xenvif_queue *queue,
			      grant_ref_t tx_ring_ref,
			      grant_ref_t tx_ring_ref,
			      grant_ref_t rx_ring_ref);
			      grant_ref_t rx_ring_ref);


/* Check for SKBs from frontend and schedule backend processing */
/* Check for SKBs from frontend and schedule backend processing */
void xenvif_napi_schedule_or_enable_events(struct xenvif *vif);
void xenvif_napi_schedule_or_enable_events(struct xenvif_queue *queue);


/* Prevent the device from generating any further traffic. */
/* Prevent the device from generating any further traffic. */
void xenvif_carrier_off(struct xenvif *vif);
void xenvif_carrier_off(struct xenvif *vif);


int xenvif_tx_action(struct xenvif *vif, int budget);
int xenvif_tx_action(struct xenvif_queue *queue, int budget);


int xenvif_kthread_guest_rx(void *data);
int xenvif_kthread_guest_rx(void *data);
void xenvif_kick_thread(struct xenvif *vif);
void xenvif_kick_thread(struct xenvif_queue *queue);


int xenvif_dealloc_kthread(void *data);
int xenvif_dealloc_kthread(void *data);


/* Determine whether the needed number of slots (req) are available,
/* Determine whether the needed number of slots (req) are available,
 * and set req_event if not.
 * and set req_event if not.
 */
 */
bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed);


void xenvif_stop_queue(struct xenvif *vif);
void xenvif_carrier_on(struct xenvif *vif);


/* Callback from stack when TX packet can be released */
/* Callback from stack when TX packet can be released */
void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);


/* Unmap a pending page and release it back to the guest */
/* Unmap a pending page and release it back to the guest */
void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);


static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
{
{
	return MAX_PENDING_REQS -
	return MAX_PENDING_REQS -
		vif->pending_prod + vif->pending_cons;
		queue->pending_prod + queue->pending_cons;
}
}


/* Callback from stack when TX packet can be released */
/* Callback from stack when TX packet can be released */
@@ -264,5 +300,6 @@ extern bool separate_tx_rx_irq;


extern unsigned int rx_drain_timeout_msecs;
extern unsigned int rx_drain_timeout_msecs;
extern unsigned int rx_drain_timeout_jiffies;
extern unsigned int rx_drain_timeout_jiffies;
extern unsigned int xenvif_max_queues;


#endif /* __XEN_NETBACK__COMMON_H__ */
#endif /* __XEN_NETBACK__COMMON_H__ */
Loading