Loading Documentation/networking/multiqueue.txt +1 −78 Original line number Diff line number Diff line Loading @@ -3,19 +3,11 @@ =========================================== Section 1: Base driver requirements for implementing multiqueue support Section 2: Qdisc support for multiqueue devices Section 3: Brief howto using PRIO or RR for multiqueue devices Intro: Kernel support for multiqueue devices --------------------------------------------------------- Kernel support for multiqueue devices is only an API that is presented to the netdevice layer for base drivers to implement. This feature is part of the core networking stack, and all network devices will be running on the multiqueue-aware stack. If a base driver only has one queue, then these changes are transparent to that driver. Kernel support for multiqueue devices is always present. Section 1: Base driver requirements for implementing multiqueue support ----------------------------------------------------------------------- Loading Loading @@ -43,73 +35,4 @@ bitmap on device initialization. Below is an example from e1000: netdev->features |= NETIF_F_MULTI_QUEUE; #endif Section 2: Qdisc support for multiqueue devices ----------------------------------------------- Currently two qdiscs support multiqueue devices. A new round-robin qdisc, sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to bands and queues, and will store the queue mapping into skb->queue_mapping. Use this field in the base driver to determine which queue to send the skb to. sch_rr has been added for hardware that doesn't want scheduling policies from software, so it's a straight round-robin qdisc. It uses the same syntax and classification priomap that sch_prio uses, so it should be intuitive to configure for people who've used sch_prio. In order to utilitize the multiqueue features of the qdiscs, the network device layer needs to enable multiple queue support. This can be done by selecting NETDEVICES_MULTIQUEUE under Drivers. The PRIO qdisc naturally plugs into a multiqueue device. If NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of bands requested is compared to the number of queues on the hardware. If they are equal, it sets a one-to-one mapping up between the queues and bands. If they're not equal, it will not load the qdisc. This is the same behavior for RR. Once the association is made, any skb that is classified will have skb->queue_mapping set, which will allow the driver to properly queue skb's to multiple queues. Section 3: Brief howto using PRIO and RR for multiqueue devices --------------------------------------------------------------- The userspace command 'tc,' part of the iproute2 package, is used to configure qdiscs. To add the PRIO qdisc to your network device, assuming the device is called eth0, run the following command: # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue This will create 4 bands, 0 being highest priority, and associate those bands to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping would look like: band 0 => queue 0 band 1 => queue 1 band 2 => queue 2 band 3 => queue 3 Traffic will begin flowing through each queue if your TOS values are assigning traffic across the various bands. For example, ssh traffic will always try to go out band 0 based on TOS -> Linux priority conversion (realtime traffic), so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" traffic classification, which is band 1. Therefore pings will be send out queue 1 on the NIC. Note the use of the multiqueue keyword. This is only in versions of iproute2 that support multiqueue networking devices; if this is omitted when loading a qdisc onto a multiqueue device, the qdisc will load and operate the same if it were loaded onto a single-queue device (i.e. - sends all traffic to queue 0). Another alternative to multiqueue band allocation can be done by using the multiqueue option and specify 0 bands. If this is the case, the qdisc will allocate the number of bands to equal the number of queues that the device reports, and bring the qdisc online. The behavior of tc filters remains the same, where it will override TOS priority classification. Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> drivers/net/Kconfig +0 −8 Original line number Diff line number Diff line Loading @@ -26,14 +26,6 @@ menuconfig NETDEVICES # that for each of the symbols. if NETDEVICES config NETDEVICES_MULTIQUEUE bool "Netdevice multiple hardware queue support" ---help--- Say Y here if you want to allow the network stack to use multiple hardware TX queues on an ethernet device. Most people will say N here. config IFB tristate "Intermediate Functional Block support" depends on NET_CLS_ACT Loading drivers/net/cpmac.c +0 −14 Original line number Diff line number Diff line Loading @@ -569,11 +569,7 @@ static int cpmac_start_xmit(struct sk_buff *skb, struct net_device *dev) len = max(skb->len, ETH_ZLEN); queue = skb_get_queue_mapping(skb); #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(dev, queue); #else netif_stop_queue(dev); #endif desc = &priv->desc_ring[queue]; if (unlikely(desc->dataflags & CPMAC_OWN)) { Loading Loading @@ -626,24 +622,14 @@ static void cpmac_end_xmit(struct net_device *dev, int queue) dev_kfree_skb_irq(desc->skb); desc->skb = NULL; #ifdef CONFIG_NETDEVICES_MULTIQUEUE if (netif_subqueue_stopped(dev, queue)) netif_wake_subqueue(dev, queue); #else if (netif_queue_stopped(dev)) netif_wake_queue(dev); #endif } else { if (netif_msg_tx_err(priv) && net_ratelimit()) printk(KERN_WARNING "%s: end_xmit: spurious interrupt\n", dev->name); #ifdef CONFIG_NETDEVICES_MULTIQUEUE if (netif_subqueue_stopped(dev, queue)) netif_wake_subqueue(dev, queue); #else if (netif_queue_stopped(dev)) netif_wake_queue(dev); #endif } } Loading drivers/net/ixgbe/ixgbe_ethtool.c +0 −6 Original line number Diff line number Diff line Loading @@ -252,21 +252,15 @@ static int ixgbe_set_tso(struct net_device *netdev, u32 data) netdev->features |= NETIF_F_TSO; netdev->features |= NETIF_F_TSO6; } else { #ifdef CONFIG_NETDEVICES_MULTIQUEUE struct ixgbe_adapter *adapter = netdev_priv(netdev); int i; #endif netif_stop_queue(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_stop_subqueue(netdev, i); #endif netdev->features &= ~NETIF_F_TSO; netdev->features &= ~NETIF_F_TSO6; #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_start_subqueue(netdev, i); #endif netif_start_queue(netdev); } return 0; Loading drivers/net/ixgbe/ixgbe_main.c +0 −40 Original line number Diff line number Diff line Loading @@ -266,28 +266,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_adapter *adapter, * sees the new next_to_clean. */ smp_mb(); #ifdef CONFIG_NETDEVICES_MULTIQUEUE if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) && !test_bit(__IXGBE_DOWN, &adapter->state)) { netif_wake_subqueue(netdev, tx_ring->queue_index); adapter->restart_queue++; } #else if (netif_queue_stopped(netdev) && !test_bit(__IXGBE_DOWN, &adapter->state)) { netif_wake_queue(netdev); adapter->restart_queue++; } #endif } if (adapter->detect_tx_hung) if (ixgbe_check_tx_hang(adapter, tx_ring, eop, eop_desc)) #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(netdev, tx_ring->queue_index); #else netif_stop_queue(netdev); #endif if (total_tx_packets >= tx_ring->work_limit) IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, tx_ring->eims_value); Loading Loading @@ -2192,11 +2180,7 @@ static void __devinit ixgbe_set_num_queues(struct ixgbe_adapter *adapter) case (IXGBE_FLAG_RSS_ENABLED): rss_m = 0xF; nrq = rss_i; #ifdef CONFIG_NETDEVICES_MULTIQUEUE ntq = rss_i; #else ntq = 1; #endif break; case 0: default: Loading Loading @@ -2370,10 +2354,8 @@ try_msi: } out: #ifdef CONFIG_NETDEVICES_MULTIQUEUE /* Notify the stack of the (possibly) reduced Tx Queue count. */ adapter->netdev->egress_subqueue_count = adapter->num_tx_queues; #endif return err; } Loading Loading @@ -2910,9 +2892,7 @@ static void ixgbe_watchdog(unsigned long data) struct net_device *netdev = adapter->netdev; bool link_up; u32 link_speed = 0; #ifdef CONFIG_NETDEVICES_MULTIQUEUE int i; #endif adapter->hw.mac.ops.check_link(&adapter->hw, &(link_speed), &link_up); Loading @@ -2934,10 +2914,8 @@ static void ixgbe_watchdog(unsigned long data) netif_carrier_on(netdev); netif_wake_queue(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_wake_subqueue(netdev, i); #endif } else { /* Force detection of hung controller */ adapter->detect_tx_hung = true; Loading Loading @@ -3264,11 +3242,7 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev, { struct ixgbe_adapter *adapter = netdev_priv(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(netdev, tx_ring->queue_index); #else netif_stop_queue(netdev); #endif /* Herbert's original patch had: * smp_mb__after_netif_stop_queue(); * but since that doesn't exist yet, just open code it. */ Loading @@ -3280,11 +3254,7 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev, return -EBUSY; /* A reprieve! - use start_queue because it doesn't call schedule */ #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_wake_subqueue(netdev, tx_ring->queue_index); #else netif_wake_queue(netdev); #endif ++adapter->restart_queue; return 0; } Loading Loading @@ -3312,9 +3282,7 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) unsigned int f; unsigned int nr_frags = skb_shinfo(skb)->nr_frags; len -= skb->data_len; #ifdef CONFIG_NETDEVICES_MULTIQUEUE r_idx = (adapter->num_tx_queues - 1) & skb->queue_mapping; #endif tx_ring = &adapter->tx_ring[r_idx]; Loading Loading @@ -3502,11 +3470,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, pci_set_master(pdev); pci_save_state(pdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE netdev = alloc_etherdev_mq(sizeof(struct ixgbe_adapter), MAX_TX_QUEUES); #else netdev = alloc_etherdev(sizeof(struct ixgbe_adapter)); #endif if (!netdev) { err = -ENOMEM; goto err_alloc_etherdev; Loading Loading @@ -3598,9 +3562,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, if (pci_using_dac) netdev->features |= NETIF_F_HIGHDMA; #ifdef CONFIG_NETDEVICES_MULTIQUEUE netdev->features |= NETIF_F_MULTI_QUEUE; #endif /* make sure the EEPROM is good */ if (ixgbe_validate_eeprom_checksum(hw, NULL) < 0) { Loading Loading @@ -3668,10 +3630,8 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, netif_carrier_off(netdev); netif_stop_queue(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_stop_subqueue(netdev, i); #endif ixgbe_napi_add_all(adapter); Loading Loading
Documentation/networking/multiqueue.txt +1 −78 Original line number Diff line number Diff line Loading @@ -3,19 +3,11 @@ =========================================== Section 1: Base driver requirements for implementing multiqueue support Section 2: Qdisc support for multiqueue devices Section 3: Brief howto using PRIO or RR for multiqueue devices Intro: Kernel support for multiqueue devices --------------------------------------------------------- Kernel support for multiqueue devices is only an API that is presented to the netdevice layer for base drivers to implement. This feature is part of the core networking stack, and all network devices will be running on the multiqueue-aware stack. If a base driver only has one queue, then these changes are transparent to that driver. Kernel support for multiqueue devices is always present. Section 1: Base driver requirements for implementing multiqueue support ----------------------------------------------------------------------- Loading Loading @@ -43,73 +35,4 @@ bitmap on device initialization. Below is an example from e1000: netdev->features |= NETIF_F_MULTI_QUEUE; #endif Section 2: Qdisc support for multiqueue devices ----------------------------------------------- Currently two qdiscs support multiqueue devices. A new round-robin qdisc, sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to bands and queues, and will store the queue mapping into skb->queue_mapping. Use this field in the base driver to determine which queue to send the skb to. sch_rr has been added for hardware that doesn't want scheduling policies from software, so it's a straight round-robin qdisc. It uses the same syntax and classification priomap that sch_prio uses, so it should be intuitive to configure for people who've used sch_prio. In order to utilitize the multiqueue features of the qdiscs, the network device layer needs to enable multiple queue support. This can be done by selecting NETDEVICES_MULTIQUEUE under Drivers. The PRIO qdisc naturally plugs into a multiqueue device. If NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of bands requested is compared to the number of queues on the hardware. If they are equal, it sets a one-to-one mapping up between the queues and bands. If they're not equal, it will not load the qdisc. This is the same behavior for RR. Once the association is made, any skb that is classified will have skb->queue_mapping set, which will allow the driver to properly queue skb's to multiple queues. Section 3: Brief howto using PRIO and RR for multiqueue devices --------------------------------------------------------------- The userspace command 'tc,' part of the iproute2 package, is used to configure qdiscs. To add the PRIO qdisc to your network device, assuming the device is called eth0, run the following command: # tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue This will create 4 bands, 0 being highest priority, and associate those bands to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping would look like: band 0 => queue 0 band 1 => queue 1 band 2 => queue 2 band 3 => queue 3 Traffic will begin flowing through each queue if your TOS values are assigning traffic across the various bands. For example, ssh traffic will always try to go out band 0 based on TOS -> Linux priority conversion (realtime traffic), so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" traffic classification, which is band 1. Therefore pings will be send out queue 1 on the NIC. Note the use of the multiqueue keyword. This is only in versions of iproute2 that support multiqueue networking devices; if this is omitted when loading a qdisc onto a multiqueue device, the qdisc will load and operate the same if it were loaded onto a single-queue device (i.e. - sends all traffic to queue 0). Another alternative to multiqueue band allocation can be done by using the multiqueue option and specify 0 bands. If this is the case, the qdisc will allocate the number of bands to equal the number of queues that the device reports, and bring the qdisc online. The behavior of tc filters remains the same, where it will override TOS priority classification. Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com>
drivers/net/Kconfig +0 −8 Original line number Diff line number Diff line Loading @@ -26,14 +26,6 @@ menuconfig NETDEVICES # that for each of the symbols. if NETDEVICES config NETDEVICES_MULTIQUEUE bool "Netdevice multiple hardware queue support" ---help--- Say Y here if you want to allow the network stack to use multiple hardware TX queues on an ethernet device. Most people will say N here. config IFB tristate "Intermediate Functional Block support" depends on NET_CLS_ACT Loading
drivers/net/cpmac.c +0 −14 Original line number Diff line number Diff line Loading @@ -569,11 +569,7 @@ static int cpmac_start_xmit(struct sk_buff *skb, struct net_device *dev) len = max(skb->len, ETH_ZLEN); queue = skb_get_queue_mapping(skb); #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(dev, queue); #else netif_stop_queue(dev); #endif desc = &priv->desc_ring[queue]; if (unlikely(desc->dataflags & CPMAC_OWN)) { Loading Loading @@ -626,24 +622,14 @@ static void cpmac_end_xmit(struct net_device *dev, int queue) dev_kfree_skb_irq(desc->skb); desc->skb = NULL; #ifdef CONFIG_NETDEVICES_MULTIQUEUE if (netif_subqueue_stopped(dev, queue)) netif_wake_subqueue(dev, queue); #else if (netif_queue_stopped(dev)) netif_wake_queue(dev); #endif } else { if (netif_msg_tx_err(priv) && net_ratelimit()) printk(KERN_WARNING "%s: end_xmit: spurious interrupt\n", dev->name); #ifdef CONFIG_NETDEVICES_MULTIQUEUE if (netif_subqueue_stopped(dev, queue)) netif_wake_subqueue(dev, queue); #else if (netif_queue_stopped(dev)) netif_wake_queue(dev); #endif } } Loading
drivers/net/ixgbe/ixgbe_ethtool.c +0 −6 Original line number Diff line number Diff line Loading @@ -252,21 +252,15 @@ static int ixgbe_set_tso(struct net_device *netdev, u32 data) netdev->features |= NETIF_F_TSO; netdev->features |= NETIF_F_TSO6; } else { #ifdef CONFIG_NETDEVICES_MULTIQUEUE struct ixgbe_adapter *adapter = netdev_priv(netdev); int i; #endif netif_stop_queue(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_stop_subqueue(netdev, i); #endif netdev->features &= ~NETIF_F_TSO; netdev->features &= ~NETIF_F_TSO6; #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_start_subqueue(netdev, i); #endif netif_start_queue(netdev); } return 0; Loading
drivers/net/ixgbe/ixgbe_main.c +0 −40 Original line number Diff line number Diff line Loading @@ -266,28 +266,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_adapter *adapter, * sees the new next_to_clean. */ smp_mb(); #ifdef CONFIG_NETDEVICES_MULTIQUEUE if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) && !test_bit(__IXGBE_DOWN, &adapter->state)) { netif_wake_subqueue(netdev, tx_ring->queue_index); adapter->restart_queue++; } #else if (netif_queue_stopped(netdev) && !test_bit(__IXGBE_DOWN, &adapter->state)) { netif_wake_queue(netdev); adapter->restart_queue++; } #endif } if (adapter->detect_tx_hung) if (ixgbe_check_tx_hang(adapter, tx_ring, eop, eop_desc)) #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(netdev, tx_ring->queue_index); #else netif_stop_queue(netdev); #endif if (total_tx_packets >= tx_ring->work_limit) IXGBE_WRITE_REG(&adapter->hw, IXGBE_EICS, tx_ring->eims_value); Loading Loading @@ -2192,11 +2180,7 @@ static void __devinit ixgbe_set_num_queues(struct ixgbe_adapter *adapter) case (IXGBE_FLAG_RSS_ENABLED): rss_m = 0xF; nrq = rss_i; #ifdef CONFIG_NETDEVICES_MULTIQUEUE ntq = rss_i; #else ntq = 1; #endif break; case 0: default: Loading Loading @@ -2370,10 +2354,8 @@ try_msi: } out: #ifdef CONFIG_NETDEVICES_MULTIQUEUE /* Notify the stack of the (possibly) reduced Tx Queue count. */ adapter->netdev->egress_subqueue_count = adapter->num_tx_queues; #endif return err; } Loading Loading @@ -2910,9 +2892,7 @@ static void ixgbe_watchdog(unsigned long data) struct net_device *netdev = adapter->netdev; bool link_up; u32 link_speed = 0; #ifdef CONFIG_NETDEVICES_MULTIQUEUE int i; #endif adapter->hw.mac.ops.check_link(&adapter->hw, &(link_speed), &link_up); Loading @@ -2934,10 +2914,8 @@ static void ixgbe_watchdog(unsigned long data) netif_carrier_on(netdev); netif_wake_queue(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_wake_subqueue(netdev, i); #endif } else { /* Force detection of hung controller */ adapter->detect_tx_hung = true; Loading Loading @@ -3264,11 +3242,7 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev, { struct ixgbe_adapter *adapter = netdev_priv(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_stop_subqueue(netdev, tx_ring->queue_index); #else netif_stop_queue(netdev); #endif /* Herbert's original patch had: * smp_mb__after_netif_stop_queue(); * but since that doesn't exist yet, just open code it. */ Loading @@ -3280,11 +3254,7 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev, return -EBUSY; /* A reprieve! - use start_queue because it doesn't call schedule */ #ifdef CONFIG_NETDEVICES_MULTIQUEUE netif_wake_subqueue(netdev, tx_ring->queue_index); #else netif_wake_queue(netdev); #endif ++adapter->restart_queue; return 0; } Loading Loading @@ -3312,9 +3282,7 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) unsigned int f; unsigned int nr_frags = skb_shinfo(skb)->nr_frags; len -= skb->data_len; #ifdef CONFIG_NETDEVICES_MULTIQUEUE r_idx = (adapter->num_tx_queues - 1) & skb->queue_mapping; #endif tx_ring = &adapter->tx_ring[r_idx]; Loading Loading @@ -3502,11 +3470,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, pci_set_master(pdev); pci_save_state(pdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE netdev = alloc_etherdev_mq(sizeof(struct ixgbe_adapter), MAX_TX_QUEUES); #else netdev = alloc_etherdev(sizeof(struct ixgbe_adapter)); #endif if (!netdev) { err = -ENOMEM; goto err_alloc_etherdev; Loading Loading @@ -3598,9 +3562,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, if (pci_using_dac) netdev->features |= NETIF_F_HIGHDMA; #ifdef CONFIG_NETDEVICES_MULTIQUEUE netdev->features |= NETIF_F_MULTI_QUEUE; #endif /* make sure the EEPROM is good */ if (ixgbe_validate_eeprom_checksum(hw, NULL) < 0) { Loading Loading @@ -3668,10 +3630,8 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, netif_carrier_off(netdev); netif_stop_queue(netdev); #ifdef CONFIG_NETDEVICES_MULTIQUEUE for (i = 0; i < adapter->num_tx_queues; i++) netif_stop_subqueue(netdev, i); #endif ixgbe_napi_add_all(adapter); Loading