Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 8199d3a7 authored by Christoph Lameter's avatar Christoph Lameter Committed by Jeff Garzik
Browse files

[PATCH] A new 10GB Ethernet Driver by Chelsio Communications

A Linux driver for the Chelsio 10Gb Ethernet Network Controller by Chelsio
(http://www.chelsio.com

).  This driver supports the Chelsio N210 NIC and is
backward compatible with the Chelsio N110 model 10Gb NICs.  It supports
AMD64, EM64T and x86 systems.

Signed-off-by: default avatarTina Yang <tinay@chelsio.com>
Signed-off-by: default avatarScott Bardone <sbardone@chelsio.com>
Signed-off-by: default avatarChristoph Lameter <christoph@lameter.com>

Adrian said:

- my3126.c is unused (because t1_my3126_ops isn't used anywhere)
- what are the EXTRA_CFLAGS in drivers/net/chelsio/Makefile for?
- $(cxgb-y) in drivers/net/chelsio/Makefile seems to be unneeded
- completely unused global functions:
  - espi.c: t1_espi_get_intr_counts
  - sge.c: t1_sge_get_intr_counts
- the following functions can be made static:
  - sge.c: t1_espi_workaround
  - sge.c: t1_sge_tx
  - subr.c: __t1_tpi_read
  - subr.c: __t1_tpi_write
  - subr.c: t1_wait_op_done

shemminger said:

The performance recommendations in cxgb.txt are common to all fast devices,
and should be in one file rather than just for this device. I would rather
see ip-sysctl.txt updated or a new file on tuning recommendations started.
Some of them have consequences that aren't documented well.
For example, turning off TCP timestamps risks data corruption from sequence wrap.

A new driver shouldn't need so may #ifdef's unless you want to putit on older
vendor versions of 2.4

Some accessor and wrapper functions like:
        t1_pci_read_config_4
        adapter_name
        t1_malloc
are just annoying noise.

Why have useless dead code like:

/* Interrupt handler */
+static int pm3393_interrupt_handler(struct cmac *cmac)
+{
+       u32 master_intr_status;
+/*
+    1. Read master interrupt register.
+    2. Read BLOCK's interrupt status registers.
+    3. Handle BLOCK interrupts.
+*/

Jeff said:

step 1:  kill all the OS wrappers.

 And do you really need hooks for multiple MACs, when only one MAC is
 really supported?  Typically these hooks are at a higher level anyway --
 struct net_device.

From: Christoph Lameter <christoph@lameter

Driver modified as suggested by Pekka Enberg, Stephen Hemminger and Andrian
Bunk.  Reduces the size of the driver to ~260k.

- clean up tabs
- removed my3126.c
- removed 85% of suni1x10gexp_regs.h
- removed 80% of regs.h
- removed various calls, renamed variables/functions.
- removed system specific and other wrappers (usleep, msleep)
- removed dead code
- dropped redundant casts in osdep.h
- dropped redundant check of kfree
- dropped weird code (MODVERSIONS stuff)
- reduced number of #ifdefs
- use kcalloc now instead of kmalloc
- Add information about known issues with the driver
- Add information about authors

Signed-off-by: default avatarScott Bardone <sbardone@chelsio.com>
Signed-off-by: default avatarChristoph Lameter <christoph@lameter.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>

diff -puN /dev/null Documentation/networking/cxgb.txt
parent 88d7bd8c
Loading
Loading
Loading
Loading
+322 −0
Original line number Diff line number Diff line
                 Chelsio N210 10Gb Ethernet Network Controller

                         Driver Release Notes for Linux

                                 Version 2.1.0

                                 March 8, 2005

CONTENTS
========
 INTRODUCTION
 FEATURES
 PERFORMANCE
 DRIVER MESSAGES
 KNOWN ISSUES
 SUPPORT


INTRODUCTION
============

 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
 Controller. This driver supports the Chelsio N210 NIC and is backward
 compatible with the Chelsio N110 model 10Gb NICs. This driver supports AMD64
 and EM64T, and x86 systems.


FEATURES
========

 Adaptive Interrupts (adaptive-rx)
 ---------------------------------

  This feature provides an adaptive algorithm that adjusts the interrupt
  coalescing parameters, allowing the driver to dynamically adapt the latency
  settings to achieve the highest performance during various types of network
  load.

  The interface used to control this feature is ethtool. Please see the
  ethtool manpage for additional usage information.

  By default, adaptive-rx is disabled.
  To enable adaptive-rx:

      ethtool -C <interface> adaptive-rx on

  To disable adaptive-rx, use ethtool:

      ethtool -C <interface> adaptive-rx off

  After disabling adaptive-rx, the timer latency value will be set to 50us.
  You may set the timer latency after disabling adaptive-rx:

      ethtool -C <interface> rx-usecs <microseconds>

  An example to set the timer latency value to 100us on eth0:

      ethtool -C eth0 rx-usecs 100

  You may also provide a timer latency value while disabling adpative-rx:

      ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>

  If adaptive-rx is disabled and a timer latency value is specified, the timer
  will be set to the specified value until changed by the user or until
  adaptive-rx is enabled.

  To view the status of the adaptive-rx and timer latency values:

      ethtool -c <interface>


 TCP Segmentation Offloading (TSO) Support
 -----------------------------------------

  This feature, also known as "large send", enables a system's protocol stack
  to offload portions of outbound TCP processing to a network interface card
  thereby reducing system CPU utilization and enhancing performance.

  The interface used to control this feature is ethtool version 1.8 or higher.
  Please see the ethtool manpage for additional usage information.

  By default, TSO is enabled.
  To disable TSO:

      ethtool -K <interface> tso off

  To enable TSO:

      ethtool -K <interface> tso on

  To view the status of TSO:

      ethtool -k <interface>


PERFORMANCE
===========

 The following information is provided as an example of how to change system
 parameters for "performance tuning" an what value to use. You may or may not
 want to change these system parameters, depending on your server/workstation
 application. Doing so is not warranted in any way by Chelsio Communications,
 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
 of data or damage to equipment.

 Your distribution may have a different way of doing things, or you may prefer
 a different method. These commands are shown only to provide an example of
 what to do and are by no means definitive.

 Making any of the following system changes will only last until you reboot
 your system. You may want to write a script that runs at boot-up which
 includes the optimal settings for your system.

  Setting PCI Latency Timer:
      setpci -d 1425:* 0x0c.l=0x0000F800

  Disabling TCP timestamp:
      sysctl -w net.ipv4.tcp_timestamps=0

  Disabling SACK:
      sysctl -w net.ipv4.tcp_sack=0

  Setting TCP read buffers (min/default/max):
      sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"

  Setting TCP write buffers (min/pressure/max):
      sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"

  Setting TCP buffer space (min/pressure/max):
      sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"

  Setting large number of incoming connection requests (2.6.x only):
      sysctl -w net.ipv4.tcp_max_syn_backlog=3000

  Setting maximum receive socket buffer size:
      sysctl -w net.core.rmem_max=524287

  Setting maximum send socket buffer size:
      sysctl -w net.core.wmem_max=524287

  Setting default receive socket buffer size:
      sysctl -w net.core.rmem_default=524287

  Setting default send socket buffer size:
      sysctl -w net.core.wmem_default=524287

  Setting maximum option memory buffers:
      sysctl -w net.core.optmem_max=524287

  Setting maximum backlog (# of unprocessed packets before kernel drops):
      sysctl -w net.core.netdev_max_backlog=300000

  Set smp_affinity (on a multiprocessor system) to a single CPU:
      echo 00000001 > /proc/irq/<interrupt_number>/smp_affinity

  TCP window size for single connections:
   The receive buffer (RX_WINDOW) size must be at least as large as the
   Bandwidth-Delay Product of the communication link between the sender and
   receiver. Due to the variations of RTT, you may want to increase the buffer
   size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
   "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
   At 10Gb speeds, use the following formula:
       RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
       Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
   RX_WINDOW sizes of 256KB - 512KB should be sufficient.
   Setting the min, max, and default receive buffer (RX_WINDOW) size:
       sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"

  TCP window size for multiple connections:
   The receive buffer (RX_WINDOW) size may be calculated the same as single
   connections, but should be divided by the number of connections. The
   smaller window prevents congestion and facilitates better pacing,
   especially if/when MAC level flow control does not work well or when it is
   not supported on the machine. Experimentation may be necessary to attain
   the correct value. This method is provided as a starting point fot the
   correct receive buffer size.
   Setting the min, max, and default receive buffer (RX_WINDOW) size is
   performed in the same manner as single connection.


DRIVER MESSAGES
===============

 The following messages are the most common messages logged by syslog. These
 may be found in /var/log/messages.

  Driver up:
     Chelsio Network Driver - version 2.1.0

  NIC detected:
     eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit

  Link up:
     eth#: link is up at 10 Gbps, full duplex

  Link down:
     eth#: link is down


KNOWN ISSUES
============

 These issues have been identified during testing. The following information
 is provided as a workaround to the problem. In some cases, this problem is
 inherent to Linux or to a particular Linux Distribution and/or hardware
 platform.

  1. Large number of TCP retransmits on a multiprocessor (SMP) system.

      On a system with multiple CPUs, the interrupt (IRQ) for the network
      controller may be bound to more than one CPU. This will cause TCP
      retransmits if the packet data were to be split across different CPUs
      and re-assembled in a different order than expected.

      To eliminate the TCP retransmits, set smp_affinity on the particular
      interrupt to a single CPU. You can locate the interrupt (IRQ) used on
      the N110/N210 by using ifconfig:
          ifconfig <dev_name> | grep Interrupt
      Set the smp_affinity to a single CPU:
          echo 1 > /proc/irq/<interrupt_number>/smp_affinity

      It is highly suggested that you do not run the irqbalance daemon on your
      system, as this will change any smp_affinity setting you have applied.
      The irqbalance daemon runs on a 10 second interval and binds interrupts
      to the least loaded CPU determined by the daemon. To disable this daemon:
          chkconfig --level 2345 irqbalance off

      By default, some Linux distributions enable the kernel feature,
      irqbalance, which performs the same function as the daemon. To disable
      this feature, add the following line to your bootloader:
          noirqbalance

          Example using the Grub bootloader:
              title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
              root (hd0,0)
              kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
              initrd /initrd-2.4.21-27.ELsmp.img

  2. After running insmod, the driver is loaded and the incorrect network
     interface is brought up without running ifup.

      When using 2.4.x kernels, including RHEL kernels, the Linux kernel
      invokes a script named "hotplug". This script is primarily used to
      automatically bring up USB devices when they are plugged in, however,
      the script also attempts to automatically bring up a network interface
      after loading the kernel module. The hotplug script does this by scanning
      the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
      for HWADDR=<mac_address>.

      If the hotplug script does not find the HWADDRR within any of the
      ifcfg-eth# files, it will bring up the device with the next available
      interface name. If this interface is already configured for a different
      network card, your new interface will have incorrect IP address and
      network settings.

      To solve this issue, you can add the HWADDR=<mac_address> key to the
      interface config file of your network controller.

      To disable this "hotplug" feature, you may add the driver (module name)
      to the "blacklist" file located in /etc/hotplug. It has been noted that
      this does not work for network devices because the net.agent script
      does not use the blacklist file. Simply remove, or rename, the net.agent
      script located in /etc/hotplug to disable this feature.

  3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
     on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.

      If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
      chipset, you may experience the "133-Mhz Mode Split Completion Data
      Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
      bus PCI-X bus.

      AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
      can provide stale data via split completion cycles to a PCI-X card that
      is operating at 133 Mhz", causing data corruption.

      AMD's provides three workarounds for this problem, however, Chelsio
      recommends the first option for best performance with this bug:

        For 133Mhz secondary bus operation, limit the transaction length and
        the number of outstanding transactions, via BIOS configuration
        programming of the PCI-X card, to the following:

           Data Length (bytes): 2k
           Total allowed outstanding transactions: 1

      Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
      section 56, "133-MHz Mode Split Completion Data Corruption" for more
      details with this bug and workarounds suggested by AMD.


SUPPORT
=======

 If you have problems with the software or hardware, please contact our
 customer support team via email at support@chelsio.com or check our website
 at http://www.chelsio.com

===============================================================================

 Chelsio Communications
 370 San Aleso Ave.
 Suite 100
 Sunnyvale, CA 94085
 http://www.chelsio.com

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License, version 2, as
published by the Free Software Foundation.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

 Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.

===============================================================================
+19 −0
Original line number Diff line number Diff line
@@ -2080,6 +2080,25 @@ endmenu
menu "Ethernet (10000 Mbit)"
	depends on NETDEVICES && !UML

config CHELSIO_T1
        tristate "Chelsio 10Gb Ethernet support"
        depends on PCI
        help
          This driver supports Chelsio N110 and N210 models 10Gb Ethernet
          cards. More information about adapter features and performance
          tuning is in <file:Documentation/networking/cxgb.txt>.

          For general information about Chelsio and our products, visit
          our website at <http://www.chelsio.com>.

          For customer support, please visit our customer support page at
          <http://www.chelsio.com/support.htm>.

          Please send feedback to <linux-bugs@chelsio.com>.

          To compile this driver as a module, choose M here: the module
          will be called cxgb.

config IXGB
	tristate "Intel(R) PRO/10GbE support"
	depends on PCI
+1 −0
Original line number Diff line number Diff line
@@ -9,6 +9,7 @@ endif
obj-$(CONFIG_E1000) += e1000/
obj-$(CONFIG_IBM_EMAC) += ibm_emac/
obj-$(CONFIG_IXGB) += ixgb/
obj-$(CONFIG_CHELSIO_T1) += chelsio/
obj-$(CONFIG_BONDING) += bonding/
obj-$(CONFIG_GIANFAR) += gianfar_driver.o

+12 −0
Original line number Diff line number Diff line
#
# Chelsio 10Gb NIC driver for Linux.
#

obj-$(CONFIG_CHELSIO_T1) += cxgb.o

EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/chelsio $(DEBUG_FLAGS)


cxgb-objs := cxgb2.o espi.o tp.o pm3393.o sge.o subr.o mv88x201x.o

+102 −0
Original line number Diff line number Diff line
/*****************************************************************************
 *                                                                           *
 * File: ch_ethtool.h                                                        *
 * $Revision: 1.5 $                                                          *
 * $Date: 2005/03/23 07:15:58 $                                              *
 * Description:                                                              *
 *  part of the Chelsio 10Gb Ethernet Driver.                                *
 *                                                                           *
 * This program is free software; you can redistribute it and/or modify      *
 * it under the terms of the GNU General Public License, version 2, as       *
 * published by the Free Software Foundation.                                *
 *                                                                           *
 * You should have received a copy of the GNU General Public License along   *
 * with this program; if not, write to the Free Software Foundation, Inc.,   *
 * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.                 *
 *                                                                           *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED    *
 * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF      *
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.                     *
 *                                                                           *
 * http://www.chelsio.com                                                    *
 *                                                                           *
 * Copyright (c) 2003 - 2005 Chelsio Communications, Inc.                    *
 * All rights reserved.                                                      *
 *                                                                           *
 * Maintainers: maintainers@chelsio.com                                      *
 *                                                                           *
 * Authors: Dimitrios Michailidis   <dm@chelsio.com>                         *
 *          Tina Yang               <tainay@chelsio.com>                     *
 *          Felix Marti             <felix@chelsio.com>                      *
 *          Scott Bardone           <sbardone@chelsio.com>                   *
 *          Kurt Ottaway            <kottaway@chelsio.com>                   *
 *          Frank DiMambro          <frank@chelsio.com>                      *
 *                                                                           *
 * History:                                                                  *
 *                                                                           *
 ****************************************************************************/

#ifndef __CHETHTOOL_LINUX_H__
#define __CHETHTOOL_LINUX_H__

/* TCB size in 32-bit words */
#define TCB_WORDS (TCB_SIZE / 4)

enum {
	ETHTOOL_SETREG,
	ETHTOOL_GETREG,
	ETHTOOL_SETTPI,
	ETHTOOL_GETTPI,
	ETHTOOL_DEVUP,
	ETHTOOL_GETMTUTAB,
	ETHTOOL_SETMTUTAB,
	ETHTOOL_GETMTU,
	ETHTOOL_SET_PM,
	ETHTOOL_GET_PM,
	ETHTOOL_GET_TCAM,
	ETHTOOL_SET_TCAM,
	ETHTOOL_GET_TCB,
	ETHTOOL_READ_TCAM_WORD,
};

struct ethtool_reg {
	uint32_t cmd;
	uint32_t addr;
	uint32_t val;
};

struct ethtool_mtus {
	uint32_t cmd;
	uint16_t mtus[NMTUS];
};

struct ethtool_pm {
	uint32_t cmd;
	uint32_t tx_pg_sz;
	uint32_t tx_num_pg;
	uint32_t rx_pg_sz;
	uint32_t rx_num_pg;
	uint32_t pm_total;
};

struct ethtool_tcam {
	uint32_t cmd;
	uint32_t tcam_size;
	uint32_t nservers;
	uint32_t nroutes;
};

struct ethtool_tcb {
	uint32_t cmd;
	uint32_t tcb_index;
	uint32_t tcb_data[TCB_WORDS];
};

struct ethtool_tcam_word {
	uint32_t cmd;
	uint32_t addr;
	uint32_t buf[3];
};

#define SIOCCHETHTOOL SIOCDEVPRIVATE
#endif
Loading