Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 2ad7bf36 authored by Mahesh Bandewar's avatar Mahesh Bandewar Committed by David S. Miller
Browse files

ipvlan: Initial check-in of the IPVLAN driver.



This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.

This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.

The device operates in two different modes and the difference
in these two modes in primarily in the TX side.

(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.

RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.

(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.

RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.

The devices can be added using the "ip" command from the iproute2
package -

	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]

Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 2bbea0a8
Loading
Loading
Loading
Loading
+107 −0
Original line number Diff line number Diff line

                            IPVLAN Driver HOWTO

Initial Release:
	Mahesh Bandewar <maheshb AT google.com>

1. Introduction:
	This is conceptually very similar to the macvlan driver with one major
exception of using L3 for mux-ing /demux-ing among slaves. This property makes
the master device share the L2 with it's slave devices. I have developed this
driver in conjuntion with network namespaces and not sure if there is use case
outside of it.


2. Building and Installation:
	In order to build the driver, please select the config item CONFIG_IPVLAN.
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
(CONFIG_IPVLAN=m).


3. Configuration:
	There are no module parameters for this driver and it can be configured
using IProute2/ip utility.

	ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }

	e.g. ip link add link ipvl0 eth0 type ipvlan mode l2


4. Operating modes:
	IPvlan has two modes of operation - L2 and L3. For a given master device,
you can select one of these two modes and all slaves on that master will
operate in the same (selected) mode. The RX mode is almost identical except
that in L3 mode the slaves wont receive any multicast / broadcast traffic.
L3 mode is more restrictive since routing is controlled from the other (mostly)
default namespace.

4.1 L2 mode:
	In this mode TX processing happens on the stack instance attached to the
slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well.

4.2 L3 mode:
	In this mode TX processing upto L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves
will not receive nor can send multicast / broadcast traffic.


5. What to choose (macvlan vs. ipvlan)?
	These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following
situations defines your use case then you can choose to use ipvlan -
	(a) The Linux host that is connected to the external switch / router has
policy configured that allows only one mac per port.
	(b) No of virtual devices created on a master exceed the mac capacity and
puts the NIC in promiscous mode and degraded performance is a concern.
	(c) If the slave device is to be put into the hostile / untrusted network
namespace where L2 on the slave could be changed / misused.


6. Example configuration:

  +=============================================================+
  |  Host: host1                                                |
  |                                                             |
  |   +----------------------+      +----------------------+    |
  |   |   NS:ns0             |      |  NS:ns1              |    |
  |   |                      |      |                      |    |
  |   |                      |      |                      |    |
  |   |        ipvl0         |      |         ipvl1        |    |
  |   +----------#-----------+      +-----------#----------+    |
  |              #                              #               |
  |              ################################               |
  |                              # eth0                         |
  +==============================#==============================+


	(a) Create two network namespaces - ns0, ns1
		ip netns add ns0
		ip netns add ns1

	(b) Create two ipvlan slaves on eth0 (master device)
		ip link add link eth0 ipvl0 type ipvlan mode l2
		ip link add link eth0 ipvl1 type ipvlan mode l2

	(c) Assign slaves to the respective network namespaces
		ip link set dev ipvl0 netns ns0
		ip link set dev ipvl1 netns ns1

	(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
		- For ns0
			(1) ip netns exec ns0 bash
			(2) ip link set dev ipvl0 up
			(3) ip link set dev lo up
			(4) ip -4 addr add 127.0.0.1 dev lo
			(5) ip -4 addr add $IPADDR dev ipvl0
			(6) ip -4 route add default via $ROUTER dev ipvl0
		- For ns1
			(1) ip netns exec ns1 bash
			(2) ip link set dev ipvl1 up
			(3) ip link set dev lo up
			(4) ip -4 addr add 127.0.0.1 dev lo
			(5) ip -4 addr add $IPADDR dev ipvl1
			(6) ip -4 route add default via $ROUTER dev ipvl1
+18 −0
Original line number Diff line number Diff line
@@ -145,6 +145,24 @@ config MACVTAP
	  To compile this driver as a module, choose M here: the module
	  will be called macvtap.


config IPVLAN
    tristate "IP-VLAN support"
    ---help---
      This allows one to create virtual devices off of a main interface
      and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
      on packets. All interfaces (including the main interface) share L2
      making it transparent to the connected L2 switch.

      Ipvlan devices can be added using the "ip" command from the
      iproute2 package starting with the iproute2-X.Y.ZZ release:

      "ip link add link <main-dev> [ NAME ] type ipvlan"

      To compile this driver as a module, choose M here: the module
      will be called ipvlan.


config VXLAN
       tristate "Virtual eXtensible Local Area Network (VXLAN)"
       depends on INET
+1 −0
Original line number Diff line number Diff line
@@ -6,6 +6,7 @@
# Networking Core Drivers
#
obj-$(CONFIG_BONDING) += bonding/
obj-$(CONFIG_IPVLAN) += ipvlan/
obj-$(CONFIG_DUMMY) += dummy.o
obj-$(CONFIG_EQUALIZER) += eql.o
obj-$(CONFIG_IFB) += ifb.o
+7 −0
Original line number Diff line number Diff line
#
# Makefile for the Ethernet Ipvlan driver
#

obj-$(CONFIG_IPVLAN) += ipvlan.o

ipvlan-objs := ipvlan_core.o ipvlan_main.o
+130 −0
Original line number Diff line number Diff line
/*
 * Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License as
 * published by the Free Software Foundation; either version 2 of
 * the License, or (at your option) any later version.
 *
 */
#ifndef __IPVLAN_H
#define __IPVLAN_H

#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/rculist.h>
#include <linux/notifier.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/if_arp.h>
#include <linux/if_link.h>
#include <linux/if_vlan.h>
#include <linux/ip.h>
#include <linux/inetdevice.h>
#include <net/rtnetlink.h>
#include <net/gre.h>
#include <net/route.h>
#include <net/addrconf.h>

#define IPVLAN_DRV	"ipvlan"
#define IPV_DRV_VER	"0.1"

#define IPVLAN_HASH_SIZE	(1 << BITS_PER_BYTE)
#define IPVLAN_HASH_MASK	(IPVLAN_HASH_SIZE - 1)

#define IPVLAN_MAC_FILTER_BITS	8
#define IPVLAN_MAC_FILTER_SIZE	(1 << IPVLAN_MAC_FILTER_BITS)
#define IPVLAN_MAC_FILTER_MASK	(IPVLAN_MAC_FILTER_SIZE - 1)

typedef enum {
	IPVL_IPV6 = 0,
	IPVL_ICMPV6,
	IPVL_IPV4,
	IPVL_ARP,
} ipvl_hdr_type;

struct ipvl_pcpu_stats {
	u64			rx_pkts;
	u64			rx_bytes;
	u64			rx_mcast;
	u64			tx_pkts;
	u64			tx_bytes;
	struct u64_stats_sync	syncp;
	u32			rx_errs;
	u32			tx_drps;
};

struct ipvl_port;

struct ipvl_dev {
	struct net_device	*dev;
	struct list_head	pnode;
	struct ipvl_port	*port;
	struct net_device	*phy_dev;
	struct list_head	addrs;
	int			ipv4cnt;
	int			ipv6cnt;
	struct ipvl_pcpu_stats	*pcpu_stats;
	DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
	netdev_features_t	sfeatures;
	u32			msg_enable;
	u16			mtu_adj;
};

struct ipvl_addr {
	struct ipvl_dev		*master; /* Back pointer to master */
	union {
		struct in6_addr	ip6;	 /* IPv6 address on logical interface */
		struct in_addr	ip4;	 /* IPv4 address on logical interface */
	} ipu;
#define ip6addr	ipu.ip6
#define ip4addr ipu.ip4
	struct hlist_node	hlnode;  /* Hash-table linkage */
	struct list_head	anode;   /* logical-interface linkage */
	struct rcu_head		rcu;
	ipvl_hdr_type		atype;
};

struct ipvl_port {
	struct net_device	*dev;
	struct hlist_head	hlhead[IPVLAN_HASH_SIZE];
	struct list_head	ipvlans;
	struct rcu_head		rcu;
	int			count;
	u16			mode;
};

static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
{
	return rcu_dereference(d->rx_handler_data);
}

static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d)
{
	return rtnl_dereference(d->rx_handler_data);
}

static inline bool ipvlan_dev_master(struct net_device *d)
{
	return d->priv_flags & IFF_IPVLAN_MASTER;
}

static inline bool ipvlan_dev_slave(struct net_device *d)
{
	return d->priv_flags & IFF_IPVLAN_SLAVE;
}

void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev);
void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval);
void ipvlan_init_secret(void);
unsigned int ipvlan_mac_hash(const unsigned char *addr);
rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb);
int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev);
void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr);
bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
					const void *iaddr, bool is_v6);
void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
#endif /* __IPVLAN_H */
Loading