Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit b68e7e95 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull s390 updates from Martin Schwidefsky:

 - three merges for KVM/s390 with changes for vfio-ccw and cpacf. The
   patches are included in the KVM tree as well, let git sort it out.

 - add the new 'trng' random number generator

 - provide the secure key verification API for the pkey interface

 - introduce the z13 cpu counters to perf

 - add a new system call to set up the guarded storage facility

 - simplify TASK_SIZE and arch_get_unmapped_area

 - export the raw STSI data related to CPU topology to user space

 - ... and the usual churn of bug-fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (74 commits)
  s390/crypt: use the correct module alias for paes_s390.
  s390/cpacf: Introduce kma instruction
  s390/cpacf: query instructions use unique parameters for compatibility with KMA
  s390/trng: Introduce s390 TRNG device driver.
  s390/crypto: Provide s390 specific arch random functionality.
  s390/crypto: Add new subfunctions to the cpacf PRNO function.
  s390/crypto: Renaming PPNO to PRNO.
  s390/pageattr: avoid unnecessary page table splitting
  s390/mm: simplify arch_get_unmapped_area[_topdown]
  s390/mm: make TASK_SIZE independent from the number of page table levels
  s390/gs: add regset for the guarded storage broadcast control block
  s390/kvm: Add use_cmma field to mm_context_t
  s390/kvm: Add PGSTE manipulation functions
  vfio: ccw: improve error handling for vfio_ccw_mdev_remove
  vfio: ccw: remove unnecessary NULL checks of a pointer
  s390/spinlock: remove compare and delay instruction
  s390/spinlock: use atomic primitives for spinlocks
  s390/cpumf: simplify detection of guest samples
  s390/pci: remove forward declaration
  s390/pci: increase the PCI_NR_FUNCTIONS default
  ...
parents d3b5d352 d0790fb6
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -22,5 +22,7 @@ qeth.txt
	- HiperSockets Bridge Port Support.
s390dbf.txt
	- information on using the s390 debug feature.
vfio-ccw.txt
	  information on the vfio-ccw I/O subchannel driver.
zfcpdump.txt
	- information on the s390 SCSI dump tool.
+303 −0
Original line number Diff line number Diff line
vfio-ccw: the basic infrastructure
==================================

Introduction
------------

Here we describe the vfio support for I/O subchannel devices for
Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
virtual machine, while vfio is the means.

Different than other hardware architectures, s390 has defined a unified
I/O access method, which is so called Channel I/O. It has its own access
patterns:
- Channel programs run asynchronously on a separate (co)processor.
- The channel subsystem will access any memory designated by the caller
  in the channel program directly, i.e. there is no iommu involved.
Thus when we introduce vfio support for these devices, we realize it
with a mediated device (mdev) implementation. The vfio mdev will be
added to an iommu group, so as to make itself able to be managed by the
vfio framework. And we add read/write callbacks for special vfio I/O
regions to pass the channel programs from the mdev to its parent device
(the real I/O subchannel device) to do further address translation and
to perform I/O instructions.

This document does not intend to explain the s390 I/O architecture in
every detail. More information/reference could be found here:
- A good start to know Channel I/O in general:
  https://en.wikipedia.org/wiki/Channel_I/O
- s390 architecture:
  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
- The existing Qemu code which implements a simple emulated channel
  subsystem could also be a good reference. It makes it easier to follow
  the flow.
  qemu/hw/s390x/css.c

For vfio mediated device framework:
- Documentation/vfio-mediated-device.txt

Motivation of vfio-ccw
----------------------

Currently, a guest virtualized via qemu/kvm on s390 only sees
paravirtualized virtio devices via the "Virtio Over Channel I/O
(virtio-ccw)" transport. This makes virtio devices discoverable via
standard operating system algorithms for handling channel devices.

However this is not enough. On s390 for the majority of devices, which
use the standard Channel I/O based mechanism, we also need to provide
the functionality of passing through them to a Qemu virtual machine.
This includes devices that don't have a virtio counterpart (e.g. tape
drives) or that have specific characteristics which guests want to
exploit.

For passing a device to a guest, we want to use the same interface as
everybody else, namely vfio. Thus, we would like to introduce vfio
support for channel devices. And we would like to name this new vfio
device "vfio-ccw".

Access patterns of CCW devices
------------------------------

s390 architecture has implemented a so called channel subsystem, that
provides a unified view of the devices physically attached to the
systems. Though the s390 hardware platform knows about a huge variety of
different peripheral attachments like disk devices (aka. DASDs), tapes,
communication controllers, etc. They can all be accessed by a well
defined access method and they are presenting I/O completion a unified
way: I/O interruptions.

All I/O requires the use of channel command words (CCWs). A CCW is an
instruction to a specialized I/O channel processor. A channel program is
a sequence of CCWs which are executed by the I/O channel subsystem.  To
issue a channel program to the channel subsystem, it is required to
build an operation request block (ORB), which can be used to point out
the format of the CCW and other control information to the system. The
operating system signals the I/O channel subsystem to begin executing
the channel program with a SSCH (start sub-channel) instruction. The
central processor is then free to proceed with non-I/O instructions
until interrupted. The I/O completion result is received by the
interrupt handler in the form of interrupt response block (IRB).

Back to vfio-ccw, in short:
- ORBs and channel programs are built in guest kernel (with guest
  physical addresses).
- ORBs and channel programs are passed to the host kernel.
- Host kernel translates the guest physical addresses to real addresses
  and starts the I/O with issuing a privileged Channel I/O instruction
  (e.g SSCH).
- channel programs run asynchronously on a separate processor.
- I/O completion will be signaled to the host with I/O interruptions.
  And it will be copied as IRB to user space to pass it back to the
  guest.

Physical vfio ccw device and its child mdev
-------------------------------------------

As mentioned above, we realize vfio-ccw with a mdev implementation.

Channel I/O does not have IOMMU hardware support, so the physical
vfio-ccw device does not have an IOMMU level translation or isolation.

Sub-channel I/O instructions are all privileged instructions, When
handling the I/O instruction interception, vfio-ccw has the software
policing and translation how the channel program is programmed before
it gets sent to hardware.

Within this implementation, we have two drivers for two types of
devices:
- The vfio_ccw driver for the physical subchannel device.
  This is an I/O subchannel driver for the real subchannel device.  It
  realizes a group of callbacks and registers to the mdev framework as a
  parent (physical) device. As a consequence, mdev provides vfio_ccw a
  generic interface (sysfs) to create mdev devices. A vfio mdev could be
  created by vfio_ccw then and added to the mediated bus. It is the vfio
  device that added to an IOMMU group and a vfio group.
  vfio_ccw also provides an I/O region to accept channel program
  request from user space and store I/O interrupt result for user
  space to retrieve. To notify user space an I/O completion, it offers
  an interface to setup an eventfd fd for asynchronous signaling.

- The vfio_mdev driver for the mediated vfio ccw device.
  This is provided by the mdev framework. It is a vfio device driver for
  the mdev that created by vfio_ccw.
  It realize a group of vfio device driver callbacks, adds itself to a
  vfio group, and registers itself to the mdev framework as a mdev
  driver.
  It uses a vfio iommu backend that uses the existing map and unmap
  ioctls, but rather than programming them into an IOMMU for a device,
  it simply stores the translations for use by later requests. This
  means that a device programmed in a VM with guest physical addresses
  can have the vfio kernel convert that address to process virtual
  address, pin the page and program the hardware with the host physical
  address in one step.
  For a mdev, the vfio iommu backend will not pin the pages during the
  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
  of the iova<->vaddr mappings in this operation. And they export a
  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
  backend for the physical devices to pin and unpin pages by demand.

Below is a high Level block diagram.

 +-------------+
 |             |
 | +---------+ | mdev_register_driver() +--------------+
 | |  Mdev   | +<-----------------------+              |
 | |  bus    | |                        | vfio_mdev.ko |
 | | driver  | +----------------------->+              |<-> VFIO user
 | +---------+ |    probe()/remove()    +--------------+    APIs
 |             |
 |  MDEV CORE  |
 |   MODULE    |
 |   mdev.ko   |
 | +---------+ | mdev_register_device() +--------------+
 | |Physical | +<-----------------------+              |
 | | device  | |                        |  vfio_ccw.ko |<-> subchannel
 | |interface| +----------------------->+              |     device
 | +---------+ |       callback         +--------------+
 +-------------+

The process of how these work together.
1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
   physical device (with callbacks) to mdev framework.
   When vfio_ccw probing the subchannel device, it registers device
   pointer and callbacks to the mdev framework. Mdev related file nodes
   under the device node in sysfs would be created for the subchannel
   device, namely 'mdev_create', 'mdev_destroy' and
   'mdev_supported_types'.
2. Create a mediated vfio ccw device.
   Use the 'mdev_create' sysfs file, we need to manually create one (and
   only one for our case) mediated device.
3. vfio_mdev.ko drives the mediated ccw device.
   vfio_mdev is also the vfio device drvier. It will probe the mdev and
   add it to an iommu_group and a vfio_group. Then we could pass through
   the mdev to a guest.

vfio-ccw I/O region
-------------------

An I/O region is used to accept channel program request from user
space and store I/O interrupt result for user space to retrieve. The
defination of the region is:

struct ccw_io_region {
#define ORB_AREA_SIZE 12
	__u8	orb_area[ORB_AREA_SIZE];
#define SCSW_AREA_SIZE 12
	__u8	scsw_area[SCSW_AREA_SIZE];
#define IRB_AREA_SIZE 96
	__u8	irb_area[IRB_AREA_SIZE];
	__u32	ret_code;
} __packed;

While starting an I/O request, orb_area should be filled with the
guest ORB, and scsw_area should be filled with the SCSW of the Virtual
Subchannel.

irb_area stores the I/O result.

ret_code stores a return code for each access of the region.

vfio-ccw patches overview
-------------------------

For now, our patches are rebased on the latest mdev implementation.
vfio-ccw follows what vfio-pci did on the s390 paltform and uses
vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
the code review for vfio-ccw. Note that the implementation is far from
complete yet; but we'd like to get feedback for the general
architecture.

* CCW translation APIs
- Description:
  These introduce a group of APIs (start with 'cp_') to do CCW
  translation. The CCWs passed in by a user space program are
  organized with their guest physical memory addresses. These APIs
  will copy the CCWs into the kernel space, and assemble a runnable
  kernel channel program by updating the guest physical addresses with
  their corresponding host physical addresses.
- Patches:
  vfio: ccw: introduce channel program interfaces

* vfio_ccw device driver
- Description:
  The following patches utilizes the CCW translation APIs and introduce
  vfio_ccw, which is the driver for the I/O subchannel devices you want
  to pass through.
  vfio_ccw implements the following vfio ioctls:
    VFIO_DEVICE_GET_INFO
    VFIO_DEVICE_GET_IRQ_INFO
    VFIO_DEVICE_GET_REGION_INFO
    VFIO_DEVICE_RESET
    VFIO_DEVICE_SET_IRQS
  This provides an I/O region, so that the user space program can pass a
  channel program to the kernel, to do further CCW translation before
  issuing them to a real device.
  This also provides the SET_IRQ ioctl to setup an event notifier to
  notify the user space program the I/O completion in an asynchronous
  way.
- Patches:
  vfio: ccw: basic implementation for vfio_ccw driver
  vfio: ccw: introduce ccw_io_region
  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls

The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
good example to get understand how these patches work. Here is a little
bit more detail how an I/O request triggered by the Qemu guest will be
handled (without error handling).

Explanation:
Q1-Q7: Qemu side process.
K1-K5: Kernel side process.

Q1. Get I/O region info during initialization.
Q2. Setup event notifier and handler to handle I/O completion.

... ...

Q3. Intercept a ssch instruction.
Q4. Write the guest channel program and ORB to the I/O region.
    K1. Copy from guest to kernel.
    K2. Translate the guest channel program to a host kernel space
        channel program, which becomes runnable for a real device.
    K3. With the necessary information contained in the orb passed in
        by Qemu, issue the ccwchain to the device.
    K4. Return the ssch CC code.
Q5. Return the CC code to the guest.

... ...

    K5. Interrupt handler gets the I/O result and write the result to
        the I/O region.
    K6. Signal Qemu to retrieve the result.
Q6. Get the signal and event handler reads out the result from the I/O
    region.
Q7. Update the irb for the guest.

Limitations
-----------

The current vfio-ccw implementation focuses on supporting basic commands
needed to implement block device functionality (read/write) of DASD/ECKD
device only. Some commands may need special handling in the future, for
example, anything related to path grouping.

DASD is a kind of storage device. While ECKD is a data recording format.
More information for DASD and ECKD could be found here:
https://en.wikipedia.org/wiki/Direct-access_storage_device
https://en.wikipedia.org/wiki/Count_key_data

Together with the corresponding work in Qemu, we can bring the passed
through DASD/ECKD device online in a guest now and use it as a block
device.

Reference
---------
1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
3. https://en.wikipedia.org/wiki/Channel_I/O
4. Documentation/s390/cds.txt
5. Documentation/vfio.txt
6. Documentation/vfio-mediated-device.txt
+11 −0
Original line number Diff line number Diff line
@@ -7201,6 +7201,7 @@ S: Supported
F:	Documentation/s390/kvm.txt
F:	arch/s390/include/asm/kvm*
F:	arch/s390/kvm/
F:	arch/s390/mm/gmap.c

KERNEL VIRTUAL MACHINE (KVM) FOR ARM
M:	Christoffer Dall <christoffer.dall@linaro.org>
@@ -10896,6 +10897,16 @@ W: http://www.ibm.com/developerworks/linux/linux390/
S:	Supported
F:	drivers/iommu/s390-iommu.c

S390 VFIO-CCW DRIVER
M:	Cornelia Huck <cornelia.huck@de.ibm.com>
M:	Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
L:	linux-s390@vger.kernel.org
L:	kvm@vger.kernel.org
S:	Supported
F:	drivers/s390/cio/vfio_ccw*
F:	Documentation/s390/vfio-ccw.txt
F:	include/uapi/linux/vfio_ccw.h

S3C24XX SD/MMC Driver
M:	Ben Dooks <ben-linux@fluff.org>
L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+1 −1
Original line number Diff line number Diff line
obj-y				+= kernel/
obj-y				+= mm/
obj-$(CONFIG_KVM)		+= kvm/
obj-$(CONFIG_CRYPTO_HW)		+= crypto/
obj-y				+= crypto/
obj-$(CONFIG_S390_HYPFS_FS)	+= hypfs/
obj-$(CONFIG_APPLDATA_BASE)	+= appldata/
obj-y				+= net/
+37 −2
Original line number Diff line number Diff line
@@ -105,6 +105,7 @@ config S390
	select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
	select ARCH_SAVE_PAGE_KEYS if HIBERNATION
	select ARCH_SUPPORTS_ATOMIC_RMW
	select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT
	select ARCH_SUPPORTS_NUMA_BALANCING
	select ARCH_USE_BUILTIN_BSWAP
	select ARCH_USE_CMPXCHG_LOCKREF
@@ -123,7 +124,6 @@ config S390
	select GENERIC_TIME_VSYSCALL
	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
	select HAVE_ARCH_AUDITSYSCALL
	select HAVE_ARCH_EARLY_PFN_TO_NID
	select HAVE_ARCH_JUMP_LABEL
	select CPU_NO_EFFICIENT_FFS if !HAVE_MARCH_Z9_109_FEATURES
	select HAVE_ARCH_SECCOMP_FILTER
@@ -506,6 +506,21 @@ source kernel/Kconfig.preempt

source kernel/Kconfig.hz

config ARCH_RANDOM
	def_bool y
	prompt "s390 architectural random number generation API"
	help
	  Enable the s390 architectural random number generation API
	  to provide random data for all consumers within the Linux
	  kernel.

	  When enabled the arch_random_* functions declared in linux/random.h
	  are implemented. The implementation is based on the s390 CPACF
	  instruction subfunction TRNG which provides a real true random
	  number generator.

	  If unsure, say Y.

endmenu

menu "Memory setup"
@@ -536,6 +551,16 @@ config FORCE_MAX_ZONEORDER

source "mm/Kconfig"

config MAX_PHYSMEM_BITS
	int "Maximum size of supported physical memory in bits (42-53)"
	range 42 53
	default "46"
	help
	  This option specifies the maximum supported size of physical memory
	  in bits. Supported is any size between 2^42 (4TB) and 2^53 (8PB).
	  Increasing the number of bits also increases the kernel image size.
	  By default 46 bits (64TB) are supported.

config PACK_STACK
	def_bool y
	prompt "Pack kernel stack"
@@ -613,7 +638,7 @@ if PCI
config PCI_NR_FUNCTIONS
	int "Maximum number of PCI functions (1-4096)"
	range 1 4096
	default "64"
	default "128"
	help
	  This allows you to specify the maximum number of PCI functions which
	  this kernel will support.
@@ -671,6 +696,16 @@ config EADM_SCH
	  To compile this driver as a module, choose M here: the
	  module will be called eadm_sch.

config VFIO_CCW
	def_tristate n
	prompt "Support for VFIO-CCW subchannels"
	depends on S390_CCW_IOMMU && VFIO_MDEV
	help
	  This driver allows usage of I/O subchannels via VFIO-CCW.

	  To compile this driver as a module, choose M here: the
	  module will be called vfio_ccw.

endmenu

menu "Dump support"
Loading