Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 66bb0aa0 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull second round of KVM changes from Paolo Bonzini:
 "Here are the PPC and ARM changes for KVM, which I separated because
  they had small conflicts (respectively within KVM documentation, and
  with 3.16-rc changes).  Since they were all within the subsystem, I
  took care of them.

  Stephen Rothwell reported some snags in PPC builds, but they are all
  fixed now; the latest linux-next report was clean.

  New features for ARM include:
   - KVM VGIC v2 emulation on GICv3 hardware
   - Big-Endian support for arm/arm64 (guest and host)
   - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)

  And for PPC:
   - Book3S: Good number of LE host fixes, enable HV on LE
   - Book3S HV: Add in-guest debug support

  This release drops support for KVM on the PPC440.  As a result, the
  PPC merge removes more lines than it adds.  :)

  I also included an x86 change, since Davidlohr tied it to an
  independent bug report and the reporter quickly provided a Tested-by;
  there was no reason to wait for -rc2"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
  KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
  KVM: PPC: Enable IRQFD support for the XICS interrupt controller
  KVM: Give IRQFD its own separate enabling Kconfig option
  KVM: Move irq notifier implementation into eventfd.c
  KVM: Move all accesses to kvm::irq_routing into irqchip.c
  KVM: irqchip: Provide and use accessors for irq routing table
  KVM: Don't keep reference to irq routing table in irqfd struct
  KVM: PPC: drop duplicate tracepoint
  arm64: KVM: fix 64bit CP15 VM access for 32bit guests
  KVM: arm64: GICv3: mandate page-aligned GICV region
  arm64: KVM: GICv3: move system register access to msr_s/mrs_s
  KVM: PPC: PR: Handle FSCR feature deselects
  KVM: PPC: HV: Remove generic instruction emulation
  KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
  KVM: PPC: Remove DCR handling
  KVM: PPC: Expose helper functions for data/inst faults
  KVM: PPC: Separate loadstore emulation from priv emulation
  KVM: PPC: Handle magic page in kvmppc_ld/st
  ...
parents e306e3be c77dcacb
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
@@ -168,6 +168,14 @@ Before jumping into the kernel, the following conditions must be met:
  the kernel image will be entered must be initialised by software at a
  higher exception level to prevent execution in an UNKNOWN state.

  For systems with a GICv3 interrupt controller:
  - If EL3 is present:
    ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
    ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1.
  - If the kernel is entered at EL1:
    ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
    ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.

The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs.  All CPUs must
enter the kernel in the same exception level.
+79 −0
Original line number Diff line number Diff line
* ARM Generic Interrupt Controller, version 3

AArch64 SMP cores are often associated with a GICv3, providing Private
Peripheral Interrupts (PPI), Shared Peripheral Interrupts (SPI),
Software Generated Interrupts (SGI), and Locality-specific Peripheral
Interrupts (LPI).

Main node required properties:

- compatible : should at least contain  "arm,gic-v3".
- interrupt-controller : Identifies the node as an interrupt controller
- #interrupt-cells : Specifies the number of cells needed to encode an
  interrupt source. Must be a single cell with a value of at least 3.

  The 1st cell is the interrupt type; 0 for SPI interrupts, 1 for PPI
  interrupts. Other values are reserved for future use.

  The 2nd cell contains the interrupt number for the interrupt type.
  SPI interrupts are in the range [0-987]. PPI interrupts are in the
  range [0-15].

  The 3rd cell is the flags, encoded as follows:
	bits[3:0] trigger type and level flags.
		1 = edge triggered
		4 = level triggered

  Cells 4 and beyond are reserved for future use. When the 1st cell
  has a value of 0 or 1, cells 4 and beyond act as padding, and may be
  ignored. It is recommended that padding cells have a value of 0.

- reg : Specifies base physical address(s) and size of the GIC
  registers, in the following order:
  - GIC Distributor interface (GICD)
  - GIC Redistributors (GICR), one range per redistributor region
  - GIC CPU interface (GICC)
  - GIC Hypervisor interface (GICH)
  - GIC Virtual CPU interface (GICV)

  GICC, GICH and GICV are optional.

- interrupts : Interrupt source of the VGIC maintenance interrupt.

Optional

- redistributor-stride : If using padding pages, specifies the stride
  of consecutive redistributors. Must be a multiple of 64kB.

- #redistributor-regions: The number of independent contiguous regions
  occupied by the redistributors. Required if more than one such
  region is present.

Examples:

	gic: interrupt-controller@2cf00000 {
		compatible = "arm,gic-v3";
		#interrupt-cells = <3>;
		interrupt-controller;
		reg = <0x0 0x2f000000 0 0x10000>,	// GICD
		      <0x0 0x2f100000 0 0x200000>,	// GICR
		      <0x0 0x2c000000 0 0x2000>,	// GICC
		      <0x0 0x2c010000 0 0x2000>,	// GICH
		      <0x0 0x2c020000 0 0x2000>;	// GICV
		interrupts = <1 9 4>;
	};

	gic: interrupt-controller@2c010000 {
		compatible = "arm,gic-v3";
		#interrupt-cells = <3>;
		interrupt-controller;
		redistributor-stride = <0x0 0x40000>;	// 256kB stride
		#redistributor-regions = <2>;
		reg = <0x0 0x2c010000 0 0x10000>,	// GICD
		      <0x0 0x2d000000 0 0x800000>,	// GICR 1: CPUs 0-31
		      <0x0 0x2e000000 0 0x800000>;	// GICR 2: CPUs 32-63
		      <0x0 0x2c040000 0 0x2000>,	// GICC
		      <0x0 0x2c060000 0 0x2000>,	// GICH
		      <0x0 0x2c080000 0 0x2000>;	// GICV
		interrupts = <1 9 4>;
	};
+0 −2
Original line number Diff line number Diff line
@@ -17,8 +17,6 @@ firmware-assisted-dump.txt
	- Documentation on the firmware assisted dump mechanism "fadump".
hvcs.txt
	- IBM "Hypervisor Virtual Console Server" Installation Guide
kvm_440.txt
	- Various notes on the implementation of KVM for PowerPC 440.
mpc52xx.txt
	- Linux 2.6.x on MPC52xx family
pmu-ebb.txt

Documentation/powerpc/kvm_440.txt

deleted100644 → 0
+0 −41
Original line number Diff line number Diff line
Hollis Blanchard <hollisb@us.ibm.com>
15 Apr 2008

Various notes on the implementation of KVM for PowerPC 440:

To enforce isolation, host userspace, guest kernel, and guest userspace all
run at user privilege level. Only the host kernel runs in supervisor mode.
Executing privileged instructions in the guest traps into KVM (in the host
kernel), where we decode and emulate them. Through this technique, unmodified
440 Linux kernels can be run (slowly) as guests. Future performance work will
focus on reducing the overhead and frequency of these traps.

The usual code flow is started from userspace invoking an "run" ioctl, which
causes KVM to switch into guest context. We use IVPR to hijack the host
interrupt vectors while running the guest, which allows us to direct all
interrupts to kvmppc_handle_interrupt(). At this point, we could either
- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
- let the host interrupt handler run (e.g. when the decrementer fires), or
- return to host userspace (e.g. when the guest performs device MMIO)

Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
address space (in host or guest), which gives us virtual address space to use
for guest mappings. While the guest is running, the host kernel remains mapped
in AS=0, but the guest can only use AS=1 mappings.

TLB entries: The TLB entries covering the host linear mapping remain
present while running the guest. This reduces the overhead of lightweight
exits, which are handled by KVM running in the host kernel. We keep three
copies of the TLB:
 - guest TLB: contents of the TLB as the guest sees it
 - shadow TLB: the TLB that is actually in hardware while guest is running
 - host TLB: to restore TLB state when context switching guest -> host
When a TLB miss occurs because a mapping was not present in the shadow TLB,
but was present in the guest TLB, KVM handles the fault without invoking the
guest. Large guest pages are backed by multiple 4KB shadow pages through this
mechanism.

IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
and block IO, so those drivers must be enabled in the guest. It's possible
that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
little effort.
+52 −8
Original line number Diff line number Diff line
@@ -148,9 +148,9 @@ of banks, as set via the KVM_X86_SETUP_MCE ioctl.

4.4 KVM_CHECK_EXTENSION

Capability: basic
Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
Architectures: all
Type: system ioctl
Type: system ioctl, vm ioctl
Parameters: extension identifier (KVM_CAP_*)
Returns: 0 if unsupported; 1 (or some other positive integer) if supported

@@ -160,6 +160,9 @@ receives an integer that describes the extension availability.
Generally 0 means no and 1 means yes, but some extensions may report
additional information in the integer return value.

Based on their initialization different VMs may have different capabilities.
It is thus encouraged to use the vm ioctl to query for capabilities (available
with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)

4.5 KVM_GET_VCPU_MMAP_SIZE

@@ -1892,7 +1895,8 @@ registers, find a list below:
  PPC   | KVM_REG_PPC_PID               | 64
  PPC   | KVM_REG_PPC_ACOP              | 64
  PPC   | KVM_REG_PPC_VRSAVE            | 32
  PPC   | KVM_REG_PPC_LPCR              | 64
  PPC   | KVM_REG_PPC_LPCR              | 32
  PPC   | KVM_REG_PPC_LPCR_64           | 64
  PPC   | KVM_REG_PPC_PPR               | 64
  PPC   | KVM_REG_PPC_ARCH_COMPAT       | 32
  PPC   | KVM_REG_PPC_DABRX             | 32
@@ -2677,8 +2681,8 @@ The 'data' member contains, in its first 'len' bytes, the value as it would
appear if the VCPU performed a load or store of the appropriate width directly
to the byte array.

NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
      KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
      KVM_EXIT_EPR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN.  The kernel side will first finish
incomplete operations and then check for pending signals.  Userspace
@@ -2749,7 +2753,7 @@ Principles of Operation Book in the Chapter for Dynamic Address Translation
			__u8  is_write;
		} dcr;

powerpc specific.
Deprecated - was used for 440 KVM.

		/* KVM_EXIT_OSI */
		struct {
@@ -2931,8 +2935,8 @@ The fields in each entry are defined as follows:
         this function/index combination


6. Capabilities that can be enabled
-----------------------------------
6. Capabilities that can be enabled on vCPUs
--------------------------------------------

There are certain capabilities that change the behavior of the virtual CPU or
the virtual machine when enabled. To enable them, please see section 4.37.
@@ -3091,3 +3095,43 @@ Parameters: none

This capability enables the in-kernel irqchip for s390. Please refer to
"4.24 KVM_CREATE_IRQCHIP" for details.

7. Capabilities that can be enabled on VMs
------------------------------------------

There are certain capabilities that change the behavior of the virtual
machine when enabled. To enable them, please see section 4.37. Below
you can find a list of capabilities and what their effect on the VM
is when enabling them.

The following information is provided along with the description:

  Architectures: which instruction set architectures provide this ioctl.
      x86 includes both i386 and x86_64.

  Parameters: what parameters are accepted by the capability.

  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
      are not detailed, but errors with specific meanings are.


7.1 KVM_CAP_PPC_ENABLE_HCALL

Architectures: ppc
Parameters: args[0] is the sPAPR hcall number
	    args[1] is 0 to disable, 1 to enable in-kernel handling

This capability controls whether individual sPAPR hypercalls (hcalls)
get handled by the kernel or not.  Enabling or disabling in-kernel
handling of an hcall is effective across the VM.  On creation, an
initial set of hcalls are enabled for in-kernel handling, which
consists of those hcalls for which in-kernel handlers were implemented
before this capability was implemented.  If disabled, the kernel will
not to attempt to handle the hcall, but will always exit to userspace
to handle it.  Note that it may not make sense to enable some and
disable others of a group of related hcalls, but KVM does not prevent
userspace from doing that.

If the hcall number specified is not one that has an in-kernel
implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL
error.
Loading