Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit fe489bf4 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull KVM fixes from Paolo Bonzini:
 "On the x86 side, there are some optimizations and documentation
  updates.  The big ARM/KVM change for 3.11, support for AArch64, will
  come through Catalin Marinas's tree.  s390 and PPC have misc cleanups
  and bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
  KVM: PPC: Ignore PIR writes
  KVM: PPC: Book3S PR: Invalidate SLB entries properly
  KVM: PPC: Book3S PR: Allow guest to use 1TB segments
  KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
  KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
  KVM: PPC: Book3S PR: Fix proto-VSID calculations
  KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
  KVM: Fix RTC interrupt coalescing tracking
  kvm: Add a tracepoint write_tsc_offset
  KVM: MMU: Inform users of mmio generation wraparound
  KVM: MMU: document fast invalidate all mmio sptes
  KVM: MMU: document fast invalidate all pages
  KVM: MMU: document fast page fault
  KVM: MMU: document mmio page fault
  KVM: MMU: document write_flooding_count
  KVM: MMU: document clear_spte_count
  KVM: MMU: drop kvm_mmu_zap_mmio_sptes
  KVM: MMU: init kvm generation close to mmio wrap-around value
  KVM: MMU: add tracepoint for check_mmio_spte
  KVM: MMU: fast invalidate all mmio sptes
  ...
parents 3e34131a a3ff5fbc
Loading
Loading
Loading
Loading
+4 −4
Original line number Diff line number Diff line
@@ -2278,7 +2278,7 @@ return indicates the attribute is implemented. It does not necessarily
indicate that the attribute can be read or written in the device's
current state.  "addr" is ignored.

4.77 KVM_ARM_VCPU_INIT
4.82 KVM_ARM_VCPU_INIT

Capability: basic
Architectures: arm, arm64
@@ -2304,7 +2304,7 @@ Possible features:
	  Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).


4.78 KVM_GET_REG_LIST
4.83 KVM_GET_REG_LIST

Capability: basic
Architectures: arm, arm64
@@ -2324,7 +2324,7 @@ This ioctl returns the guest registers that are supported for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.


4.80 KVM_ARM_SET_DEVICE_ADDR
4.84 KVM_ARM_SET_DEVICE_ADDR

Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
Architectures: arm, arm64
@@ -2362,7 +2362,7 @@ must be called after calling KVM_CREATE_IRQCHIP, but before calling
KVM_RUN on any of the VCPUs.  Calling this ioctl twice for any of the
base addresses will return -EEXIST.

4.82 KVM_PPC_RTAS_DEFINE_TOKEN
4.85 KVM_PPC_RTAS_DEFINE_TOKEN

Capability: KVM_CAP_PPC_RTAS
Architectures: ppc
+83 −8
Original line number Diff line number Diff line
@@ -191,12 +191,12 @@ Shadow pages contain the following information:
    A counter keeping track of how many hardware registers (guest cr3 or
    pdptrs) are now pointing at the page.  While this counter is nonzero, the
    page cannot be destroyed.  See role.invalid.
  multimapped:
    Whether there exist multiple sptes pointing at this page.
  parent_pte/parent_ptes:
    If multimapped is zero, parent_pte points at the single spte that points at
    this page's spt.  Otherwise, parent_ptes points at a data structure
    with a list of parent_ptes.
  parent_ptes:
    The reverse mapping for the pte/ptes pointing at this page's spt. If
    parent_ptes bit 0 is zero, only one spte points at this pages and
    parent_ptes points at this single spte, otherwise, there exists multiple
    sptes pointing at this page and (parent_ptes & ~0x1) points at a data
    structure with a list of parent_ptes.
  unsync:
    If true, then the translations in this page may not match the guest's
    translation.  This is equivalent to the state of the tlb when a pte is
@@ -210,6 +210,24 @@ Shadow pages contain the following information:
    A bitmap indicating which sptes in spt point (directly or indirectly) at
    pages that may be unsynchronized.  Used to quickly locate all unsychronized
    pages reachable from a given page.
  mmu_valid_gen:
    Generation number of the page.  It is compared with kvm->arch.mmu_valid_gen
    during hash table lookup, and used to skip invalidated shadow pages (see
    "Zapping all pages" below.)
  clear_spte_count:
    Only present on 32-bit hosts, where a 64-bit spte cannot be written
    atomically.  The reader uses this while running out of the MMU lock
    to detect in-progress updates and retry them until the writer has
    finished the write.
  write_flooding_count:
    A guest may write to a page table many times, causing a lot of
    emulations if the page needs to be write-protected (see "Synchronized
    and unsynchronized pages" below).  Leaf pages can be unsynchronized
    so that they do not trigger frequent emulation, but this is not
    possible for non-leafs.  This field counts the number of emulations
    since the last time the page table was actually used; if emulation
    is triggered too frequently on this page, KVM will unmap the page
    to avoid emulation in the future.

Reverse map
===========
@@ -258,14 +276,26 @@ This is the most complicated event. The cause of a page fault can be:

Handling a page fault is performed as follows:

 - if the RSV bit of the error code is set, the page fault is caused by guest
   accessing MMIO and cached MMIO information is available.
   - walk shadow page table
   - check for valid generation number in the spte (see "Fast invalidation of
     MMIO sptes" below)
   - cache the information to vcpu->arch.mmio_gva, vcpu->arch.access and
     vcpu->arch.mmio_gfn, and call the emulator
 - If both P bit and R/W bit of error code are set, this could possibly
   be handled as a "fast page fault" (fixed without taking the MMU lock).  See
   the description in Documentation/virtual/kvm/locking.txt.
 - if needed, walk the guest page tables to determine the guest translation
   (gva->gpa or ngpa->gpa)
   - if permissions are insufficient, reflect the fault back to the guest
 - determine the host page
   - if this is an mmio request, there is no host page; call the emulator
     to emulate the instruction instead
   - if this is an mmio request, there is no host page; cache the info to
     vcpu->arch.mmio_gva, vcpu->arch.access and vcpu->arch.mmio_gfn
 - walk the shadow page table to find the spte for the translation,
   instantiating missing intermediate page tables as necessary
   - If this is an mmio request, cache the mmio info to the spte and set some
     reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask)
 - try to unsynchronize the page
   - if successful, we can let the guest continue and modify the gpte
 - emulate the instruction
@@ -351,6 +381,51 @@ causes its write_count to be incremented, thus preventing instantiation of
a large spte.  The frames at the end of an unaligned memory slot have
artificially inflated ->write_counts so they can never be instantiated.

Zapping all pages (page generation count)
=========================================

For the large memory guests, walking and zapping all pages is really slow
(because there are a lot of pages), and also blocks memory accesses of
all VCPUs because it needs to hold the MMU lock.

To make it be more scalable, kvm maintains a global generation number
which is stored in kvm->arch.mmu_valid_gen.  Every shadow page stores
the current global generation-number into sp->mmu_valid_gen when it
is created.  Pages with a mismatching generation number are "obsolete".

When KVM need zap all shadow pages sptes, it just simply increases the global
generation-number then reload root shadow pages on all vcpus.  As the VCPUs
create new shadow page tables, the old pages are not used because of the
mismatching generation number.

KVM then walks through all pages and zaps obsolete pages.  While the zap
operation needs to take the MMU lock, the lock can be released periodically
so that the VCPUs can make progress.

Fast invalidation of MMIO sptes
===============================

As mentioned in "Reaction to events" above, kvm will cache MMIO
information in leaf sptes.  When a new memslot is added or an existing
memslot is changed, this information may become stale and needs to be
invalidated.  This also needs to hold the MMU lock while walking all
shadow pages, and is made more scalable with a similar technique.

MMIO sptes have a few spare bits, which are used to store a
generation number.  The global generation number is stored in
kvm_memslots(kvm)->generation, and increased whenever guest memory info
changes.  This generation number is distinct from the one described in
the previous section.

When KVM finds an MMIO spte, it checks the generation number of the spte.
If the generation number of the spte does not equal the global generation
number, it will ignore the cached MMIO information and handle the page
fault through the slow path.

Since only 19 bits are used to store generation-number on mmio spte, all
pages are zapped when there is an overflow.


Further reading
===============

+2 −2
Original line number Diff line number Diff line
@@ -4733,10 +4733,10 @@ F: arch/s390/kvm/
F:	drivers/s390/kvm/

KERNEL VIRTUAL MACHINE (KVM) FOR ARM
M:	Christoffer Dall <cdall@cs.columbia.edu>
M:	Christoffer Dall <christoffer.dall@linaro.org>
L:	kvmarm@lists.cs.columbia.edu
W:	http://systems.cs.columbia.edu/projects/kvm-arm
S:	Maintained
S:	Supported
F:	arch/arm/include/uapi/asm/kvm*
F:	arch/arm/include/asm/kvm*
F:	arch/arm/kvm/
+0 −1
Original line number Diff line number Diff line
@@ -135,7 +135,6 @@
#define KVM_PHYS_MASK	(KVM_PHYS_SIZE - 1ULL)
#define PTRS_PER_S2_PGD	(1ULL << (KVM_PHYS_SHIFT - 30))
#define S2_PGD_ORDER	get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
#define S2_PGD_SIZE	(1 << S2_PGD_ORDER)

/* Virtualization Translation Control Register (VTCR) bits */
#define VTCR_SH0	(3 << 12)
+12 −12
Original line number Diff line number Diff line
@@ -37,16 +37,18 @@
#define c5_AIFSR	15	/* Auxilary Instrunction Fault Status R */
#define c6_DFAR		16	/* Data Fault Address Register */
#define c6_IFAR		17	/* Instruction Fault Address Register */
#define c9_L2CTLR	18	/* Cortex A15 L2 Control Register */
#define c10_PRRR	19	/* Primary Region Remap Register */
#define c10_NMRR	20	/* Normal Memory Remap Register */
#define c12_VBAR	21	/* Vector Base Address Register */
#define c13_CID		22	/* Context ID Register */
#define c13_TID_URW	23	/* Thread ID, User R/W */
#define c13_TID_URO	24	/* Thread ID, User R/O */
#define c13_TID_PRIV	25	/* Thread ID, Privileged */
#define c14_CNTKCTL	26	/* Timer Control Register (PL1) */
#define NR_CP15_REGS	27	/* Number of regs (incl. invalid) */
#define c7_PAR		18	/* Physical Address Register */
#define c7_PAR_high	19	/* PAR top 32 bits */
#define c9_L2CTLR	20	/* Cortex A15 L2 Control Register */
#define c10_PRRR	21	/* Primary Region Remap Register */
#define c10_NMRR	22	/* Normal Memory Remap Register */
#define c12_VBAR	23	/* Vector Base Address Register */
#define c13_CID		24	/* Context ID Register */
#define c13_TID_URW	25	/* Thread ID, User R/W */
#define c13_TID_URO	26	/* Thread ID, User R/O */
#define c13_TID_PRIV	27	/* Thread ID, Privileged */
#define c14_CNTKCTL	28	/* Timer Control Register (PL1) */
#define NR_CP15_REGS	29	/* Number of regs (incl. invalid) */

#define ARM_EXCEPTION_RESET	  0
#define ARM_EXCEPTION_UNDEFINED   1
@@ -72,8 +74,6 @@ extern char __kvm_hyp_vector[];
extern char __kvm_hyp_code_start[];
extern char __kvm_hyp_code_end[];

extern void __kvm_tlb_flush_vmid(struct kvm *kvm);

extern void __kvm_flush_vm_context(void);
extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);

Loading