Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm (fe489bf4) · Commits · e / devices / android_kernel_oneplus_sm7250

Documentation/virtual/kvm/api.txt

+4 −4

Original line number	Diff line number	Diff line
		@@ -2278,7 +2278,7 @@ return indicates the attribute is implemented. It does not necessarily
		indicate that the attribute can be read or written in the device's
		current state. "addr" is ignored.

		4.77 KVM_ARM_VCPU_INIT
		4.82 KVM_ARM_VCPU_INIT

		Capability: basic
		Architectures: arm, arm64
		@@ -2304,7 +2304,7 @@ Possible features:
		Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).


		4.78 KVM_GET_REG_LIST
		4.83 KVM_GET_REG_LIST

		Capability: basic
		Architectures: arm, arm64
		@@ -2324,7 +2324,7 @@ This ioctl returns the guest registers that are supported for the
		KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.


		4.80 KVM_ARM_SET_DEVICE_ADDR
		4.84 KVM_ARM_SET_DEVICE_ADDR

		Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
		Architectures: arm, arm64
		@@ -2362,7 +2362,7 @@ must be called after calling KVM_CREATE_IRQCHIP, but before calling
		KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
		base addresses will return -EEXIST.

		4.82 KVM_PPC_RTAS_DEFINE_TOKEN
		4.85 KVM_PPC_RTAS_DEFINE_TOKEN

		Capability: KVM_CAP_PPC_RTAS
		Architectures: ppc

Documentation/virtual/kvm/mmu.txt

+83 −8

Original line number	Diff line number	Diff line
		@@ -191,12 +191,12 @@ Shadow pages contain the following information:
		A counter keeping track of how many hardware registers (guest cr3 or
		pdptrs) are now pointing at the page. While this counter is nonzero, the
		page cannot be destroyed. See role.invalid.
		multimapped:
		Whether there exist multiple sptes pointing at this page.
		parent_pte/parent_ptes:
		If multimapped is zero, parent_pte points at the single spte that points at
		this page's spt. Otherwise, parent_ptes points at a data structure
		with a list of parent_ptes.
		parent_ptes:
		The reverse mapping for the pte/ptes pointing at this page's spt. If
		parent_ptes bit 0 is zero, only one spte points at this pages and
		parent_ptes points at this single spte, otherwise, there exists multiple
		sptes pointing at this page and (parent_ptes & ~0x1) points at a data
		structure with a list of parent_ptes.
		unsync:
		If true, then the translations in this page may not match the guest's
		translation. This is equivalent to the state of the tlb when a pte is
		@@ -210,6 +210,24 @@ Shadow pages contain the following information:
		A bitmap indicating which sptes in spt point (directly or indirectly) at
		pages that may be unsynchronized. Used to quickly locate all unsychronized
		pages reachable from a given page.
		mmu_valid_gen:
		Generation number of the page. It is compared with kvm->arch.mmu_valid_gen
		during hash table lookup, and used to skip invalidated shadow pages (see
		"Zapping all pages" below.)
		clear_spte_count:
		Only present on 32-bit hosts, where a 64-bit spte cannot be written
		atomically. The reader uses this while running out of the MMU lock
		to detect in-progress updates and retry them until the writer has
		finished the write.
		write_flooding_count:
		A guest may write to a page table many times, causing a lot of
		emulations if the page needs to be write-protected (see "Synchronized
		and unsynchronized pages" below). Leaf pages can be unsynchronized
		so that they do not trigger frequent emulation, but this is not
		possible for non-leafs. This field counts the number of emulations
		since the last time the page table was actually used; if emulation
		is triggered too frequently on this page, KVM will unmap the page
		to avoid emulation in the future.

		Reverse map
		===========
		@@ -258,14 +276,26 @@ This is the most complicated event. The cause of a page fault can be:

		Handling a page fault is performed as follows:

		- if the RSV bit of the error code is set, the page fault is caused by guest
		accessing MMIO and cached MMIO information is available.
		- walk shadow page table
		- check for valid generation number in the spte (see "Fast invalidation of
		MMIO sptes" below)
		- cache the information to vcpu->arch.mmio_gva, vcpu->arch.access and
		vcpu->arch.mmio_gfn, and call the emulator
		- If both P bit and R/W bit of error code are set, this could possibly
		be handled as a "fast page fault" (fixed without taking the MMU lock). See
		the description in Documentation/virtual/kvm/locking.txt.
		- if needed, walk the guest page tables to determine the guest translation
		(gva->gpa or ngpa->gpa)
		- if permissions are insufficient, reflect the fault back to the guest
		- determine the host page
		- if this is an mmio request, there is no host page; call the emulator
		to emulate the instruction instead
		- if this is an mmio request, there is no host page; cache the info to
		vcpu->arch.mmio_gva, vcpu->arch.access and vcpu->arch.mmio_gfn
		- walk the shadow page table to find the spte for the translation,
		instantiating missing intermediate page tables as necessary
		- If this is an mmio request, cache the mmio info to the spte and set some
		reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask)
		- try to unsynchronize the page
		- if successful, we can let the guest continue and modify the gpte
		- emulate the instruction
		@@ -351,6 +381,51 @@ causes its write_count to be incremented, thus preventing instantiation of
		a large spte. The frames at the end of an unaligned memory slot have
		artificially inflated ->write_counts so they can never be instantiated.

		Zapping all pages (page generation count)
		=========================================

		For the large memory guests, walking and zapping all pages is really slow
		(because there are a lot of pages), and also blocks memory accesses of
		all VCPUs because it needs to hold the MMU lock.

		To make it be more scalable, kvm maintains a global generation number
		which is stored in kvm->arch.mmu_valid_gen. Every shadow page stores
		the current global generation-number into sp->mmu_valid_gen when it
		is created. Pages with a mismatching generation number are "obsolete".

		When KVM need zap all shadow pages sptes, it just simply increases the global
		generation-number then reload root shadow pages on all vcpus. As the VCPUs
		create new shadow page tables, the old pages are not used because of the
		mismatching generation number.

		KVM then walks through all pages and zaps obsolete pages. While the zap
		operation needs to take the MMU lock, the lock can be released periodically
		so that the VCPUs can make progress.

		Fast invalidation of MMIO sptes
		===============================

		As mentioned in "Reaction to events" above, kvm will cache MMIO
		information in leaf sptes. When a new memslot is added or an existing
		memslot is changed, this information may become stale and needs to be
		invalidated. This also needs to hold the MMU lock while walking all
		shadow pages, and is made more scalable with a similar technique.

		MMIO sptes have a few spare bits, which are used to store a
		generation number. The global generation number is stored in
		kvm_memslots(kvm)->generation, and increased whenever guest memory info
		changes. This generation number is distinct from the one described in
		the previous section.

		When KVM finds an MMIO spte, it checks the generation number of the spte.
		If the generation number of the spte does not equal the global generation
		number, it will ignore the cached MMIO information and handle the page
		fault through the slow path.

		Since only 19 bits are used to store generation-number on mmio spte, all
		pages are zapped when there is an overflow.


		Further reading
		===============

MAINTAINERS

+2 −2

Original line number	Diff line number	Diff line
		@@ -4733,10 +4733,10 @@ F: arch/s390/kvm/
		F: drivers/s390/kvm/

		KERNEL VIRTUAL MACHINE (KVM) FOR ARM
		M: Christoffer Dall <cdall@cs.columbia.edu>
		M: Christoffer Dall <christoffer.dall@linaro.org>
		L: kvmarm@lists.cs.columbia.edu
		W: http://systems.cs.columbia.edu/projects/kvm-arm
		S: Maintained
		S: Supported
		F: arch/arm/include/uapi/asm/kvm*
		F: arch/arm/include/asm/kvm*
		F: arch/arm/kvm/

arch/arm/include/asm/kvm_arm.h

+0 −1

Original line number	Diff line number	Diff line
		@@ -135,7 +135,6 @@
		#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1ULL)
		#define PTRS_PER_S2_PGD (1ULL << (KVM_PHYS_SHIFT - 30))
		#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
		#define S2_PGD_SIZE (1 << S2_PGD_ORDER)

		/* Virtualization Translation Control Register (VTCR) bits */
		#define VTCR_SH0 (3 << 12)

arch/arm/include/asm/kvm_asm.h

+12 −12

Original line number	Diff line number	Diff line
		@@ -37,16 +37,18 @@
		#define c5_AIFSR 15 /* Auxilary Instrunction Fault Status R */
		#define c6_DFAR 16 /* Data Fault Address Register */
		#define c6_IFAR 17 /* Instruction Fault Address Register */
		#define c9_L2CTLR 18 /* Cortex A15 L2 Control Register */
		#define c10_PRRR 19 /* Primary Region Remap Register */
		#define c10_NMRR 20 /* Normal Memory Remap Register */
		#define c12_VBAR 21 /* Vector Base Address Register */
		#define c13_CID 22 /* Context ID Register */
		#define c13_TID_URW 23 /* Thread ID, User R/W */
		#define c13_TID_URO 24 /* Thread ID, User R/O */
		#define c13_TID_PRIV 25 /* Thread ID, Privileged */
		#define c14_CNTKCTL 26 /* Timer Control Register (PL1) */
		#define NR_CP15_REGS 27 /* Number of regs (incl. invalid) */
		#define c7_PAR 18 /* Physical Address Register */
		#define c7_PAR_high 19 /* PAR top 32 bits */
		#define c9_L2CTLR 20 /* Cortex A15 L2 Control Register */
		#define c10_PRRR 21 /* Primary Region Remap Register */
		#define c10_NMRR 22 /* Normal Memory Remap Register */
		#define c12_VBAR 23 /* Vector Base Address Register */
		#define c13_CID 24 /* Context ID Register */
		#define c13_TID_URW 25 /* Thread ID, User R/W */
		#define c13_TID_URO 26 /* Thread ID, User R/O */
		#define c13_TID_PRIV 27 /* Thread ID, Privileged */
		#define c14_CNTKCTL 28 /* Timer Control Register (PL1) */
		#define NR_CP15_REGS 29 /* Number of regs (incl. invalid) */

		#define ARM_EXCEPTION_RESET 0
		#define ARM_EXCEPTION_UNDEFINED 1
		@@ -72,8 +74,6 @@ extern char __kvm_hyp_vector[];
		extern char __kvm_hyp_code_start[];
		extern char __kvm_hyp_code_end[];

		extern void __kvm_tlb_flush_vmid(struct kvm *kvm);

		extern void __kvm_flush_vm_context(void);
		extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);