Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit ecefbd94 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull KVM updates from Avi Kivity:
 "Highlights of the changes for this release include support for vfio
  level triggered interrupts, improved big real mode support on older
  Intels, a streamlines guest page table walker, guest APIC speedups,
  PIO optimizations, better overcommit handling, and read-only memory."

* tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
  KVM: s390: Fix vcpu_load handling in interrupt code
  KVM: x86: Fix guest debug across vcpu INIT reset
  KVM: Add resampling irqfds for level triggered interrupts
  KVM: optimize apic interrupt delivery
  KVM: MMU: Eliminate pointless temporary 'ac'
  KVM: MMU: Avoid access/dirty update loop if all is well
  KVM: MMU: Eliminate eperm temporary
  KVM: MMU: Optimize is_last_gpte()
  KVM: MMU: Simplify walk_addr_generic() loop
  KVM: MMU: Optimize pte permission checks
  KVM: MMU: Update accessed and dirty bits after guest pagetable walk
  KVM: MMU: Move gpte_access() out of paging_tmpl.h
  KVM: MMU: Optimize gpte_access() slightly
  KVM: MMU: Push clean gpte write protection out of gpte_access()
  KVM: clarify kvmclock documentation
  KVM: make processes waiting on vcpu mutex killable
  KVM: SVM: Make use of asm.h
  KVM: VMX: Make use of asm.h
  KVM: VMX: Make lto-friendly
  KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
  ...

Conflicts:
	arch/s390/include/asm/processor.h
	arch/x86/kvm/i8259.c
parents ce57e981 3d11df7a
Loading
Loading
Loading
Loading
+25 −8
Original line number Diff line number Diff line
@@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
};

/* for kvm_memory_region::flags */
#define KVM_MEM_LOG_DIRTY_PAGES  1UL
#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
#define KVM_MEM_READONLY	(1UL << 1)

This ioctl allows the user to create or modify a guest physical memory
slot.  When changing an existing slot, it may be moved in the guest
@@ -873,14 +874,17 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
be identical.  This allows large pages in the guest to be backed by large
pages in the host.

The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
instructs kvm to keep track of writes to memory within the slot.  See
the KVM_GET_DIRTY_LOG ioctl.
The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which instructs
kvm to keep track of writes to memory within the slot.  See KVM_GET_DIRTY_LOG
ioctl.  The KVM_CAP_READONLY_MEM capability indicates the availability of the
KVM_MEM_READONLY flag.  When this flag is set for a memory region, KVM only
allows read accesses.  Writes will be posted to userspace as KVM_EXIT_MMIO
exits.

When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
region are automatically reflected into the guest.  For example, an mmap()
that affects the region will be made visible immediately.  Another example
is madvise(MADV_DROP).
When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
the memory region are automatically reflected into the guest.  For example, an
mmap() that affects the region will be made visible immediately.  Another
example is madvise(MADV_DROP).

It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
@@ -1946,6 +1950,19 @@ the guest using the specified gsi pin. The irqfd is removed using
the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
and kvm_irqfd.gsi.

With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
mechanism allowing emulation of level-triggered, irqfd-based
interrupts.  When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
additional eventfd in the kvm_irqfd.resamplefd field.  When operating
in resample mode, posting of an interrupt through kvm_irq.fd asserts
the specified gsi in the irqchip.  When the irqchip is resampled, such
as from an EOI, the gsi is de-asserted and the user is notifed via
kvm_irqfd.resamplefd.  It is the user's responsibility to re-queue
the interrupt if the device making use of it still requires service.
Note that closing the resamplefd is not sufficient to disable the
irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.

4.76 KVM_PPC_ALLOCATE_HTAB

Capability: KVM_CAP_PPC_ALLOC_HTAB
+66 −0
Original line number Diff line number Diff line
Linux KVM Hypercall:
===================
X86:
 KVM Hypercalls have a three-byte sequence of either the vmcall or the vmmcall
 instruction. The hypervisor can replace it with instructions that are
 guaranteed to be supported.

 Up to four arguments may be passed in rbx, rcx, rdx, and rsi respectively.
 The hypercall number should be placed in rax and the return value will be
 placed in rax.  No other registers will be clobbered unless explicitly stated
 by the particular hypercall.

S390:
  R2-R7 are used for parameters 1-6. In addition, R1 is used for hypercall
  number. The return value is written to R2.

  S390 uses diagnose instruction as hypercall (0x500) along with hypercall
  number in R1.

 PowerPC:
  It uses R3-R10 and hypercall number in R11. R4-R11 are used as output registers.
  Return value is placed in R3.

  KVM hypercalls uses 4 byte opcode, that are patched with 'hypercall-instructions'
  property inside the device tree's /hypervisor node.
  For more information refer to Documentation/virtual/kvm/ppc-pv.txt

KVM Hypercalls Documentation
===========================
The template for each hypercall is:
1. Hypercall name.
2. Architecture(s)
3. Status (deprecated, obsolete, active)
4. Purpose

1. KVM_HC_VAPIC_POLL_IRQ
------------------------
Architecture: x86
Status: active
Purpose: Trigger guest exit so that the host can check for pending
interrupts on reentry.

2. KVM_HC_MMU_OP
------------------------
Architecture: x86
Status: deprecated.
Purpose: Support MMU operations such as writing to PTE,
flushing TLB, release PT.

3. KVM_HC_FEATURES
------------------------
Architecture: PPC
Status: active
Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid
used to enumerate which hypercalls are available. On PPC, either device tree
based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration
mechanism (which is this hypercall) can be used.

4. KVM_HC_PPC_MAP_MAGIC_PAGE
------------------------
Architecture: PPC
Status: active
Purpose: To enable communication between the hypervisor and guest there is a
shared page that contains parts of supervisor visible register state.
The guest can map this shared page to access its supervisor register through
memory using this hypercall.
+20 −12
Original line number Diff line number Diff line
@@ -34,9 +34,12 @@ MSR_KVM_WALL_CLOCK_NEW: 0x4b564d00
		time information and check that they are both equal and even.
		An odd version indicates an in-progress update.

		sec: number of seconds for wallclock.
		sec: number of seconds for wallclock at time of boot.

		nsec: number of nanoseconds for wallclock.
		nsec: number of nanoseconds for wallclock at time of boot.

	In order to get the current wallclock time, the system_time from
	MSR_KVM_SYSTEM_TIME_NEW needs to be added.

	Note that although MSRs are per-CPU entities, the effect of this
	particular MSR is global.
@@ -82,20 +85,25 @@ MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01
		time at the time this structure was last updated. Unit is
		nanoseconds.

		tsc_to_system_mul: a function of the tsc frequency. One has
		to multiply any tsc-related quantity by this value to get
		a value in nanoseconds, besides dividing by 2^tsc_shift
		tsc_to_system_mul: multiplier to be used when converting
		tsc-related quantity to nanoseconds

		tsc_shift: cycle to nanosecond divider, as a power of two, to
		allow for shift rights. One has to shift right any tsc-related
		quantity by this value to get a value in nanoseconds, besides
		multiplying by tsc_to_system_mul.
		tsc_shift: shift to be used when converting tsc-related
		quantity to nanoseconds. This shift will ensure that
		multiplication with tsc_to_system_mul does not overflow.
		A positive value denotes a left shift, a negative value
		a right shift.

		With this information, guests can derive per-CPU time by
		doing:
		The conversion from tsc to nanoseconds involves an additional
		right shift by 32 bits. With this information, guests can
		derive per-CPU time by doing:

			time = (current_tsc - tsc_timestamp)
			time = (time * tsc_to_system_mul) >> tsc_shift
			if (tsc_shift >= 0)
				time <<= tsc_shift;
			else
				time >>= -tsc_shift;
			time = (time * tsc_to_system_mul) >> 32
			time = time + system_time

		flags: bits in this field indicate extended capabilities
+22 −0
Original line number Diff line number Diff line
@@ -174,3 +174,25 @@ following:
That way we can inject an arbitrary amount of code as replacement for a single
instruction. This allows us to check for pending interrupts when setting EE=1
for example.

Hypercall ABIs in KVM on PowerPC
=================================
1) KVM hypercalls (ePAPR)

These are ePAPR compliant hypercall implementation (mentioned above). Even
generic hypercalls are implemented here, like the ePAPR idle hcall. These are
available on all targets.

2) PAPR hypercalls

PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU).
These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of
them are handled in the kernel, some are handled in user space. This is only
available on book3s_64.

3) OSI hypercalls

Mac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long
before KVM). This is supported to maintain compatibility. All these hypercalls get
forwarded to user space. This is only useful on book3s_32, but can be used with
book3s_64 as well.
+17 −24
Original line number Diff line number Diff line
@@ -924,6 +924,16 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
	return 0;
}

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event)
{
	if (!irqchip_in_kernel(kvm))
		return -ENXIO;

	irq_event->status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
					irq_event->irq, irq_event->level);
	return 0;
}

long kvm_arch_vm_ioctl(struct file *filp,
		unsigned int ioctl, unsigned long arg)
{
@@ -963,29 +973,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
			goto out;
		}
		break;
	case KVM_IRQ_LINE_STATUS:
	case KVM_IRQ_LINE: {
		struct kvm_irq_level irq_event;

		r = -EFAULT;
		if (copy_from_user(&irq_event, argp, sizeof irq_event))
			goto out;
		r = -ENXIO;
		if (irqchip_in_kernel(kvm)) {
			__s32 status;
			status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
				    irq_event.irq, irq_event.level);
			if (ioctl == KVM_IRQ_LINE_STATUS) {
				r = -EFAULT;
				irq_event.status = status;
				if (copy_to_user(argp, &irq_event,
							sizeof irq_event))
					goto out;
			}
			r = 0;
		}
		break;
		}
	case KVM_GET_IRQCHIP: {
		/* 0: PIC master, 1: PIC slave, 2: IOAPIC */
		struct kvm_irqchip chip;
@@ -1626,11 +1613,17 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
	return;
}

void kvm_arch_flush_shadow(struct kvm *kvm)
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
	kvm_flush_remote_tlbs(kvm);
}

void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
				   struct kvm_memory_slot *slot)
{
	kvm_arch_flush_shadow_all();
}

long kvm_arch_dev_ioctl(struct file *filp,
			unsigned int ioctl, unsigned long arg)
{
Loading