Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit d5b798c1 authored by Paolo Bonzini's avatar Paolo Bonzini
Browse files

Merge branch 'kvm-ppc-next' of...

Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest.  This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree.  Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.

Other notable changes include:

* Add the ability to change the size of the hashed page table,
  from David Gibson.

* XICS (interrupt controller) emulation fixes and improvements,
  from Li Zhong.

* Bug fixes from myself and Thomas Huth.

These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
parents 55dd00a7 050f2339
Loading
Loading
Loading
Loading
+187 −7
Original line number Diff line number Diff line
@@ -2443,18 +2443,20 @@ are, it will do nothing and return an EBUSY error.
The parameter is a pointer to a 32-bit unsigned integer variable
containing the order (log base 2) of the desired size of the hash
table, which must be between 18 and 46.  On successful return from the
ioctl, it will have been updated with the order of the hash table that
was allocated.
ioctl, the value will not be changed by the kernel.

If no hash table has been allocated when any vcpu is asked to run
(with the KVM_RUN ioctl), the host kernel will allocate a
default-sized hash table (16 MB).

If this ioctl is called when a hash table has already been allocated,
the kernel will clear out the existing hash table (zero all HPTEs) and
return the hash table order in the parameter.  (If the guest is using
the virtualized real-mode area (VRMA) facility, the kernel will
re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
with a different order from the existing hash table, the existing hash
table will be freed and a new one allocated.  If this is ioctl is
called when a hash table has already been allocated of the same order
as specified, the kernel will clear out the existing hash table (zero
all HPTEs).  In either case, if the guest is using the virtualized
real-mode area (VRMA) facility, the kernel will re-create the VMRA
HPTEs on the next KVM_RUN of any vcpu.

4.77 KVM_S390_INTERRUPT

@@ -3177,7 +3179,7 @@ of IOMMU pages.

The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.

4.98 KVM_REINJECT_CONTROL
4.99 KVM_REINJECT_CONTROL

Capability: KVM_CAP_REINJECT_CONTROL
Architectures: x86
@@ -3201,6 +3203,166 @@ struct kvm_reinject_control {
pit_reinject = 0 (!reinject mode) is recommended, unless running an old
operating system that uses the PIT for timing (e.g. Linux 2.4.x).

4.100 KVM_PPC_CONFIGURE_V3_MMU

Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_mmuv3_cfg (in)
Returns: 0 on success,
         -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
         -EINVAL if the configuration is invalid

This ioctl controls whether the guest will use radix or HPT (hashed
page table) translation, and sets the pointer to the process table for
the guest.

struct kvm_ppc_mmuv3_cfg {
	__u64	flags;
	__u64	process_table;
};

There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
KVM_PPC_MMUV3_GTSE.  KVM_PPC_MMUV3_RADIX, if set, configures the guest
to use radix tree translation, and if clear, to use HPT translation.
KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
to be able to use the global TLB and SLB invalidation instructions;
if clear, the guest may not use these instructions.

The process_table field specifies the address and size of the guest
process table, which is in the guest's space.  This field is formatted
as the second doubleword of the partition table entry, as defined in
the Power ISA V3.00, Book III section 5.7.6.1.

4.101 KVM_PPC_GET_RMMU_INFO

Capability: KVM_CAP_PPC_RADIX_MMU
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_rmmu_info (out)
Returns: 0 on success,
	 -EFAULT if struct kvm_ppc_rmmu_info cannot be written,
	 -EINVAL if no useful information can be returned

This ioctl returns a structure containing two things: (a) a list
containing supported radix tree geometries, and (b) a list that maps
page sizes to put in the "AP" (actual page size) field for the tlbie
(TLB invalidate entry) instruction.

struct kvm_ppc_rmmu_info {
	struct kvm_ppc_radix_geom {
		__u8	page_shift;
		__u8	level_bits[4];
		__u8	pad[3];
	}	geometries[8];
	__u32	ap_encodings[8];
};

The geometries[] field gives up to 8 supported geometries for the
radix page table, in terms of the log base 2 of the smallest page
size, and the number of bits indexed at each level of the tree, from
the PTE level up to the PGD level in that order.  Any unused entries
will have 0 in the page_shift field.

The ap_encodings gives the supported page sizes and their AP field
encodings, encoded with the AP value in the top 3 bits and the log
base 2 of the page size in the bottom 6 bits.

4.102 KVM_PPC_RESIZE_HPT_PREPARE

Capability: KVM_CAP_SPAPR_RESIZE_HPT
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_resize_hpt (in)
Returns: 0 on successful completion,
	 >0 if a new HPT is being prepared, the value is an estimated
             number of milliseconds until preparation is complete
         -EFAULT if struct kvm_reinject_control cannot be read,
	 -EINVAL if the supplied shift or flags are invalid
	 -ENOMEM if unable to allocate the new HPT
	 -ENOSPC if there was a hash collision when moving existing
                  HPT entries to the new HPT
	 -EIO on other error conditions

Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT).  Specifically this starts, stops or monitors
the preparation of a new potential HPT for the guest, essentially
implementing the H_RESIZE_HPT_PREPARE hypercall.

If called with shift > 0 when there is no pending HPT for the guest,
this begins preparation of a new pending HPT of size 2^(shift) bytes.
It then returns a positive integer with the estimated number of
milliseconds until preparation is complete.

If called when there is a pending HPT whose size does not match that
requested in the parameters, discards the existing pending HPT and
creates a new one as above.

If called when there is a pending HPT of the size requested, will:
  * If preparation of the pending HPT is already complete, return 0
  * If preparation of the pending HPT has failed, return an error
    code, then discard the pending HPT.
  * If preparation of the pending HPT is still in progress, return an
    estimated number of milliseconds until preparation is complete.

If called with shift == 0, discards any currently pending HPT and
returns 0 (i.e. cancels any in-progress preparation).

flags is reserved for future expansion, currently setting any bits in
flags will result in an -EINVAL.

Normally this will be called repeatedly with the same parameters until
it returns <= 0.  The first call will initiate preparation, subsequent
ones will monitor preparation until it completes or fails.

struct kvm_ppc_resize_hpt {
	__u64 flags;
	__u32 shift;
	__u32 pad;
};

4.103 KVM_PPC_RESIZE_HPT_COMMIT

Capability: KVM_CAP_SPAPR_RESIZE_HPT
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_resize_hpt (in)
Returns: 0 on successful completion,
         -EFAULT if struct kvm_reinject_control cannot be read,
	 -EINVAL if the supplied shift or flags are invalid
	 -ENXIO is there is no pending HPT, or the pending HPT doesn't
                 have the requested size
	 -EBUSY if the pending HPT is not fully prepared
	 -ENOSPC if there was a hash collision when moving existing
                  HPT entries to the new HPT
	 -EIO on other error conditions

Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT).  Specifically this requests that the guest be
transferred to working with the new HPT, essentially implementing the
H_RESIZE_HPT_COMMIT hypercall.

This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
returned 0 with the same parameters.  In other cases
KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
-EBUSY, though others may be possible if the preparation was started,
but failed).

This will have undefined effects on the guest if it has not already
placed itself in a quiescent state where no vcpu will make MMU enabled
memory accesses.

On succsful completion, the pending HPT will become the guest's active
HPT and the previous HPT will be discarded.

On failure, the guest will still be operating on its previous HPT.

struct kvm_ppc_resize_hpt {
	__u64 flags;
	__u32 shift;
	__u32 pad;
};

5. The kvm_run structure
------------------------

@@ -3942,3 +4104,21 @@ In order to use SynIC, it has to be activated by setting this
capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
will disable the use of APIC hardware virtualization even if supported
by the CPU, as it's incompatible with SynIC auto-EOI behavior.

8.3 KVM_CAP_PPC_RADIX_MMU

Architectures: ppc

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel can support guests using the
radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
processor).

8.4 KVM_CAP_PPC_HASH_MMU_V3

Architectures: ppc

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel can support guests using the
hashed page table MMU defined in Power ISA V3.00 (as implemented in
the POWER9 processor), including in-memory segment tables.
+17 −1
Original line number Diff line number Diff line
@@ -44,10 +44,20 @@ struct patb_entry {
};
extern struct patb_entry *partition_tb;

/* Bits in patb0 field */
#define PATB_HR		(1UL << 63)
#define PATB_GR		(1UL << 63)
#define RPDB_MASK	0x0ffffffffffff00fUL
#define RPDB_SHIFT	(1UL << 8)
#define RTS1_SHIFT	61		/* top 2 bits of radix tree size */
#define RTS1_MASK	(3UL << RTS1_SHIFT)
#define RTS2_SHIFT	5		/* bottom 3 bits of radix tree size */
#define RTS2_MASK	(7UL << RTS2_SHIFT)
#define RPDS_MASK	0x1f		/* root page dir. size field */

/* Bits in patb1 field */
#define PATB_GR		(1UL << 63)	/* guest uses radix; must match HR */
#define PRTS_MASK	0x1f		/* process table size field */

/*
 * Limit process table to PAGE_SIZE table. This
 * also limit the max pid we can support.
@@ -138,5 +148,11 @@ static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base,
extern int (*register_process_table)(unsigned long base, unsigned long page_size,
				     unsigned long tbl_size);

#ifdef CONFIG_PPC_PSERIES
extern void radix_init_pseries(void);
#else
static inline void radix_init_pseries(void) { };
#endif

#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
+55 −20
Original line number Diff line number Diff line
@@ -97,6 +97,15 @@
	ld	reg,PACAKBASE(r13);					\
	ori	reg,reg,(ABS_ADDR(label))@l;

/*
 * Branches from unrelocated code (e.g., interrupts) to labels outside
 * head-y require >64K offsets.
 */
#define __LOAD_FAR_HANDLER(reg, label)					\
	ld	reg,PACAKBASE(r13);					\
	ori	reg,reg,(ABS_ADDR(label))@l;				\
	addis	reg,reg,(ABS_ADDR(label))@h;

/* Exception register prefixes */
#define EXC_HV	H
#define EXC_STD
@@ -227,13 +236,41 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
	mtctr	reg;							\
	bctr

/*
 * KVM requires __LOAD_FAR_HANDLER.
 *
 * __BRANCH_TO_KVM_EXIT branches are also a special case because they
 * explicitly use r9 then reload it from PACA before branching. Hence
 * the double-underscore.
 */
#define __BRANCH_TO_KVM_EXIT(area, label)				\
	mfctr	r9;							\
	std	r9,HSTATE_SCRATCH1(r13);				\
	__LOAD_FAR_HANDLER(r9, label);					\
	mtctr	r9;							\
	ld	r9,area+EX_R9(r13);					\
	bctr

#define BRANCH_TO_KVM(reg, label)					\
	__LOAD_FAR_HANDLER(reg, label);					\
	mtctr	reg;							\
	bctr

#else
#define BRANCH_TO_COMMON(reg, label)					\
	b	label

#define BRANCH_TO_KVM(reg, label)					\
	b	label

#define __BRANCH_TO_KVM_EXIT(area, label)				\
	ld	r9,area+EX_R9(r13);					\
	b	label

#endif

#define __KVM_HANDLER_PROLOG(area, n)					\

#define __KVM_HANDLER(area, h, n)					\
	BEGIN_FTR_SECTION_NESTED(947)					\
	ld	r10,area+EX_CFAR(r13);					\
	std	r10,HSTATE_CFAR(r13);					\
@@ -243,30 +280,28 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
	std	r10,HSTATE_PPR(r13);					\
	END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);	\
	ld	r10,area+EX_R10(r13);					\
	stw	r9,HSTATE_SCRATCH1(r13);				\
	ld	r9,area+EX_R9(r13);					\
	std	r12,HSTATE_SCRATCH0(r13);				\

#define __KVM_HANDLER(area, h, n)					\
	__KVM_HANDLER_PROLOG(area, n)					\
	li	r12,n;							\
	b	kvmppc_interrupt
	sldi	r12,r9,32;						\
	ori	r12,r12,(n);						\
	/* This reloads r9 before branching to kvmppc_interrupt */	\
	__BRANCH_TO_KVM_EXIT(area, kvmppc_interrupt)

#define __KVM_HANDLER_SKIP(area, h, n)					\
	cmpwi	r10,KVM_GUEST_MODE_SKIP;				\
	ld	r10,area+EX_R10(r13);					\
	beq	89f;							\
	stw	r9,HSTATE_SCRATCH1(r13);				\
	BEGIN_FTR_SECTION_NESTED(948)					\
	ld	r9,area+EX_PPR(r13);					\
	std	r9,HSTATE_PPR(r13);					\
	ld	r10,area+EX_PPR(r13);					\
	std	r10,HSTATE_PPR(r13);					\
	END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948);	\
	ld	r9,area+EX_R9(r13);					\
	ld	r10,area+EX_R10(r13);					\
	std	r12,HSTATE_SCRATCH0(r13);				\
	li	r12,n;							\
	b	kvmppc_interrupt;					\
	sldi	r12,r9,32;						\
	ori	r12,r12,(n);						\
	/* This reloads r9 before branching to kvmppc_interrupt */	\
	__BRANCH_TO_KVM_EXIT(area, kvmppc_interrupt);			\
89:	mtocrf	0x80,r9;						\
	ld	r9,area+EX_R9(r13);					\
	ld	r10,area+EX_R10(r13);					\
	b	kvmppc_skip_##h##interrupt

#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
@@ -393,12 +428,12 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
	EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_STD)

#define STD_RELON_EXCEPTION_HV(loc, vec, label)		\
	/* No guest interrupts come through here */	\
	SET_SCRATCH0(r13);	/* save r13 */		\
	EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label, EXC_HV, NOTEST, vec);
	EXCEPTION_RELON_PROLOG_PSERIES(PACA_EXGEN, label,	\
				       EXC_HV, KVMTEST_HV, vec);

#define STD_RELON_EXCEPTION_HV_OOL(vec, label)			\
	EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, vec);		\
	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_HV, vec);	\
	EXCEPTION_RELON_PROLOG_PSERIES_1(label, EXC_HV)

/* This associate vector numbers with bits in paca->irq_happened */
@@ -475,10 +510,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)

#define MASKABLE_RELON_EXCEPTION_HV(loc, vec, label)			\
	_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label,			\
					  EXC_HV, SOFTEN_NOTEST_HV)
					  EXC_HV, SOFTEN_TEST_HV)

#define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label)			\
	EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);		\
	EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);		\
	EXCEPTION_PROLOG_PSERIES_1(label, EXC_HV)

/*
+1 −1
Original line number Diff line number Diff line
@@ -218,7 +218,7 @@ end_##sname:

#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#define TRAMP_KVM_BEGIN(name)						\
	TRAMP_REAL_BEGIN(name)
	TRAMP_VIRT_BEGIN(name)
#else
#define TRAMP_KVM_BEGIN(name)
#endif
+11 −0
Original line number Diff line number Diff line
@@ -276,6 +276,7 @@
#define H_GET_MPP_X		0x314
#define H_SET_MODE		0x31C
#define H_CLEAR_HPT		0x358
#define H_REGISTER_PROC_TBL	0x37C
#define H_SIGNAL_SYS_RESET	0x380
#define MAX_HCALL_OPCODE	H_SIGNAL_SYS_RESET

@@ -313,6 +314,16 @@
#define H_SIGNAL_SYS_RESET_ALL_OTHERS		-2
/* >= 0 values are CPU number */

/* Flag values used in H_REGISTER_PROC_TBL hcall */
#define PROC_TABLE_OP_MASK	0x18
#define PROC_TABLE_DEREG	0x10
#define PROC_TABLE_NEW		0x18
#define PROC_TABLE_TYPE_MASK	0x06
#define PROC_TABLE_HPT_SLB	0x00
#define PROC_TABLE_HPT_PT	0x02
#define PROC_TABLE_RADIX	0x04
#define PROC_TABLE_GTSE		0x01

#ifndef __ASSEMBLY__

/**
Loading