Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 4bb3c7a0 authored by Paul Mackerras's avatar Paul Mackerras Committed by Michael Ellerman
Browse files

KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9



POWER9 has hardware bugs relating to transactional memory and thread
reconfiguration (changes to hardware SMT mode).  Specifically, the core
does not have enough storage to store a complete checkpoint of all the
architected state for all four threads.  The DD2.2 version of POWER9
includes hardware modifications designed to allow hypervisor software
to implement workarounds for these problems.  This patch implements
those workarounds in KVM code so that KVM guests see a full, working
transactional memory implementation.

The problems center around the use of TM suspended state, where the
CPU has a checkpointed state but execution is not transactional.  The
workaround is to implement a "fake suspend" state, which looks to the
guest like suspended state but the CPU does not store a checkpoint.
In this state, any instruction that would cause a transition to
transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
checkpointed state (treclaim) causes a "soft patch" interrupt (vector
0x1500) to the hypervisor so that it can be emulated.  The trechkpt
instruction also causes a soft patch interrupt.

On POWER9 DD2.2, we avoid returning to the guest in any state which
would require a checkpoint to be present.  The trechkpt in the guest
entry path which would normally create that checkpoint is replaced by
either a transition to fake suspend state, if the guest is in suspend
state, or a rollback to the pre-transactional state if the guest is in
transactional state.  Fake suspend state is indicated by a flag in the
PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
reads back as 0.

On exit from the guest, if the guest is in fake suspend state, we still
do the treclaim instruction as we would in real suspend state, in order
to get into non-transactional state, but we do not save the resulting
register state since there was no checkpoint.

Emulation of the instructions that cause a softpatch interrupt is
handled in two paths.  If the guest is in real suspend mode, we call
kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
transitioning to transactional state.  This is called before we do the
treclaim in the guest exit path; because we haven't done treclaim, we
can get back to the guest with the transaction still active.  If the
instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
handle, or if the guest is in fake suspend state, then we proceed to
do the complete guest exit path and subsequently call
kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
all the cases including the cases that generate program interrupts
(illegal instruction or TM Bad Thing) and facility unavailable
interrupts.

The emulation is reasonably straightforward and is mostly concerned
with checking for exception conditions and updating the state of
registers such as MSR and CR0.  The treclaim emulation takes care to
ensure that the TEXASR register gets updated as if it were the guest
treclaim instruction that had done failure recording, not the treclaim
done in hypervisor state in the guest exit path.

With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
transactional memory is not available to host userspace.

Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
parent 7672691a
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -108,6 +108,8 @@

/* book3s_hv */

#define BOOK3S_INTERRUPT_HV_SOFTPATCH	0x1500

/*
 * Special trap used to indicate to host that this is a
 * passthrough interrupt that could not be handled
+4 −0
Original line number Diff line number Diff line
@@ -241,6 +241,10 @@ extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
			unsigned long mask);
extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);

extern int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu);
extern int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu);
extern void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu);

extern void kvmppc_entry_trampoline(void);
extern void kvmppc_hv_entry_trampoline(void);
extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
+43 −0
Original line number Diff line number Diff line
@@ -472,6 +472,49 @@ static inline void set_dirty_bits_atomic(unsigned long *map, unsigned long i,
			set_bit_le(i, map);
}

static inline u64 sanitize_msr(u64 msr)
{
	msr &= ~MSR_HV;
	msr |= MSR_ME;
	return msr;
}

#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
static inline void copy_from_checkpoint(struct kvm_vcpu *vcpu)
{
	vcpu->arch.cr  = vcpu->arch.cr_tm;
	vcpu->arch.xer = vcpu->arch.xer_tm;
	vcpu->arch.lr  = vcpu->arch.lr_tm;
	vcpu->arch.ctr = vcpu->arch.ctr_tm;
	vcpu->arch.amr = vcpu->arch.amr_tm;
	vcpu->arch.ppr = vcpu->arch.ppr_tm;
	vcpu->arch.dscr = vcpu->arch.dscr_tm;
	vcpu->arch.tar = vcpu->arch.tar_tm;
	memcpy(vcpu->arch.gpr, vcpu->arch.gpr_tm,
	       sizeof(vcpu->arch.gpr));
	vcpu->arch.fp  = vcpu->arch.fp_tm;
	vcpu->arch.vr  = vcpu->arch.vr_tm;
	vcpu->arch.vrsave = vcpu->arch.vrsave_tm;
}

static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
{
	vcpu->arch.cr_tm  = vcpu->arch.cr;
	vcpu->arch.xer_tm = vcpu->arch.xer;
	vcpu->arch.lr_tm  = vcpu->arch.lr;
	vcpu->arch.ctr_tm = vcpu->arch.ctr;
	vcpu->arch.amr_tm = vcpu->arch.amr;
	vcpu->arch.ppr_tm = vcpu->arch.ppr;
	vcpu->arch.dscr_tm = vcpu->arch.dscr;
	vcpu->arch.tar_tm = vcpu->arch.tar;
	memcpy(vcpu->arch.gpr_tm, vcpu->arch.gpr,
	       sizeof(vcpu->arch.gpr));
	vcpu->arch.fp_tm  = vcpu->arch.fp;
	vcpu->arch.vr_tm  = vcpu->arch.vr;
	vcpu->arch.vrsave_tm = vcpu->arch.vrsave;
}
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */

#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */

#endif /* __ASM_KVM_BOOK3S_64_H__ */
+1 −0
Original line number Diff line number Diff line
@@ -119,6 +119,7 @@ struct kvmppc_host_state {
	u8 host_ipi;
	u8 ptid;		/* thread number within subcore when split */
	u8 tid;			/* thread number within whole core */
	u8 fake_suspend;
	struct kvm_vcpu *kvm_vcpu;
	struct kvmppc_vcore *kvm_vcore;
	void __iomem *xics_phys;
+1 −0
Original line number Diff line number Diff line
@@ -610,6 +610,7 @@ struct kvm_vcpu_arch {
	u64 tfhar;
	u64 texasr;
	u64 tfiar;
	u64 orig_texasr;

	u32 cr_tm;
	u64 xer_tm;
Loading