Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit fd7bacbc authored by Mahesh Salgaonkar's avatar Mahesh Salgaonkar Committed by Paul Mackerras
Browse files

KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt



When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.

When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.

If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.

With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.

This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.

On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.

All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.

It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.

One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete.  Once TB resync is complete all threads will then
proceed.

Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
parent 6dd06d15
Loading
Loading
Loading
Loading
+45 −0
Original line number Diff line number Diff line
/*
 * Hypervisor Maintenance Interrupt header file.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.
 *
 * Copyright 2015 IBM Corporation
 * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
 */

#ifndef __ASM_PPC64_HMI_H__
#define __ASM_PPC64_HMI_H__

#ifdef CONFIG_PPC_BOOK3S_64

#define	CORE_TB_RESYNC_REQ_BIT		63
#define MAX_SUBCORE_PER_CORE		4

/*
 * sibling_subcore_state structure is used to co-ordinate all threads
 * during HMI to avoid TB corruption. This structure is allocated once
 * per each core and shared by all threads on that core.
 */
struct sibling_subcore_state {
	unsigned long	flags;
	u8		in_guest[MAX_SUBCORE_PER_CORE];
};

extern void wait_for_subcore_guest_exit(void);
extern void wait_for_tb_resync(void);
#else
static inline void wait_for_subcore_guest_exit(void) { }
static inline void wait_for_tb_resync(void) { }
#endif
#endif /* __ASM_PPC64_HMI_H__ */
+6 −0
Original line number Diff line number Diff line
@@ -25,6 +25,7 @@
#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
#include <asm/kvm_book3s_asm.h>
#endif
#include <asm/hmi.h>

register struct paca_struct *local_paca asm("r13");

@@ -181,6 +182,11 @@ struct paca_struct {
	 */
	u16 in_mce;
	u8 hmi_event_available;		 /* HMI event is available */
	/*
	 * Bitmap for sibling subcore status. See kvm/book3s_hv_ras.c for
	 * more details
	 */
	struct sibling_subcore_state *sibling_subcore_state;
#endif

	/* Stuff for accurate time accounting */
+1 −1
Original line number Diff line number Diff line
@@ -41,7 +41,7 @@ obj-$(CONFIG_VDSO32) += vdso32/
obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_ppc970.o cpu_setup_pa6t.o
obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_power.o
obj-$(CONFIG_PPC_BOOK3S_64)	+= mce.o mce_power.o
obj-$(CONFIG_PPC_BOOK3S_64)	+= mce.o mce_power.o hmi.o
obj64-$(CONFIG_RELOCATABLE)	+= reloc_64.o
obj-$(CONFIG_PPC_BOOK3E_64)	+= exceptions-64e.o idle_book3e.o
obj-$(CONFIG_PPC64)		+= vdso64/
+3 −1
Original line number Diff line number Diff line
@@ -680,6 +680,8 @@ _GLOBAL(__replay_interrupt)
BEGIN_FTR_SECTION
	cmpwi	r3,0xe80
	beq	h_doorbell_common
	cmpwi	r3,0xe60
	beq	hmi_exception_common
FTR_SECTION_ELSE
	cmpwi	r3,0xa00
	beq	doorbell_super_common
@@ -1172,7 +1174,7 @@ fwnmi_data_area:

	.globl hmi_exception_early
hmi_exception_early:
	EXCEPTION_PROLOG_1(PACA_EXGEN, NOTEST, 0xe60)
	EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, 0xe62)
	mr	r10,r1			/* Save r1			*/
	ld	r1,PACAEMERGSP(r13)	/* Use emergency stack		*/
	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack frame		*/
+56 −0
Original line number Diff line number Diff line
/*
 * Hypervisor Maintenance Interrupt (HMI) handling.
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.
 *
 * Copyright 2015 IBM Corporation
 * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
 */

#undef DEBUG

#include <linux/types.h>
#include <linux/compiler.h>
#include <asm/paca.h>
#include <asm/hmi.h>

void wait_for_subcore_guest_exit(void)
{
	int i;

	/*
	 * NULL bitmap pointer indicates that KVM module hasn't
	 * been loaded yet and hence no guests are running.
	 * If no KVM is in use, no need to co-ordinate among threads
	 * as all of them will always be in host and no one is going
	 * to modify TB other than the opal hmi handler.
	 * Hence, just return from here.
	 */
	if (!local_paca->sibling_subcore_state)
		return;

	for (i = 0; i < MAX_SUBCORE_PER_CORE; i++)
		while (local_paca->sibling_subcore_state->in_guest[i])
			cpu_relax();
}

void wait_for_tb_resync(void)
{
	if (!local_paca->sibling_subcore_state)
		return;

	while (test_bit(CORE_TB_RESYNC_REQ_BIT,
				&local_paca->sibling_subcore_state->flags))
		cpu_relax();
}
Loading