Loading Documentation/ia64/paravirt_ops.txt 0 → 100644 +137 −0 Original line number Diff line number Diff line Paravirt_ops on IA64 ==================== 21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp> Introduction ------------ The aim of this documentation is to help with maintainability and/or to encourage people to use paravirt_ops/IA64. paravirt_ops (pv_ops in short) is a way for virtualization support of Linux kernel on x86. Several ways for virtualization support were proposed, paravirt_ops is the winner. On the other hand, now there are also several IA64 virtualization technologies like kvm/IA64, xen/IA64 and many other academic IA64 hypervisors so that it is good to add generic virtualization infrastructure on Linux/IA64. What is paravirt_ops? --------------------- It has been developed on x86 as virtualization support via API, not ABI. It allows each hypervisor to override operations which are important for hypervisors at API level. And it allows a single kernel binary to run on all supported execution environments including native machine. Essentially paravirt_ops is a set of function pointers which represent operations corresponding to low level sensitive instructions and high level functionalities in various area. But one significant difference from usual function pointer table is that it allows optimization with binary patch. It is because some of these operations are very performance sensitive and indirect call overhead is not negligible. With binary patch, indirect C function call can be transformed into direct C function call or in-place execution to eliminate the overhead. Thus, operations of paravirt_ops are classified into three categories. - simple indirect call These operations correspond to high level functionality so that the overhead of indirect call isn't very important. - indirect call which allows optimization with binary patch Usually these operations correspond to low level instructions. They are called frequently and performance critical. So the overhead is very important. - a set of macros for hand written assembly code Hand written assembly codes (.S files) also need paravirtualization because they include sensitive instructions or some of code paths in them are very performance critical. The relation to the IA64 machine vector --------------------------------------- Linux/IA64 has the IA64 machine vector functionality which allows the kernel to switch implementations (e.g. initialization, ipi, dma api...) depending on executing platform. We can replace some implementations very easily defining a new machine vector. Thus another approach for virtualization support would be enhancing the machine vector functionality. But paravirt_ops approach was taken because - virtualization support needs wider support than machine vector does. e.g. low level instruction paravirtualization. It must be initialized very early before platform detection. - virtualization support needs more functionality like binary patch. Probably the calling overhead might not be very large compared to the emulation overhead of virtualization. However in the native case, the overhead should be eliminated completely. A single kernel binary should run on each environment including native, and the overhead of paravirt_ops on native environment should be as small as possible. - for full virtualization technology, e.g. KVM/IA64 or Xen/IA64 HVM domain, the result would be (the emulated platform machine vector. probably dig) + (pv_ops). This means that the virtualization support layer should be under the machine vector layer. Possibly it might be better to move some function pointers from paravirt_ops to machine vector. In fact, Xen domU case utilizes both pv_ops and machine vector. IA64 paravirt_ops ----------------- In this section, the concrete paravirt_ops will be discussed. Because of the architecture difference between ia64 and x86, the resulting set of functions is very different from x86 pv_ops. - C function pointer tables They are not very performance critical so that simple C indirect function call is acceptable. The following structures are defined at this moment. For details see linux/include/asm-ia64/paravirt.h - struct pv_info This structure describes the execution environment. - struct pv_init_ops This structure describes the various initialization hooks. - struct pv_iosapic_ops This structure describes hooks to iosapic operations. - struct pv_irq_ops This structure describes hooks to irq related operations - struct pv_time_op This structure describes hooks to steal time accounting. - a set of indirect calls which need optimization Currently this class of functions correspond to a subset of IA64 intrinsics. At this moment the optimization with binary patch isn't implemented yet. struct pv_cpu_op is defined. For details see linux/include/asm-ia64/paravirt_privop.h Mostly they correspond to ia64 intrinsics 1-to-1. Caveat: Now they are defined as C indirect function pointers, but in order to support binary patch optimization, they will be changed using GCC extended inline assembly code. - a set of macros for hand written assembly code (.S files) For maintenance purpose, the taken approach for .S files is single source code and compile multiple times with different macros definitions. Each pv_ops instance must define those macros to compile. The important thing here is that sensitive, but non-privileged instructions must be paravirtualized and that some privileged instructions also need paravirtualization for reasonable performance. Developers who modify .S files must be aware of that. At this moment an easy checker is implemented to detect paravirtualization breakage. But it doesn't cover all the cases. Sometimes this set of macros is called pv_cpu_asm_op. But there is no corresponding structure in the source code. Those macros mostly 1:1 correspond to a subset of privileged instructions. See linux/include/asm-ia64/native/inst.h. And some functions written in assembly also need to be overrided so that each pv_ops instance have to define some macros. Again see linux/include/asm-ia64/native/inst.h. Those structures must be initialized very early before start_kernel. Probably initialized in head.S using multi entry point or some other trick. For native case implementation see linux/arch/ia64/kernel/paravirt.c. arch/ia64/Makefile +6 −0 Original line number Diff line number Diff line Loading @@ -100,3 +100,9 @@ define archhelp echo ' boot - Build vmlinux and bootloader for Ski simulator' echo '* unwcheck - Check vmlinux for invalid unwind info' endef archprepare: make_nr_irqs_h FORCE PHONY += make_nr_irqs_h FORCE make_nr_irqs_h: FORCE $(Q)$(MAKE) $(build)=arch/ia64/kernel include/asm-ia64/nr-irqs.h arch/ia64/kernel/Makefile +44 −0 Original line number Diff line number Diff line Loading @@ -36,6 +36,8 @@ obj-$(CONFIG_PCI_MSI) += msi_ia64.o mca_recovery-y += mca_drv.o mca_drv_asm.o obj-$(CONFIG_IA64_MC_ERR_INJECT)+= err_inject.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirtentry.o obj-$(CONFIG_IA64_ESI) += esi.o ifneq ($(CONFIG_IA64_ESI),) obj-y += esi_stub.o # must be in kernel proper Loading Loading @@ -70,3 +72,45 @@ $(obj)/gate-syms.o: $(obj)/gate.lds $(obj)/gate.o FORCE # We must build gate.so before we can assemble it. # Note: kbuild does not track this dependency due to usage of .incbin $(obj)/gate-data.o: $(obj)/gate.so # Calculate NR_IRQ = max(IA64_NATIVE_NR_IRQS, XEN_NR_IRQS, ...) based on config define sed-y "/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}" endef quiet_cmd_nr_irqs = GEN $@ define cmd_nr_irqs (set -e; \ echo "#ifndef __ASM_NR_IRQS_H__"; \ echo "#define __ASM_NR_IRQS_H__"; \ echo "/*"; \ echo " * DO NOT MODIFY."; \ echo " *"; \ echo " * This file was generated by Kbuild"; \ echo " *"; \ echo " */"; \ echo ""; \ sed -ne $(sed-y) $<; \ echo ""; \ echo "#endif" ) > $@ endef # We use internal kbuild rules to avoid the "is up to date" message from make arch/$(SRCARCH)/kernel/nr-irqs.s: $(srctree)/arch/$(SRCARCH)/kernel/nr-irqs.c \ $(wildcard $(srctree)/include/asm-ia64/*/irq.h) $(Q)mkdir -p $(dir $@) $(call if_changed_dep,cc_s_c) include/asm-ia64/nr-irqs.h: arch/$(SRCARCH)/kernel/nr-irqs.s $(Q)mkdir -p $(dir $@) $(call cmd,nr_irqs) clean-files += $(objtree)/include/asm-ia64/nr-irqs.h # # native ivt.S and entry.S # ASM_PARAVIRT_OBJS = ivt.o entry.o define paravirtualized_native AFLAGS_$(1) += -D__IA64_ASM_PARAVIRTUALIZED_NATIVE endef $(foreach obj,$(ASM_PARAVIRT_OBJS),$(eval $(call paravirtualized_native,$(obj)))) arch/ia64/kernel/entry.S +72 −43 Original line number Diff line number Diff line Loading @@ -22,6 +22,11 @@ * Patrick O'Rourke <orourke@missioncriticallinux.com> * 11/07/2000 */ /* * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> * VA Linux Systems Japan K.K. * pv_ops. */ /* * Global (preserved) predicate usage on syscall entry/exit path: * Loading @@ -45,6 +50,7 @@ #include "minstate.h" #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* * execve() is special because in case of success, we need to * setup a null register window frame. Loading Loading @@ -173,6 +179,7 @@ GLOBAL_ENTRY(sys_clone) mov rp=loc0 br.ret.sptk.many rp END(sys_clone) #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ /* * prev_task <- ia64_switch_to(struct task_struct *next) Loading @@ -180,7 +187,7 @@ END(sys_clone) * called. The code starting at .map relies on this. The rest of the code * doesn't care about the interrupt masking status. */ GLOBAL_ENTRY(ia64_switch_to) GLOBAL_ENTRY(__paravirt_switch_to) .prologue alloc r16=ar.pfs,1,0,0,0 DO_SAVE_SWITCH_STACK Loading @@ -204,7 +211,7 @@ GLOBAL_ENTRY(ia64_switch_to) ;; .done: ld8 sp=[r21] // load kernel stack pointer of new task mov IA64_KR(CURRENT)=in0 // update "current" application register MOV_TO_KR(CURRENT, in0, r8, r9) // update "current" application register mov r8=r13 // return pointer to previously running task mov r13=in0 // set "current" pointer ;; Loading @@ -216,26 +223,25 @@ GLOBAL_ENTRY(ia64_switch_to) br.ret.sptk.many rp // boogie on out in new context .map: rsm psr.ic // interrupts (psr.i) are already disabled here RSM_PSR_IC(r25) // interrupts (psr.i) are already disabled here movl r25=PAGE_KERNEL ;; srlz.d or r23=r25,r20 // construct PA | page properties mov r25=IA64_GRANULE_SHIFT<<2 ;; mov cr.itir=r25 mov cr.ifa=in0 // VA of next task... MOV_TO_ITIR(p0, r25, r8) MOV_TO_IFA(in0, r8) // VA of next task... ;; mov r25=IA64_TR_CURRENT_STACK mov IA64_KR(CURRENT_STACK)=r26 // remember last page we mapped... MOV_TO_KR(CURRENT_STACK, r26, r8, r9) // remember last page we mapped... ;; itr.d dtr[r25]=r23 // wire in new mapping... ssm psr.ic // reenable the psr.ic bit ;; srlz.d SSM_PSR_IC_AND_SRLZ_D(r8, r9) // reenable the psr.ic bit br.cond.sptk .done END(ia64_switch_to) END(__paravirt_switch_to) #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* * Note that interrupts are enabled during save_switch_stack and load_switch_stack. This * means that we may get an interrupt with "sp" pointing to the new kernel stack while Loading Loading @@ -375,7 +381,7 @@ END(save_switch_stack) * - b7 holds address to return to * - must not touch r8-r11 */ ENTRY(load_switch_stack) GLOBAL_ENTRY(load_switch_stack) .prologue .altrp b7 Loading Loading @@ -571,7 +577,7 @@ GLOBAL_ENTRY(ia64_trace_syscall) .ret3: (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk (pUStk) rsm psr.i // disable interrupts br.cond.sptk .work_pending_syscall_end br.cond.sptk ia64_work_pending_syscall_end strace_error: ld8 r3=[r2] // load pt_regs.r8 Loading Loading @@ -636,8 +642,17 @@ GLOBAL_ENTRY(ia64_ret_from_syscall) adds r2=PT(R8)+16,sp // r2 = &pt_regs.r8 mov r10=r0 // clear error indication in r10 (p7) br.cond.spnt handle_syscall_error // handle potential syscall failure #ifdef CONFIG_PARAVIRT ;; br.cond.sptk.few ia64_leave_syscall ;; #endif /* CONFIG_PARAVIRT */ END(ia64_ret_from_syscall) #ifndef CONFIG_PARAVIRT // fall through #endif #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ /* * ia64_leave_syscall(): Same as ia64_leave_kernel, except that it doesn't * need to switch to bank 0 and doesn't restore the scratch registers. Loading Loading @@ -682,7 +697,7 @@ END(ia64_ret_from_syscall) * ar.csd: cleared * ar.ssd: cleared */ ENTRY(ia64_leave_syscall) GLOBAL_ENTRY(__paravirt_leave_syscall) PT_REGS_UNWIND_INFO(0) /* * work.need_resched etc. mustn't get changed by this CPU before it returns to Loading @@ -692,11 +707,11 @@ ENTRY(ia64_leave_syscall) * extra work. We always check for extra work when returning to user-level. * With CONFIG_PREEMPT, we also check for extra work when the preempt_count * is 0. After extra work processing has been completed, execution * resumes at .work_processed_syscall with p6 set to 1 if the extra-work-check * resumes at ia64_work_processed_syscall with p6 set to 1 if the extra-work-check * needs to be redone. */ #ifdef CONFIG_PREEMPT rsm psr.i // disable interrupts RSM_PSR_I(p0, r2, r18) // disable interrupts cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; Loading @@ -706,11 +721,12 @@ ENTRY(ia64_leave_syscall) ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) #else /* !CONFIG_PREEMPT */ (pUStk) rsm psr.i RSM_PSR_I(pUStk, r2, r18) cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk #endif .work_processed_syscall: .global __paravirt_work_processed_syscall; __paravirt_work_processed_syscall: #ifdef CONFIG_VIRT_CPU_ACCOUNTING adds r2=PT(LOADRS)+16,r12 (pUStk) mov.m r22=ar.itc // fetch time at leave Loading Loading @@ -744,7 +760,7 @@ ENTRY(ia64_leave_syscall) (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE! ;; invala // M0|1 invalidate ALAT rsm psr.i | psr.ic // M2 turn off interrupts and interruption collection RSM_PSR_I_IC(r28, r29, r30) // M2 turn off interrupts and interruption collection cmp.eq p9,p0=r0,r0 // A set p9 to indicate that we should restore cr.ifs ld8 r29=[r2],16 // M0|1 load cr.ipsr Loading @@ -765,7 +781,7 @@ ENTRY(ia64_leave_syscall) ;; #endif ld8 r26=[r2],PT(B0)-PT(AR_PFS) // M0|1 load ar.pfs (pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled MOV_FROM_PSR(pKStk, r22, r21) // M2 read PSR now that interrupts are disabled nop 0 ;; ld8 r21=[r2],PT(AR_RNAT)-PT(B0) // M0|1 load b0 Loading Loading @@ -798,7 +814,7 @@ ENTRY(ia64_leave_syscall) srlz.d // M0 ensure interruption collection is off (for cover) shr.u r18=r19,16 // I0|1 get byte size of existing "dirty" partition cover // B add current frame into dirty partition & set cr.ifs COVER // B add current frame into dirty partition & set cr.ifs ;; #ifdef CONFIG_VIRT_CPU_ACCOUNTING mov r19=ar.bsp // M2 get new backing store pointer Loading @@ -823,8 +839,9 @@ ENTRY(ia64_leave_syscall) mov.m ar.ssd=r0 // M2 clear ar.ssd mov f11=f0 // F clear f11 br.cond.sptk.many rbs_switch // B END(ia64_leave_syscall) END(__paravirt_leave_syscall) #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE #ifdef CONFIG_IA32_SUPPORT GLOBAL_ENTRY(ia64_ret_from_ia32_execve) PT_REGS_UNWIND_INFO(0) Loading @@ -835,10 +852,20 @@ GLOBAL_ENTRY(ia64_ret_from_ia32_execve) st8.spill [r2]=r8 // store return value in slot for r8 and set unat bit .mem.offset 8,0 st8.spill [r3]=r0 // clear error indication in slot for r10 and set unat bit #ifdef CONFIG_PARAVIRT ;; // don't fall through, ia64_leave_kernel may be #define'd br.cond.sptk.few ia64_leave_kernel ;; #endif /* CONFIG_PARAVIRT */ END(ia64_ret_from_ia32_execve) #ifndef CONFIG_PARAVIRT // fall through #endif #endif /* CONFIG_IA32_SUPPORT */ GLOBAL_ENTRY(ia64_leave_kernel) #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ GLOBAL_ENTRY(__paravirt_leave_kernel) PT_REGS_UNWIND_INFO(0) /* * work.need_resched etc. mustn't get changed by this CPU before it returns to Loading @@ -852,7 +879,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) * needs to be redone. */ #ifdef CONFIG_PREEMPT rsm psr.i // disable interrupts RSM_PSR_I(p0, r17, r31) // disable interrupts cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; Loading @@ -862,7 +889,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) #else (pUStk) rsm psr.i RSM_PSR_I(pUStk, r17, r31) cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk #endif Loading Loading @@ -910,7 +937,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) mov ar.csd=r30 mov ar.ssd=r31 ;; rsm psr.i | psr.ic // initiate turning off of interrupt and interruption collection RSM_PSR_I_IC(r23, r22, r25) // initiate turning off of interrupt and interruption collection invala // invalidate ALAT ;; ld8.fill r22=[r2],24 Loading Loading @@ -942,7 +969,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) mov ar.ccv=r15 ;; ldf.fill f11=[r2] bsw.0 // switch back to bank 0 (no stop bit required beforehand...) BSW_0(r2, r3, r15) // switch back to bank 0 (no stop bit required beforehand...) ;; (pUStk) mov r18=IA64_KR(CURRENT)// M2 (12 cycle read latency) adds r16=PT(CR_IPSR)+16,r12 Loading @@ -950,12 +977,12 @@ GLOBAL_ENTRY(ia64_leave_kernel) #ifdef CONFIG_VIRT_CPU_ACCOUNTING .pred.rel.mutex pUStk,pKStk (pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled MOV_FROM_PSR(pKStk, r22, r29) // M2 read PSR now that interrupts are disabled (pUStk) mov.m r22=ar.itc // M fetch time at leave nop.i 0 ;; #else (pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled MOV_FROM_PSR(pKStk, r22, r29) // M2 read PSR now that interrupts are disabled nop.i 0 nop.i 0 ;; Loading Loading @@ -1027,7 +1054,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) * NOTE: alloc, loadrs, and cover can't be predicated. */ (pNonSys) br.cond.dpnt dont_preserve_current_frame cover // add current frame into dirty partition and set cr.ifs COVER // add current frame into dirty partition and set cr.ifs ;; mov r19=ar.bsp // get new backing store pointer rbs_switch: Loading Loading @@ -1130,16 +1157,16 @@ skip_rbs_switch: (pKStk) dep r29=r22,r29,21,1 // I0 update ipsr.pp with psr.pp (pLvSys)mov r16=r0 // A clear r16 for leave_syscall, no-op otherwise ;; mov cr.ipsr=r29 // M2 MOV_TO_IPSR(p0, r29, r25) // M2 mov ar.pfs=r26 // I0 (pLvSys)mov r17=r0 // A clear r17 for leave_syscall, no-op otherwise (p9) mov cr.ifs=r30 // M2 MOV_TO_IFS(p9, r30, r25)// M2 mov b0=r21 // I0 (pLvSys)mov r18=r0 // A clear r18 for leave_syscall, no-op otherwise mov ar.fpsr=r20 // M2 mov cr.iip=r28 // M2 MOV_TO_IIP(r28, r25) // M2 nop 0 ;; (pUStk) mov ar.rnat=r24 // M2 must happen with RSE in lazy mode Loading @@ -1148,7 +1175,7 @@ skip_rbs_switch: mov ar.rsc=r27 // M2 mov pr=r31,-1 // I0 rfi // B RFI // B /* * On entry: Loading @@ -1174,35 +1201,36 @@ skip_rbs_switch: ;; (pKStk) st4 [r20]=r21 #endif ssm psr.i // enable interrupts SSM_PSR_I(p0, p6, r2) // enable interrupts br.call.spnt.many rp=schedule .ret9: cmp.eq p6,p0=r0,r0 // p6 <- 1 (re-check) rsm psr.i // disable interrupts RSM_PSR_I(p0, r2, r20) // disable interrupts ;; #ifdef CONFIG_PREEMPT (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; (pKStk) st4 [r20]=r0 // preempt_count() <- 0 #endif (pLvSys)br.cond.sptk.few .work_pending_syscall_end (pLvSys)br.cond.sptk.few __paravirt_pending_syscall_end br.cond.sptk.many .work_processed_kernel .notify: (pUStk) br.call.spnt.many rp=notify_resume_user .ret10: cmp.ne p6,p0=r0,r0 // p6 <- 0 (don't re-check) (pLvSys)br.cond.sptk.few .work_pending_syscall_end (pLvSys)br.cond.sptk.few __paravirt_pending_syscall_end br.cond.sptk.many .work_processed_kernel .work_pending_syscall_end: .global __paravirt_pending_syscall_end; __paravirt_pending_syscall_end: adds r2=PT(R8)+16,r12 adds r3=PT(R10)+16,r12 ;; ld8 r8=[r2] ld8 r10=[r3] br.cond.sptk.many .work_processed_syscall END(ia64_leave_kernel) br.cond.sptk.many __paravirt_work_processed_syscall_target END(__paravirt_leave_kernel) #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE ENTRY(handle_syscall_error) /* * Some system calls (e.g., ptrace, mmap) can return arbitrary values which could Loading Loading @@ -1244,7 +1272,7 @@ END(ia64_invoke_schedule_tail) * We declare 8 input registers so the system call args get preserved, * in case we need to restart a system call. */ ENTRY(notify_resume_user) GLOBAL_ENTRY(notify_resume_user) .prologue ASM_UNW_PRLG_RP|ASM_UNW_PRLG_PFS, ASM_UNW_PRLG_GRSAVE(8) alloc loc1=ar.pfs,8,2,3,0 // preserve all eight input regs in case of syscall restart! mov r9=ar.unat Loading Loading @@ -1306,7 +1334,7 @@ ENTRY(sys_rt_sigreturn) adds sp=16,sp ;; ld8 r9=[sp] // load new ar.unat mov.sptk b7=r8,ia64_leave_kernel mov.sptk b7=r8,ia64_native_leave_kernel ;; mov ar.unat=r9 br.many b7 Loading Loading @@ -1665,3 +1693,4 @@ sys_call_table: data8 sys_timerfd_gettime .org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ arch/ia64/kernel/head.S +41 −0 Original line number Diff line number Diff line Loading @@ -26,11 +26,14 @@ #include <asm/mmu_context.h> #include <asm/asm-offsets.h> #include <asm/pal.h> #include <asm/paravirt.h> #include <asm/pgtable.h> #include <asm/processor.h> #include <asm/ptrace.h> #include <asm/system.h> #include <asm/mca_asm.h> #include <linux/init.h> #include <linux/linkage.h> #ifdef CONFIG_HOTPLUG_CPU #define SAL_PSR_BITS_TO_SET \ Loading Loading @@ -367,6 +370,44 @@ start_ap: ;; (isBP) st8 [r2]=r28 // save the address of the boot param area passed by the bootloader #ifdef CONFIG_PARAVIRT movl r14=hypervisor_setup_hooks movl r15=hypervisor_type mov r16=num_hypervisor_hooks ;; ld8 r2=[r15] ;; cmp.ltu p7,p0=r2,r16 // array size check shladd r8=r2,3,r14 ;; (p7) ld8 r9=[r8] ;; (p7) mov b1=r9 (p7) cmp.ne.unc p7,p0=r9,r0 // no actual branch to NULL ;; (p7) br.call.sptk.many rp=b1 __INITDATA default_setup_hook = 0 // Currently nothing needs to be done. .weak xen_setup_hook .global hypervisor_type hypervisor_type: data8 PARAVIRT_HYPERVISOR_TYPE_DEFAULT // must have the same order with PARAVIRT_HYPERVISOR_TYPE_xxx hypervisor_setup_hooks: data8 default_setup_hook data8 xen_setup_hook num_hypervisor_hooks = (. - hypervisor_setup_hooks) / 8 .previous #endif #ifdef CONFIG_SMP (isAP) br.call.sptk.many rp=start_secondary .ret0: Loading Loading
Documentation/ia64/paravirt_ops.txt 0 → 100644 +137 −0 Original line number Diff line number Diff line Paravirt_ops on IA64 ==================== 21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp> Introduction ------------ The aim of this documentation is to help with maintainability and/or to encourage people to use paravirt_ops/IA64. paravirt_ops (pv_ops in short) is a way for virtualization support of Linux kernel on x86. Several ways for virtualization support were proposed, paravirt_ops is the winner. On the other hand, now there are also several IA64 virtualization technologies like kvm/IA64, xen/IA64 and many other academic IA64 hypervisors so that it is good to add generic virtualization infrastructure on Linux/IA64. What is paravirt_ops? --------------------- It has been developed on x86 as virtualization support via API, not ABI. It allows each hypervisor to override operations which are important for hypervisors at API level. And it allows a single kernel binary to run on all supported execution environments including native machine. Essentially paravirt_ops is a set of function pointers which represent operations corresponding to low level sensitive instructions and high level functionalities in various area. But one significant difference from usual function pointer table is that it allows optimization with binary patch. It is because some of these operations are very performance sensitive and indirect call overhead is not negligible. With binary patch, indirect C function call can be transformed into direct C function call or in-place execution to eliminate the overhead. Thus, operations of paravirt_ops are classified into three categories. - simple indirect call These operations correspond to high level functionality so that the overhead of indirect call isn't very important. - indirect call which allows optimization with binary patch Usually these operations correspond to low level instructions. They are called frequently and performance critical. So the overhead is very important. - a set of macros for hand written assembly code Hand written assembly codes (.S files) also need paravirtualization because they include sensitive instructions or some of code paths in them are very performance critical. The relation to the IA64 machine vector --------------------------------------- Linux/IA64 has the IA64 machine vector functionality which allows the kernel to switch implementations (e.g. initialization, ipi, dma api...) depending on executing platform. We can replace some implementations very easily defining a new machine vector. Thus another approach for virtualization support would be enhancing the machine vector functionality. But paravirt_ops approach was taken because - virtualization support needs wider support than machine vector does. e.g. low level instruction paravirtualization. It must be initialized very early before platform detection. - virtualization support needs more functionality like binary patch. Probably the calling overhead might not be very large compared to the emulation overhead of virtualization. However in the native case, the overhead should be eliminated completely. A single kernel binary should run on each environment including native, and the overhead of paravirt_ops on native environment should be as small as possible. - for full virtualization technology, e.g. KVM/IA64 or Xen/IA64 HVM domain, the result would be (the emulated platform machine vector. probably dig) + (pv_ops). This means that the virtualization support layer should be under the machine vector layer. Possibly it might be better to move some function pointers from paravirt_ops to machine vector. In fact, Xen domU case utilizes both pv_ops and machine vector. IA64 paravirt_ops ----------------- In this section, the concrete paravirt_ops will be discussed. Because of the architecture difference between ia64 and x86, the resulting set of functions is very different from x86 pv_ops. - C function pointer tables They are not very performance critical so that simple C indirect function call is acceptable. The following structures are defined at this moment. For details see linux/include/asm-ia64/paravirt.h - struct pv_info This structure describes the execution environment. - struct pv_init_ops This structure describes the various initialization hooks. - struct pv_iosapic_ops This structure describes hooks to iosapic operations. - struct pv_irq_ops This structure describes hooks to irq related operations - struct pv_time_op This structure describes hooks to steal time accounting. - a set of indirect calls which need optimization Currently this class of functions correspond to a subset of IA64 intrinsics. At this moment the optimization with binary patch isn't implemented yet. struct pv_cpu_op is defined. For details see linux/include/asm-ia64/paravirt_privop.h Mostly they correspond to ia64 intrinsics 1-to-1. Caveat: Now they are defined as C indirect function pointers, but in order to support binary patch optimization, they will be changed using GCC extended inline assembly code. - a set of macros for hand written assembly code (.S files) For maintenance purpose, the taken approach for .S files is single source code and compile multiple times with different macros definitions. Each pv_ops instance must define those macros to compile. The important thing here is that sensitive, but non-privileged instructions must be paravirtualized and that some privileged instructions also need paravirtualization for reasonable performance. Developers who modify .S files must be aware of that. At this moment an easy checker is implemented to detect paravirtualization breakage. But it doesn't cover all the cases. Sometimes this set of macros is called pv_cpu_asm_op. But there is no corresponding structure in the source code. Those macros mostly 1:1 correspond to a subset of privileged instructions. See linux/include/asm-ia64/native/inst.h. And some functions written in assembly also need to be overrided so that each pv_ops instance have to define some macros. Again see linux/include/asm-ia64/native/inst.h. Those structures must be initialized very early before start_kernel. Probably initialized in head.S using multi entry point or some other trick. For native case implementation see linux/arch/ia64/kernel/paravirt.c.
arch/ia64/Makefile +6 −0 Original line number Diff line number Diff line Loading @@ -100,3 +100,9 @@ define archhelp echo ' boot - Build vmlinux and bootloader for Ski simulator' echo '* unwcheck - Check vmlinux for invalid unwind info' endef archprepare: make_nr_irqs_h FORCE PHONY += make_nr_irqs_h FORCE make_nr_irqs_h: FORCE $(Q)$(MAKE) $(build)=arch/ia64/kernel include/asm-ia64/nr-irqs.h
arch/ia64/kernel/Makefile +44 −0 Original line number Diff line number Diff line Loading @@ -36,6 +36,8 @@ obj-$(CONFIG_PCI_MSI) += msi_ia64.o mca_recovery-y += mca_drv.o mca_drv_asm.o obj-$(CONFIG_IA64_MC_ERR_INJECT)+= err_inject.o obj-$(CONFIG_PARAVIRT) += paravirt.o paravirtentry.o obj-$(CONFIG_IA64_ESI) += esi.o ifneq ($(CONFIG_IA64_ESI),) obj-y += esi_stub.o # must be in kernel proper Loading Loading @@ -70,3 +72,45 @@ $(obj)/gate-syms.o: $(obj)/gate.lds $(obj)/gate.o FORCE # We must build gate.so before we can assemble it. # Note: kbuild does not track this dependency due to usage of .incbin $(obj)/gate-data.o: $(obj)/gate.so # Calculate NR_IRQ = max(IA64_NATIVE_NR_IRQS, XEN_NR_IRQS, ...) based on config define sed-y "/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}" endef quiet_cmd_nr_irqs = GEN $@ define cmd_nr_irqs (set -e; \ echo "#ifndef __ASM_NR_IRQS_H__"; \ echo "#define __ASM_NR_IRQS_H__"; \ echo "/*"; \ echo " * DO NOT MODIFY."; \ echo " *"; \ echo " * This file was generated by Kbuild"; \ echo " *"; \ echo " */"; \ echo ""; \ sed -ne $(sed-y) $<; \ echo ""; \ echo "#endif" ) > $@ endef # We use internal kbuild rules to avoid the "is up to date" message from make arch/$(SRCARCH)/kernel/nr-irqs.s: $(srctree)/arch/$(SRCARCH)/kernel/nr-irqs.c \ $(wildcard $(srctree)/include/asm-ia64/*/irq.h) $(Q)mkdir -p $(dir $@) $(call if_changed_dep,cc_s_c) include/asm-ia64/nr-irqs.h: arch/$(SRCARCH)/kernel/nr-irqs.s $(Q)mkdir -p $(dir $@) $(call cmd,nr_irqs) clean-files += $(objtree)/include/asm-ia64/nr-irqs.h # # native ivt.S and entry.S # ASM_PARAVIRT_OBJS = ivt.o entry.o define paravirtualized_native AFLAGS_$(1) += -D__IA64_ASM_PARAVIRTUALIZED_NATIVE endef $(foreach obj,$(ASM_PARAVIRT_OBJS),$(eval $(call paravirtualized_native,$(obj))))
arch/ia64/kernel/entry.S +72 −43 Original line number Diff line number Diff line Loading @@ -22,6 +22,11 @@ * Patrick O'Rourke <orourke@missioncriticallinux.com> * 11/07/2000 */ /* * Copyright (c) 2008 Isaku Yamahata <yamahata at valinux co jp> * VA Linux Systems Japan K.K. * pv_ops. */ /* * Global (preserved) predicate usage on syscall entry/exit path: * Loading @@ -45,6 +50,7 @@ #include "minstate.h" #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* * execve() is special because in case of success, we need to * setup a null register window frame. Loading Loading @@ -173,6 +179,7 @@ GLOBAL_ENTRY(sys_clone) mov rp=loc0 br.ret.sptk.many rp END(sys_clone) #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ /* * prev_task <- ia64_switch_to(struct task_struct *next) Loading @@ -180,7 +187,7 @@ END(sys_clone) * called. The code starting at .map relies on this. The rest of the code * doesn't care about the interrupt masking status. */ GLOBAL_ENTRY(ia64_switch_to) GLOBAL_ENTRY(__paravirt_switch_to) .prologue alloc r16=ar.pfs,1,0,0,0 DO_SAVE_SWITCH_STACK Loading @@ -204,7 +211,7 @@ GLOBAL_ENTRY(ia64_switch_to) ;; .done: ld8 sp=[r21] // load kernel stack pointer of new task mov IA64_KR(CURRENT)=in0 // update "current" application register MOV_TO_KR(CURRENT, in0, r8, r9) // update "current" application register mov r8=r13 // return pointer to previously running task mov r13=in0 // set "current" pointer ;; Loading @@ -216,26 +223,25 @@ GLOBAL_ENTRY(ia64_switch_to) br.ret.sptk.many rp // boogie on out in new context .map: rsm psr.ic // interrupts (psr.i) are already disabled here RSM_PSR_IC(r25) // interrupts (psr.i) are already disabled here movl r25=PAGE_KERNEL ;; srlz.d or r23=r25,r20 // construct PA | page properties mov r25=IA64_GRANULE_SHIFT<<2 ;; mov cr.itir=r25 mov cr.ifa=in0 // VA of next task... MOV_TO_ITIR(p0, r25, r8) MOV_TO_IFA(in0, r8) // VA of next task... ;; mov r25=IA64_TR_CURRENT_STACK mov IA64_KR(CURRENT_STACK)=r26 // remember last page we mapped... MOV_TO_KR(CURRENT_STACK, r26, r8, r9) // remember last page we mapped... ;; itr.d dtr[r25]=r23 // wire in new mapping... ssm psr.ic // reenable the psr.ic bit ;; srlz.d SSM_PSR_IC_AND_SRLZ_D(r8, r9) // reenable the psr.ic bit br.cond.sptk .done END(ia64_switch_to) END(__paravirt_switch_to) #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE /* * Note that interrupts are enabled during save_switch_stack and load_switch_stack. This * means that we may get an interrupt with "sp" pointing to the new kernel stack while Loading Loading @@ -375,7 +381,7 @@ END(save_switch_stack) * - b7 holds address to return to * - must not touch r8-r11 */ ENTRY(load_switch_stack) GLOBAL_ENTRY(load_switch_stack) .prologue .altrp b7 Loading Loading @@ -571,7 +577,7 @@ GLOBAL_ENTRY(ia64_trace_syscall) .ret3: (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk (pUStk) rsm psr.i // disable interrupts br.cond.sptk .work_pending_syscall_end br.cond.sptk ia64_work_pending_syscall_end strace_error: ld8 r3=[r2] // load pt_regs.r8 Loading Loading @@ -636,8 +642,17 @@ GLOBAL_ENTRY(ia64_ret_from_syscall) adds r2=PT(R8)+16,sp // r2 = &pt_regs.r8 mov r10=r0 // clear error indication in r10 (p7) br.cond.spnt handle_syscall_error // handle potential syscall failure #ifdef CONFIG_PARAVIRT ;; br.cond.sptk.few ia64_leave_syscall ;; #endif /* CONFIG_PARAVIRT */ END(ia64_ret_from_syscall) #ifndef CONFIG_PARAVIRT // fall through #endif #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ /* * ia64_leave_syscall(): Same as ia64_leave_kernel, except that it doesn't * need to switch to bank 0 and doesn't restore the scratch registers. Loading Loading @@ -682,7 +697,7 @@ END(ia64_ret_from_syscall) * ar.csd: cleared * ar.ssd: cleared */ ENTRY(ia64_leave_syscall) GLOBAL_ENTRY(__paravirt_leave_syscall) PT_REGS_UNWIND_INFO(0) /* * work.need_resched etc. mustn't get changed by this CPU before it returns to Loading @@ -692,11 +707,11 @@ ENTRY(ia64_leave_syscall) * extra work. We always check for extra work when returning to user-level. * With CONFIG_PREEMPT, we also check for extra work when the preempt_count * is 0. After extra work processing has been completed, execution * resumes at .work_processed_syscall with p6 set to 1 if the extra-work-check * resumes at ia64_work_processed_syscall with p6 set to 1 if the extra-work-check * needs to be redone. */ #ifdef CONFIG_PREEMPT rsm psr.i // disable interrupts RSM_PSR_I(p0, r2, r18) // disable interrupts cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; Loading @@ -706,11 +721,12 @@ ENTRY(ia64_leave_syscall) ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) #else /* !CONFIG_PREEMPT */ (pUStk) rsm psr.i RSM_PSR_I(pUStk, r2, r18) cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk #endif .work_processed_syscall: .global __paravirt_work_processed_syscall; __paravirt_work_processed_syscall: #ifdef CONFIG_VIRT_CPU_ACCOUNTING adds r2=PT(LOADRS)+16,r12 (pUStk) mov.m r22=ar.itc // fetch time at leave Loading Loading @@ -744,7 +760,7 @@ ENTRY(ia64_leave_syscall) (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE! ;; invala // M0|1 invalidate ALAT rsm psr.i | psr.ic // M2 turn off interrupts and interruption collection RSM_PSR_I_IC(r28, r29, r30) // M2 turn off interrupts and interruption collection cmp.eq p9,p0=r0,r0 // A set p9 to indicate that we should restore cr.ifs ld8 r29=[r2],16 // M0|1 load cr.ipsr Loading @@ -765,7 +781,7 @@ ENTRY(ia64_leave_syscall) ;; #endif ld8 r26=[r2],PT(B0)-PT(AR_PFS) // M0|1 load ar.pfs (pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled MOV_FROM_PSR(pKStk, r22, r21) // M2 read PSR now that interrupts are disabled nop 0 ;; ld8 r21=[r2],PT(AR_RNAT)-PT(B0) // M0|1 load b0 Loading Loading @@ -798,7 +814,7 @@ ENTRY(ia64_leave_syscall) srlz.d // M0 ensure interruption collection is off (for cover) shr.u r18=r19,16 // I0|1 get byte size of existing "dirty" partition cover // B add current frame into dirty partition & set cr.ifs COVER // B add current frame into dirty partition & set cr.ifs ;; #ifdef CONFIG_VIRT_CPU_ACCOUNTING mov r19=ar.bsp // M2 get new backing store pointer Loading @@ -823,8 +839,9 @@ ENTRY(ia64_leave_syscall) mov.m ar.ssd=r0 // M2 clear ar.ssd mov f11=f0 // F clear f11 br.cond.sptk.many rbs_switch // B END(ia64_leave_syscall) END(__paravirt_leave_syscall) #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE #ifdef CONFIG_IA32_SUPPORT GLOBAL_ENTRY(ia64_ret_from_ia32_execve) PT_REGS_UNWIND_INFO(0) Loading @@ -835,10 +852,20 @@ GLOBAL_ENTRY(ia64_ret_from_ia32_execve) st8.spill [r2]=r8 // store return value in slot for r8 and set unat bit .mem.offset 8,0 st8.spill [r3]=r0 // clear error indication in slot for r10 and set unat bit #ifdef CONFIG_PARAVIRT ;; // don't fall through, ia64_leave_kernel may be #define'd br.cond.sptk.few ia64_leave_kernel ;; #endif /* CONFIG_PARAVIRT */ END(ia64_ret_from_ia32_execve) #ifndef CONFIG_PARAVIRT // fall through #endif #endif /* CONFIG_IA32_SUPPORT */ GLOBAL_ENTRY(ia64_leave_kernel) #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */ GLOBAL_ENTRY(__paravirt_leave_kernel) PT_REGS_UNWIND_INFO(0) /* * work.need_resched etc. mustn't get changed by this CPU before it returns to Loading @@ -852,7 +879,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) * needs to be redone. */ #ifdef CONFIG_PREEMPT rsm psr.i // disable interrupts RSM_PSR_I(p0, r17, r31) // disable interrupts cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; Loading @@ -862,7 +889,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) ;; cmp.eq p6,p0=r21,r0 // p6 <- pUStk || (preempt_count == 0) #else (pUStk) rsm psr.i RSM_PSR_I(pUStk, r17, r31) cmp.eq p0,pLvSys=r0,r0 // pLvSys=0: leave from kernel (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk #endif Loading Loading @@ -910,7 +937,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) mov ar.csd=r30 mov ar.ssd=r31 ;; rsm psr.i | psr.ic // initiate turning off of interrupt and interruption collection RSM_PSR_I_IC(r23, r22, r25) // initiate turning off of interrupt and interruption collection invala // invalidate ALAT ;; ld8.fill r22=[r2],24 Loading Loading @@ -942,7 +969,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) mov ar.ccv=r15 ;; ldf.fill f11=[r2] bsw.0 // switch back to bank 0 (no stop bit required beforehand...) BSW_0(r2, r3, r15) // switch back to bank 0 (no stop bit required beforehand...) ;; (pUStk) mov r18=IA64_KR(CURRENT)// M2 (12 cycle read latency) adds r16=PT(CR_IPSR)+16,r12 Loading @@ -950,12 +977,12 @@ GLOBAL_ENTRY(ia64_leave_kernel) #ifdef CONFIG_VIRT_CPU_ACCOUNTING .pred.rel.mutex pUStk,pKStk (pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled MOV_FROM_PSR(pKStk, r22, r29) // M2 read PSR now that interrupts are disabled (pUStk) mov.m r22=ar.itc // M fetch time at leave nop.i 0 ;; #else (pKStk) mov r22=psr // M2 read PSR now that interrupts are disabled MOV_FROM_PSR(pKStk, r22, r29) // M2 read PSR now that interrupts are disabled nop.i 0 nop.i 0 ;; Loading Loading @@ -1027,7 +1054,7 @@ GLOBAL_ENTRY(ia64_leave_kernel) * NOTE: alloc, loadrs, and cover can't be predicated. */ (pNonSys) br.cond.dpnt dont_preserve_current_frame cover // add current frame into dirty partition and set cr.ifs COVER // add current frame into dirty partition and set cr.ifs ;; mov r19=ar.bsp // get new backing store pointer rbs_switch: Loading Loading @@ -1130,16 +1157,16 @@ skip_rbs_switch: (pKStk) dep r29=r22,r29,21,1 // I0 update ipsr.pp with psr.pp (pLvSys)mov r16=r0 // A clear r16 for leave_syscall, no-op otherwise ;; mov cr.ipsr=r29 // M2 MOV_TO_IPSR(p0, r29, r25) // M2 mov ar.pfs=r26 // I0 (pLvSys)mov r17=r0 // A clear r17 for leave_syscall, no-op otherwise (p9) mov cr.ifs=r30 // M2 MOV_TO_IFS(p9, r30, r25)// M2 mov b0=r21 // I0 (pLvSys)mov r18=r0 // A clear r18 for leave_syscall, no-op otherwise mov ar.fpsr=r20 // M2 mov cr.iip=r28 // M2 MOV_TO_IIP(r28, r25) // M2 nop 0 ;; (pUStk) mov ar.rnat=r24 // M2 must happen with RSE in lazy mode Loading @@ -1148,7 +1175,7 @@ skip_rbs_switch: mov ar.rsc=r27 // M2 mov pr=r31,-1 // I0 rfi // B RFI // B /* * On entry: Loading @@ -1174,35 +1201,36 @@ skip_rbs_switch: ;; (pKStk) st4 [r20]=r21 #endif ssm psr.i // enable interrupts SSM_PSR_I(p0, p6, r2) // enable interrupts br.call.spnt.many rp=schedule .ret9: cmp.eq p6,p0=r0,r0 // p6 <- 1 (re-check) rsm psr.i // disable interrupts RSM_PSR_I(p0, r2, r20) // disable interrupts ;; #ifdef CONFIG_PREEMPT (pKStk) adds r20=TI_PRE_COUNT+IA64_TASK_SIZE,r13 ;; (pKStk) st4 [r20]=r0 // preempt_count() <- 0 #endif (pLvSys)br.cond.sptk.few .work_pending_syscall_end (pLvSys)br.cond.sptk.few __paravirt_pending_syscall_end br.cond.sptk.many .work_processed_kernel .notify: (pUStk) br.call.spnt.many rp=notify_resume_user .ret10: cmp.ne p6,p0=r0,r0 // p6 <- 0 (don't re-check) (pLvSys)br.cond.sptk.few .work_pending_syscall_end (pLvSys)br.cond.sptk.few __paravirt_pending_syscall_end br.cond.sptk.many .work_processed_kernel .work_pending_syscall_end: .global __paravirt_pending_syscall_end; __paravirt_pending_syscall_end: adds r2=PT(R8)+16,r12 adds r3=PT(R10)+16,r12 ;; ld8 r8=[r2] ld8 r10=[r3] br.cond.sptk.many .work_processed_syscall END(ia64_leave_kernel) br.cond.sptk.many __paravirt_work_processed_syscall_target END(__paravirt_leave_kernel) #ifdef __IA64_ASM_PARAVIRTUALIZED_NATIVE ENTRY(handle_syscall_error) /* * Some system calls (e.g., ptrace, mmap) can return arbitrary values which could Loading Loading @@ -1244,7 +1272,7 @@ END(ia64_invoke_schedule_tail) * We declare 8 input registers so the system call args get preserved, * in case we need to restart a system call. */ ENTRY(notify_resume_user) GLOBAL_ENTRY(notify_resume_user) .prologue ASM_UNW_PRLG_RP|ASM_UNW_PRLG_PFS, ASM_UNW_PRLG_GRSAVE(8) alloc loc1=ar.pfs,8,2,3,0 // preserve all eight input regs in case of syscall restart! mov r9=ar.unat Loading Loading @@ -1306,7 +1334,7 @@ ENTRY(sys_rt_sigreturn) adds sp=16,sp ;; ld8 r9=[sp] // load new ar.unat mov.sptk b7=r8,ia64_leave_kernel mov.sptk b7=r8,ia64_native_leave_kernel ;; mov ar.unat=r9 br.many b7 Loading Loading @@ -1665,3 +1693,4 @@ sys_call_table: data8 sys_timerfd_gettime .org sys_call_table + 8*NR_syscalls // guard against failures to increase NR_syscalls #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */
arch/ia64/kernel/head.S +41 −0 Original line number Diff line number Diff line Loading @@ -26,11 +26,14 @@ #include <asm/mmu_context.h> #include <asm/asm-offsets.h> #include <asm/pal.h> #include <asm/paravirt.h> #include <asm/pgtable.h> #include <asm/processor.h> #include <asm/ptrace.h> #include <asm/system.h> #include <asm/mca_asm.h> #include <linux/init.h> #include <linux/linkage.h> #ifdef CONFIG_HOTPLUG_CPU #define SAL_PSR_BITS_TO_SET \ Loading Loading @@ -367,6 +370,44 @@ start_ap: ;; (isBP) st8 [r2]=r28 // save the address of the boot param area passed by the bootloader #ifdef CONFIG_PARAVIRT movl r14=hypervisor_setup_hooks movl r15=hypervisor_type mov r16=num_hypervisor_hooks ;; ld8 r2=[r15] ;; cmp.ltu p7,p0=r2,r16 // array size check shladd r8=r2,3,r14 ;; (p7) ld8 r9=[r8] ;; (p7) mov b1=r9 (p7) cmp.ne.unc p7,p0=r9,r0 // no actual branch to NULL ;; (p7) br.call.sptk.many rp=b1 __INITDATA default_setup_hook = 0 // Currently nothing needs to be done. .weak xen_setup_hook .global hypervisor_type hypervisor_type: data8 PARAVIRT_HYPERVISOR_TYPE_DEFAULT // must have the same order with PARAVIRT_HYPERVISOR_TYPE_xxx hypervisor_setup_hooks: data8 default_setup_hook data8 xen_setup_hook num_hypervisor_hooks = (. - hypervisor_setup_hooks) / 8 .previous #endif #ifdef CONFIG_SMP (isAP) br.call.sptk.many rp=start_secondary .ret0: Loading