Loading Documentation/tracers/mmiotrace.txt 0 → 100644 +164 −0 Original line number Diff line number Diff line In-kernel memory-mapped I/O tracing Home page and links to optional user space tools: http://nouveau.freedesktop.org/wiki/MmioTrace MMIO tracing was originally developed by Intel around 2003 for their Fault Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel, Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau project in mind. Since then many people have contributed. Mmiotrace was built for reverse engineering any memory-mapped IO device with the Nouveau project as the first real user. Only x86 and x86_64 architectures are supported. Out-of-tree mmiotrace was originally modified for mainline inclusion and ftrace framework by Pekka Paalanen <pq@iki.fi>. Preparation ----------- Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is disabled by default, so it is safe to have this set to yes. SMP systems are supported, but tracing is unreliable and may miss events if more than one CPU is on-line, therefore mmiotrace takes all but one CPU off-line during run-time activation. You can re-enable CPUs by hand, but you have been warned, there is no way to automatically detect if you are losing events due to CPUs racing. Usage Quick Reference --------------------- $ mount -t debugfs debugfs /debug $ echo mmiotrace > /debug/tracing/current_tracer $ cat /debug/tracing/trace_pipe > mydump.txt & Start X or whatever. $ echo "X is up" > /debug/tracing/marker $ echo none > /debug/tracing/current_tracer Check for lost events. Usage ----- Make sure debugfs is mounted to /debug. If not, (requires root privileges) $ mount -t debugfs debugfs /debug Check that the driver you are about to trace is not loaded. Activate mmiotrace (requires root privileges): $ echo mmiotrace > /debug/tracing/current_tracer Start storing the trace: $ cat /debug/tracing/trace_pipe > mydump.txt & The 'cat' process should stay running (sleeping) in the background. Load the driver you want to trace and use it. Mmiotrace will only catch MMIO accesses to areas that are ioremapped while mmiotrace is active. [Unimplemented feature:] During tracing you can place comments (markers) into the trace by $ echo "X is up" > /debug/tracing/marker This makes it easier to see which part of the (huge) trace corresponds to which action. It is recommended to place descriptive markers about what you do. Shut down mmiotrace (requires root privileges): $ echo none > /debug/tracing/current_tracer The 'cat' process exits. If it does not, kill it by issuing 'fg' command and pressing ctrl+c. Check that mmiotrace did not lose events due to a buffer filling up. Either $ grep -i lost mydump.txt which tells you exactly how many events were lost, or use $ dmesg to view your kernel log and look for "mmiotrace has lost events" warning. If events were lost, the trace is incomplete. You should enlarge the buffers and try again. Buffers are enlarged by first seeing how large the current buffers are: $ cat /debug/tracing/trace_entries gives you a number. Approximately double this number and write it back, for instance: $ echo 128000 > /debug/tracing/trace_entries Then start again from the top. If you are doing a trace for a driver project, e.g. Nouveau, you should also do the following before sending your results: $ lspci -vvv > lspci.txt $ dmesg > dmesg.txt $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt and then send the .tar.gz file. The trace compresses considerably. Replace "pciid" and "nick" with the PCI ID or model name of your piece of hardware under investigation and your nick name. How Mmiotrace Works ------------------- Access to hardware IO-memory is gained by mapping addresses from PCI bus by calling one of the ioremap_*() functions. Mmiotrace is hooked into the __ioremap() function and gets called whenever a mapping is created. Mapping is an event that is recorded into the trace log. Note, that ISA range mappings are not caught, since the mapping always exists and is returned directly. MMIO accesses are recorded via page faults. Just before __ioremap() returns, the mapped pages are marked as not present. Any access to the pages causes a fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace marks the page present, sets TF flag to achieve single stepping and exits the fault handler. The instruction that faulted is executed and debug trap is entered. Here mmiotrace again marks the page as not present. The instruction is decoded to get the type of operation (read/write), data width and the value read or written. These are stored to the trace log. Setting the page present in the page fault handler has a race condition on SMP machines. During the single stepping other CPUs may run freely on that page and events can be missed without a notice. Re-enabling other CPUs during tracing is discouraged. Trace Log Format ---------------- The raw log is text and easily filtered with e.g. grep and awk. One record is one line in the log. A record starts with a keyword, followed by keyword dependant arguments. Arguments are separated by a space, or continue until the end of line. The format for version 20070824 is as follows: Explanation Keyword Space separated arguments --------------------------------------------------------------------------- read event R width, timestamp, map id, physical, value, PC, PID write event W width, timestamp, map id, physical, value, PC, PID ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID iounmap event UNMAP timestamp, map id, PC, PID marker MARK timestamp, text version VERSION the string "20070824" info for reader LSPCI one line from lspci -v PCI address map PCIDEV space separated /proc/bus/pci/devices data unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual is a kernel virtual address. Width is the data width in bytes and value is the data value. Map id is an arbitrary id number identifying the mapping that was used in an operation. PC is the program counter and PID is process id. PC is zero if it is not recorded. PID is always zero as tracing MMIO accesses originating in user space memory is not yet supported. For instance, the following awk filter will pass all 32-bit writes that target physical addresses in the range [0xfb73ce40, 0xfb800000[ $ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 && adr < 0xfb800000) print; }' Tools for Developers -------------------- The user space tools include utilities for: - replacing numeric addresses and values with hardware register names - replaying MMIO logs, i.e., re-executing the recorded writes arch/x86/Kconfig.debug +26 −6 Original line number Diff line number Diff line Loading @@ -172,13 +172,33 @@ config IOMMU_LEAK Add a simple leak tracer to the IOMMU code. This is useful when you are debugging a buggy device driver that leaks IOMMU mappings. config PAGE_FAULT_HANDLERS bool "Custom page fault handlers" depends on DEBUG_KERNEL config MMIOTRACE_HOOKS bool config MMIOTRACE bool "Memory mapped IO tracing" depends on DEBUG_KERNEL && PCI select TRACING select MMIOTRACE_HOOKS default y help Mmiotrace traces Memory Mapped I/O access and is meant for debugging and reverse engineering. It is called from the ioremap implementation and works via page faults. Tracing is disabled by default and can be enabled at run-time. See Documentation/tracers/mmiotrace.txt. If you are not helping to develop drivers, say N. config MMIOTRACE_TEST tristate "Test module for mmiotrace" depends on MMIOTRACE && m help Allow the use of custom page fault handlers. A kernel module may register a function that is called on every page fault. Custom handlers are used by some debugging and reverse engineering tools. This is a dumb module for testing mmiotrace. It is very dangerous as it will write garbage to IO memory starting at a given address. However, it should be safe to use on e.g. unused portion of VRAM. Say N, unless you absolutely know what you are doing. # # IO delay types: Loading arch/x86/mm/Makefile +5 −0 Original line number Diff line number Diff line Loading @@ -8,6 +8,11 @@ obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o obj-$(CONFIG_HIGHMEM) += highmem_32.o obj-$(CONFIG_MMIOTRACE_HOOKS) += kmmio.o obj-$(CONFIG_MMIOTRACE) += mmiotrace.o mmiotrace-y := pf_in.o mmio-mod.o obj-$(CONFIG_MMIOTRACE_TEST) += testmmiotrace.o ifeq ($(CONFIG_X86_32),y) obj-$(CONFIG_NUMA) += discontig_32.o else Loading arch/x86/mm/fault.c +7 −50 Original line number Diff line number Diff line Loading @@ -10,6 +10,7 @@ #include <linux/string.h> #include <linux/types.h> #include <linux/ptrace.h> #include <linux/mmiotrace.h> #include <linux/mman.h> #include <linux/mm.h> #include <linux/smp.h> Loading Loading @@ -49,58 +50,14 @@ #define PF_RSVD (1<<3) #define PF_INSTR (1<<4) #ifdef CONFIG_PAGE_FAULT_HANDLERS static HLIST_HEAD(pf_handlers); /* protected by RCU */ static DEFINE_SPINLOCK(pf_handlers_writer); void register_page_fault_handler(struct pf_handler *new_pfh) { unsigned long flags; spin_lock_irqsave(&pf_handlers_writer, flags); hlist_add_head_rcu(&new_pfh->hlist, &pf_handlers); spin_unlock_irqrestore(&pf_handlers_writer, flags); } EXPORT_SYMBOL_GPL(register_page_fault_handler); /** * unregister_page_fault_handler: * The caller must ensure @old_pfh is not in use anymore before freeing it. * This function does not guarantee it. The list of handlers is protected by * RCU, so you can do this by e.g. calling synchronize_rcu(). */ void unregister_page_fault_handler(struct pf_handler *old_pfh) static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr) { unsigned long flags; spin_lock_irqsave(&pf_handlers_writer, flags); hlist_del_rcu(&old_pfh->hlist); spin_unlock_irqrestore(&pf_handlers_writer, flags); } EXPORT_SYMBOL_GPL(unregister_page_fault_handler); #ifdef CONFIG_MMIOTRACE_HOOKS if (unlikely(is_kmmio_active())) if (kmmio_handler(regs, addr) == 1) return -1; #endif /* returns non-zero if do_page_fault() should return */ static int handle_custom_pf(struct pt_regs *regs, unsigned long error_code, unsigned long address) { #ifdef CONFIG_PAGE_FAULT_HANDLERS int ret = 0; struct pf_handler *cur; struct hlist_node *ncur; if (hlist_empty(&pf_handlers)) return 0; rcu_read_lock(); hlist_for_each_entry_rcu(cur, ncur, &pf_handlers, hlist) { ret = cur->handler(regs, error_code, address); if (ret) break; } rcu_read_unlock(); return ret; #else return 0; #endif } static inline int notify_page_fault(struct pt_regs *regs) Loading Loading @@ -660,7 +617,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code) if (notify_page_fault(regs)) return; if (handle_custom_pf(regs, error_code, address)) if (unlikely(kmmio_fault(regs, address))) return; /* Loading arch/x86/mm/ioremap.c +10 −1 Original line number Diff line number Diff line Loading @@ -12,6 +12,7 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/mmiotrace.h> #include <asm/cacheflush.h> #include <asm/e820.h> Loading Loading @@ -122,10 +123,13 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, { unsigned long pfn, offset, vaddr; resource_size_t last_addr; const resource_size_t unaligned_phys_addr = phys_addr; const unsigned long unaligned_size = size; struct vm_struct *area; unsigned long new_prot_val; pgprot_t prot; int retval; void __iomem *ret_addr; /* Don't allow wraparound or zero size */ last_addr = phys_addr + size - 1; Loading Loading @@ -233,7 +237,10 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, return NULL; } return (void __iomem *) (vaddr + offset); ret_addr = (void __iomem *) (vaddr + offset); mmiotrace_ioremap(unaligned_phys_addr, unaligned_size, ret_addr); return ret_addr; } /** Loading Loading @@ -325,6 +332,8 @@ void iounmap(volatile void __iomem *addr) addr = (volatile void __iomem *) (PAGE_MASK & (unsigned long __force)addr); mmiotrace_iounmap(addr); /* Use the vm area unlocked, assuming the caller ensures there isn't another iounmap for the same address in parallel. Reuse of the virtual address is prevented by Loading Loading
Documentation/tracers/mmiotrace.txt 0 → 100644 +164 −0 Original line number Diff line number Diff line In-kernel memory-mapped I/O tracing Home page and links to optional user space tools: http://nouveau.freedesktop.org/wiki/MmioTrace MMIO tracing was originally developed by Intel around 2003 for their Fault Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel, Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau project in mind. Since then many people have contributed. Mmiotrace was built for reverse engineering any memory-mapped IO device with the Nouveau project as the first real user. Only x86 and x86_64 architectures are supported. Out-of-tree mmiotrace was originally modified for mainline inclusion and ftrace framework by Pekka Paalanen <pq@iki.fi>. Preparation ----------- Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is disabled by default, so it is safe to have this set to yes. SMP systems are supported, but tracing is unreliable and may miss events if more than one CPU is on-line, therefore mmiotrace takes all but one CPU off-line during run-time activation. You can re-enable CPUs by hand, but you have been warned, there is no way to automatically detect if you are losing events due to CPUs racing. Usage Quick Reference --------------------- $ mount -t debugfs debugfs /debug $ echo mmiotrace > /debug/tracing/current_tracer $ cat /debug/tracing/trace_pipe > mydump.txt & Start X or whatever. $ echo "X is up" > /debug/tracing/marker $ echo none > /debug/tracing/current_tracer Check for lost events. Usage ----- Make sure debugfs is mounted to /debug. If not, (requires root privileges) $ mount -t debugfs debugfs /debug Check that the driver you are about to trace is not loaded. Activate mmiotrace (requires root privileges): $ echo mmiotrace > /debug/tracing/current_tracer Start storing the trace: $ cat /debug/tracing/trace_pipe > mydump.txt & The 'cat' process should stay running (sleeping) in the background. Load the driver you want to trace and use it. Mmiotrace will only catch MMIO accesses to areas that are ioremapped while mmiotrace is active. [Unimplemented feature:] During tracing you can place comments (markers) into the trace by $ echo "X is up" > /debug/tracing/marker This makes it easier to see which part of the (huge) trace corresponds to which action. It is recommended to place descriptive markers about what you do. Shut down mmiotrace (requires root privileges): $ echo none > /debug/tracing/current_tracer The 'cat' process exits. If it does not, kill it by issuing 'fg' command and pressing ctrl+c. Check that mmiotrace did not lose events due to a buffer filling up. Either $ grep -i lost mydump.txt which tells you exactly how many events were lost, or use $ dmesg to view your kernel log and look for "mmiotrace has lost events" warning. If events were lost, the trace is incomplete. You should enlarge the buffers and try again. Buffers are enlarged by first seeing how large the current buffers are: $ cat /debug/tracing/trace_entries gives you a number. Approximately double this number and write it back, for instance: $ echo 128000 > /debug/tracing/trace_entries Then start again from the top. If you are doing a trace for a driver project, e.g. Nouveau, you should also do the following before sending your results: $ lspci -vvv > lspci.txt $ dmesg > dmesg.txt $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt and then send the .tar.gz file. The trace compresses considerably. Replace "pciid" and "nick" with the PCI ID or model name of your piece of hardware under investigation and your nick name. How Mmiotrace Works ------------------- Access to hardware IO-memory is gained by mapping addresses from PCI bus by calling one of the ioremap_*() functions. Mmiotrace is hooked into the __ioremap() function and gets called whenever a mapping is created. Mapping is an event that is recorded into the trace log. Note, that ISA range mappings are not caught, since the mapping always exists and is returned directly. MMIO accesses are recorded via page faults. Just before __ioremap() returns, the mapped pages are marked as not present. Any access to the pages causes a fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace marks the page present, sets TF flag to achieve single stepping and exits the fault handler. The instruction that faulted is executed and debug trap is entered. Here mmiotrace again marks the page as not present. The instruction is decoded to get the type of operation (read/write), data width and the value read or written. These are stored to the trace log. Setting the page present in the page fault handler has a race condition on SMP machines. During the single stepping other CPUs may run freely on that page and events can be missed without a notice. Re-enabling other CPUs during tracing is discouraged. Trace Log Format ---------------- The raw log is text and easily filtered with e.g. grep and awk. One record is one line in the log. A record starts with a keyword, followed by keyword dependant arguments. Arguments are separated by a space, or continue until the end of line. The format for version 20070824 is as follows: Explanation Keyword Space separated arguments --------------------------------------------------------------------------- read event R width, timestamp, map id, physical, value, PC, PID write event W width, timestamp, map id, physical, value, PC, PID ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID iounmap event UNMAP timestamp, map id, PC, PID marker MARK timestamp, text version VERSION the string "20070824" info for reader LSPCI one line from lspci -v PCI address map PCIDEV space separated /proc/bus/pci/devices data unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual is a kernel virtual address. Width is the data width in bytes and value is the data value. Map id is an arbitrary id number identifying the mapping that was used in an operation. PC is the program counter and PID is process id. PC is zero if it is not recorded. PID is always zero as tracing MMIO accesses originating in user space memory is not yet supported. For instance, the following awk filter will pass all 32-bit writes that target physical addresses in the range [0xfb73ce40, 0xfb800000[ $ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 && adr < 0xfb800000) print; }' Tools for Developers -------------------- The user space tools include utilities for: - replacing numeric addresses and values with hardware register names - replaying MMIO logs, i.e., re-executing the recorded writes
arch/x86/Kconfig.debug +26 −6 Original line number Diff line number Diff line Loading @@ -172,13 +172,33 @@ config IOMMU_LEAK Add a simple leak tracer to the IOMMU code. This is useful when you are debugging a buggy device driver that leaks IOMMU mappings. config PAGE_FAULT_HANDLERS bool "Custom page fault handlers" depends on DEBUG_KERNEL config MMIOTRACE_HOOKS bool config MMIOTRACE bool "Memory mapped IO tracing" depends on DEBUG_KERNEL && PCI select TRACING select MMIOTRACE_HOOKS default y help Mmiotrace traces Memory Mapped I/O access and is meant for debugging and reverse engineering. It is called from the ioremap implementation and works via page faults. Tracing is disabled by default and can be enabled at run-time. See Documentation/tracers/mmiotrace.txt. If you are not helping to develop drivers, say N. config MMIOTRACE_TEST tristate "Test module for mmiotrace" depends on MMIOTRACE && m help Allow the use of custom page fault handlers. A kernel module may register a function that is called on every page fault. Custom handlers are used by some debugging and reverse engineering tools. This is a dumb module for testing mmiotrace. It is very dangerous as it will write garbage to IO memory starting at a given address. However, it should be safe to use on e.g. unused portion of VRAM. Say N, unless you absolutely know what you are doing. # # IO delay types: Loading
arch/x86/mm/Makefile +5 −0 Original line number Diff line number Diff line Loading @@ -8,6 +8,11 @@ obj-$(CONFIG_X86_PTDUMP) += dump_pagetables.o obj-$(CONFIG_HIGHMEM) += highmem_32.o obj-$(CONFIG_MMIOTRACE_HOOKS) += kmmio.o obj-$(CONFIG_MMIOTRACE) += mmiotrace.o mmiotrace-y := pf_in.o mmio-mod.o obj-$(CONFIG_MMIOTRACE_TEST) += testmmiotrace.o ifeq ($(CONFIG_X86_32),y) obj-$(CONFIG_NUMA) += discontig_32.o else Loading
arch/x86/mm/fault.c +7 −50 Original line number Diff line number Diff line Loading @@ -10,6 +10,7 @@ #include <linux/string.h> #include <linux/types.h> #include <linux/ptrace.h> #include <linux/mmiotrace.h> #include <linux/mman.h> #include <linux/mm.h> #include <linux/smp.h> Loading Loading @@ -49,58 +50,14 @@ #define PF_RSVD (1<<3) #define PF_INSTR (1<<4) #ifdef CONFIG_PAGE_FAULT_HANDLERS static HLIST_HEAD(pf_handlers); /* protected by RCU */ static DEFINE_SPINLOCK(pf_handlers_writer); void register_page_fault_handler(struct pf_handler *new_pfh) { unsigned long flags; spin_lock_irqsave(&pf_handlers_writer, flags); hlist_add_head_rcu(&new_pfh->hlist, &pf_handlers); spin_unlock_irqrestore(&pf_handlers_writer, flags); } EXPORT_SYMBOL_GPL(register_page_fault_handler); /** * unregister_page_fault_handler: * The caller must ensure @old_pfh is not in use anymore before freeing it. * This function does not guarantee it. The list of handlers is protected by * RCU, so you can do this by e.g. calling synchronize_rcu(). */ void unregister_page_fault_handler(struct pf_handler *old_pfh) static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr) { unsigned long flags; spin_lock_irqsave(&pf_handlers_writer, flags); hlist_del_rcu(&old_pfh->hlist); spin_unlock_irqrestore(&pf_handlers_writer, flags); } EXPORT_SYMBOL_GPL(unregister_page_fault_handler); #ifdef CONFIG_MMIOTRACE_HOOKS if (unlikely(is_kmmio_active())) if (kmmio_handler(regs, addr) == 1) return -1; #endif /* returns non-zero if do_page_fault() should return */ static int handle_custom_pf(struct pt_regs *regs, unsigned long error_code, unsigned long address) { #ifdef CONFIG_PAGE_FAULT_HANDLERS int ret = 0; struct pf_handler *cur; struct hlist_node *ncur; if (hlist_empty(&pf_handlers)) return 0; rcu_read_lock(); hlist_for_each_entry_rcu(cur, ncur, &pf_handlers, hlist) { ret = cur->handler(regs, error_code, address); if (ret) break; } rcu_read_unlock(); return ret; #else return 0; #endif } static inline int notify_page_fault(struct pt_regs *regs) Loading Loading @@ -660,7 +617,7 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code) if (notify_page_fault(regs)) return; if (handle_custom_pf(regs, error_code, address)) if (unlikely(kmmio_fault(regs, address))) return; /* Loading
arch/x86/mm/ioremap.c +10 −1 Original line number Diff line number Diff line Loading @@ -12,6 +12,7 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/mmiotrace.h> #include <asm/cacheflush.h> #include <asm/e820.h> Loading Loading @@ -122,10 +123,13 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, { unsigned long pfn, offset, vaddr; resource_size_t last_addr; const resource_size_t unaligned_phys_addr = phys_addr; const unsigned long unaligned_size = size; struct vm_struct *area; unsigned long new_prot_val; pgprot_t prot; int retval; void __iomem *ret_addr; /* Don't allow wraparound or zero size */ last_addr = phys_addr + size - 1; Loading Loading @@ -233,7 +237,10 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, return NULL; } return (void __iomem *) (vaddr + offset); ret_addr = (void __iomem *) (vaddr + offset); mmiotrace_ioremap(unaligned_phys_addr, unaligned_size, ret_addr); return ret_addr; } /** Loading Loading @@ -325,6 +332,8 @@ void iounmap(volatile void __iomem *addr) addr = (volatile void __iomem *) (PAGE_MASK & (unsigned long __force)addr); mmiotrace_iounmap(addr); /* Use the vm area unlocked, assuming the caller ensures there isn't another iounmap for the same address in parallel. Reuse of the virtual address is prevented by Loading