Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 3f8e402f authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge branches 'tracing/branch-tracer', 'tracing/ftrace',...

Merge branches 'tracing/branch-tracer', 'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/tracepoints' and 'tracing/urgent' into tracing/core
Loading
Loading
Loading
Loading
+14 −0
Original line number Diff line number Diff line
@@ -70,6 +70,20 @@ a printk warning which identifies the inconsistency:

"Format mismatch for probe probe_name (format), marker (format)"

Another way to use markers is to simply define the marker without generating any
function call to actually call into the marker. This is useful in combination
with tracepoint probes in a scheme like this :

void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk);

DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name,
	"arg1 %u pid %d");

notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk)
{
	struct marker *marker = &GET_MARKER(kernel_irq_entry);
	/* write data to trace buffers ... */
}

* Probe / marker example

+53 −39
Original line number Diff line number Diff line
@@ -3,28 +3,30 @@
			    Mathieu Desnoyers


This document introduces Linux Kernel Tracepoints and their use. It provides
examples of how to insert tracepoints in the kernel and connect probe functions
to them and provides some examples of probe functions.
This document introduces Linux Kernel Tracepoints and their use. It
provides examples of how to insert tracepoints in the kernel and
connect probe functions to them and provides some examples of probe
functions.


* Purpose of tracepoints

A tracepoint placed in code provides a hook to call a function (probe) that you
can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or
"off" (no probe is attached). When a tracepoint is "off" it has no effect,
except for adding a tiny time penalty (checking a condition for a branch) and
space penalty (adding a few bytes for the function call at the end of the
instrumented function and adds a data structure in a separate section).  When a
tracepoint is "on", the function you provide is called each time the tracepoint
is executed, in the execution context of the caller. When the function provided
ends its execution, it returns to the caller (continuing from the tracepoint
site).
A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty
(checking a condition for a branch) and space penalty (adding a few
bytes for the function call at the end of the instrumented function
and adds a data structure in a separate section).  When a tracepoint
is "on", the function you provide is called each time the tracepoint
is executed, in the execution context of the caller. When the function
provided ends its execution, it returns to the caller (continuing from
the tracepoint site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters,
which prototypes are described in a tracepoint declaration placed in a header
file.
which prototypes are described in a tracepoint declaration placed in a
header file.

They can be used for tracing and performance accounting.

@@ -42,7 +44,7 @@ In include/trace/subsys.h :

#include <linux/tracepoint.h>

DEFINE_TRACE(subsys_eventname,
DECLARE_TRACE(subsys_eventname,
	TPPTOTO(int firstarg, struct task_struct *p),
	TPARGS(firstarg, p));

@@ -50,6 +52,8 @@ In subsys/file.c (where the tracing statement must be added) :

#include <trace/subsys.h>

DEFINE_TRACE(subsys_eventname);

void somefct(void)
{
	...
@@ -61,31 +65,41 @@ Where :
- subsys_eventname is an identifier unique to your event
    - subsys is the name of your subsystem.
    - eventname is the name of the event to trace.
- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the function
  called by this tracepoint.
- TPARGS(firstarg, p) are the parameters names, same as found in the prototype.

Connecting a function (probe) to a tracepoint is done by providing a probe
(function to call) for the specific tracepoint through
register_trace_subsys_eventname().  Removing a probe is done through
unregister_trace_subsys_eventname(); it will remove the probe sure there is no
caller left using the probe when it returns. Probe removal is preempt-safe
because preemption is disabled around the probe call. See the "Probe example"
section below for a sample probe module.

The tracepoint mechanism supports inserting multiple instances of the same
tracepoint, but a single definition must be made of a given tracepoint name over
all the kernel to make sure no type conflict will occur. Name mangling of the
tracepoints is done using the prototypes to make sure typing is correct.
Verification of probe type correctness is done at the registration site by the
compiler. Tracepoints can be put in inline functions, inlined static functions,
and unrolled loops as well as regular functions.

The naming scheme "subsys_event" is suggested here as a convention intended
to limit collisions. Tracepoint names are global to the kernel: they are
considered as being the same whether they are in the core kernel image or in
modules.
- TPPTOTO(int firstarg, struct task_struct *p) is the prototype of the
  function called by this tracepoint.

- TPARGS(firstarg, p) are the parameters names, same as found in the
  prototype.

Connecting a function (probe) to a tracepoint is done by providing a
probe (function to call) for the specific tracepoint through
register_trace_subsys_eventname().  Removing a probe is done through
unregister_trace_subsys_eventname(); it will remove the probe.

tracepoint_synchronize_unregister() must be called before the end of
the module exit function to make sure there is no caller left using
the probe. This, and the fact that preemption is disabled around the
probe call, make sure that probe removal and module unload are safe.
See the "Probe example" section below for a sample probe module.

The tracepoint mechanism supports inserting multiple instances of the
same tracepoint, but a single definition must be made of a given
tracepoint name over all the kernel to make sure no type conflict will
occur. Name mangling of the tracepoints is done using the prototypes
to make sure typing is correct. Verification of probe type correctness
is done at the registration site by the compiler. Tracepoints can be
put in inline functions, inlined static functions, and unrolled loops
as well as regular functions.

The naming scheme "subsys_event" is suggested here as a convention
intended to limit collisions. Tracepoint names are global to the
kernel: they are considered as being the same whether they are in the
core kernel image or in modules.

If the tracepoint has to be used in kernel modules, an
EXPORT_TRACEPOINT_SYMBOL_GPL() or EXPORT_TRACEPOINT_SYMBOL() can be
used to export the defined tracepoints.

* Probe / tracepoint example

+8 −0
Original line number Diff line number Diff line
@@ -17,6 +17,14 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
	 */
	return addr - 1;
}

#ifdef CONFIG_DYNAMIC_FTRACE

struct dyn_arch_ftrace {
	/* No extra data needed for x86 */
};

#endif /*  CONFIG_DYNAMIC_FTRACE */
#endif /* __ASSEMBLY__ */
#endif /* CONFIG_FUNCTION_TRACER */

+11 −7
Original line number Diff line number Diff line
@@ -1190,7 +1190,7 @@ ENTRY(mcount)
	jnz trace
#ifdef CONFIG_FUNCTION_RET_TRACER
	cmpl $ftrace_stub, ftrace_function_return
	jnz trace_return
	jnz ftrace_return_caller
#endif
.globl ftrace_stub
ftrace_stub:
@@ -1211,9 +1211,15 @@ trace:
	popl %ecx
	popl %eax
	jmp ftrace_stub
END(mcount)
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* CONFIG_FUNCTION_TRACER */

#ifdef CONFIG_FUNCTION_RET_TRACER
trace_return:
ENTRY(ftrace_return_caller)
	cmpl $0, function_trace_stop
	jne ftrace_stub

	pushl %eax
	pushl %ecx
	pushl %edx
@@ -1223,7 +1229,8 @@ trace_return:
	popl %edx
	popl %ecx
	popl %eax
	jmp ftrace_stub
	ret
END(ftrace_return_caller)

.globl return_to_handler
return_to_handler:
@@ -1237,10 +1244,7 @@ return_to_handler:
	popl %ecx
	popl %eax
	ret
#endif /* CONFIG_FUNCTION_RET_TRACER */
END(mcount)
#endif /* CONFIG_DYNAMIC_FTRACE */
#endif /* CONFIG_FUNCTION_TRACER */
#endif

.section .rodata,"a"
#include "syscall_table_32.S"
+156 −130
Original line number Diff line number Diff line
@@ -24,133 +24,6 @@
#include <asm/nmi.h>



#ifdef CONFIG_FUNCTION_RET_TRACER

/*
 * These functions are picked from those used on
 * this page for dynamic ftrace. They have been
 * simplified to ignore all traces in NMI context.
 */
static atomic_t in_nmi;

void ftrace_nmi_enter(void)
{
	atomic_inc(&in_nmi);
}

void ftrace_nmi_exit(void)
{
	atomic_dec(&in_nmi);
}

/* Add a function return address to the trace stack on thread info.*/
static int push_return_trace(unsigned long ret, unsigned long long time,
				unsigned long func)
{
	int index;
	struct thread_info *ti = current_thread_info();

	/* The return trace stack is full */
	if (ti->curr_ret_stack == FTRACE_RET_STACK_SIZE - 1)
		return -EBUSY;

	index = ++ti->curr_ret_stack;
	ti->ret_stack[index].ret = ret;
	ti->ret_stack[index].func = func;
	ti->ret_stack[index].calltime = time;

	return 0;
}

/* Retrieve a function return address to the trace stack on thread info.*/
static void pop_return_trace(unsigned long *ret, unsigned long long *time,
				unsigned long *func)
{
	int index;

	struct thread_info *ti = current_thread_info();
	index = ti->curr_ret_stack;
	*ret = ti->ret_stack[index].ret;
	*func = ti->ret_stack[index].func;
	*time = ti->ret_stack[index].calltime;
	ti->curr_ret_stack--;
}

/*
 * Send the trace to the ring-buffer.
 * @return the original return address.
 */
unsigned long ftrace_return_to_handler(void)
{
	struct ftrace_retfunc trace;
	pop_return_trace(&trace.ret, &trace.calltime, &trace.func);
	trace.rettime = cpu_clock(raw_smp_processor_id());
	ftrace_function_return(&trace);

	return trace.ret;
}

/*
 * Hook the return address and push it in the stack of return addrs
 * in current thread info.
 */
void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
{
	unsigned long old;
	unsigned long long calltime;
	int faulted;
	unsigned long return_hooker = (unsigned long)
				&return_to_handler;

	/* Nmi's are currently unsupported */
	if (atomic_read(&in_nmi))
		return;

	/*
	 * Protect against fault, even if it shouldn't
	 * happen. This tool is too much intrusive to
	 * ignore such a protection.
	 */
	asm volatile(
		"1: movl (%[parent_old]), %[old]\n"
		"2: movl %[return_hooker], (%[parent_replaced])\n"
		"   movl $0, %[faulted]\n"

		".section .fixup, \"ax\"\n"
		"3: movl $1, %[faulted]\n"
		".previous\n"

		".section __ex_table, \"a\"\n"
		"   .long 1b, 3b\n"
		"   .long 2b, 3b\n"
		".previous\n"

		: [parent_replaced] "=r" (parent), [old] "=r" (old),
		  [faulted] "=r" (faulted)
		: [parent_old] "0" (parent), [return_hooker] "r" (return_hooker)
		: "memory"
	);

	if (WARN_ON(faulted)) {
		unregister_ftrace_return();
		return;
	}

	if (WARN_ON(!__kernel_text_address(old))) {
		unregister_ftrace_return();
		*parent = old;
		return;
	}

	calltime = cpu_clock(raw_smp_processor_id());

	if (push_return_trace(old, calltime, self_addr) == -EBUSY)
		*parent = old;
}

#endif

#ifdef CONFIG_DYNAMIC_FTRACE

union ftrace_code_union {
@@ -166,7 +39,7 @@ static int ftrace_calc_offset(long ip, long addr)
	return (int)(addr - ip);
}

unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
{
	static union ftrace_code_union calc;

@@ -311,12 +184,12 @@ do_ftrace_mod_code(unsigned long ip, void *new_code)

static unsigned char ftrace_nop[MCOUNT_INSN_SIZE];

unsigned char *ftrace_nop_replace(void)
static unsigned char *ftrace_nop_replace(void)
{
	return ftrace_nop;
}

int
static int
ftrace_modify_code(unsigned long ip, unsigned char *old_code,
		   unsigned char *new_code)
{
@@ -349,6 +222,29 @@ ftrace_modify_code(unsigned long ip, unsigned char *old_code,
	return 0;
}

int ftrace_make_nop(struct module *mod,
		    struct dyn_ftrace *rec, unsigned long addr)
{
	unsigned char *new, *old;
	unsigned long ip = rec->ip;

	old = ftrace_call_replace(ip, addr);
	new = ftrace_nop_replace();

	return ftrace_modify_code(rec->ip, old, new);
}

int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
	unsigned char *new, *old;
	unsigned long ip = rec->ip;

	old = ftrace_nop_replace();
	new = ftrace_call_replace(ip, addr);

	return ftrace_modify_code(rec->ip, old, new);
}

int ftrace_update_ftrace_func(ftrace_func_t func)
{
	unsigned long ip = (unsigned long)(&ftrace_call);
@@ -426,3 +322,133 @@ int __init ftrace_dyn_arch_init(void *data)
	return 0;
}
#endif

#ifdef CONFIG_FUNCTION_RET_TRACER

#ifndef CONFIG_DYNAMIC_FTRACE

/*
 * These functions are picked from those used on
 * this page for dynamic ftrace. They have been
 * simplified to ignore all traces in NMI context.
 */
static atomic_t in_nmi;

void ftrace_nmi_enter(void)
{
	atomic_inc(&in_nmi);
}

void ftrace_nmi_exit(void)
{
	atomic_dec(&in_nmi);
}
#endif /* !CONFIG_DYNAMIC_FTRACE */

/* Add a function return address to the trace stack on thread info.*/
static int push_return_trace(unsigned long ret, unsigned long long time,
				unsigned long func)
{
	int index;
	struct thread_info *ti = current_thread_info();

	/* The return trace stack is full */
	if (ti->curr_ret_stack == FTRACE_RET_STACK_SIZE - 1)
		return -EBUSY;

	index = ++ti->curr_ret_stack;
	barrier();
	ti->ret_stack[index].ret = ret;
	ti->ret_stack[index].func = func;
	ti->ret_stack[index].calltime = time;

	return 0;
}

/* Retrieve a function return address to the trace stack on thread info.*/
static void pop_return_trace(unsigned long *ret, unsigned long long *time,
				unsigned long *func)
{
	int index;

	struct thread_info *ti = current_thread_info();
	index = ti->curr_ret_stack;
	*ret = ti->ret_stack[index].ret;
	*func = ti->ret_stack[index].func;
	*time = ti->ret_stack[index].calltime;
	ti->curr_ret_stack--;
}

/*
 * Send the trace to the ring-buffer.
 * @return the original return address.
 */
unsigned long ftrace_return_to_handler(void)
{
	struct ftrace_retfunc trace;
	pop_return_trace(&trace.ret, &trace.calltime, &trace.func);
	trace.rettime = cpu_clock(raw_smp_processor_id());
	ftrace_function_return(&trace);

	return trace.ret;
}

/*
 * Hook the return address and push it in the stack of return addrs
 * in current thread info.
 */
void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
{
	unsigned long old;
	unsigned long long calltime;
	int faulted;
	unsigned long return_hooker = (unsigned long)
				&return_to_handler;

	/* Nmi's are currently unsupported */
	if (atomic_read(&in_nmi))
		return;

	/*
	 * Protect against fault, even if it shouldn't
	 * happen. This tool is too much intrusive to
	 * ignore such a protection.
	 */
	asm volatile(
		"1: movl (%[parent_old]), %[old]\n"
		"2: movl %[return_hooker], (%[parent_replaced])\n"
		"   movl $0, %[faulted]\n"

		".section .fixup, \"ax\"\n"
		"3: movl $1, %[faulted]\n"
		".previous\n"

		".section __ex_table, \"a\"\n"
		"   .long 1b, 3b\n"
		"   .long 2b, 3b\n"
		".previous\n"

		: [parent_replaced] "=r" (parent), [old] "=r" (old),
		  [faulted] "=r" (faulted)
		: [parent_old] "0" (parent), [return_hooker] "r" (return_hooker)
		: "memory"
	);

	if (WARN_ON(faulted)) {
		unregister_ftrace_return();
		return;
	}

	if (WARN_ON(!__kernel_text_address(old))) {
		unregister_ftrace_return();
		*parent = old;
		return;
	}

	calltime = cpu_clock(raw_smp_processor_id());

	if (push_return_trace(old, calltime, self_addr) == -EBUSY)
		*parent = old;
}

#endif /* CONFIG_FUNCTION_RET_TRACER */
Loading