Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 95107b30 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull tracing updates from Steven Rostedt:
 "This release cycle is rather small.  Just a few fixes to tracing.

  The big change is the addition of the hwlat tracer. It not only
  detects SMIs, but also other latency that's caused by the hardware. I
  have detected some latency from large boxes having bus contention"

* tag 'trace-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Call traceoff trigger after event is recorded
  ftrace/scripts: Add helper script to bisect function tracing problem functions
  tracing: Have max_latency be defined for HWLAT_TRACER as well
  tracing: Add NMI tracing in hwlat detector
  tracing: Have hwlat trace migrate across tracing_cpumask CPUs
  tracing: Add documentation for hwlat_detector tracer
  tracing: Added hardware latency tracer
  ftrace: Access ret_stack->subtime only in the function profiler
  function_graph: Handle TRACE_BPUTS in print_graph_comment
  tracing/uprobe: Drop isdigit() check in create_trace_uprobe
parents 541efb76 a0d0c621
Loading
Loading
Loading
Loading
+5 −5
Original line number Diff line number Diff line
@@ -858,11 +858,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
	       When enabled, it will account time the task has been
	       scheduled out as part of the function call.

  graph-time - When running function graph tracer, to include the
  	       time to call nested functions. When this is not set,
	       the time reported for the function will only include
	       the time the function itself executed for, not the time
	       for functions that it called.
  graph-time - When running function profiler with function graph tracer,
	       to include the time to call nested functions. When this is
	       not set, the time reported for the function will only
	       include the time the function itself executed for, not the
	       time for functions that it called.

  record-cmd - When any event or tracer is enabled, a hook is enabled
  	       in the sched_switch trace point to fill comm cache
+79 −0
Original line number Diff line number Diff line
Introduction:
-------------

The tracer hwlat_detector is a special purpose tracer that is used to
detect large system latencies induced by the behavior of certain underlying
hardware or firmware, independent of Linux itself. The code was developed
originally to detect SMIs (System Management Interrupts) on x86 systems,
however there is nothing x86 specific about this patchset. It was
originally written for use by the "RT" patch since the Real Time
kernel is highly latency sensitive.

SMIs are not serviced by the Linux kernel, which means that it does not
even know that they are occuring. SMIs are instead set up by BIOS code
and are serviced by BIOS code, usually for "critical" events such as
management of thermal sensors and fans. Sometimes though, SMIs are used for
other tasks and those tasks can spend an inordinate amount of time in the
handler (sometimes measured in milliseconds). Obviously this is a problem if
you are trying to keep event service latencies down in the microsecond range.

The hardware latency detector works by hogging one of the cpus for configurable
amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
for some period, then looking for gaps in the TSC data. Any gap indicates a
time when the polling was interrupted and since the interrupts are disabled,
the only thing that could do that would be an SMI or other hardware hiccup
(or an NMI, but those can be tracked).

Note that the hwlat detector should *NEVER* be used in a production environment.
It is intended to be run manually to determine if the hardware platform has a
problem with long system firmware service routines.

Usage:
------

Write the ASCII text "hwlat" into the current_tracer file of the tracing system
(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to
redefine the threshold in microseconds (us) above which latency spikes will
be taken into account.

Example:

	# echo hwlat > /sys/kernel/tracing/current_tracer
	# echo 100 > /sys/kernel/tracing/tracing_thresh

The /sys/kernel/tracing/hwlat_detector interface contains the following files:

width			- time period to sample with CPUs held (usecs)
			  must be less than the total window size (enforced)
window			- total period of sampling, width being inside (usecs)

By default the width is set to 500,000 and window to 1,000,000, meaning that
for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will
change to a default of 10 usecs. If any latencies that exceed the threshold is
observed then the data will be written to the tracing ring buffer.

The minimum sleep time between periods is 1 millisecond. Even if width
is less than 1 millisecond apart from window, to allow the system to not
be totally starved.

If tracing_thresh was zero when hwlat detector was started, it will be set
back to zero if another tracer is loaded. Note, the last value in
tracing_thresh that hwlat detector had will be saved and this value will
be restored in tracing_thresh if it is still zero when hwlat detector is
started again.

The following tracing directory files are used by the hwlat_detector:

in /sys/kernel/tracing:

 tracing_threshold	- minimum latency value to be considered (usecs)
 tracing_max_latency	- maximum hardware latency actually observed (usecs)
 tracing_cpumask	- the CPUs to move the hwlat thread across
 hwlat_detector/width	- specified amount of time to spin within window (usecs)
 hwlat_detector/window	- amount of time between (width) runs (usecs)

The hwlat detector's kernel thread will migrate across each CPU specified in
tracing_cpumask between each window. To limit the migration, either modify
tracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU
affinity directly, and the migration will stop.
+2 −2
Original line number Diff line number Diff line
@@ -139,7 +139,7 @@ static void ftrace_mod_code(void)
		clear_mod_flag();
}

void ftrace_nmi_enter(void)
void arch_ftrace_nmi_enter(void)
{
	if (atomic_inc_return(&nmi_running) & MOD_CODE_WRITE_FLAG) {
		smp_rmb();
@@ -150,7 +150,7 @@ void ftrace_nmi_enter(void)
	smp_mb();
}

void ftrace_nmi_exit(void)
void arch_ftrace_nmi_exit(void)
{
	/* Finish all executions before clearing nmi_running */
	smp_mb();
+2 −0
Original line number Diff line number Diff line
@@ -794,7 +794,9 @@ struct ftrace_ret_stack {
	unsigned long ret;
	unsigned long func;
	unsigned long long calltime;
#ifdef CONFIG_FUNCTION_PROFILER
	unsigned long long subtime;
#endif
#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
	unsigned long fp;
#endif
+27 −4
Original line number Diff line number Diff line
@@ -3,11 +3,34 @@


#ifdef CONFIG_FTRACE_NMI_ENTER
extern void ftrace_nmi_enter(void);
extern void ftrace_nmi_exit(void);
extern void arch_ftrace_nmi_enter(void);
extern void arch_ftrace_nmi_exit(void);
#else
static inline void ftrace_nmi_enter(void) { }
static inline void ftrace_nmi_exit(void) { }
static inline void arch_ftrace_nmi_enter(void) { }
static inline void arch_ftrace_nmi_exit(void) { }
#endif

#ifdef CONFIG_HWLAT_TRACER
extern bool trace_hwlat_callback_enabled;
extern void trace_hwlat_callback(bool enter);
#endif

static inline void ftrace_nmi_enter(void)
{
#ifdef CONFIG_HWLAT_TRACER
	if (trace_hwlat_callback_enabled)
		trace_hwlat_callback(true);
#endif
	arch_ftrace_nmi_enter();
}

static inline void ftrace_nmi_exit(void)
{
	arch_ftrace_nmi_exit();
#ifdef CONFIG_HWLAT_TRACER
	if (trace_hwlat_callback_enabled)
		trace_hwlat_callback(false);
#endif
}

#endif /* _LINUX_FTRACE_IRQ_H */
Loading