Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 8f55cea4 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf changes from Ingo Molnar:
 "There are lots of improvements, the biggest changes are:

  Main kernel side changes:

   - Improve uprobes performance by adding 'pre-filtering' support, by
     Oleg Nesterov.

   - Make some POWER7 events available in sysfs, equivalent to what was
     done on x86, from Sukadev Bhattiprolu.

   - tracing updates by Steve Rostedt - mostly misc fixes and smaller
     improvements.

   - Use perf/event tracing to report PCI Express advanced errors, by
     Tony Luck.

   - Enable northbridge performance counters on AMD family 15h, by Jacob
     Shin.

   - This tracing commit:

        tracing: Remove the extra 4 bytes of padding in events

     changes the ABI.  All involved parties (PowerTop in particular)
     seem to agree that it's safe to do now with the introduction of
     libtraceevent, but the devil is in the details ...

  Main tooling side changes:

   - Add 'event group view', from Namyung Kim:

     To use it, 'perf record' should group events when recording.  And
     then perf report parses the saved group relation from file header
     and prints them together if --group option is provided.  You can
     use the 'perf evlist' command to see event group information:

        $ perf record -e '{ref-cycles,cycles}' noploop 1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

        $ perf evlist --group
        {ref-cycles,cycles}

     With this example, default perf report will show you each event
     separately.

     You can use --group option to enable event group view:

        $ perf report --group
        ...
        # group: {ref-cycles,cycles}
        # ========
        # Samples: 7K of event 'anon group { ref-cycles, cycles }'
        # Event count (approx.): 6876107743
        #
        #         Overhead  Command      Shared Object                      Symbol
        # ................  .......  .................  ..........................
            99.84%  99.76%  noploop  noploop            [.] main
             0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
             0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
             0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
             0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
             0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
             0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
             0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
             0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
             0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time

     As you can see the Overhead column now contains both of ref-cycles
     and cycles and header line shows group information also - 'anon
     group { ref-cycles, cycles }'.  The output is sorted by period of
     group leader first.

   - Initial GTK+ annotate browser, from Namhyung Kim.

   - Add option for runtime switching perf data file in perf report,
     just press 's' and a menu with the valid files found in the current
     directory will be presented, from Feng Tang.

   - Add support to display whole group data for raw columns, from Jiri
     Olsa.

   - Add per processor socket count aggregation in perf stat, from
     Stephane Eranian.

   - Add interval printing in 'perf stat', from Stephane Eranian.

   - 'perf test' improvements

   - Add support for wildcards in tracepoint system name, from Jiri
     Olsa.

   - Add anonymous huge page recognition, from Joshua Zhu.

   - perf build-id cache now can show DSOs present in a perf.data file
     that are not in the cache, to integrate with build-id servers being
     put in place by organizations such as Fedora.

   - perf top now shares more of the evsel config/creation routines with
     'record', paving the way for further integration like 'top'
     snapshots, etc.

   - perf top now supports DWARF callchains.

   - Fix mmap limitations on 32-bit, fix from David Miller.

   - 'perf bench numa mem' NUMA performance measurement suite

   - ... and lots of fixes, performance improvements, cleanups and other
     improvements I failed to list - see the shortlog and git log for
     details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
  perf/x86/amd: Enable northbridge performance counters on AMD family 15h
  perf/hwbp: Fix cleanup in case of kzalloc failure
  perf tools: Fix build with bison 2.3 and older.
  perf tools: Limit unwind support to x86 archs
  perf annotate: Make it to be able to skip unannotatable symbols
  perf gtk/annotate: Fail early if it can't annotate
  perf gtk/annotate: Show source lines with gray color
  perf gtk/annotate: Support multiple event annotation
  perf ui/gtk: Implement basic GTK2 annotation browser
  perf annotate: Fix warning message on a missing vmlinux
  perf buildid-cache: Add --update option
  uprobes/perf: Avoid uprobe_apply() whenever possible
  uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
  uprobes/perf: Teach trace_uprobe/perf code to pre-filter
  uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
  uprobes: Introduce uprobe_apply()
  perf: Introduce hw_perf_event->tp_target and ->tp_list
  uprobes/perf: Always increment trace_uprobe->nhit
  uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
  uprobes/tracing: Introduce is_trace_uprobe_enabled()
  ...
parents b7133a9a e259514e
Loading
Loading
Loading
Loading
+62 −0
Original line number Diff line number Diff line
What:		/sys/devices/cpu/events/
		/sys/devices/cpu/events/branch-misses
		/sys/devices/cpu/events/cache-references
		/sys/devices/cpu/events/cache-misses
		/sys/devices/cpu/events/stalled-cycles-frontend
		/sys/devices/cpu/events/branch-instructions
		/sys/devices/cpu/events/stalled-cycles-backend
		/sys/devices/cpu/events/instructions
		/sys/devices/cpu/events/cpu-cycles

Date:		2013/01/08

Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>

Description:	Generic performance monitoring events

		A collection of performance monitoring events that may be
		supported by many/most CPUs. These events can be monitored
		using the 'perf(1)' tool.

		The contents of each file would look like:

			event=0xNNNN

		where 'N' is a hex digit and the number '0xNNNN' shows the
		"raw code" for the perf event identified by the file's
		"basename".


What: 		/sys/devices/cpu/events/PM_LD_MISS_L1
		/sys/devices/cpu/events/PM_LD_REF_L1
		/sys/devices/cpu/events/PM_CYC
		/sys/devices/cpu/events/PM_BRU_FIN
		/sys/devices/cpu/events/PM_GCT_NOSLOT_CYC
		/sys/devices/cpu/events/PM_BRU_MPRED
		/sys/devices/cpu/events/PM_INST_CMPL
		/sys/devices/cpu/events/PM_CMPLU_STALL

Date:		2013/01/08

Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
		Linux Powerpc mailing list <linuxppc-dev@ozlabs.org>

Description:	POWER-systems specific performance monitoring events

		A collection of performance monitoring events that may be
		supported by the POWER CPU. These events can be monitored
		using the 'perf(1)' tool.

		These events may not be supported by other CPUs.

		The contents of each file would look like:

			event=0xNNNN

		where 'N' is a hex digit and the number '0xNNNN' shows the
		"raw code" for the perf event identified by the file's
		"basename".

		Further, multiple terms like 'event=0xNNNN' can be specified
		and separated with comma. All available terms are defined in
		the /sys/bus/event_source/devices/<dev>/format file.
+83 −0
Original line number Diff line number Diff line
@@ -1842,6 +1842,89 @@ an error.
 # cat buffer_size_kb
85

Snapshot
--------
CONFIG_TRACER_SNAPSHOT makes a generic snapshot feature
available to all non latency tracers. (Latency tracers which
record max latency, such as "irqsoff" or "wakeup", can't use
this feature, since those are already using the snapshot
mechanism internally.)

Snapshot preserves a current trace buffer at a particular point
in time without stopping tracing. Ftrace swaps the current
buffer with a spare buffer, and tracing continues in the new
current (=previous spare) buffer.

The following debugfs files in "tracing" are related to this
feature:

  snapshot:

	This is used to take a snapshot and to read the output
	of the snapshot. Echo 1 into this file to allocate a
	spare buffer and to take a snapshot (swap), then read
	the snapshot from this file in the same format as
	"trace" (described above in the section "The File
	System"). Both reads snapshot and tracing are executable
	in parallel. When the spare buffer is allocated, echoing
	0 frees it, and echoing else (positive) values clear the
	snapshot contents.
	More details are shown in the table below.

	status\input  |     0      |     1      |    else    |
	--------------+------------+------------+------------+
	not allocated |(do nothing)| alloc+swap |   EINVAL   |
	--------------+------------+------------+------------+
	allocated     |    free    |    swap    |   clear    |
	--------------+------------+------------+------------+

Here is an example of using the snapshot feature.

 # echo 1 > events/sched/enable
 # echo 1 > snapshot
 # cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 71/71   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
          <idle>-0     [005] d...  2440.603828: sched_switch: prev_comm=swapper/5 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2242 next_prio=120
           sleep-2242  [005] d...  2440.603846: sched_switch: prev_comm=snapshot-test-2 prev_pid=2242 prev_prio=120 prev_state=R ==> next_comm=kworker/5:1 next_pid=60 next_prio=120
[...]
          <idle>-0     [002] d...  2440.707230: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2229 next_prio=120

 # cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 77/77   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
          <idle>-0     [007] d...  2440.707395: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=snapshot-test-2 next_pid=2243 next_prio=120
 snapshot-test-2-2229  [002] d...  2440.707438: sched_switch: prev_comm=snapshot-test-2 prev_pid=2229 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
[...]


If you try to use this snapshot feature when current tracer is
one of the latency tracers, you will get the following results.

 # echo wakeup > current_tracer
 # echo 1 > snapshot
bash: echo: write error: Device or resource busy
 # cat snapshot
cat: snapshot: Device or resource busy

-----------

More details can be found in the source code, in the
+12 −0
Original line number Diff line number Diff line
@@ -76,6 +76,15 @@ config OPTPROBES
	depends on KPROBES && HAVE_OPTPROBES
	depends on !PREEMPT

config KPROBES_ON_FTRACE
	def_bool y
	depends on KPROBES && HAVE_KPROBES_ON_FTRACE
	depends on DYNAMIC_FTRACE_WITH_REGS
	help
	 If function tracer is enabled and the arch supports full
	 passing of pt_regs to function tracing, then kprobes can
	 optimize on top of function tracing.

config UPROBES
	bool "Transparent user-space probes (EXPERIMENTAL)"
	depends on UPROBE_EVENT && PERF_EVENTS
@@ -158,6 +167,9 @@ config HAVE_KRETPROBES
config HAVE_OPTPROBES
	bool

config HAVE_KPROBES_ON_FTRACE
	bool

config HAVE_NMI_WATCHDOG
	bool
#
+26 −0
Original line number Diff line number Diff line
@@ -11,6 +11,7 @@

#include <linux/types.h>
#include <asm/hw_irq.h>
#include <linux/device.h>

#define MAX_HWEVENTS		8
#define MAX_EVENT_ALTERNATIVES	8
@@ -35,6 +36,7 @@ struct power_pmu {
	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
	int		(*limited_pmc_event)(u64 event_id);
	u32		flags;
	const struct attribute_group	**attr_groups;
	int		n_generic;
	int		*generic_events;
	int		(*cache_events)[PERF_COUNT_HW_CACHE_MAX]
@@ -109,3 +111,27 @@ extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
 * If an event_id is not subject to the constraint expressed by a particular
 * field, then it will have 0 in both the mask and value for that field.
 */

extern ssize_t power_events_sysfs_show(struct device *dev,
				struct device_attribute *attr, char *page);

/*
 * EVENT_VAR() is same as PMU_EVENT_VAR with a suffix.
 *
 * Having a suffix allows us to have aliases in sysfs - eg: the generic
 * event 'cpu-cycles' can have two entries in sysfs: 'cpu-cycles' and
 * 'PM_CYC' where the latter is the name by which the event is known in
 * POWER CPU specification.
 */
#define	EVENT_VAR(_id, _suffix)		event_attr_##_id##_suffix
#define	EVENT_PTR(_id, _suffix)		&EVENT_VAR(_id, _suffix).attr.attr

#define	EVENT_ATTR(_name, _id, _suffix)					\
	PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,	\
			power_events_sysfs_show)

#define	GENERIC_EVENT_ATTR(_name, _id)	EVENT_ATTR(_name, _id, _g)
#define	GENERIC_EVENT_PTR(_id)		EVENT_PTR(_id, _g)

#define	POWER_EVENT_ATTR(_name, _id)	EVENT_ATTR(PM_##_name, _id, _p)
#define	POWER_EVENT_PTR(_id)		EVENT_PTR(_id, _p)
+12 −0
Original line number Diff line number Diff line
@@ -1305,6 +1305,16 @@ static int power_pmu_event_idx(struct perf_event *event)
	return event->hw.idx;
}

ssize_t power_events_sysfs_show(struct device *dev,
				struct device_attribute *attr, char *page)
{
	struct perf_pmu_events_attr *pmu_attr;

	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);

	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
}

struct pmu power_pmu = {
	.pmu_enable	= power_pmu_enable,
	.pmu_disable	= power_pmu_disable,
@@ -1537,6 +1547,8 @@ int __cpuinit register_power_pmu(struct power_pmu *pmu)
	pr_info("%s performance monitor hardware support registered\n",
		pmu->name);

	power_pmu.attr_groups = ppmu->attr_groups;

#ifdef MSR_HV
	/*
	 * Use FCHV to ignore kernel events if MSR.HV is set.
Loading