Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 23acd3e1 authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'perf-core-for-mingo-4.13-20170630' of...

Merge tag 'perf-core-for-mingo-4.13-20170630' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

Intel PT enhancements:

 - Support "ptwrite" instruction, a way to stuff 32 or 64 bit values into
   the Intel PT trace (Adrian Hunter)

 - Support power events in Intel PT to report changes to C-state (Adrian
   Hunter)

 - Synthesize Intel PT events as PERF_RECORD_SAMPLE records with a
   perf_event_attr.type (PERF_TYPE_SYNTH) just after the range used by the
   kernel, i.e. right after what is allocated for PMUs, at INT_MAX + 1U,
   attr.config will have the identification for the synthesized event and
   the PERF_SAMPLE_RAW payload will have its fields (Adrian Hunter)

Infrastructure changes:

 - Remove warning() and error(), using instead pr_warning() and
   pr_error(), consolidating error reporting (Arnaldo Carvalho de Melo)

 - Add platform dependency to 'perf test 15' (Thomas Richter)

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents e91c8d97 644e0840
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -1009,7 +1009,7 @@ GrpTable: Grp15
1: fxstor | RDGSBASE Ry (F3),(11B)
2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
4: XSAVE
4: XSAVE | ptwrite Ey (F3),(11B)
5: XRSTOR | lfence (11B)
6: XSAVEOPT | clwb (66) | mfence (11B)
7: clflush | clflushopt (66) | sfence (11B)
+29 −6
Original line number Diff line number Diff line
@@ -5,6 +5,8 @@
#include <stddef.h>
#include <assert.h>
#include <linux/compiler.h>
#include <endian.h>
#include <byteswap.h>

#ifndef UINT_MAX
#define UINT_MAX	(~0U)
@@ -67,12 +69,33 @@
#endif
#endif

/*
 * Both need more care to handle endianness
 * (Don't use bitmap_copy_le() for now)
 */
#define cpu_to_le64(x)	(x)
#define cpu_to_le32(x)	(x)
#if __BYTE_ORDER == __BIG_ENDIAN
#define cpu_to_le16 bswap_16
#define cpu_to_le32 bswap_32
#define cpu_to_le64 bswap_64
#define le16_to_cpu bswap_16
#define le32_to_cpu bswap_32
#define le64_to_cpu bswap_64
#define cpu_to_be16
#define cpu_to_be32
#define cpu_to_be64
#define be16_to_cpu
#define be32_to_cpu
#define be64_to_cpu
#else
#define cpu_to_le16
#define cpu_to_le32
#define cpu_to_le64
#define le16_to_cpu
#define le32_to_cpu
#define le64_to_cpu
#define cpu_to_be16 bswap_16
#define cpu_to_be32 bswap_32
#define cpu_to_be64 bswap_64
#define be16_to_cpu bswap_16
#define be32_to_cpu bswap_32
#define be64_to_cpu bswap_64
#endif

int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
int scnprintf(char * buf, size_t size, const char * fmt, ...);
+1 −1
Original line number Diff line number Diff line
@@ -1009,7 +1009,7 @@ GrpTable: Grp15
1: fxstor | RDGSBASE Ry (F3),(11B)
2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
4: XSAVE
4: XSAVE | ptwrite Ey (F3),(11B)
5: XRSTOR | lfence (11B)
6: XSAVEOPT | clwb (66) | mfence (11B)
7: clflush | clflushopt (66) | sfence (11B)
+40 −2
Original line number Diff line number Diff line
@@ -108,6 +108,9 @@ approach is available to export the data to a postgresql database. Refer to
script export-to-postgresql.py for more details, and to script
call-graph-from-postgresql.py for an example of using the database.

There is also script intel-pt-events.py which provides an example of how to
unpack the raw data for power events and PTWRITE.

As mentioned above, it is easy to capture too much data.  One way to limit the
data captured is to use 'snapshot' mode which is explained further below.
Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
@@ -710,13 +713,15 @@ Having no option is the same as

which, in turn, is the same as

	--itrace=ibxe
	--itrace=ibxwpe

The letters are:

	i	synthesize "instructions" events
	b	synthesize "branches" events
	x	synthesize "transactions" events
	w	synthesize "ptwrite" events
	p	synthesize "power" events
	c	synthesize branches events (calls only)
	r	synthesize branches events (returns only)
	e	synthesize tracing error events
@@ -735,7 +740,40 @@ and "r" can be combined to get calls and returns.
'flags' field can be used in perf script to determine whether the event is a
tranasaction start, commit or abort.

Error events are new.  They show where the decoder lost the trace.  Error events
Note that "instructions", "branches" and "transactions" events depend on code
flow packets which can be disabled by using the config term "branch=0".  Refer
to the config terms section above.

"ptwrite" events record the payload of the ptwrite instruction and whether
"fup_on_ptw" was used.  "ptwrite" events depend on PTWRITE packets which are
recorded only if the "ptw" config term was used.  Refer to the config terms
section above.  perf script "synth" field displays "ptwrite" information like
this: "ip: 0 payload: 0x123456789abcdef0"  where "ip" is 1 if "fup_on_ptw" was
used.

"Power" events correspond to power event packets and CBR (core-to-bus ratio)
packets.  While CBR packets are always recorded when tracing is enabled, power
event packets are recorded only if the "pwr_evt" config term was used.  Refer to
the config terms section above.  The power events record information about
C-state changes, whereas CBR is indicative of CPU frequency.  perf script
"event,synth" fields display information like this:
	cbr:  cbr: 22 freq: 2189 MHz (200%)
	mwait:  hints: 0x60 extensions: 0x1
	pwre:  hw: 0 cstate: 2 sub-cstate: 0
	exstop:  ip: 1
	pwrx:  deepest cstate: 2 last cstate: 2 wake reason: 0x4
Where:
	"cbr" includes the frequency and the percentage of maximum non-turbo
	"mwait" shows mwait hints and extensions
	"pwre" shows C-state transitions (to a C-state deeper than C0) and
	whether	initiated by hardware
	"exstop" indicates execution stopped and whether the IP was recorded
	exactly,
	"pwrx" indicates return to C0
For more details refer to the Intel 64 and IA-32 Architectures Software
Developer Manuals.

Error events show where the decoder lost the trace.  Error events
are quite important.  Users must know if what they are seeing is a complete
picture or not.

+5 −3
Original line number Diff line number Diff line
@@ -3,13 +3,15 @@
		c	synthesize branches events (calls only)
		r	synthesize branches events (returns only)
		x	synthesize transactions events
		w	synthesize ptwrite events
		p	synthesize power events
		e	synthesize error events
		d	create a debug log
		g	synthesize a call chain (use with i or x)
		l	synthesize last branch entries (use with i or x)
		s       skip initial number of events

	The default is all events i.e. the same as --itrace=ibxe
	The default is all events i.e. the same as --itrace=ibxwpe

	In addition, the period (default 100000) for instructions events
	can be specified in units of:
@@ -26,8 +28,8 @@
	Also the number of last branch entries (default 64, max. 1024) for
	instructions or transactions events can be specified.

	It is also possible to skip events generated (instructions, branches, transactions)
	at the beginning. This is useful to ignore initialization code.
	It is also possible to skip events generated (instructions, branches, transactions,
	ptwrite, power) at the beginning. This is useful to ignore initialization code.

	--itrace=i0nss1000000

Loading