Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 2696ec45 authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'perf-core-for-mingo-4.18-20180606' of...

Merge tag 'perf-core-for-mingo-4.18-20180606' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf stat:

 - Display user and system time for workload targets (Jiri Olsa)

perf record:

 - Enable arbitrary event names thru name= modifier (Alexey Budankov)

PowerPC:

 - Add a python script for hypervisor call statistics (Ravi Bangoria)

Intel PT: (Adrian Hunter)

 - Fix sync_switch INTEL_PT_SS_NOT_TRACING

 - Fix decoding to accept CBR between FUP and corresponding TIP

 - Fix MTC timing after overflow

 - Fix "Unexpected indirect branch" error

perf test:

 - record+probe_libc_inet_pton:

  -  To get the symbol table for dynamic
     shared objects on ubuntu we need to pass the -D/--dynamic command line
     option, unlike with the fedora distros (Arnaldo Carvalho de Melo)

 - code-reading:

  - Fix perf_env setup for PTI entry trampolines (Adrian Hunter)

 - kmod-path:

  - Add tests for vdso32 and vdsox32 (Adrian Hunter)

 - Use header file util/debug.h (Thomas Richter)

perf annotate:

 - Make the various UI backends (stdio, TUI, gtk) use more consistently
  structs with annotation options as specified by the user (Arnaldo Carvalho de Melo)

 - Move annotation specific knobs from the symbol_conf global kitchen
  sink to the annotation option structs (Arnaldo Carvalho de Melo)

perf script:

 - Add more PMU fields to python scripts event handler dict (Jin Yao)

Core:

 - Fix misleading error for some unparsable events mentioning PMUs when
  those are not involved in the problem (Jiri Olsa)

 - Consider BSS symbols when processing /proc/kallsyms ('B' and 'b')
  (Arnaldo Carvalho de Melo)

- Be more robust when trying to use per-symbol histograms, checking for
  unlikely but possible cases where the space for the histograms wasn't
  allocated, print a debug message for such cases (Arnaldo Carvalho de Melo)

- Fix symbol and object code resolution for vdso32 and vdsox32 (Adrian Hunter)

 - No need to check for null when passing pointers to foo__get() style
  refcount grabbing helpers, just like in the kernel and with free(),
  its safe to pass a NULL pointer to avoid having to check it before
  each and every foo__get() call (Arnaldo Carvalho de Melo)

 - Remove some dead code (quote.[ch]) (Arnaldo Carvalho de Melo)

 - Remove some needless globals, making them local (Arnaldo Carvalho de Melo)

 - Reduce usage of symbol_conf.use_callchain, using other means of
  finding out if callchains are in use or available for specific events,
  as we evolved this codebase to allow requesting callchains for just
  a subset of the monitored events. In time it will help polish
  recording and showing mixed sets accross the various tools:

    perf record -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions'

  (Arnaldo Carvalho de Melo)

 - Consider PTI entry trampolines in map__rip_2objdump() (Adrian Hunter)

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents d09a8e6f ac56aa45
Loading
Loading
Loading
Loading
+5 −1
Original line number Diff line number Diff line
@@ -124,7 +124,11 @@ The available PMUs and their raw parameters can be listed with
For example the raw event "LSD.UOPS" core pmu event above could
be specified as

  perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=1/ ...
  perf stat -e cpu/event=0xa8,umask=0x1,name=LSD.UOPS_CYCLES,cmask=0x1/ ...

  or using extended name syntax

  perf stat -e cpu/event=0xa8,umask=0x1,cmask=0x1,name=\'LSD.UOPS_CYCLES:cmask=0x1\'/ ...

PER SOCKET PMUS
---------------
+3 −0
Original line number Diff line number Diff line
@@ -57,6 +57,9 @@ OPTIONS
			 FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
			 "no" for disable callgraph.
	  - 'stack-size': user stack size for dwarf mode
	  - 'name' : User defined event name. Single quotes (') may be used to
		    escape symbols in the name from parsing by shell and tool
		    like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.

          See the linkperf:perf-list[1] man page for more parameters.

+26 −0
Original line number Diff line number Diff line
@@ -610,6 +610,32 @@ Various utility functions for use with perf script:
  nsecs_str(nsecs) - returns printable string in the form secs.nsecs
  avg(total, n) - returns average given a sum and a total number of values

SUPPORTED FIELDS
----------------

Currently supported fields:

ev_name, comm, pid, tid, cpu, ip, time, period, phys_addr, addr,
symbol, dso, time_enabled, time_running, values, callchain,
brstack, brstacksym, datasrc, datasrc_decode, iregs, uregs,
weight, transaction, raw_buf, attr.

Some fields have sub items:

brstack:
    from, to, from_dsoname, to_dsoname, mispred,
    predicted, in_tx, abort, cycles.

brstacksym:
    items: from, to, pred, in_tx, abort (converted string)

For example,
We can use this code to print brstack "from", "to", "cycles".

if 'brstack' in dict:
	for entry in dict['brstack']:
		print "from %s, to %s, cycles %s" % (entry["from"], entry["to"], entry["cycles"])

SEE ALSO
--------
linkperf:perf-script[1]
+29 −11
Original line number Diff line number Diff line
@@ -310,20 +310,38 @@ Users who wants to get the actual value can apply --no-metric-only.
EXAMPLES
--------

$ perf stat -- make -j
$ perf stat -- make

 Performance counter stats for 'make -j':
   Performance counter stats for 'make':

    8117.370256  task clock ticks     #      11.281 CPU utilization factor
            678  context switches     #       0.000 M/sec
            133  CPU migrations       #       0.000 M/sec
         235724  pagefaults           #       0.029 M/sec
    24821162526  CPU cycles           #    3057.784 M/sec
    18687303457  instructions         #    2302.138 M/sec
      172158895  cache references     #      21.209 M/sec
       27075259  cache misses         #       3.335 M/sec
        83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
                   0      context-switches:u        #    0.000 K/sec
                   0      cpu-migrations:u          #    0.000 K/sec
           3,228,188      page-faults:u             #    0.039 M/sec
     229,570,665,834      cycles:u                  #    2.742 GHz
     313,163,853,778      instructions:u            #    1.36  insn per cycle
      69,704,684,856      branches:u                #  832.559 M/sec
       2,078,861,393      branch-misses:u           #    2.98% of all branches

 Wall-clock time elapsed:   719.554352 msecs
        83.409183620 seconds time elapsed

        74.684747000 seconds user
         8.739217000 seconds sys

TIMINGS
-------
As displayed in the example above we can display 3 types of timings.
We always display the time the counters were enabled/alive:

        83.409183620 seconds time elapsed

For workload sessions we also display time the workloads spent in
user/system lands:

        74.684747000 seconds user
         8.739217000 seconds sys

Those times are the very same as displayed by the 'time' tool.

CSV FORMAT
----------
+2 −2
Original line number Diff line number Diff line
@@ -189,7 +189,7 @@ static int perf_env__lookup_binutils_path(struct perf_env *env,
	return -1;
}

int perf_env__lookup_objdump(struct perf_env *env)
int perf_env__lookup_objdump(struct perf_env *env, const char **path)
{
	/*
	 * For live mode, env->arch will be NULL and we can use
@@ -198,5 +198,5 @@ int perf_env__lookup_objdump(struct perf_env *env)
	if (env->arch == NULL)
		return 0;

	return perf_env__lookup_binutils_path(env, "objdump", &objdump_path);
	return perf_env__lookup_binutils_path(env, "objdump", path);
}
Loading