Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit a8944c5b authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'perf-core-for-mingo-20160427' of...

Merge tag 'perf-core-for-mingo-20160427' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

 into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

- perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de Melo)

  Tracing write syscalls and major page faults with callchains while starting
  firefox, limiting the stack to 5 frames:

 # perf trace -e write --pf maj --max-stack 5 firefox
   589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, count: 151) = 151
                                       [0xfaed] (/usr/lib64/libpthread-2.22.so)
                                       fire_glxtest_process+0x5c (/usr/lib64/firefox/libxul.so)
                                       InstallGdkErrorHandler+0x41 (/usr/lib64/firefox/libxul.so)
                                       XREMain::XRE_mainInit+0x12c (/usr/lib64/firefox/libxul.so)
                                       XREMain::XRE_main+0x1e4 (/usr/lib64/firefox/libxul.so)
   760.704 ( 0.000 ms): firefox/15332 majfault [gtk_tree_view_accessible_get_type+0x0] => /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.)
                                       gtk_tree_view_accessible_get_type+0x0 (/usr/lib64/libgtk-3.so.0.1800.9)
                                       gtk_tree_view_class_intern_init+0x1a54 (/usr/lib64/libgtk-3.so.0.1800.9)
                                       g_type_class_ref+0x6dd (/usr/lib64/libgobject-2.0.so.0.4600.2)
                                       [0x115378] (/usr/lib64/libgnutls.so.30.6.3)

  This automagically selects "--call-graph dwarf", use "--call-graph fp" on systems
  where -fno-omit-frame-pointer was used to built the components of interest, to
  incur in less overhead, or tune "--call-graph dwarf" appropriately, see 'perf record --help'.

- Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard coded value
  of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like Groovy, Ruby, etc,
  and also to reduce overhead by limiting it to a smaller value, upcoming work will allow
  this to be done per-event (Arnaldo Carvalho de Melo)

- Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo Carvalho de Melo)

- Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo Carvalho de Melo)

   # perf record --call lbr usleep 1
   # perf evlist -v
   cycles:ppp: ... sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, ...
            branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES
   #

- Clear dummy entry accumulated period, fixing such 'perf top/report' output
  as: (Kan Liang)

    4769.98%  0.01%  0.00%  0.01%  tchain_edit  [kernel] [k] update_fast_timekeeper

- System calls with pid_t arguments gets them augmented with the COMM event
  more thoroughly:

  # trace -e perf_event_open perf stat -e cycles -p 15608
   6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
   6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
   6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                                            ^^^^^^^^^^^^^^^^^^
   ^C

- Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria)

- Fix module probe issue if no dwarf support in (Ravi Bangoria)

Assorted fixes:

- Fix off-by-one in write_buildid() (Andrey Ryabinin)

- Fix segfault when printing callchains in 'perf script' (Chris Phlipot)

- Replace assignment with comparison on assert check in 'perf test' entry (Colin Ian King)

- Fix off-by-one comparison in intel-pt code (Colin Ian King)

- Close target file on error path in 'perf probe' (Masami Hiramatsu)

- Set default kprobe group name if not given in 'perf probe' (Masami Hiramatsu)

- Avoid partial perf_event_header reads (Wang Nan)

Infrastructure changes:

- Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo Carvalho de Melo)

- Make the x86 clean quiet wrt syscall table removal (Jiri Olsa)

Cleanups:

- Simplify wrapper for LOCK_PI in 'perf bench futex' (Davidlohr Bueso)

- Remove duplicate const qualifier (Eric Engestrom)

Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 67d61296 4cb93446
Loading
Loading
Loading
Loading
+14 −0
Original line number Diff line number Diff line
@@ -60,6 +60,7 @@ show up in /proc/sys/kernel:
- panic_on_warn
- perf_cpu_time_max_percent
- perf_event_paranoid
- perf_event_max_stack
- pid_max
- powersave-nap               [ PPC only ]
- printk
@@ -654,6 +655,19 @@ users (without CAP_SYS_ADMIN). The default value is 1.

==============================================================

perf_event_max_stack:

Controls maximum number of stack frames to copy for (attr.sample_type &
PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
'perf record -g' or 'perf trace --call-graph fp'.

This can only be done when no events are in use that have callchains
enabled, otherwise writing to this file will return -EBUSY.

The default value is 127.

==============================================================

pid_max:

PID allocation wrap value.  When the kernel's next PID value
+1 −1
Original line number Diff line number Diff line
@@ -75,7 +75,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)

	tail = (struct frame_tail __user *)regs->ARM_fp - 1;

	while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
	while ((entry->nr < sysctl_perf_event_max_stack) &&
	       tail && !((unsigned long)tail & 0x3))
		tail = user_backtrace(tail, entry);
}
+2 −2
Original line number Diff line number Diff line
@@ -122,7 +122,7 @@ void perf_callchain_user(struct perf_callchain_entry *entry,

		tail = (struct frame_tail __user *)regs->regs[29];

		while (entry->nr < PERF_MAX_STACK_DEPTH &&
		while (entry->nr < sysctl_perf_event_max_stack &&
		       tail && !((unsigned long)tail & 0xf))
			tail = user_backtrace(tail, entry);
	} else {
@@ -132,7 +132,7 @@ void perf_callchain_user(struct perf_callchain_entry *entry,

		tail = (struct compat_frame_tail __user *)regs->compat_fp - 1;

		while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
		while ((entry->nr < sysctl_perf_event_max_stack) &&
			tail && !((unsigned long)tail & 0x3))
			tail = compat_user_backtrace(tail, entry);
#endif
+1 −1
Original line number Diff line number Diff line
@@ -65,7 +65,7 @@ perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)

	--frame;

	while ((entry->nr < PERF_MAX_STACK_DEPTH) && frame)
	while ((entry->nr < sysctl_perf_event_max_stack) && frame)
		frame = user_backtrace(frame, entry);
}

+2 −2
Original line number Diff line number Diff line
@@ -35,7 +35,7 @@ static void save_raw_perf_callchain(struct perf_callchain_entry *entry,
		addr = *sp++;
		if (__kernel_text_address(addr)) {
			perf_callchain_store(entry, addr);
			if (entry->nr >= PERF_MAX_STACK_DEPTH)
			if (entry->nr >= sysctl_perf_event_max_stack)
				break;
		}
	}
@@ -59,7 +59,7 @@ void perf_callchain_kernel(struct perf_callchain_entry *entry,
	}
	do {
		perf_callchain_store(entry, pc);
		if (entry->nr >= PERF_MAX_STACK_DEPTH)
		if (entry->nr >= sysctl_perf_event_max_stack)
			break;
		pc = unwind_stack(current, &sp, pc, &ra);
	} while (pc);
Loading