Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit b82a5df2 authored by jianzhou's avatar jianzhou
Browse files

Merge android-4.9.165 (72b54df9) into msm-4.9



* refs/heads/tmp-72b54df9:
  Linux 4.9.165
  KVM: X86: Fix residual mmio emulation request to userspace
  KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
  KVM: nVMX: Sign extend displacements of VMX instr's mem operands
  drm/radeon/evergreen_cs: fix missing break in switch statement
  media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
  rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
  md: Fix failed allocation of md_register_thread
  perf intel-pt: Fix divide by zero when TSC is not available
  perf intel-pt: Fix overlap calculation for padding
  perf auxtrace: Define auxtrace record alignment
  perf intel-pt: Fix CYC timestamp calculation after OVF
  bcache: never writeback a discard operation
  PM / wakeup: Rework wakeup source timer cancellation
  nfsd: fix wrong check in write_v4_end_grace()
  nfsd: fix memory corruption caused by readdir
  NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
  NFS: Fix an I/O request leakage in nfs_do_recoalesce
  NFS: Fix I/O request leakages
  dm: fix to_sector() for 32bit
  ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
  powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning
  powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
  powerpc/83xx: Also save/restore SPRG4-7 during suspend
  powerpc/powernv: Make opal log only readable by root
  powerpc/wii: properly disable use of BATs when requested.
  powerpc/32: Clear on-stack exception marker upon exception return
  jbd2: fix compile warning when using JBUFFER_TRACE
  jbd2: clear dirty flag when revoking a buffer from an older transaction
  serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
  serial: 8250_pci: Fix number of ports for ACCES serial cards
  8250: FIX Fourth port offset of Pericom PI7C9X7954 boards
  serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart
  serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO
  drm/i915: Relax mmap VMA check
  i2c: tegra: fix maximum transfer size
  parport_pc: fix find_superio io compare code, should use equal test.
  intel_th: Don't reference unassigned outputs
  device property: Fix the length used in PROPERTY_ENTRY_STRING()
  kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
  mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  mm: hwpoison: fix thp split handing in soft_offline_in_use_page()
  nfit: acpi_nfit_ctl(): Check out_obj->type in the right place
  clk: ingenic: Fix doc of ingenic_cgu_div_info
  clk: ingenic: Fix round_rate misbehaving with non-integer dividers
  clk: clk-twl6040: Fix imprecise external abort for pdmclk
  ext2: Fix underflow in ext2_max_size()
  ext4: fix crash during online resizing
  cpufreq: pxa2xx: remove incorrect __init annotation
  cpufreq: tegra124: add missing of_node_put()
  libertas_tf: don't set URB_ZERO_PACKET on IN USB transfer
  crypto: pcbc - remove bogus memcpy()s with src == dest
  Btrfs: fix corruption reading shared and compressed extents after hole punching
  btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
  m68k: Add -ffreestanding to CFLAGS
  splice: don't merge into linked buffers
  fs/devpts: always delete dcache dentry-s in dput()
  scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
  scsi: sd: Optimal I/O size should be a multiple of physical block size
  scsi: virtio_scsi: don't send sc payload with tmfs
  s390/virtio: handle find on invalid queue gracefully
  clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
  clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
  regulator: s2mpa01: Fix step values for some LDOs
  regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
  spi: pxa2xx: Setup maximum supported DMA transfer length
  spi: ti-qspi: Fix mmap read when more than one CS in use
  ACPI / device_sysfs: Avoid OF modalias creation for removed device
  tracing: Do not free iter->trace in fail path of tracing_open_pipe()
  tracing: Use strncpy instead of memcpy for string keys in hist triggers
  CIFS: Fix read after write for files with read caching
  CIFS: Do not reset lease state to NONE on lease break
  crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
  crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
  libnvdimm: Fix altmap reservation size calculation
  libnvdimm/pmem: Honor force_raw for legacy pmem regions
  libnvdimm/label: Clear 'updating' flag after label-set update
  stm class: Prevent division by zero
  tmpfs: fix uninitialized return value in shmem_link
  net: set static variable an initial value in atl2_probe()
  nfp: bpf: fix ALU32 high bits clearance bug
  nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
  net: thunderx: make CFG_DONE message to run through generic send-ack sequence
  mac80211_hwsim: propagate genlmsg_reply return code
  phonet: fix building with clang
  ARC: uacces: remove lp_start, lp_end from clobber list
  ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
  tmpfs: fix link accounting when a tmpfile is linked in
  net: marvell: mvneta: fix DMA debug warning
  arm64: Relax GIC version check during early boot
  ASoC: topology: free created components in tplg load error
  net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
  qmi_wwan: apply SET_DTR quirk to Sierra WP7607
  pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
  net: systemport: Fix reception of BPDUs
  scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
  assoc_array: Fix shortcut creation
  ARM: 8824/1: fix a migrating irq bug when hotplug cpu
  clk: sunxi: A31: Fix wrong AHB gate number
  Input: st-keyscan - fix potential zalloc NULL dereference
  i2c: cadence: Fix the hold bit setting
  net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
  mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  mm/gup: fix gup_pmd_range() for dax
  floppy: check_events callback should not return a negative number
  Input: matrix_keypad - use flush_delayed_work()
  Input: cap11xx - switch to using set_brightness_blocking()
  ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
  s390/dasd: fix using offset into zero size array error
  gpu: ipu-v3: Fix CSI offsets for imx53
  gpu: ipu-v3: Fix i.MX51 CSI control registers offset
  crypto: ahash - fix another early termination in hash walk
  crypto: caam - fixed handling of sg list
  stm class: Fix an endless loop in channel allocation
  iio: adc: exynos-adc: Fix NULL pointer exception on unbind
  ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
  9p/net: fix memory leak in p9_client_create
  9p: use inode->i_lock to protect i_size_write() under 32-bit
  media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
  FROMLIST: psi: introduce psi monitor
  FROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h
  FROMLIST: psi: track changed states
  FROMLIST: psi: split update_stats into parts
  FROMLIST: psi: rename psi fields in preparation for psi trigger addition
  FROMLIST: psi: make psi_enable static
  FROMLIST: psi: introduce state_mask to represent stalled psi states
  ANDROID: cuttlefish_defconfig: Enable CONFIG_PSI
  BACKPORT: kernel: cgroup: add poll file operation
  BACKPORT: fs: kernfs: add poll file operation
  UPSTREAM: psi: avoid divide-by-zero crash inside virtual machines
  UPSTREAM: psi: clarify the Kconfig text for the default-disable option
  UPSTREAM: psi: fix aggregation idle shut-off
  UPSTREAM: cgroup add cftype->open/release() callbacks
  UPSTREAM: kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
  UPSTREAM: kernfs: fix locking around kernfs_ops->release() callback
  UPSTREAM: kernfs: add kernfs_ops->open/release() callbacks
  UPSTREAM: psi: fix reference to kernel commandline enable
  BACKPORT: psi: make disabling/enabling easier for vendor kernels
  UPSTREAM: kernel/sched/psi.c: simplify cgroup_move_task()
  BACKPORT: psi: cgroup support
  UPSTREAM: psi: pressure stall information for CPU, memory, and IO
  BACKPORT: sched: introduce this_rq_lock_irq()
  BACKPORT: sched: sched.h: make rq locking and clock functions available in stats.h
  UPSTREAM: sched: loadavg: make calc_load_n() public
  UPSTREAM: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
  BACKPORT: delayacct: track delays from thrashing cache pages
  BACKPORT: mm: workingset: tell cache transitions from workingset thrashing
  BACKPORT: cgroup: misc changes
  UPSTREAM: sched/headers: Remove <linux/sched.h> from <linux/sched/loadavg.h>
  UPSTREAM: sched/headers: Move loadavg related definitions from <linux/sched.h> to <linux/sched/loadavg.h>
  UPSTREAM: sched/headers, delayacct: Move the 'struct task_delay_info' definition from <linux/sched.h> to <linux/delayacct.h>
  UPSTREAM: sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h>
  BACKPORT: sched/core: Add wrappers for lockdep_(un)pin_lock()
  UPSTREAM: Avoid page waitqueue race leaving possible page locker waiting
  UPSTREAM: mm: add PageWaiters indicating tasks are waiting for a page bit
  UPSTREAM: workqueue: make workqueue available early during boot
  ANDROID: ion_dummy_driver: Remove SYSTEM_CONTIG heap
  ANDROID: ion_dummy_driver: Rework ion_dummy_driver to avoid direct indexing into the heaps
  ANDROID: ion_dummy_driver: Use IS_ERR_OR_NULL() before destroying heaps
  ANDROID: sched/fair: fix energy compute when a cluster is only a cpu core in multi-cluster system

Conflicts:
	arch/arm/kernel/irq.c
	drivers/scsi/sd.c
	include/linux/cgroup.h
	include/linux/sched.h
	kernel/sched/Makefile
	kernel/sched/core.c
	mm/page_alloc.c

Change-Id: I387345d8f6d2145e2456fbead266c897475dfb4c
Signed-off-by: default avatarjianzhou <jianzhou@codeaurora.org>
parents 50483ad2 72b54df9
Loading
Loading
Loading
Loading
+180 −0
Original line number Diff line number Diff line
================================
PSI - Pressure Stall Information
================================

:Date: April, 2018
:Author: Johannes Weiner <hannes@cmpxchg.org>

When CPU, memory or IO devices are contended, workloads experience
latency spikes, throughput losses, and run the risk of OOM kills.

Without an accurate measure of such contention, users are forced to
either play it safe and under-utilize their hardware resources, or
roll the dice and frequently suffer the disruptions resulting from
excessive overcommit.

The psi feature identifies and quantifies the disruptions caused by
such resource crunches and the time impact it has on complex workloads
or even entire systems.

Having an accurate measure of productivity losses caused by resource
scarcity aids users in sizing workloads to hardware--or provisioning
hardware according to workload demand.

As psi aggregates this information in realtime, systems can be managed
dynamically using techniques such as load shedding, migrating jobs to
other systems or data centers, or strategically pausing or killing low
priority or restartable batch jobs.

This allows maximizing hardware utilization without sacrificing
workload health or risking major disruptions such as OOM kills.

Pressure interface
==================

Pressure information for each resource is exported through the
respective file in /proc/pressure/ -- cpu, memory, and io.

The format for CPU is as such:

some avg10=0.00 avg60=0.00 avg300=0.00 total=0

and for memory and IO:

some avg10=0.00 avg60=0.00 avg300=0.00 total=0
full avg10=0.00 avg60=0.00 avg300=0.00 total=0

The "some" line indicates the share of time in which at least some
tasks are stalled on a given resource.

The "full" line indicates the share of time in which all non-idle
tasks are stalled on a given resource simultaneously. In this state
actual CPU cycles are going to waste, and a workload that spends
extended time in this state is considered to be thrashing. This has
severe impact on performance, and it's useful to distinguish this
situation from a state where some tasks are stalled but the CPU is
still doing productive work. As such, time spent in this subset of the
stall state is tracked separately and exported in the "full" averages.

The ratios are tracked as recent trends over ten, sixty, and three
hundred second windows, which gives insight into short term events as
well as medium and long term trends. The total absolute stall time is
tracked and exported as well, to allow detection of latency spikes
which wouldn't necessarily make a dent in the time averages, or to
average trends over custom time frames.

Monitoring for pressure thresholds
==================================

Users can register triggers and use poll() to be woken up when resource
pressure exceeds certain thresholds.

A trigger describes the maximum cumulative stall time over a specific
time window, e.g. 100ms of total stall time within any 500ms window to
generate a wakeup event.

To register a trigger user has to open psi interface file under
/proc/pressure/ representing the resource to be monitored and write the
desired threshold and time window. The open file descriptor should be
used to wait for trigger events using select(), poll() or epoll().
The following format is used:

<some|full> <stall amount in us> <time window in us>

For example writing "some 150000 1000000" into /proc/pressure/memory
would add 150ms threshold for partial memory stall measured within
1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
would add 50ms threshold for full io stall measured within 1sec time window.

Triggers can be set on more than one psi metric and more than one trigger
for the same psi metric can be specified. However for each trigger a separate
file descriptor is required to be able to poll it separately from others,
therefore for each trigger a separate open() syscall should be made even
when opening the same psi interface file.

Monitors activate only when system enters stall state for the monitored
psi metric and deactivates upon exit from the stall state. While system is
in the stall state psi signal growth is monitored at a rate of 10 times per
tracking window.

The kernel accepts window sizes ranging from 500ms to 10s, therefore min
monitoring update interval is 50ms and max is 1s. Min limit is set to
prevent overly frequent polling. Max limit is chosen as a high enough number
after which monitors are most likely not needed and psi averages can be used
instead.

When activated, psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when system is
bouncing in and out of the stall state.

Notifications to the userspace are rate-limited to one per tracking window.

The trigger will de-register when the file descriptor used to define the
trigger  is closed.

Userspace monitor usage example
===============================

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>

/*
 * Monitor memory partial stall with 1s tracking window size
 * and 150ms threshold.
 */
int main() {
	const char trig[] = "some 150000 1000000";
	struct pollfd fds;
	int n;

	fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
	if (fds.fd < 0) {
		printf("/proc/pressure/memory open error: %s\n",
			strerror(errno));
		return 1;
	}
	fds.events = POLLPRI;

	if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
		printf("/proc/pressure/memory write error: %s\n",
			strerror(errno));
		return 1;
	}

	printf("waiting for events...\n");
	while (1) {
		n = poll(&fds, 1, -1);
		if (n < 0) {
			printf("poll error: %s\n", strerror(errno));
			return 1;
		}
		if (fds.revents & POLLERR) {
			printf("got POLLERR, event source is gone\n");
			return 0;
		}
		if (fds.revents & POLLPRI) {
			printf("event triggered!\n");
		} else {
			printf("unknown event received: 0x%x\n", fds.revents);
			return 1;
		}
	}

	return 0;
}

Cgroup2 interface
=================

In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
mounted, pressure stall information is also tracked for tasks grouped
into cgroups. Each subdirectory in the cgroupfs mountpoint contains
cpu.pressure, memory.pressure, and io.pressure files; the format is
the same as the /proc/pressure/ files.

Per-cgroup psi monitors can be specified and used the same way as
system-wide ones.
+18 −0
Original line number Diff line number Diff line
@@ -717,6 +717,12 @@ All time durations are in microseconds.
	$PERIOD duration.  If only one number is written, $MAX is
	updated.

  cpu.pressure
	A read-only nested-key file which exists on non-root cgroups.

	Shows pressure stall information for CPU. See
	Documentation/accounting/psi.txt for details.


5-2. Memory

@@ -925,6 +931,12 @@ PAGE_SIZE multiple when read back.
	Swap usage hard limit.  If a cgroup's swap usage reaches this
	limit, anonymous meomry of the cgroup will not be swapped out.

  memory.pressure
	A read-only nested-key file which exists on non-root cgroups.

	Shows pressure stall information for memory. See
	Documentation/accounting/psi.txt for details.


5-2-2. Usage Guidelines

@@ -1055,6 +1067,12 @@ blk-mq devices.

	  8:16 rbps=2097152 wbps=max riops=max wiops=max

  io.pressure
	A read-only nested-key file which exists on non-root cgroups.

	Shows pressure stall information for IO. See
	Documentation/accounting/psi.txt for details.


5-3-2. Writeback

+4 −0
Original line number Diff line number Diff line
@@ -3409,6 +3409,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
			before loading.
			See Documentation/blockdev/ramdisk.txt.

	psi=		[KNL] Enable or disable pressure stall information
			tracking.
			Format: <bool>

	psmouse.proto=	[HW,MOUSE] Highest PS2 mouse protocol extension to
			probe for; one of (bare|imps|exps|lifebook|any).
	psmouse.rate=	[HW,MOUSE] Set desired mouse report rate, in reports
+1 −1
Original line number Diff line number Diff line
VERSION = 4
PATCHLEVEL = 9
SUBLEVEL = 164
SUBLEVEL = 165
EXTRAVERSION =
NAME = Roaring Lionus

+4 −4
Original line number Diff line number Diff line
@@ -209,7 +209,7 @@ __arc_copy_from_user(void *to, const void __user *from, unsigned long n)
		*/
		  "=&r" (tmp), "+r" (to), "+r" (from)
		:
		: "lp_count", "lp_start", "lp_end", "memory");
		: "lp_count", "memory");

		return n;
	}
@@ -438,7 +438,7 @@ __arc_copy_to_user(void __user *to, const void *from, unsigned long n)
		 */
		  "=&r" (tmp), "+r" (to), "+r" (from)
		:
		: "lp_count", "lp_start", "lp_end", "memory");
		: "lp_count", "memory");

		return n;
	}
@@ -658,7 +658,7 @@ static inline unsigned long __arc_clear_user(void __user *to, unsigned long n)
	"	.previous			\n"
	: "+r"(d_char), "+r"(res)
	: "i"(0)
	: "lp_count", "lp_start", "lp_end", "memory");
	: "lp_count", "memory");

	return res;
}
@@ -691,7 +691,7 @@ __arc_strncpy_from_user(char *dst, const char __user *src, long count)
	"	.previous			\n"
	: "+r"(res), "+r"(dst), "+r"(src), "=r"(val)
	: "g"(-EFAULT), "r"(count)
	: "lp_count", "lp_start", "lp_end", "memory");
	: "lp_count", "memory");

	return res;
}
Loading