Merge branch 'akpm' (patches from Andrew) (59d53737) · Commits · e / devices / android_kernel_samsung_universal8895

Documentation/cgroups/unified-hierarchy.txt

+79 −0

Original line number	Diff line number	Diff line
		@@ -327,6 +327,85 @@ supported and the interface files "release_agent" and
		- use_hierarchy is on by default and the cgroup file for the flag is
		not created.

		- The original lower boundary, the soft limit, is defined as a limit
		that is per default unset. As a result, the set of cgroups that
		global reclaim prefers is opt-in, rather than opt-out. The costs
		for optimizing these mostly negative lookups are so high that the
		implementation, despite its enormous size, does not even provide the
		basic desirable behavior. First off, the soft limit has no
		hierarchical meaning. All configured groups are organized in a
		global rbtree and treated like equal peers, regardless where they
		are located in the hierarchy. This makes subtree delegation
		impossible. Second, the soft limit reclaim pass is so aggressive
		that it not just introduces high allocation latencies into the
		system, but also impacts system performance due to overreclaim, to
		the point where the feature becomes self-defeating.

		The memory.low boundary on the other hand is a top-down allocated
		reserve. A cgroup enjoys reclaim protection when it and all its
		ancestors are below their low boundaries, which makes delegation of
		subtrees possible. Secondly, new cgroups have no reserve per
		default and in the common case most cgroups are eligible for the
		preferred reclaim pass. This allows the new low boundary to be
		efficiently implemented with just a minor addition to the generic
		reclaim code, without the need for out-of-band data structures and
		reclaim passes. Because the generic reclaim code considers all
		cgroups except for the ones running low in the preferred first
		reclaim pass, overreclaim of individual groups is eliminated as
		well, resulting in much better overall workload performance.

		- The original high boundary, the hard limit, is defined as a strict
		limit that can not budge, even if the OOM killer has to be called.
		But this generally goes against the goal of making the most out of
		the available memory. The memory consumption of workloads varies
		during runtime, and that requires users to overcommit. But doing
		that with a strict upper limit requires either a fairly accurate
		prediction of the working set size or adding slack to the limit.
		Since working set size estimation is hard and error prone, and
		getting it wrong results in OOM kills, most users tend to err on the
		side of a looser limit and end up wasting precious resources.

		The memory.high boundary on the other hand can be set much more
		conservatively. When hit, it throttles allocations by forcing them
		into direct reclaim to work off the excess, but it never invokes the
		OOM killer. As a result, a high boundary that is chosen too
		aggressively will not terminate the processes, but instead it will
		lead to gradual performance degradation. The user can monitor this
		and make corrections until the minimal memory footprint that still
		gives acceptable performance is found.

		In extreme cases, with many concurrent allocations and a complete
		breakdown of reclaim progress within the group, the high boundary
		can be exceeded. But even then it's mostly better to satisfy the
		allocation from the slack available in other groups or the rest of
		the system than killing the group. Otherwise, memory.max is there
		to limit this type of spillover and ultimately contain buggy or even
		malicious applications.

		- The original control file names are unwieldy and inconsistent in
		many different ways. For example, the upper boundary hit count is
		exported in the memory.failcnt file, but an OOM event count has to
		be manually counted by listening to memory.oom_control events, and
		lower boundary / soft limit events have to be counted by first
		setting a threshold for that value and then counting those events.
		Also, usage and limit files encode their units in the filename.
		That makes the filenames very long, even though this is not
		information that a user needs to be reminded of every time they type
		out those names.

		To address these naming issues, as well as to signal clearly that
		the new interface carries a new configuration model, the naming
		conventions in it necessarily differ from the old interface.

		- The original limit files indicate the state of an unset limit with a
		Very High Number, and a configured limit can be unset by echoing -1
		into those files. But that very high number is implementation and
		architecture dependent and not very descriptive. And while -1 can
		be understood as an underflow into the highest possible value, -2 or
		-10M etc. do not work, so it's not consistent.

		memory.low, memory.high, and memory.max will use the string
		"infinity" to indicate and set the highest possible value.

		5. Planned Changes

Documentation/filesystems/proc.txt

+23 −0

Original line number	Diff line number	Diff line
		@@ -42,6 +42,7 @@ Table of Contents
		3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
		3.7 /proc/<pid>/task/<tid>/children - Information about task children
		3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
		3.9 /proc/<pid>/map_files - Information about memory mapped files

		4 Configuring procfs
		4.1 Mount options
		@@ -1763,6 +1764,28 @@ pair provide additional information particular to the objects they represent.
		with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
		still exhibits timer's remaining time.

		3.9 /proc/<pid>/map_files - Information about memory mapped files
		---------------------------------------------------------------------
		This directory contains symbolic links which represent memory mapped files
		the process is maintaining. Example output:

		\| lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
		\| lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
		\| lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
		\| ...
		\| lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
		\| lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls

		The name of a link represents the virtual memory bounds of a mapping, i.e.
		vm_area_struct::vm_start-vm_area_struct::vm_end.

		The main purpose of the map_files is to retrieve a set of memory mapped
		files in a fast way instead of parsing /proc/<pid>/maps or
		/proc/<pid>/smaps, both of which contain many more records. At the same
		time one can open(2) mappings from the listings of two processes and
		comparing their inode numbers to figure out which anonymous memory areas
		are actually shared.

		------------------------------------------------------------------------------
		Configuring procfs
		------------------------------------------------------------------------------

Documentation/sysctl/vm.txt

+6 −6

Original line number	Diff line number	Diff line
		@@ -555,12 +555,12 @@ this is causing problems for your system/application.

		oom_dump_tasks

		Enables a system-wide task dump (excluding kernel threads) to be
		produced when the kernel performs an OOM-killing and includes such
		information as pid, uid, tgid, vm size, rss, nr_ptes, swapents,
		oom_score_adj score, and name. This is helpful to determine why the
		OOM killer was invoked, to identify the rogue task that caused it,
		and to determine why the OOM killer chose the task it did to kill.
		Enables a system-wide task dump (excluding kernel threads) to be produced
		when the kernel performs an OOM-killing and includes such information as
		pid, uid, tgid, vm size, rss, nr_ptes, nr_pmds, swapents, oom_score_adj
		score, and name. This is helpful to determine why the OOM killer was
		invoked, to identify the rogue task that caused it, and to determine why
		the OOM killer chose the task it did to kill.

		If this is set to zero, this information is suppressed. On very
		large systems with thousands of tasks it may not be feasible to dump

Documentation/vm/pagemap.txt

+8 −0

Original line number	Diff line number	Diff line
		@@ -62,6 +62,8 @@ There are three components to pagemap:
		20. NOPAGE
		21. KSM
		22. THP
		23. BALLOON
		24. ZERO_PAGE

		Short descriptions to the page flags:

		@@ -102,6 +104,12 @@ Short descriptions to the page flags:
		22. THP
		contiguous pages which construct transparent hugepages

		23. BALLOON
		balloon compaction page

		24. ZERO_PAGE
		zero page for pfn_zero or huge_zero page

		[IO related page flags]
		1. ERROR IO error occurred
		3. UPTODATE page has up-to-date data

arch/alpha/include/asm/pgtable.h

+1 −1

Original line number	Diff line number	Diff line
		@@ -45,7 +45,7 @@ struct vm_area_struct;
		#define PTRS_PER_PMD (1UL << (PAGE_SHIFT-3))
		#define PTRS_PER_PGD (1UL << (PAGE_SHIFT-3))
		#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)
		#define FIRST_USER_ADDRESS 0
		#define FIRST_USER_ADDRESS 0UL

		/* Number of pointers that fit on a page: this will go away. */
		#define PTRS_PER_PAGE (1UL << (PAGE_SHIFT-3))