Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit c90423d1 authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move



Conflicts:
	kernel/Makefile

There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.

Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents ecf1f014 b8a21626
Loading
Loading
Loading
Loading
+76 −0
Original line number Diff line number Diff line
@@ -355,6 +355,82 @@ utilize.

==============================================================

numa_balancing

Enables/disables automatic page fault based NUMA memory
balancing. Memory is moved automatically to nodes
that access it often.

Enables/disables automatic NUMA memory balancing. On NUMA machines, there
is a performance penalty if remote memory is accessed by a CPU. When this
feature is enabled the kernel samples what task thread is accessing memory
by periodically unmapping pages and later trapping a page fault. At the
time of the page fault, it is determined if the data being accessed should
be migrated to a local memory node.

The unmapping of pages and trapping faults incur additional overhead that
ideally is offset by improved memory locality but there is no universal
guarantee. If the target workload is already bound to NUMA nodes then this
feature should be disabled. Otherwise, if the system overhead from the
feature is too high then the rate the kernel samples for NUMA hinting
faults may be controlled by the numa_balancing_scan_period_min_ms,
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
numa_balancing_migrate_deferred.

==============================================================

numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb

Automatic NUMA balancing scans tasks address space and unmaps pages to
detect if pages are properly placed or if the data should be migrated to a
memory node local to where the task is running.  Every "scan delay" the task
scans the next "scan size" number of pages in its address space. When the
end of the address space is reached the scanner restarts from the beginning.

In combination, the "scan delay" and "scan size" determine the scan rate.
When "scan delay" decreases, the scan rate increases.  The scan delay and
hence the scan rate of every task is adaptive and depends on historical
behaviour. If pages are properly placed then the scan delay increases,
otherwise the scan delay decreases.  The "scan size" is not adaptive but
the higher the "scan size", the higher the scan rate.

Higher scan rates incur higher system overhead as page faults must be
trapped and potentially data must be migrated. However, the higher the scan
rate, the more quickly a tasks memory is migrated to a local node if the
workload pattern changes and minimises performance impact due to remote
memory accesses. These sysctls control the thresholds for scan delays and
the number of pages scanned.

numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
scan a tasks virtual memory. It effectively controls the maximum scanning
rate for each task.

numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
when it initially forks.

numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
scan a tasks virtual memory. It effectively controls the minimum scanning
rate for each task.

numa_balancing_scan_size_mb is how many megabytes worth of pages are
scanned for a given scan.

numa_balancing_settle_count is how many scan periods must complete before
the schedule balancer stops pushing the task towards a preferred node. This
gives the scheduler a chance to place the task on an alternative node if the
preferred node is overloaded.

numa_balancing_migrate_deferred is how many page migrations get skipped
unconditionally, after a page migration is skipped because a page is shared
with other tasks. This reduces page migration overhead, and determines
how much stronger the "move task near its memory" policy scheduler becomes,
versus the "move memory near its task" memory management policy, for workloads
with shared memory.

==============================================================

osrelease, ostype & version:

# cat osrelease
+2 −0
Original line number Diff line number Diff line
@@ -7304,6 +7304,8 @@ S: Maintained
F:	kernel/sched/
F:	include/linux/sched.h
F:	include/uapi/linux/sched.h
F:	kernel/wait.c
F:	include/linux/wait.h

SCORE ARCHITECTURE
M:	Chen Liqin <liqin.linux@gmail.com>
+1 −0
Original line number Diff line number Diff line
@@ -3,3 +3,4 @@ generic-y += clkdev.h

generic-y += exec.h
generic-y += trace_clock.h
generic-y += preempt.h
+1 −0
Original line number Diff line number Diff line
@@ -46,3 +46,4 @@ generic-y += ucontext.h
generic-y += user.h
generic-y += vga.h
generic-y += xor.h
generic-y += preempt.h
+1 −0
Original line number Diff line number Diff line
@@ -32,3 +32,4 @@ generic-y += termios.h
generic-y += timex.h
generic-y += trace_clock.h
generic-y += unaligned.h
generic-y += preempt.h
Loading