Merge branch 'sched-core-for-linus' of... (b8ae30ee) · Commits · e / devices / android_kernel_oneplus_sm7250

Documentation/RCU/torture.txt

+0 −10

Original line number	Diff line number	Diff line
		@@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following:
		sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0
		sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0
		sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0
		state: -1 / 0:0 3:0 4:0

		As before, the first four lines are similar to those for RCU.
		The last line shows the task-migration state. The first number is
		-1 if synchronize_sched_expedited() is idle, -2 if in the process of
		posting wakeups to the migration kthreads, and N when waiting on CPU N.
		Each of the colon-separated fields following the "/" is a CPU:state pair.
		Valid states are "0" for idle, "1" for waiting for quiescent state,
		"2" for passed through quiescent state, and "3" when a race with a
		CPU-hotplug event forces use of the synchronize_sched() primitive.


		USAGE

Documentation/scheduler/sched-design-CFS.txt

+3 −51

Original line number	Diff line number	Diff line
		@@ -211,7 +211,7 @@ provide fair CPU time to each such task group. For example, it may be
		desirable to first provide fair CPU time to each user on the system and then to
		each task belonging to a user.

		CONFIG_GROUP_SCHED strives to achieve exactly that. It lets tasks to be
		CONFIG_CGROUP_SCHED strives to achieve exactly that. It lets tasks to be
		grouped and divides CPU time fairly among such groups.

		CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and
		@@ -220,38 +220,11 @@ SCHED_RR) tasks.
		CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and
		SCHED_BATCH) tasks.

		At present, there are two (mutually exclusive) mechanisms to group tasks for
		CPU bandwidth control purposes:

		- Based on user id (CONFIG_USER_SCHED)

		With this option, tasks are grouped according to their user id.

		- Based on "cgroup" pseudo filesystem (CONFIG_CGROUP_SCHED)

		This options needs CONFIG_CGROUPS to be defined, and lets the administrator
		These options need CONFIG_CGROUPS to be defined, and let the administrator
		create arbitrary groups of tasks, using the "cgroup" pseudo filesystem. See
		Documentation/cgroups/cgroups.txt for more information about this filesystem.

		Only one of these options to group tasks can be chosen and not both.

		When CONFIG_USER_SCHED is defined, a directory is created in sysfs for each new
		user and a "cpu_share" file is added in that directory.

		# cd /sys/kernel/uids
		# cat 512/cpu_share # Display user 512's CPU share
		1024
		# echo 2048 > 512/cpu_share # Modify user 512's CPU share
		# cat 512/cpu_share # Display user 512's CPU share
		2048
		#

		CPU bandwidth between two users is divided in the ratio of their CPU shares.
		For example: if you would like user "root" to get twice the bandwidth of user
		"guest," then set the cpu_share for both the users such that "root"'s cpu_share
		is twice "guest"'s cpu_share.

		When CONFIG_CGROUP_SCHED is defined, a "cpu.shares" file is created for each
		When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
		group created using the pseudo filesystem. See example steps below to create
		task groups and modify their CPU share using the "cgroups" pseudo filesystem.

		@@ -273,24 +246,3 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem.

		# #Launch gmplayer (or your favourite movie player)
		# echo <movie_player_pid> > multimedia/tasks

		8. Implementation note: user namespaces

		User namespaces are intended to be hierarchical. But they are currently
		only partially implemented. Each of those has ramifications for CFS.

		First, since user namespaces are hierarchical, the /sys/kernel/uids
		presentation is inadequate. Eventually we will likely want to use sysfs
		tagging to provide private views of /sys/kernel/uids within each user
		namespace.

		Second, the hierarchical nature is intended to support completely
		unprivileged use of user namespaces. So if using user groups, then
		we want the users in a user namespace to be children of the user
		who created it.

		That is currently unimplemented. So instead, every user in a new
		user namespace will receive 1024 shares just like any user in the
		initial user namespace. Note that at the moment creation of a new
		user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and
		CAP_SETGID.

Documentation/scheduler/sched-rt-group.txt

+4 −16

Original line number	Diff line number	Diff line
		@@ -126,23 +126,12 @@ priority!
		2.3 Basis for grouping tasks
		----------------------------

		There are two compile-time settings for allocating CPU bandwidth. These are
		configured using the "Basis for grouping tasks" multiple choice menu under
		General setup > Group CPU Scheduler:

		a. CONFIG_USER_SCHED (aka "Basis for grouping tasks" = "user id")

		This lets you use the virtual files under
		"/sys/kernel/uids/<uid>/cpu_rt_runtime_us" to control he CPU time reserved for
		each user .

		The other option is:

		.o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
		Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
		CPU bandwidth to task groups.

		This uses the /cgroup virtual file system and
		"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each
		control group instead.
		control group.

		For more information on working with control groups, you should read
		Documentation/cgroups/cgroups.txt as well.
		@@ -161,8 +150,7 @@ For now, this can be simplified to just the following (but see Future plans):
		===============

		There is work in progress to make the scheduling period for each group
		("/sys/kernel/uids/<uid>/cpu_rt_period_us" or
		"/cgroup/<cgroup>/cpu.rt_period_us" respectively) configurable as well.
		("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well.

		The constraint on the period is that a subgroup must have a smaller or
		equal period to its parent. But realistically its not very useful _yet_

arch/s390/kernel/time.c

+0 −1

Original line number	Diff line number	Diff line
		@@ -391,7 +391,6 @@ static void __init time_init_wq(void)
		if (time_sync_wq)
		return;
		time_sync_wq = create_singlethread_workqueue("timesync");
		stop_machine_create();
		}

		/*

drivers/cpufreq/cpufreq_ondemand.c

+73 −2

Original line number	Diff line number	Diff line
		@@ -73,6 +73,7 @@ enum {DBS_NORMAL_SAMPLE, DBS_SUB_SAMPLE};

		struct cpu_dbs_info_s {
		cputime64_t prev_cpu_idle;
		cputime64_t prev_cpu_iowait;
		cputime64_t prev_cpu_wall;
		cputime64_t prev_cpu_nice;
		struct cpufreq_policy *cur_policy;
		@@ -108,6 +109,7 @@ static struct dbs_tuners {
		unsigned int down_differential;
		unsigned int ignore_nice;
		unsigned int powersave_bias;
		unsigned int io_is_busy;
		} dbs_tuners_ins = {
		.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
		.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
		@@ -148,6 +150,16 @@ static inline cputime64_t get_cpu_idle_time(unsigned int cpu, cputime64_t *wall)
		return idle_time;
		}

		static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wall)
		{
		u64 iowait_time = get_cpu_iowait_time_us(cpu, wall);

		if (iowait_time == -1ULL)
		return 0;

		return iowait_time;
		}

		/*
		* Find right freq to be set now with powersave_bias on.
		* Returns the freq_hi to be used right now and will set freq_hi_jiffies,
		@@ -249,6 +261,7 @@ static ssize_t show_##file_name \
		return sprintf(buf, "%u\n", dbs_tuners_ins.object); \
		}
		show_one(sampling_rate, sampling_rate);
		show_one(io_is_busy, io_is_busy);
		show_one(up_threshold, up_threshold);
		show_one(ignore_nice_load, ignore_nice);
		show_one(powersave_bias, powersave_bias);
		@@ -299,6 +312,23 @@ static ssize_t store_sampling_rate(struct kobject a, struct attribute b,
		return count;
		}

		static ssize_t store_io_is_busy(struct kobject a, struct attribute b,
		const char *buf, size_t count)
		{
		unsigned int input;
		int ret;

		ret = sscanf(buf, "%u", &input);
		if (ret != 1)
		return -EINVAL;

		mutex_lock(&dbs_mutex);
		dbs_tuners_ins.io_is_busy = !!input;
		mutex_unlock(&dbs_mutex);

		return count;
		}

		static ssize_t store_up_threshold(struct kobject a, struct attribute b,
		const char *buf, size_t count)
		{
		@@ -381,6 +411,7 @@ static struct global_attr _name = \
		__ATTR(_name, 0644, show_##_name, store_##_name)

		define_one_rw(sampling_rate);
		define_one_rw(io_is_busy);
		define_one_rw(up_threshold);
		define_one_rw(ignore_nice_load);
		define_one_rw(powersave_bias);
		@@ -392,6 +423,7 @@ static struct attribute *dbs_attributes[] = {
		&up_threshold.attr,
		&ignore_nice_load.attr,
		&powersave_bias.attr,
		&io_is_busy.attr,
		NULL
		};

		@@ -470,14 +502,15 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)

		for_each_cpu(j, policy->cpus) {
		struct cpu_dbs_info_s *j_dbs_info;
		cputime64_t cur_wall_time, cur_idle_time;
		unsigned int idle_time, wall_time;
		cputime64_t cur_wall_time, cur_idle_time, cur_iowait_time;
		unsigned int idle_time, wall_time, iowait_time;
		unsigned int load, load_freq;
		int freq_avg;

		j_dbs_info = &per_cpu(od_cpu_dbs_info, j);

		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
		cur_iowait_time = get_cpu_iowait_time(j, &cur_wall_time);

		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
		j_dbs_info->prev_cpu_wall);
		@@ -487,6 +520,10 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
		j_dbs_info->prev_cpu_idle);
		j_dbs_info->prev_cpu_idle = cur_idle_time;

		iowait_time = (unsigned int) cputime64_sub(cur_iowait_time,
		j_dbs_info->prev_cpu_iowait);
		j_dbs_info->prev_cpu_iowait = cur_iowait_time;

		if (dbs_tuners_ins.ignore_nice) {
		cputime64_t cur_nice;
		unsigned long cur_nice_jiffies;
		@@ -504,6 +541,16 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
		idle_time += jiffies_to_usecs(cur_nice_jiffies);
		}

		/*
		* For the purpose of ondemand, waiting for disk IO is an
		* indication that you're performance critical, and not that
		* the system is actually idle. So subtract the iowait time
		* from the cpu idle time.
		*/

		if (dbs_tuners_ins.io_is_busy && idle_time >= iowait_time)
		idle_time -= iowait_time;

		if (unlikely(!wall_time \|\| wall_time < idle_time))
		continue;

		@@ -617,6 +664,29 @@ static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
		cancel_delayed_work_sync(&dbs_info->work);
		}

		/*
		* Not all CPUs want IO time to be accounted as busy; this dependson how
		* efficient idling at a higher frequency/voltage is.
		* Pavel Machek says this is not so for various generations of AMD and old
		* Intel systems.
		* Mike Chan (androidlcom) calis this is also not true for ARM.
		* Because of this, whitelist specific known (series) of CPUs by default, and
		* leave all others up to the user.
		*/
		static int should_io_be_busy(void)
		{
		#if defined(CONFIG_X86)
		/*
		* For Intel, Core 2 (model 15) andl later have an efficient idle.
		*/
		if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
		boot_cpu_data.x86 == 6 &&
		boot_cpu_data.x86_model >= 15)
		return 1;
		#endif
		return 0;
		}

		static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
		unsigned int event)
		{
		@@ -679,6 +749,7 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
		dbs_tuners_ins.sampling_rate =
		max(min_sampling_rate,
		latency * LATENCY_MULTIPLIER);
		dbs_tuners_ins.io_is_busy = should_io_be_busy();
		}
		mutex_unlock(&dbs_mutex);