Merge "sched: Packing support until a frequency threshold" (6be10b2d) · Commits · e / devices / android_kernel_sony_msm8994

Documentation/scheduler/sched-hmp.txt

+82 −2

Original line number	Diff line number	Diff line
		@@ -22,7 +22,7 @@ CONTENTS
		5.3 Scheduler Tick
		5.4 Load Balancer
		5.5 Real Time Tasks
		5.6 Stop-Class Tasks
		5.6 Task packing
		6. Frequency Guidance
		6.1 Per-CPU Window-Based Stats
		6.1 Per-task Window-Based Stats
		@@ -571,15 +571,19 @@ both tasks and CPUs to aid in the placement of tasks.
		the scheduler is tracking the demand of each task it can make an educated
		guess as to whether a CPU will become idle in the near future.

		There are two tunable parameters which are used to determine whether
		There are three tunable parameters which are used to determine whether
		a CPU is mostly idle:

		/sys/devices/system/cpu/cpuX/sched_mostly_idle_nr_run
		/sys/devices/system/cpu/cpuX/sched_mostly_idle_load
		/sys/devices/system/cpu/cpuX/sched_mostly_idle_freq

		Note that these tunables are per-cpu. If a CPU does not have more than
		sched_mostly_idle_nr_run runnable tasks and is not more than
		sched_mostly_idle_load percent busy, it is considered mostly idle.
		Additionally if a cpu's sched_mostly_idle_freq is non-zero and its current
		frequency is less than threshold, then scheduler will attempt to pack
		tasks on the most power-efficient cpu in the cluster.

		- spill threshold

		@@ -894,6 +898,71 @@ HMP scheduler brings in a change which avoids fast-path and always resorts to
		slow-path. Further cpu with lowest power-rating from candidate list of cpus is
		chosen as cpu for placing waking real-time task.

		*** 5.6 Task packing

		Task packing is letting one cpu take up more than one task in an attempt to
		improve power (and in some cases performance). Power benefit is derived by
		avoiding wakeup cost for idle cpus from their deep sleep states. For example,
		consider a system with one cpu busy while other cpus are idle and in deep
		sleep state. A small task in this situation needs to be placed on a suitable
		cpu. Placing the small task on the busy cpu will likely not hurt its
		performance (it is after all a low-demand task) while helping gain on power
		because we avoid the cost associated with waking idle cpu from deep sleep
		state.

		Task packing can have good or bad implications for power and performance.

		a. Power implications

		As described in the small task wakeup example, task packing can be beneficial
		for power. However, the adverse impact on power can arise when packing on one
		cpu can increase its busy time and hence result in frequency raise.

		b. Performance implications

		The most obvious negative impact on performance because of packing is
		increased scheduling latencies for tasks that can occur. Positive impact on
		performance from packing has also been seen. This arises from the fact
		that a waking task, when woken to busy cpu because of packing, will incur very
		low latency to run immediately, when compared to being woken to a idle cpu in
		deep sleep state. In later case, task has to wait for cpu to exit sleep state,
		considerable enough in some cases to hurt performance.

		Packing thus is a delicate matter to play with! The following parameters control
		packing behavior.

		- sched_small_task
		This parameter specifies demand threshold below which a task will be
		classified as "small". As described in Sec 5.2 ("Task Wakeup and
		select_best_cpu()"), for small tasks wakeups, a busy cpu is prefered as target
		rather than idle cpu.

		- mostly_idle_load and mostly_idle_nr_run

		These are per-cpu parameters that define mostly_idle thresholds for a cpu. A cpu
		whose load < mostly_idle_load AND whose nr_running is < mostly_idle_nr_run is
		classified as mostly_idle. See further description of "mostly_idle" thresholds
		in Sec 5.

		- mostly_idle_freq

		This is a per-cpu parameter. If non-zero for a cpu which is part of a cluster
		and cluster current frequency is less than this threshold, then scheduler will
		poack all tasks on a single cpu in cluster. The cpu chosen is the first most
		power-efficient cpu found while scanning cluster's online cpus.

		For some low band of frequency, spread of task on all available cpus can be
		groslly power-inefficient. As an example, consider two tasks that each need
		500MHz. Packing them on one cpu could lead to 1GHz. In spread case, we incur
		cost of two cpus running at 500MHz, while in packed case, we incur the cost of
		one cpu running at 1GHz. Based on the silicon characteristics, where leakage
		power can be dominant factor, former can be worse on power rather than latter.
		Running at slow frequency (in spread case) can actually makes it worse on
		leakage power (especially if 500MHz and 1GHz share the same voltage point).
		sched_mostly_idle_freq is set based on silicon characteristics and can provide
		a winning argument for both power and performance.


		=====================
		6. FREQUENCY GUIDANCE
		=====================
		@@ -1271,6 +1340,17 @@ comparison. Scheduler will request a raise in cpu frequency when heavy tasks
		wakeup after at least one window of sleep, where window size is defined by
		sched_ravg_window. Value 0 will disable this feature.

		** 7.21 sched_mostly_idle_freq

		Appears at: /sys/devices/system/cpu/cpuX/sched_mostly_idle_freq

		Default value: 0

		This tunable is intended to achieve task packing behavior based on cluster
		frequency. Hence it is strongly advised to have all cpus in a cluster have the
		same value for mostly_idle_freq. For more details, see section on "Task
		packing" (sec 5.6).

		=========================
		8. HMP SCHEDULER TRACE POINTS
		=========================

drivers/base/cpu.c

+41 −0

Original line number	Diff line number	Diff line
		@@ -205,6 +205,42 @@ static ssize_t __ref store_sched_mostly_idle_load(struct device *dev,
		return err;
		}

		static ssize_t show_sched_mostly_idle_freq(struct device *dev,
		struct device_attribute attr, char buf)
		{
		struct cpu *cpu = container_of(dev, struct cpu, dev);
		ssize_t rc;
		int cpunum;
		unsigned int mostly_idle_freq;

		cpunum = cpu->dev.id;

		mostly_idle_freq = sched_get_cpu_mostly_idle_freq(cpunum);

		rc = snprintf(buf, PAGE_SIZE-2, "%d\n", mostly_idle_freq);

		return rc;
		}

		static ssize_t __ref store_sched_mostly_idle_freq(struct device *dev,
		struct device_attribute *attr,
		const char *buf, size_t count)
		{
		struct cpu *cpu = container_of(dev, struct cpu, dev);
		int cpuid = cpu->dev.id, err;
		unsigned int mostly_idle_freq;

		err = kstrtoint(strstrip((char *)buf), 0, &mostly_idle_freq);
		if (err)
		return err;

		err = sched_set_cpu_mostly_idle_freq(cpuid, mostly_idle_freq);
		if (err >= 0)
		err = count;

		return err;
		}

		static ssize_t show_sched_mostly_idle_nr_run(struct device *dev,
		struct device_attribute attr, char buf)
		{
		@@ -241,6 +277,8 @@ static ssize_t __ref store_sched_mostly_idle_nr_run(struct device *dev,
		return err;
		}

		static DEVICE_ATTR(sched_mostly_idle_freq, 0664, show_sched_mostly_idle_freq,
		store_sched_mostly_idle_freq);
		static DEVICE_ATTR(sched_mostly_idle_load, 0664, show_sched_mostly_idle_load,
		store_sched_mostly_idle_load);
		static DEVICE_ATTR(sched_mostly_idle_nr_run, 0664,
		@@ -424,6 +462,9 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
		if (!error)
		error = device_create_file(&cpu->dev,
		&dev_attr_sched_mostly_idle_nr_run);
		if (!error)
		error = device_create_file(&cpu->dev,
		&dev_attr_sched_mostly_idle_freq);
		#endif

		return error;

include/linux/sched.h

+3 −0

Original line number	Diff line number	Diff line
		@@ -1921,6 +1921,9 @@ extern int sched_set_cpu_mostly_idle_load(int cpu, int mostly_idle_pct);
		extern int sched_get_cpu_mostly_idle_load(int cpu);
		extern int sched_set_cpu_mostly_idle_nr_run(int cpu, int nr_run);
		extern int sched_get_cpu_mostly_idle_nr_run(int cpu);
		extern int
		sched_set_cpu_mostly_idle_freq(int cpu, unsigned int mostly_idle_freq);
		extern unsigned int sched_get_cpu_mostly_idle_freq(int cpu);

		#else
		static inline int sched_set_boost(int enable)

kernel/sched/core.c

+1 −0

Original line number	Diff line number	Diff line
		@@ -8997,6 +8997,7 @@ void __init sched_init(void)
		rq->hmp_flags = 0;
		rq->mostly_idle_load = pct_to_real(20);
		rq->mostly_idle_nr_run = 3;
		rq->mostly_idle_freq = 0;
		#ifdef CONFIG_SCHED_FREQ_INPUT
		rq->old_busy_time = 0;
		rq->curr_runnable_sum = rq->prev_runnable_sum = 0;

kernel/sched/fair.c

+62 −0

Original line number	Diff line number	Diff line
		@@ -1389,6 +1389,25 @@ int sched_set_cpu_mostly_idle_load(int cpu, int mostly_idle_pct)
		return 0;
		}

		int sched_set_cpu_mostly_idle_freq(int cpu, unsigned int mostly_idle_freq)
		{
		struct rq *rq = cpu_rq(cpu);

		if (mostly_idle_freq > rq->max_possible_freq)
		return -EINVAL;

		rq->mostly_idle_freq = mostly_idle_freq;

		return 0;
		}

		unsigned int sched_get_cpu_mostly_idle_freq(int cpu)
		{
		struct rq *rq = cpu_rq(cpu);

		return rq->mostly_idle_freq;
		}

		int sched_get_cpu_mostly_idle_load(int cpu)
		{
		struct rq *rq = cpu_rq(cpu);
		@@ -1795,6 +1814,42 @@ static int skip_cpu(struct task_struct *p, int cpu, int reason)
		return skip;
		}

		/*
		* Select a single cpu in cluster as target for packing, iff cluster frequency
		* is less than a threshold level
		*/
		static int select_packing_target(struct task_struct *p, int best_cpu)
		{
		struct rq *rq = cpu_rq(best_cpu);
		struct cpumask search_cpus;
		int i;
		int min_cost = INT_MAX;
		int target = best_cpu;

		if (rq->cur_freq >= rq->mostly_idle_freq)
		return best_cpu;

		/* Don't pack if current freq is low because of throttling */
		if (rq->max_freq <= rq->mostly_idle_freq)
		return best_cpu;

		cpumask_and(&search_cpus, tsk_cpus_allowed(p), cpu_online_mask);
		cpumask_and(&search_cpus, &search_cpus, &rq->freq_domain_cpumask);

		/* Pick the first lowest power cpu as target */
		for_each_cpu(i, &search_cpus) {
		int cost = power_cost(p, i);

		if (cost < min_cost) {
		target = i;
		min_cost = cost;
		}
		}

		return target;
		}


		/* return cheapest cpu that can fit this task */
		static int select_best_cpu(struct task_struct *p, int target, int reason)
		{
		@@ -1906,6 +1961,9 @@ done:
		best_cpu = fallback_idle_cpu;
		}

		if (cpu_rq(best_cpu)->mostly_idle_freq)
		best_cpu = select_packing_target(p, best_cpu);

		return best_cpu;
		}

		@@ -7212,6 +7270,10 @@ static inline int _nohz_kick_needed_hmp(struct rq rq, int cpu, int type)
		struct sched_domain *sd;
		int i;

		if (rq->mostly_idle_freq && rq->cur_freq < rq->mostly_idle_freq
		&& rq->max_freq > rq->mostly_idle_freq)
		return 0;

		if (rq->nr_running >= 2 && (rq->nr_running - rq->nr_small_tasks >= 2 \|\|
		rq->nr_running > rq->mostly_idle_nr_run \|\|
		cpu_load(cpu) > rq->mostly_idle_load)) {