Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 6be10b2d authored by Linux Build Service Account's avatar Linux Build Service Account Committed by Gerrit - the friendly Code Review server
Browse files

Merge "sched: Packing support until a frequency threshold"

parents 0899fc9f 57da6261
Loading
Loading
Loading
Loading
+82 −2
Original line number Diff line number Diff line
@@ -22,7 +22,7 @@ CONTENTS
   5.3 Scheduler Tick
   5.4 Load Balancer
   5.5 Real Time Tasks
   5.6 Stop-Class Tasks
   5.6 Task packing
6. Frequency Guidance
   6.1 Per-CPU Window-Based Stats
   6.1 Per-task Window-Based Stats
@@ -571,15 +571,19 @@ both tasks and CPUs to aid in the placement of tasks.
  the scheduler is tracking the demand of each task it can make an educated
  guess as to whether a CPU will become idle in the near future.

  There are two tunable parameters which are used to determine whether
  There are three tunable parameters which are used to determine whether
  a CPU is mostly idle:

  /sys/devices/system/cpu/cpuX/sched_mostly_idle_nr_run
  /sys/devices/system/cpu/cpuX/sched_mostly_idle_load
  /sys/devices/system/cpu/cpuX/sched_mostly_idle_freq

  Note that these tunables are per-cpu. If a CPU does not have more than
  sched_mostly_idle_nr_run runnable tasks and is not more than
  sched_mostly_idle_load percent busy, it is considered mostly idle.
  Additionally if a cpu's sched_mostly_idle_freq is non-zero and its current
  frequency is less than threshold, then scheduler will attempt to pack
  tasks on the most power-efficient cpu in the cluster.

- spill threshold

@@ -894,6 +898,71 @@ HMP scheduler brings in a change which avoids fast-path and always resorts to
slow-path. Further cpu with lowest power-rating from candidate list of cpus is
chosen as cpu for placing waking real-time task.

*** 5.6 Task packing

Task packing is letting one cpu take up more than one task in an attempt to
improve power (and in some cases performance). Power benefit is derived by
avoiding wakeup cost for idle cpus from their deep sleep states. For example,
consider a system with one cpu busy while other cpus are idle and in deep
sleep state. A small task in this situation needs to be placed on a suitable
cpu. Placing the small task on the busy cpu will likely not hurt its
performance (it is after all a low-demand task) while helping gain on power
because we avoid the cost associated with waking idle cpu from deep sleep
state.

Task packing can have good or bad implications for power and performance.

a. Power implications

As described in the small task wakeup example, task packing can be beneficial
for power. However, the adverse impact on power can arise when packing on one
cpu can increase its busy time and hence result in frequency raise.

b. Performance implications

The most obvious negative impact on performance because of packing is
increased scheduling latencies for tasks that can occur. Positive impact on
performance from packing has also been seen. This arises from the fact
that a waking task, when woken to busy cpu because of packing, will incur very
low latency to run immediately, when compared to being woken to a idle cpu in
deep sleep state. In later case, task has to wait for cpu to exit sleep state,
considerable enough in some cases to hurt performance.

Packing thus is a delicate matter to play with! The following parameters control
packing behavior.

- sched_small_task
	This parameter specifies demand threshold below which a task will be
classified as "small". As described in Sec 5.2 ("Task Wakeup and
select_best_cpu()"), for small tasks wakeups, a busy cpu is prefered as target
rather than idle cpu.

- mostly_idle_load and mostly_idle_nr_run

These are per-cpu parameters that define mostly_idle thresholds for a cpu. A cpu
whose load < mostly_idle_load AND whose nr_running is < mostly_idle_nr_run is
classified as mostly_idle. See further description of "mostly_idle" thresholds
in Sec 5.

- mostly_idle_freq

This is a per-cpu parameter. If non-zero for a cpu which is part of a cluster
and cluster current frequency is less than this threshold, then scheduler will
poack all tasks on a single cpu in cluster. The cpu chosen is the first most
power-efficient cpu found while scanning cluster's online cpus.

For some low band of frequency, spread of task on all available cpus can be
groslly power-inefficient. As an example, consider two tasks that each need
500MHz. Packing them on one cpu could lead to 1GHz. In spread case, we incur
cost of two cpus running at 500MHz, while in packed case, we incur the cost of
one cpu running at 1GHz. Based on the silicon characteristics, where leakage
power can be dominant factor, former can be worse on power rather than latter.
Running at slow frequency (in spread case) can actually makes it worse on
leakage power (especially if 500MHz and 1GHz share the same voltage point).
sched_mostly_idle_freq is set based on silicon characteristics and can provide
a winning argument for both power and performance.


=====================
6. FREQUENCY GUIDANCE
=====================
@@ -1271,6 +1340,17 @@ comparison. Scheduler will request a raise in cpu frequency when heavy tasks
wakeup after at least one window of sleep, where window size is defined by
sched_ravg_window. Value 0 will disable this feature.

** 7.21 sched_mostly_idle_freq

Appears at: /sys/devices/system/cpu/cpuX/sched_mostly_idle_freq

Default value: 0

This tunable is intended to achieve task packing behavior based on cluster
frequency. Hence it is strongly advised to have all cpus in a cluster have the
same value for mostly_idle_freq. For more details, see section on "Task
packing" (sec 5.6).

=========================
8. HMP SCHEDULER TRACE POINTS
=========================
+41 −0
Original line number Diff line number Diff line
@@ -205,6 +205,42 @@ static ssize_t __ref store_sched_mostly_idle_load(struct device *dev,
	return err;
}

static ssize_t show_sched_mostly_idle_freq(struct device *dev,
		 struct device_attribute *attr, char *buf)
{
	struct cpu *cpu = container_of(dev, struct cpu, dev);
	ssize_t rc;
	int cpunum;
	unsigned int mostly_idle_freq;

	cpunum = cpu->dev.id;

	mostly_idle_freq = sched_get_cpu_mostly_idle_freq(cpunum);

	rc = snprintf(buf, PAGE_SIZE-2, "%d\n", mostly_idle_freq);

	return rc;
}

static ssize_t __ref store_sched_mostly_idle_freq(struct device *dev,
				  struct device_attribute *attr,
				  const char *buf, size_t count)
{
	struct cpu *cpu = container_of(dev, struct cpu, dev);
	int cpuid = cpu->dev.id, err;
	unsigned int mostly_idle_freq;

	err = kstrtoint(strstrip((char *)buf), 0, &mostly_idle_freq);
	if (err)
		return err;

	err = sched_set_cpu_mostly_idle_freq(cpuid, mostly_idle_freq);
	if (err >= 0)
		err = count;

	return err;
}

static ssize_t show_sched_mostly_idle_nr_run(struct device *dev,
		 struct device_attribute *attr, char *buf)
{
@@ -241,6 +277,8 @@ static ssize_t __ref store_sched_mostly_idle_nr_run(struct device *dev,
	return err;
}

static DEVICE_ATTR(sched_mostly_idle_freq, 0664, show_sched_mostly_idle_freq,
						store_sched_mostly_idle_freq);
static DEVICE_ATTR(sched_mostly_idle_load, 0664, show_sched_mostly_idle_load,
						store_sched_mostly_idle_load);
static DEVICE_ATTR(sched_mostly_idle_nr_run, 0664,
@@ -424,6 +462,9 @@ int __cpuinit register_cpu(struct cpu *cpu, int num)
	if (!error)
		error = device_create_file(&cpu->dev,
					 &dev_attr_sched_mostly_idle_nr_run);
	if (!error)
		error = device_create_file(&cpu->dev,
					 &dev_attr_sched_mostly_idle_freq);
#endif

	return error;
+3 −0
Original line number Diff line number Diff line
@@ -1921,6 +1921,9 @@ extern int sched_set_cpu_mostly_idle_load(int cpu, int mostly_idle_pct);
extern int sched_get_cpu_mostly_idle_load(int cpu);
extern int sched_set_cpu_mostly_idle_nr_run(int cpu, int nr_run);
extern int sched_get_cpu_mostly_idle_nr_run(int cpu);
extern int
sched_set_cpu_mostly_idle_freq(int cpu, unsigned int mostly_idle_freq);
extern unsigned int sched_get_cpu_mostly_idle_freq(int cpu);

#else
static inline int sched_set_boost(int enable)
+1 −0
Original line number Diff line number Diff line
@@ -8997,6 +8997,7 @@ void __init sched_init(void)
		rq->hmp_flags = 0;
		rq->mostly_idle_load = pct_to_real(20);
		rq->mostly_idle_nr_run = 3;
		rq->mostly_idle_freq = 0;
#ifdef CONFIG_SCHED_FREQ_INPUT
		rq->old_busy_time = 0;
		rq->curr_runnable_sum = rq->prev_runnable_sum = 0;
+62 −0
Original line number Diff line number Diff line
@@ -1389,6 +1389,25 @@ int sched_set_cpu_mostly_idle_load(int cpu, int mostly_idle_pct)
	return 0;
}

int sched_set_cpu_mostly_idle_freq(int cpu, unsigned int mostly_idle_freq)
{
	struct rq *rq = cpu_rq(cpu);

	if (mostly_idle_freq > rq->max_possible_freq)
		return -EINVAL;

	rq->mostly_idle_freq = mostly_idle_freq;

	return 0;
}

unsigned int sched_get_cpu_mostly_idle_freq(int cpu)
{
	struct rq *rq = cpu_rq(cpu);

	return rq->mostly_idle_freq;
}

int sched_get_cpu_mostly_idle_load(int cpu)
{
	struct rq *rq = cpu_rq(cpu);
@@ -1795,6 +1814,42 @@ static int skip_cpu(struct task_struct *p, int cpu, int reason)
	return skip;
}

/*
 * Select a single cpu in cluster as target for packing, iff cluster frequency
 * is less than a threshold level
 */
static int select_packing_target(struct task_struct *p, int best_cpu)
{
	struct rq *rq = cpu_rq(best_cpu);
	struct cpumask search_cpus;
	int i;
	int min_cost = INT_MAX;
	int target = best_cpu;

	if (rq->cur_freq >= rq->mostly_idle_freq)
		return best_cpu;

	/* Don't pack if current freq is low because of throttling */
	if (rq->max_freq <= rq->mostly_idle_freq)
		return best_cpu;

	cpumask_and(&search_cpus, tsk_cpus_allowed(p), cpu_online_mask);
	cpumask_and(&search_cpus, &search_cpus, &rq->freq_domain_cpumask);

	/* Pick the first lowest power cpu as target */
	for_each_cpu(i, &search_cpus) {
		int cost = power_cost(p, i);

		if (cost < min_cost) {
			target = i;
			min_cost = cost;
		}
	}

	return target;
}


/* return cheapest cpu that can fit this task */
static int select_best_cpu(struct task_struct *p, int target, int reason)
{
@@ -1906,6 +1961,9 @@ done:
			best_cpu = fallback_idle_cpu;
	}

	if (cpu_rq(best_cpu)->mostly_idle_freq)
		best_cpu = select_packing_target(p, best_cpu);

	return best_cpu;
}

@@ -7212,6 +7270,10 @@ static inline int _nohz_kick_needed_hmp(struct rq *rq, int cpu, int *type)
	struct sched_domain *sd;
	int i;

	if (rq->mostly_idle_freq && rq->cur_freq < rq->mostly_idle_freq
		 && rq->max_freq > rq->mostly_idle_freq)
			return 0;

	if (rq->nr_running >= 2 && (rq->nr_running - rq->nr_small_tasks >= 2 ||
	     rq->nr_running > rq->mostly_idle_nr_run ||
		cpu_load(cpu) > rq->mostly_idle_load)) {
Loading