Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 441e10ac authored by Dmitry Shmidt's avatar Dmitry Shmidt
Browse files

Merge remote-tracking branch 'common/android-4.4' into android-4.4.y

parents 2a3670c6 1510827f
Loading
Loading
Loading
Loading
+360 −0
Original line number Diff line number Diff line
===========================================================
Energy cost bindings for Energy Aware Scheduling
===========================================================

===========================================================
1 - Introduction
===========================================================

This note specifies bindings required for energy-aware scheduling
(EAS)[1]. Historically, the scheduler's primary objective has been
performance.  EAS aims to provide an alternative objective - energy
efficiency. EAS relies on a simple platform energy cost model to
guide scheduling decisions.  The model only considers the CPU
subsystem.

This note is aligned with the definition of the layout of physical
CPUs in the system as described in the ARM topology binding
description [2]. The concept is applicable to any system so long as
the cost model data is provided for those processing elements in
that system's topology that EAS is required to service.

Processing elements refer to hardware threads, CPUs and clusters of
related CPUs in increasing order of hierarchy.

EAS requires two key cost metrics - busy costs and idle costs. Busy
costs comprise of a list of compute capacities for the processing
element in question and the corresponding power consumption at that
capacity.  Idle costs comprise of a list of power consumption values
for each idle state [C-state] that the processing element supports.
For a detailed description of these metrics, their derivation and
their use see [3].

These cost metrics are required for processing elements in all
scheduling domain levels that EAS is required to service.

===========================================================
2 - energy-costs node
===========================================================

Energy costs for the processing elements in scheduling domains that
EAS is required to service are defined in the energy-costs node
which acts as a container for the actual per processing element cost
nodes. A single energy-costs node is required for a given system.

- energy-costs node

	Usage: Required

	Description: The energy-costs node is a container node and
	it's sub-nodes describe costs for each processing element at
	all scheduling domain levels that EAS is required to
	service.

	Node name must be "energy-costs".

	The energy-costs node's parent node must be the cpus node.

	The energy-costs node's child nodes can be:

	- one or more cost nodes.

	Any other configuration is considered invalid.

The energy-costs node can only contain a single type of child node
whose bindings are described in paragraph 4.

===========================================================
3 - energy-costs node child nodes naming convention
===========================================================

energy-costs child nodes must follow a naming convention where the
node name must be "thread-costN", "core-costN", "cluster-costN"
depending on whether the costs in the node are for a thread, core or
cluster.  N (where N = {0, 1, ...}) is the node number and has no
bearing to the OS' logical thread, core or cluster index.

===========================================================
4 - cost node bindings
===========================================================

Bindings for cost nodes are defined as follows:

- cluster-cost node

	Description: must be declared within an energy-costs node. A
	system can contain multiple clusters and each cluster
	serviced by EAS	must have a corresponding cluster-costs
	node.

	The cluster-cost node name must be "cluster-costN" as
	described in 3 above.

	A cluster-cost node must be a leaf node with no children.

	Properties for cluster-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

- core-cost node

	Description: must be declared within an energy-costs node. A
	system can contain multiple cores and each core serviced by
	EAS must have a corresponding core-cost node.

	The core-cost node name must be "core-costN" as described in
	3 above.

	A core-cost node must be a leaf node with no children.

	Properties for core-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

- thread-cost node

	Description: must be declared within an energy-costs node. A
	system can contain cores with multiple hardware threads and
	each thread serviced by EAS must have a corresponding
	thread-cost node.

	The core-cost node name must be "core-costN" as described in
	3 above.

	A core-cost node must be a leaf node with no children.

	Properties for thread-cost nodes are described in paragraph
	5 below.

	Any other configuration is considered invalid.

===========================================================
5 - Cost node properties
==========================================================

All cost node types must have only the following properties:

- busy-cost-data

	Usage: required
	Value type: An array of 2-item tuples. Each item is of type
	u32.
	Definition: The first item in the tuple is the capacity
	value as described in [3]. The second item in the tuple is
	the energy cost value as described in [3].

- idle-cost-data

	Usage: required
	Value type: An array of 1-item tuples. The item is of type
	u32.
	Definition: The item in the tuple is the energy cost value
	as described in [3].

===========================================================
4 - Extensions to the cpu node
===========================================================

The cpu node is extended with a property that establishes the
connection between the processing element represented by the cpu
node and the cost-nodes associated with this processing element.

The connection is expressed in line with the topological hierarchy
that this processing element belongs to starting with the level in
the hierarchy that this processing element itself belongs to through
to the highest level that EAS is required to service.  The
connection cannot be sparse and must be contiguous from the
processing element's level through to the highest desired level. The
highest desired level must be the same for all processing elements.

Example: Given that a cpu node may represent a thread that is a part
of a core, this property may contain multiple elements which
associate the thread with cost nodes describing the costs for the
thread itself, the core the thread belongs to, the cluster the core
belongs to and so on. The elements must be ordered from the lowest
level nodes to the highest desired level that EAS must service. The
highest desired level must be the same for all cpu nodes. The
elements must not be sparse: there must be elements for the current
thread, the next level of hierarchy (core) and so on without any
'holes'.

Example: Given that a cpu node may represent a core that is a part
of a cluster of related cpus this property may contain multiple
elements which associate the core with cost nodes describing the
costs for the core itself, the cluster the core belongs to and so
on. The elements must be ordered from the lowest level nodes to the
highest desired level that EAS must service. The highest desired
level must be the same for all cpu nodes. The elements must not be
sparse: there must be elements for the current thread, the next
level of hierarchy (core) and so on without any 'holes'.

If the system comprises of hierarchical clusters of clusters, this
property will contain multiple associations with the relevant number
of cluster elements in hierarchical order.

Property added to the cpu node:

- sched-energy-costs

	Usage: required
	Value type: List of phandles
	Definition: a list of phandles to specific cost nodes in the
	energy-costs parent node that correspond to the processing
	element represented by this cpu node in hierarchical order
	of topology.

	The order of phandles in the list is significant. The first
	phandle is to the current processing element's own cost
	node.  Subsequent phandles are to higher hierarchical level
	cost nodes up until the maximum level that EAS is to
	service.

	All cpu nodes must have the same highest level cost node.

	The phandle list must not be sparsely populated with handles
	to non-contiguous hierarchical levels. See commentary above
	for clarity.

	Any other configuration is invalid.

===========================================================
5 - Example dts
===========================================================

Example 1 (ARM 64-bit, 6-cpu system, two clusters of cpus, one
cluster of 2 Cortex-A57 cpus, one cluster of 4 Cortex-A53 cpus):

cpus {
	#address-cells = <2>;
	#size-cells = <0>;
	.
	.
	.
	A57_0: cpu@0 {
		compatible = "arm,cortex-a57","arm,armv8";
		reg = <0x0 0x0>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A57_L2>;
		clocks = <&scpi_dvfs 0>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>;
	};

	A57_1: cpu@1 {
		compatible = "arm,cortex-a57","arm,armv8";
		reg = <0x0 0x1>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A57_L2>;
		clocks = <&scpi_dvfs 0>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>;
	};

	A53_0: cpu@100 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x100>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	A53_1: cpu@101 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x101>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	A53_2: cpu@102 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x102>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	A53_3: cpu@103 {
		compatible = "arm,cortex-a53","arm,armv8";
		reg = <0x0 0x103>;
		device_type = "cpu";
		enable-method = "psci";
		next-level-cache = <&A53_L2>;
		clocks = <&scpi_dvfs 1>;
		cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
	};

	energy-costs {
		CPU_COST_0: core-cost0 {
			busy-cost-data = <
				417   168
				579   251
				744   359
				883   479
				1024  616
			>;
			idle-cost-data = <
				15
				0
			>;
		};
		CPU_COST_1: core-cost1 {
			busy-cost-data = <
				235 33
				302 46
				368 61
				406 76
				447 93
			>;
			idle-cost-data = <
				6
				0
			>;
		};
		CLUSTER_COST_0: cluster-cost0 {
			busy-cost-data = <
				417   24
				579   32
				744   43
				883   49
				1024  64
			>;
			idle-cost-data = <
				65
				24
			>;
		};
		CLUSTER_COST_1: cluster-cost1 {
			busy-cost-data = <
				235 26
				303 30
				368 39
				406 47
				447 57
			>;
			idle-cost-data = <
				56
				17
			>;
		};
	};
};

===============================================================================
[1] https://lkml.org/lkml/2015/5/12/728
[2] Documentation/devicetree/bindings/topology.txt
[3] Documentation/scheduler/sched-energy.txt
+362 −0

File added.

Preview size limit exceeded, changes collapsed.

+366 −0

File added.

Preview size limit exceeded, changes collapsed.

+7 −0
Original line number Diff line number Diff line
@@ -3,6 +3,7 @@

#ifdef CONFIG_ARM_CPU_TOPOLOGY

#include <linux/cpufreq.h>
#include <linux/cpumask.h>

struct cputopo_arm {
@@ -24,6 +25,12 @@ void init_cpu_topology(void);
void store_cpu_topology(unsigned int cpuid);
const struct cpumask *cpu_coregroup_mask(int cpu);

#ifdef CONFIG_CPU_FREQ
#define arch_scale_freq_capacity cpufreq_scale_freq_capacity
#endif
#define arch_scale_cpu_capacity scale_cpu_capacity
extern unsigned long scale_cpu_capacity(struct sched_domain *sd, int cpu);

#else

static inline void init_cpu_topology(void) { }
+141 −8
Original line number Diff line number Diff line
@@ -42,9 +42,15 @@
 */
static DEFINE_PER_CPU(unsigned long, cpu_scale);

unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
unsigned long scale_cpu_capacity(struct sched_domain *sd, int cpu)
{
#ifdef CONFIG_CPU_FREQ
	unsigned long max_freq_scale = cpufreq_scale_max_freq_capacity(cpu);

	return per_cpu(cpu_scale, cpu) * max_freq_scale >> SCHED_CAPACITY_SHIFT;
#else
	return per_cpu(cpu_scale, cpu);
#endif
}

static void set_capacity_scale(unsigned int cpu, unsigned long capacity)
@@ -153,6 +159,8 @@ static void __init parse_dt_topology(void)

}

static const struct sched_group_energy * const cpu_core_energy(int cpu);

/*
 * Look for a customed capacity of a CPU in the cpu_capacity table during the
 * boot. The update of all CPUs is in O(n^2) for heteregeneous system but the
@@ -160,10 +168,14 @@ static void __init parse_dt_topology(void)
 */
static void update_cpu_capacity(unsigned int cpu)
{
	if (!cpu_capacity(cpu))
		return;
	unsigned long capacity = SCHED_CAPACITY_SCALE;

	set_capacity_scale(cpu, cpu_capacity(cpu) / middle_capacity);
	if (cpu_core_energy(cpu)) {
		int max_cap_idx = cpu_core_energy(cpu)->nr_cap_states - 1;
		capacity = cpu_core_energy(cpu)->cap_states[max_cap_idx].cap;
	}

	set_capacity_scale(cpu, capacity);

	pr_info("CPU%u: update cpu_capacity %lu\n",
		cpu, arch_scale_cpu_capacity(NULL, cpu));
@@ -275,17 +287,138 @@ void store_cpu_topology(unsigned int cpuid)
		cpu_topology[cpuid].socket_id, mpidr);
}

/*
 * ARM TC2 specific energy cost model data. There are no unit requirements for
 * the data. Data can be normalized to any reference point, but the
 * normalization must be consistent. That is, one bogo-joule/watt must be the
 * same quantity for all data, but we don't care what it is.
 */
static struct idle_state idle_states_cluster_a7[] = {
	 { .power = 25 }, /* arch_cpu_idle() (active idle) = WFI */
	 { .power = 25 }, /* WFI */
	 { .power = 10 }, /* cluster-sleep-l */
	};

static struct idle_state idle_states_cluster_a15[] = {
	 { .power = 70 }, /* arch_cpu_idle() (active idle) = WFI */
	 { .power = 70 }, /* WFI */
	 { .power = 25 }, /* cluster-sleep-b */
	};

static struct capacity_state cap_states_cluster_a7[] = {
	/* Cluster only power */
	 { .cap =  150, .power = 2967, }, /*  350 MHz */
	 { .cap =  172, .power = 2792, }, /*  400 MHz */
	 { .cap =  215, .power = 2810, }, /*  500 MHz */
	 { .cap =  258, .power = 2815, }, /*  600 MHz */
	 { .cap =  301, .power = 2919, }, /*  700 MHz */
	 { .cap =  344, .power = 2847, }, /*  800 MHz */
	 { .cap =  387, .power = 3917, }, /*  900 MHz */
	 { .cap =  430, .power = 4905, }, /* 1000 MHz */
	};

static struct capacity_state cap_states_cluster_a15[] = {
	/* Cluster only power */
	 { .cap =  426, .power =  7920, }, /*  500 MHz */
	 { .cap =  512, .power =  8165, }, /*  600 MHz */
	 { .cap =  597, .power =  8172, }, /*  700 MHz */
	 { .cap =  682, .power =  8195, }, /*  800 MHz */
	 { .cap =  768, .power =  8265, }, /*  900 MHz */
	 { .cap =  853, .power =  8446, }, /* 1000 MHz */
	 { .cap =  938, .power = 11426, }, /* 1100 MHz */
	 { .cap = 1024, .power = 15200, }, /* 1200 MHz */
	};

static struct sched_group_energy energy_cluster_a7 = {
	  .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a7),
	  .idle_states    = idle_states_cluster_a7,
	  .nr_cap_states  = ARRAY_SIZE(cap_states_cluster_a7),
	  .cap_states     = cap_states_cluster_a7,
};

static struct sched_group_energy energy_cluster_a15 = {
	  .nr_idle_states = ARRAY_SIZE(idle_states_cluster_a15),
	  .idle_states    = idle_states_cluster_a15,
	  .nr_cap_states  = ARRAY_SIZE(cap_states_cluster_a15),
	  .cap_states     = cap_states_cluster_a15,
};

static struct idle_state idle_states_core_a7[] = {
	 { .power = 0 }, /* arch_cpu_idle (active idle) = WFI */
	 { .power = 0 }, /* WFI */
	 { .power = 0 }, /* cluster-sleep-l */
	};

static struct idle_state idle_states_core_a15[] = {
	 { .power = 0 }, /* arch_cpu_idle (active idle) = WFI */
	 { .power = 0 }, /* WFI */
	 { .power = 0 }, /* cluster-sleep-b */
	};

static struct capacity_state cap_states_core_a7[] = {
	/* Power per cpu */
	 { .cap =  150, .power =  187, }, /*  350 MHz */
	 { .cap =  172, .power =  275, }, /*  400 MHz */
	 { .cap =  215, .power =  334, }, /*  500 MHz */
	 { .cap =  258, .power =  407, }, /*  600 MHz */
	 { .cap =  301, .power =  447, }, /*  700 MHz */
	 { .cap =  344, .power =  549, }, /*  800 MHz */
	 { .cap =  387, .power =  761, }, /*  900 MHz */
	 { .cap =  430, .power = 1024, }, /* 1000 MHz */
	};

static struct capacity_state cap_states_core_a15[] = {
	/* Power per cpu */
	 { .cap =  426, .power = 2021, }, /*  500 MHz */
	 { .cap =  512, .power = 2312, }, /*  600 MHz */
	 { .cap =  597, .power = 2756, }, /*  700 MHz */
	 { .cap =  682, .power = 3125, }, /*  800 MHz */
	 { .cap =  768, .power = 3524, }, /*  900 MHz */
	 { .cap =  853, .power = 3846, }, /* 1000 MHz */
	 { .cap =  938, .power = 5177, }, /* 1100 MHz */
	 { .cap = 1024, .power = 6997, }, /* 1200 MHz */
	};

static struct sched_group_energy energy_core_a7 = {
	  .nr_idle_states = ARRAY_SIZE(idle_states_core_a7),
	  .idle_states    = idle_states_core_a7,
	  .nr_cap_states  = ARRAY_SIZE(cap_states_core_a7),
	  .cap_states     = cap_states_core_a7,
};

static struct sched_group_energy energy_core_a15 = {
	  .nr_idle_states = ARRAY_SIZE(idle_states_core_a15),
	  .idle_states    = idle_states_core_a15,
	  .nr_cap_states  = ARRAY_SIZE(cap_states_core_a15),
	  .cap_states     = cap_states_core_a15,
};

/* sd energy functions */
static inline
const struct sched_group_energy * const cpu_cluster_energy(int cpu)
{
	return cpu_topology[cpu].socket_id ? &energy_cluster_a7 :
			&energy_cluster_a15;
}

static inline
const struct sched_group_energy * const cpu_core_energy(int cpu)
{
	return cpu_topology[cpu].socket_id ? &energy_core_a7 :
			&energy_core_a15;
}

static inline int cpu_corepower_flags(void)
{
	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN | \
	       SD_SHARE_CAP_STATES;
}

static struct sched_domain_topology_level arm_topology[] = {
#ifdef CONFIG_SCHED_MC
	{ cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
	{ cpu_coregroup_mask, cpu_corepower_flags, cpu_core_energy, SD_INIT_NAME(MC) },
#endif
	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
	{ cpu_cpu_mask, NULL, cpu_cluster_energy, SD_INIT_NAME(DIE) },
	{ NULL, },
};

Loading