arm64: dts: lito: Optimize frequency tables and energy model
Each cluster's minimum frequency is now its most efficient frequency by
ULPMark-CM [1] score (CoreMark [2] iterations per millijoule of energy)
and the energy model has been recalculated to accomodate for the
frequency changes. All measurements and tuning have been done for
the SM7250-AB (Snapdragon 765G) variant of lito.
Inefficient intermediate frequencies have been removed for performance
and power reasons. The maximum frequency for each cluster, however
inefficient, has been retained for maximum peak performance. Efficient
frequency selection has been performed based on ULPMark-CM scores (I/mJ)
and manual discretion.
Power and performance measurements were made using my freqbench [3]
benchmark, which isolates, offlines, and disables the timer tick on
test CPUs to maximize accuracy.
The energy model dynamic-power-coefficient values were calculated with
DPC = µW / MHz / V^2
for each OPP, and averaged across all OPPs within each cluster for the
final coefficient. Voltages were obtained from the qcom-cpufreq-hw
driver that reads the base voltages from the OSM LUT programmed into the
SoC.
Normalized DMIPS/MHz capacity scale values for each CPU have also been
updated to reflect measurements. Instead of using DMIPS/MHz, we use
CoreMarks/MHz (CoreMark iterations per second per MHz), which serves the
same purpose. For each CPU, the final capacity-dmips-mhz value is the
CM/MHz value of its maximum frequency normalized to SCHED_CAPACITY_SCALE
(1024) for the fastest CPU on the system.
As a positive side-effect, the new energy model with reduced frequencies
fits actual power usage much better than a model with all frequencies
included. Qualcomm's combination of voltages, clocks, cores, OSM config,
and other factors result in very big discrepancies between frequencies,
so it can't be modeled well with a single coefficient. That results in
the standard deviation of per-frequency DPCs being up to 75% of the
final (average) DPC in the worst-case scenario: the little cluster on
SM7250-AB.
All human-readable source data is included below. All benchmark data,
including the raw samples in machine-readable JSON format, can be found
in the freqbench repository [4].
------------------------------------------------------------------------
New efficient frequency tables with power and performance stats:
===== CPU 1 =====
Frequencies: 300 576 614 864 1075 1363 1516 1651 1804
1517: 5627 3.7 C/MHz 204 mW 9.0 J 27.6 I/mJ 44.4 s
1805: 6696 3.7 C/MHz 254 mW 9.5 J 26.4 I/mJ 37.3 s
===== CPU 6 =====
Frequencies: 652 940 1152 1478 1728 1900 2092 2208
1478: 11514 7.8 C/MHz 345 mW 7.5 J 33.4 I/mJ 21.7 s
1728: 13456 7.8 C/MHz 481 mW 8.9 J 28.0 I/mJ 18.6 s
2208: 17179 7.8 C/MHz 701 mW 10.2 J 24.5 I/mJ 14.6 s
===== CPU 7 =====
Frequencies: 806 1094 1401 1766 1996 2188 2304 2400
1766: 13749 7.8 C/MHz 482 mW 8.8 J 28.5 I/mJ 18.2 s
2189: 16911 7.7 C/MHz 722 mW 10.7 J 23.4 I/mJ 14.8 s
2304: 17942 7.8 C/MHz 813 mW 11.3 J 22.1 I/mJ 13.9 s
2400: 18686 7.8 C/MHz 895 mW 12.0 J 20.9 I/mJ 13.4 s
------------------------------------------------------------------------
Source data for SM7250-AC:
Frequency domains: cpu1 cpu6 cpu7
Offline CPUs: cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu7
Baseline power usage: 614 mW
===== CPU 1 =====
Frequencies: 300 576 614 864 1075 1363 1516 1651 1804
300: 1109 3.7 C/MHz 128 mW 28.9 J 8.6 I/mJ 225.4 s
576: 2134 3.7 C/MHz 140 mW 16.5 J 15.2 I/mJ 117.2 s
614: 2276 3.7 C/MHz 148 mW 16.2 J 15.4 I/mJ 109.8 s
864: 3203 3.7 C/MHz 190 mW 14.8 J 16.9 I/mJ 78.1 s
1075: 3346 3.1 C/MHz 182 mW 13.6 J 18.4 I/mJ 74.7 s
1363: 5057 3.7 C/MHz 255 mW 12.6 J 19.9 I/mJ 49.4 s
1517: 5627 3.7 C/MHz 204 mW 9.0 J 27.6 I/mJ 44.4 s
1651: 6126 3.7 C/MHz 252 mW 10.3 J 24.3 I/mJ 40.8 s
1805: 6696 3.7 C/MHz 254 mW 9.5 J 26.4 I/mJ 37.3 s
===== CPU 6 =====
Frequencies: 652 940 1152 1478 1728 1900 2092 2208
653: 2540 3.9 C/MHz 161 mW 15.8 J 15.8 I/mJ 98.4 s
941: 7322 7.8 C/MHz 228 mW 7.8 J 32.1 I/mJ 34.2 s
1152: 8600 7.5 C/MHz 312 mW 9.1 J 27.6 I/mJ 29.1 s
1478: 11514 7.8 C/MHz 345 mW 7.5 J 33.4 I/mJ 21.7 s
1728: 13456 7.8 C/MHz 481 mW 8.9 J 28.0 I/mJ 18.6 s
1901: 14809 7.8 C/MHz 614 mW 10.4 J 24.1 I/mJ 16.9 s
2093: 16299 7.8 C/MHz 724 mW 11.1 J 22.5 I/mJ 15.4 s
2208: 17179 7.8 C/MHz 701 mW 10.2 J 24.5 I/mJ 14.6 s
===== CPU 7 =====
Frequencies: 806 1094 1401 1766 1996 2188 2304 2400
806: 6276 7.8 C/MHz 261 mW 10.4 J 24.0 I/mJ 39.8 s
1094: 8521 7.8 C/MHz 317 mW 9.3 J 26.9 I/mJ 29.4 s
1402: 10912 7.8 C/MHz 424 mW 9.7 J 25.7 I/mJ 22.9 s
1766: 13749 7.8 C/MHz 482 mW 8.8 J 28.5 I/mJ 18.2 s
1997: 15547 7.8 C/MHz 674 mW 10.9 J 23.0 I/mJ 16.1 s
2189: 16911 7.7 C/MHz 722 mW 10.7 J 23.4 I/mJ 14.8 s
2304: 17942 7.8 C/MHz 813 mW 11.3 J 22.1 I/mJ 13.9 s
2400: 18686 7.8 C/MHz 895 mW 12.0 J 20.9 I/mJ 13.4 s
------------------------------------------------------------------------
[1] https://www.eembc.org/ulpmark/#cm
[2] https://www.eembc.org/coremark/
[3] https://github.com/kdrag0n/freqbench
[4] https://github.com/kdrag0n/freqbench/tree/master/results/p5/250kiter
Signed-off-by:
Danny Lin <danny@kdrag0n.dev>
Loading
Please register or sign in to comment