Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 769c2b3f authored by Danny Lin's avatar Danny Lin Committed by Akshay Kakatkar
Browse files

arm64: dts: lito: Optimize frequency tables and energy model

Each cluster's minimum frequency is now its most efficient frequency by
ULPMark-CM [1] score (CoreMark [2] iterations per millijoule of energy)
and the energy model has been recalculated to accomodate for the
frequency changes. All measurements and tuning have been done for
the SM7250-AB (Snapdragon 765G) variant of lito.

Inefficient intermediate frequencies have been removed for performance
and power reasons. The maximum frequency for each cluster, however
inefficient, has been retained for maximum peak performance. Efficient
frequency selection has been performed based on ULPMark-CM scores (I/mJ)
and manual discretion.

Power and performance measurements were made using my freqbench [3]
benchmark, which isolates, offlines, and disables the timer tick on
test CPUs to maximize accuracy.

The energy model dynamic-power-coefficient values were calculated with
    DPC = µW / MHz / V^2
for each OPP, and averaged across all OPPs within each cluster for the
final coefficient. Voltages were obtained from the qcom-cpufreq-hw
driver that reads the base voltages from the OSM LUT programmed into the
SoC.

Normalized DMIPS/MHz capacity scale values for each CPU have also been
updated to reflect measurements. Instead of using DMIPS/MHz, we use
CoreMarks/MHz (CoreMark iterations per second per MHz), which serves the
same purpose. For each CPU, the final capacity-dmips-mhz value is the
CM/MHz value of its maximum frequency normalized to SCHED_CAPACITY_SCALE
(1024) for the fastest CPU on the system.

As a positive side-effect, the new energy model with reduced frequencies
fits actual power usage much better than a model with all frequencies
included. Qualcomm's combination of voltages, clocks, cores, OSM config,
and other factors result in very big discrepancies between frequencies,
so it can't be modeled well with a single coefficient. That results in
the standard deviation of per-frequency DPCs being up to 75% of the
final (average) DPC in the worst-case scenario: the little cluster on
SM7250-AB.

All human-readable source data is included below. All benchmark data,
including the raw samples in machine-readable JSON format, can be found
in the freqbench repository [4].

------------------------------------------------------------------------

New efficient frequency tables with power and performance stats:

===== CPU 1 =====
Frequencies: 300 576 614 864 1075 1363 1516 1651 1804

1517:  5627     3.7 C/MHz    204 mW    9.0 J   27.6 I/mJ    44.4 s
1805:  6696     3.7 C/MHz    254 mW    9.5 J   26.4 I/mJ    37.3 s

===== CPU 6 =====
Frequencies: 652 940 1152 1478 1728 1900 2092 2208

1478: 11514     7.8 C/MHz    345 mW    7.5 J   33.4 I/mJ    21.7 s
1728: 13456     7.8 C/MHz    481 mW    8.9 J   28.0 I/mJ    18.6 s
2208: 17179     7.8 C/MHz    701 mW   10.2 J   24.5 I/mJ    14.6 s

===== CPU 7 =====
Frequencies: 806 1094 1401 1766 1996 2188 2304 2400

1766: 13749     7.8 C/MHz    482 mW    8.8 J   28.5 I/mJ    18.2 s
2189: 16911     7.7 C/MHz    722 mW   10.7 J   23.4 I/mJ    14.8 s
2304: 17942     7.8 C/MHz    813 mW   11.3 J   22.1 I/mJ    13.9 s
2400: 18686     7.8 C/MHz    895 mW   12.0 J   20.9 I/mJ    13.4 s

------------------------------------------------------------------------

Source data for SM7250-AC:

Frequency domains: cpu1 cpu6 cpu7
Offline CPUs: cpu1 cpu2 cpu3 cpu4 cpu5 cpu6 cpu7
Baseline power usage: 614 mW

===== CPU 1 =====
Frequencies: 300 576 614 864 1075 1363 1516 1651 1804

 300:  1109     3.7 C/MHz    128 mW   28.9 J    8.6 I/mJ   225.4 s
 576:  2134     3.7 C/MHz    140 mW   16.5 J   15.2 I/mJ   117.2 s
 614:  2276     3.7 C/MHz    148 mW   16.2 J   15.4 I/mJ   109.8 s
 864:  3203     3.7 C/MHz    190 mW   14.8 J   16.9 I/mJ    78.1 s
1075:  3346     3.1 C/MHz    182 mW   13.6 J   18.4 I/mJ    74.7 s
1363:  5057     3.7 C/MHz    255 mW   12.6 J   19.9 I/mJ    49.4 s
1517:  5627     3.7 C/MHz    204 mW    9.0 J   27.6 I/mJ    44.4 s
1651:  6126     3.7 C/MHz    252 mW   10.3 J   24.3 I/mJ    40.8 s
1805:  6696     3.7 C/MHz    254 mW    9.5 J   26.4 I/mJ    37.3 s

===== CPU 6 =====
Frequencies: 652 940 1152 1478 1728 1900 2092 2208

 653:  2540     3.9 C/MHz    161 mW   15.8 J   15.8 I/mJ    98.4 s
 941:  7322     7.8 C/MHz    228 mW    7.8 J   32.1 I/mJ    34.2 s
1152:  8600     7.5 C/MHz    312 mW    9.1 J   27.6 I/mJ    29.1 s
1478: 11514     7.8 C/MHz    345 mW    7.5 J   33.4 I/mJ    21.7 s
1728: 13456     7.8 C/MHz    481 mW    8.9 J   28.0 I/mJ    18.6 s
1901: 14809     7.8 C/MHz    614 mW   10.4 J   24.1 I/mJ    16.9 s
2093: 16299     7.8 C/MHz    724 mW   11.1 J   22.5 I/mJ    15.4 s
2208: 17179     7.8 C/MHz    701 mW   10.2 J   24.5 I/mJ    14.6 s

===== CPU 7 =====
Frequencies: 806 1094 1401 1766 1996 2188 2304 2400

 806:  6276     7.8 C/MHz    261 mW   10.4 J   24.0 I/mJ    39.8 s
1094:  8521     7.8 C/MHz    317 mW    9.3 J   26.9 I/mJ    29.4 s
1402: 10912     7.8 C/MHz    424 mW    9.7 J   25.7 I/mJ    22.9 s
1766: 13749     7.8 C/MHz    482 mW    8.8 J   28.5 I/mJ    18.2 s
1997: 15547     7.8 C/MHz    674 mW   10.9 J   23.0 I/mJ    16.1 s
2189: 16911     7.7 C/MHz    722 mW   10.7 J   23.4 I/mJ    14.8 s
2304: 17942     7.8 C/MHz    813 mW   11.3 J   22.1 I/mJ    13.9 s
2400: 18686     7.8 C/MHz    895 mW   12.0 J   20.9 I/mJ    13.4 s

------------------------------------------------------------------------

[1] https://www.eembc.org/ulpmark/#cm
[2] https://www.eembc.org/coremark/
[3] https://github.com/kdrag0n/freqbench
[4] https://github.com/kdrag0n/freqbench/tree/master/results/p5/250kiter



Signed-off-by: default avatarDanny Lin <danny@kdrag0n.dev>
parent 0d2edded
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment