Unverified Commit 5b68ef13 authored Feb 08, 2023 by Sandeep Dhavale Committed by Michael Bestas Apr 16, 2024
BACKPORT: erofs: add per-cpu threads for decompression as an option

Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.

The results were evaluated on arm64 Android devices running 5.10 kernel.

The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
|                         | workqueue | kthread_worker |  diff   |
+-------------------------+-----------+----------------+---------+
| Average (us)            |     15253 |           2914 | -80.89% |
| Median (us)             |     14001 |           2912 | -79.20% |
| Minimum (us)            |      3117 |           1027 | -67.05% |
| Maximum (us)            |     30170 |           3805 | -87.39% |
| Standard deviation (us) |      7166 |            359 |         |
+-------------------------+-----------+----------------+---------+

Background: Boot times and cold app launch benchmarks are very
important to the Android ecosystem as they directly translate to
responsiveness from user point of view. While EROFS provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.

Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and talk at [1]

[1] https://lpc.events/event/16/contributions/1338/

[ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue
             issue [2] on many arm64 devices is resolved. ]
[2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com



Bug: 271636421
Bug: 278520205
Test: launch_cvd
Change-Id: I9dce2bfd6f40ec6a210161b80cee7c0417b4edb3
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com


(cherry picked from commit 3fffb589b9a6e331e39cb75373ee7691acd7b109)
[dhavale: Fixed minor conflict as upstream now has zdata.h folded in
zdata.c]
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
(cherry picked from commit 566a7f6c6b3f5f13b766fe749bbdb45918b029ac)
[dhavale: Fixed minor conflicts in Kconfig and zdata.c]
(cherry picked from commit 2de95f5d183c2174c9380a902919c8e59e380293)
parent 260d03bf
Show whitespace changes
Inline Side-by-side
Please register or to comment