init: reboot: Simplify the RebootMonitorThread()
Since its introduction in r.android.com/946516, the RebootMonitorThread() has had a bit of a complex control flow. Specifically: 1. We create it in DoReboot() but it doesn't kick off right away. It starts out paused and an odd number of semaphore signals will resume it (and even number will pause it). 2. We signal it to start via signaling the semaphore (1 semaphore signal = odd = resume) 3. Before starting an fsck(), we pause the monitor via semaphore (2 semaphore signals = even = pause). 4. After finishing fsck(), we resume the monitor via semaphore (3 semaphore signals = odd = resume). 5. Finally, before doing a few last steps we signal the thread to exit. NOTE: when the code was first added RebootMonitorThread()'s timeout was 30 seconds. As of r.android.com/1275254 it's now 300 (it can be overridden by a property, but code seems designed for it to be long). Let's take a step back and make a few observations. a) We really don't want _any_ part of the reboot/shutdown sequence to be "unguarded" by a watchdog. Any time we pause and/or turn off the watchdog it's an opportunity for the system to get stuck. When the system is running shutdown code the user has no way to get out so we need some guarantee it won't take too long. b) The cases where were pausing/unpausing the watchdog weren't really effective. It can be observed that in `UmountPartitions()` (which runs _before_ the fsck() we paused for) we finishing killing nearly all userspace processes (earlier we killed the non-shutdown-critical ones, here we kill the shutdown-critical ones). Assuming we have a userspace program patting something in /dev/watchdog* (like Pixel devices do) that watchdog program will be killed. Now it will be a matter of time before the kernel reboots anyway. Thus the pause for `fsck()` makes little sense. Similarly it doesn't make sense to bother stopping the shutdown thread before the final steps. As per the above logic, simplify things. When we start RebootMonitorThread() we'll just do a big delay. If the system is still alive after the delay finishes then that's bad and we should take evasive actions. Given the way things work now, let's also not bother accounting for the "clean" shutdown timeout. That number is designed to be much smaller than this 300 second last-resort and adding it in just increases complexity. NOTE: future changes can continue to refine the RebootMonitorThread(). For now we'll just do the simple thing and keeps all the "evasive actions" the same. ALSO NOTE: If someone made an Android phone where they have heavily tuned "ro.build.shutdown.watchdog.timeout" to be as short as possible, this could potentially mess them up. I can't quite see a reason for someone to have done that, though. A future change will enforce a minimum of 60 seconds. Bug: 409835922 Test: Hack delays in various parts of DoReboot() and see timeout. Change-Id: I4966b6987965b25ad38a63ef026277cee1e93927
Loading
Please register or sign in to comment