Use a separate hwcomposer hidl instance for vr flinger
Improve robustness of vr flinger <--> surface flinger switching by having vr flinger use a separate hardware composer hidl instance instead of sharing the instance with surface flinger. Sharing the hardware composer instance has proven to be error prone, with situations where both the vr flinger thread and surface flinger main thread would write to the composer at the same time, causing hard to diagnose crashes (b/62925812). Instead of sharing the hardware composer instance, when switching to vr flinger we now delete the existing instance, create a new instance directed to the vr hardware composer shim, and vr flinger creates its own composer instance connected to the real hardware composer. By creating a separate composer instance for vr flinger, crashes like the ones found in b/62925812 are no longer impossible. Most of the changes in this commit are related to enabling surface flinger to delete HWComposer instances cleanly. In particular: - Previously the hardware composer callbacks (which come in on a hwbinder thread) would land in HWC2::Device and bubble up to the SurfaceFlinger object. But with the new behavior the HWC2::Device might be dead or in the process of being destroyed, so instead we have SurfaceFlinger receive the composer callbacks directly, and forward them to HWComposer and HWC2::Device. We include a composer id field in the callbacks so surface flinger can ignore stale callbacks from dead composer instances. - Object ownership for HWC2::Display and HWC2::Layer was shared by passing around shared_ptrs to these objects. This was problematic because they referenced and used the HWC2::Device, which can now be destroyed when switching to vr flinger. Simplify the ownership model by having HWC2::Device own (via unique_ptr<>) instances of HWC2::Display, which owns (again via unique_ptr<>) instances of HWC2::Layer. In cases where we previously passed std::shared_ptr<> to HWC2::Display or HWC2::Layer, instead pass non-owning HWC2::Display* and HWC2::Layer* pointers. This ensures clean composer instance teardown with no stale references to the deleted HWC2::Device. - When the hardware composer instance is destroyed and the HWC2::Layers are removed, notify the android::Layer via a callback, so it can remove the HWC2::Layer from its internal table of hardware composer layers. This removes the burden to explicitly clear out all hardware composer layers when switching to vr flinger, which has been a source of bugs. - We were missing an mStateLock lock in SurfaceFlinger::setVsyncEnabled(), which was necessary to ensure we were setting vsync on the correct hardware composer instance. Once that lock was added, surface flinger would sometimes deadlock when transitioning to vr flinger, because the surface flinger main thread would acquire mStateLock and then EventControlThread::mMutex, whereas the event control thread would acquire the locks in the opposite order. The changes in EventControlThread.cpp are to ensure it doesn't hold a lock on EventControlThread::mMutex while calling setVsyncEnabled(), to avoid the deadlock. I found that without a composer callback registered in vr flinger the vsync_event file wasn't getting vsync timestamps written, so vr flinger would get stuck in an infinite loop trying to parse a vsync timestamp. Since we need to have a callback anyway I changed the code in hardware_composer.cpp to get the vsync timestamp from the callback, as surface flinger does. I confirmed the timestamps are the same with either method, and this lets us remove some extra code for extracting the vsync timestamp that (probably) wasn't compatible with all devices we want to run on anyway. I also added a timeout to the vysnc wait so we'll see an error message in the log if we fail to wait for vsync, instead of looping forever. Bug: 62925812 Test: - Confirmed surface flinger <--> vr flinger switching is robust by switching devices on and off hundreds of times and observing no hardware composer related issues, surface flinger crashes, or hardware composer service crashes. - Confirmed 2d in vr works as before by going through the OOBE flow on a standalone. This also exercises virtual display creation and usage through surface flinger. - Added logs to confirm perfect layer/display cleanup when destroying hardware composer instances. - Tested normal 2d phone usage to confirm basic layer create/destroy functionality works as before. - Monitored surface flinger file descriptor usage across dozens of surface flinger <--> vr flinger transitions and observed no file descriptor leaks. - Confirmed the HWC1 code path still compiles. - Ran the surface flinger tests and confirmed there are no new test failures. - Ran the hardware composer hidl in passthrough mode on a Marlin and confirmed it works. - Ran CTS tests for virtual displays and confirmed they all pass. - Tested Android Auto and confirmed basic graphics functionality still works. Change-Id: I17dc0e060bfb5cb447ffbaa573b279fc6d2d8bd1 Merged-In: I17dc0e060bfb5cb447ffbaa573b279fc6d2d8bd1
Loading
Please register or sign in to comment