libbinder: no temp rpc sess leak w spurious wakeup
RpcSession incoming threads continued to hold an RpcSession instance after they set the shutdown condition (removing the associated 1:1 thread connection object from RpcSession's mConnections object). Since the shutdown condition must include cleaning up RpcSession, we cannot delay or remove clearing the associated connections. Instead, a new explicit shutdown condition is added, which does not restrict the manipulation of the session object. Interestingly, this issue was blocking several changes on-and-off for a few weeks. Though, test hub doesn't show it failing at all. I couldn't reproduce it locally even with 5 days (24hr/day) and one of these failing tests running in a tight loop, with builds running in the background (devinmoore@ reported a local failure with a build running). I submitted several changes to debug this, and one of them (that dumped backtraces), should have caught it, except the race is just too rare. When we have this situation: a retrospectively benign problem causing a big failure, the obvious question to ask is, is the test too brittle? No! If this is the sensitivity at which it finds a bug, we can hardly imagine an error going unnoticed here. Only if this situation repeated several times or some of these issues became too plenty to maintain would I think that we needed to "tone down the tests". Finally, how did this get fixed: dump every incStrong backtrace in RpcSession and investigate all the code that is responsible for maintaining those sp<>s. Wheeee :) Bug: 244325464 Test: binderRpcTest Change-Id: I76ac8f21900d6ce095a1acfb246ca7acf1591e0b
Loading
Please register or sign in to comment