RPC Binder: socket ENOMEM handling
(urgent to merge) RPC Binder uses non-blocking sockets. This is so when client/server pairs, when both writing to the same socket, if the socket would block, they can back off to polling so that they can read commands from the other side and recover. When writing to one of these sockets, the vhost-vsock implementation we are using returns ENOMEM in some cases. In order to resolve this, we add an exponential packet size backoff and time backoff. There is the suggestion to use blocking sockets instead, since it would engage direct reclaim that could potentially block. However, I've not gone with this approach because: - all RPC binder sockets are marked non-blocking globally. Setting the per-syscall flags on each operation all throughout the binder code base is a more invasive option. Specifically, it is hard to guarantee that we never hang indefinitely. - the timeout needs to be configured with the global SO_SNDTIMEO socket option and can't be specified on sendmsg/recvmsg directly. This means retry would need to be added on every operation (if they now block, such as accept), or we would need 2 syscalls for each of the operations here. RPC_FLAKE_PRONE mode now simulates ENOMEM errors as well. Future considerations: - separate build of binderRpcTest with flake mode Bug: 422574189 Test: binderRpcTest passes with FLAKE_MODE Flag: EXEMPT bug fix Change-Id: I5f2c99d0cb6760cca53ac4839d47daa9304a955a
Loading
Please register or sign in to comment