Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Skip to content
Commit f9bae84e authored by Lorenzo Colitti's avatar Lorenzo Colitti
Browse files

Don't start iptables and ip6tables on parallel threads.

This still ensures that the child processes initialize in
parallel (which is the slow part of starting them, since
forkAndExec itself is sub-millisecond), but it does not fork them
in parallel.

Forking them in parallel is incorrect because forkAndExec first
creates pipes, then forks, then closes the pipes. If this is done
in parallel on two threads, the following can happen:

1. Create pipes to subprocess A.
2. Create pipes to subprocess B.
3. Fork process A.
4. Fork process B.
5. Close child end of pipes to subprocess A.
6. Close child end of pipes to subprocess B.

If this happens, then the subprocess B will inherit all the fds
of the pipes to subprocess A in addition to its own. This
includes the read end of the pipe that netd uses to send commands
to subprocess A, and the write ends of the pipes that netd uses
to receive subprocess A's stdout and stderr. Subprocess B never
reads or writes to these fds because it never dup2()s them to
stdin/stdout/stderr before calling exec(). However, it also never
closes them. Therefore, if process A dies (e.g., due to an
invalid command), then netd will never receive a POLLHUP on the
read ends or a POLLERR on the write end of these pipes.

This means that:

1. The invalid command to subprocess A will time out (imposing a
   5-second delay) instead of failing fast.
2. Even though the timeout causes us to call stop() on subprocess
   A, and stop() causes processTerminated to be set to true,
   sendCommand does not check processTerminated, it only checks
   outputReady. Therefore, when we get told to send another
   command to subprocess A, if subprocess B is still around,
   outputReady will return true. The result is that we do not
   fork a new subprocess to replace subprocess A.

These effects caused flakiness of TestRestartOnMalformedCommand
and a latency hit to the first invalid command sent after boot.

Bug: 28362720
Bug: 64687442
Test: bullhead builds, boots
Test: netd_{unit,integration}_test pass
Test: TestRestartOnMalformedCommand passes 100 times in a row
Change-Id: I117359b9ca9c82dd676608100fed6957c9a93c92
parent 43f1c3c5
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment