Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 723f5f36 authored by Tobias Thierer's avatar Tobias Thierer
Browse files

Fix state deletion for transient backup issues.

Since Android 10, backupPm() includes sendDataToTransport(), which was
not previously the case. This means that error handling logic that
deletes the backup state file (causing initialize_device() on the next
attempt, which deletes any existing backup) will now also be triggered
upon errors during sendDataToTransport(), which wasn't previously
(Android <= 9) the case.

This has the potential of making an existing temporary outage much
worse:

  1. A few devices might run into temporary issues, e.g. a B&R server
     returning HTTP 503 Service Unavailable (treated as a
     TransientHttpStatusException instanceof NetworkException, which
     is mapped to TRANSPORT_ERROR during handleTransportStatus(),
     which results in a TaskException with stateCompromised==false
     but which backupPm() wraps in another TaskException that forces
     stateCompromised=true).
  2. On their next backup attempt, those devices throw away any
     existing backup and start from scratch (initialize_device()),
     increasing the load on the server.
  3. This leads to a positive-feedback loop where more devices than
     before run into HTTP 503 Service Unavailable.
  4. As a result, masses of devices delete their backups and then
     hammer the B&R server with attempts to upload new backups.
  5. Backups are unavailable to any users who would otherwise rely
     on them during this outage.

To improve on this dangerous situation, this CL changes the code to
force stateCompromised=true only for TaskExceptions thrown
specifically during extractPmAgentData(), and (as before) for all
AgentExceptions.

Note that the code is still quite brittle. It still seems like we
are probably forcing stateCompromised=true in too many situations,
but it's hard to say so this CL is being conservative about the
changes. Changing back to the old behavior could be done through
a local change around KeyValueBackupTask.java:676; a future CL may
do this to have a safety hatch in case we want to cherry-pick this
CL into an upcoming Android release late in the release cycle.

[1] https://android.googlesource.com/platform/frameworks/base/+/refs/heads/pie-dev/services/backup/java/com/android/server/backup/internal/PerformBackupTask.java#1035
[2] https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/services/backup/java/com/android/server/backup/keyvalue/KeyValueBackupTask.java#1040
[3] https://source.corp.google.com/piper///depot/google3/java/com/google/android/gmscore/integ/modules/backup/transport/src/com/google/android/gms/backup/transport/GmsBackupTransport.java;l=770;rcl=281845876

Bug: 144030477
Test: Checked that the following passes after this CL, but
  testRunTask_whenTransportReturnsErrorForPm_updatesFilesAndCleansUp()
  fails if I revert the state of KeyValueBackupTask.java to before this CL:
    ROBOTEST_FILTER=KeyValueBackupTaskTest make \
    RunBackupFrameworksServicesRoboTests

Change-Id: I6c622c55fbd804ec0a12e0bea7ade1308f7a3877
(cherry picked from commit 819ed81f)
parent 03b0c47b
Loading
Loading
Loading
Loading
+21 −4
Original line number Diff line number Diff line
@@ -649,16 +649,33 @@ public class KeyValueBackupTask implements BackupRestoreTask, Runnable {
        mReporter.onStartPackageBackup(PM_PACKAGE);
        mCurrentPackage = new PackageInfo();
        mCurrentPackage.packageName = PM_PACKAGE;

        try {
            // If we can't even extractPmAgentData(), then we treat the local state as
            // compromised, just in case. This means that we will clear data and will
            // start from a clean slate in the next attempt. It's not clear whether that's
            // the right thing to do, but matches what we have historically done.
            try {
                extractPmAgentData(mCurrentPackage);
            } catch (TaskException e) {
                throw TaskException.stateCompromised(e); // force stateCompromised
            }
            // During sendDataToTransport, we generally trust any thrown TaskException
            // about whether stateCompromised because those are likely transient;
            // clearing state for those would have the potential to lead to cascading
            // failures, as discussed in http://b/144030477.
            // For specific status codes (e.g. TRANSPORT_NON_INCREMENTAL_BACKUP_REQUIRED),
            // cleanUpAgentForTransportStatus() or theoretically handleTransportStatus()
            // still have the opportunity to perform additional clean-up tasks.
            int status = sendDataToTransport(mCurrentPackage);
            cleanUpAgentForTransportStatus(status);
        } catch (AgentException | TaskException e) {
            mReporter.onExtractPmAgentDataError(e);
            cleanUpAgentForError(e);
            // PM agent failure is task failure.
            throw TaskException.stateCompromised(e);
            if (e instanceof TaskException) {
                throw (TaskException) e;
            } else {
                throw TaskException.stateCompromised(e); // PM agent failure is task failure.
            }
        }
    }

+51 −8
Original line number Diff line number Diff line
@@ -122,6 +122,7 @@ import com.android.server.testing.shadows.ShadowBackupDataOutput;
import com.android.server.testing.shadows.ShadowEventLog;
import com.android.server.testing.shadows.ShadowSystemServiceRegistry;

import com.google.common.base.Charsets;
import com.google.common.truth.IterableSubject;

import org.junit.After;
@@ -1910,7 +1911,8 @@ public class KeyValueBackupTaskTest {
    }

    @Test
    public void testRunTask_whenTransportReturnsError_updatesFilesAndCleansUp() throws Exception {
    public void testRunTask_whenTransportReturnsErrorForGenericPackage_updatesFilesAndCleansUp()
            throws Exception {
        TransportMock transportMock = setUpInitializedTransport(mTransport);
        when(transportMock.transport.performBackup(
                        argThat(packageInfo(PACKAGE_1)), any(), anyInt()))
@@ -1926,6 +1928,39 @@ public class KeyValueBackupTaskTest {
        assertCleansUpFilesAndAgent(mTransport, PACKAGE_1);
    }

    /**
     * Checks that TRANSPORT_ERROR during @pm@ backup keeps the state file untouched.
     * http://b/144030477
     */
    @Test
    public void testRunTask_whenTransportReturnsErrorForPm_updatesFilesAndCleansUp()
            throws Exception {
        // TODO(tobiast): Refactor this method to share code with
        //  testRunTask_whenTransportReturnsErrorForGenericPackage_updatesFilesAndCleansUp
        // See patchset 7 of http://ag/11762961
        final PackageData packageData = PM_PACKAGE;
        TransportMock transportMock = setUpInitializedTransport(mTransport);
        when(transportMock.transport.performBackup(
                argThat(packageInfo(packageData)), any(), anyInt()))
                .thenReturn(BackupTransport.TRANSPORT_ERROR);

        byte[] pmStateBytes = "fake @pm@ state for testing".getBytes(Charsets.UTF_8);

        Path pmStatePath = createPmStateFile(pmStateBytes.clone());
        PackageManagerBackupAgent pmAgent = spy(createPmAgent());
        KeyValueBackupTask task = createKeyValueBackupTask(transportMock, packageData);
        runTask(task);
        verify(pmAgent, never()).onBackup(any(), any(), any());

        assertThat(Files.readAllBytes(pmStatePath)).isEqualTo(pmStateBytes.clone());

        boolean existed = deletePmStateFile();
        assertThat(existed).isTrue();
        // unbindAgent() is skipped for @pm@. Comment in KeyValueBackupTask.java:
        // "For PM metadata (for which applicationInfo is null) there is no agent-bound state."
        assertCleansUpFiles(mTransport, packageData);
    }

    @Test
    public void testRunTask_whenTransportGetBackupQuotaThrowsForPm() throws Exception {
        TransportMock transportMock = setUpInitializedTransport(mTransport);
@@ -2707,21 +2742,29 @@ public class KeyValueBackupTaskTest {
     *       </ul>
     * </ul>
     */
    private void createPmStateFile() throws IOException {
        createPmStateFile(mTransport);
    private Path createPmStateFile() throws IOException {
        return createPmStateFile("pmState".getBytes());
    }

    private Path createPmStateFile(byte[] bytes) throws IOException {
        return createPmStateFile(bytes, mTransport);
    }

    private Path createPmStateFile(TransportData transport) throws IOException {
        return createPmStateFile("pmState".getBytes(), mTransport);
    }

    /** @see #createPmStateFile() */
    private void createPmStateFile(TransportData transport) throws IOException {
        Files.write(getStateFile(transport, PM_PACKAGE), "pmState".getBytes());
    /** @see #createPmStateFile(byte[]) */
    private Path createPmStateFile(byte[] bytes, TransportData transport) throws IOException {
        return Files.write(getStateFile(transport, PM_PACKAGE), bytes);
    }

    /**
     * Forces transport initialization and call to {@link
     * UserBackupManagerService#resetBackupState(File)}
     */
    private void deletePmStateFile() throws IOException {
        Files.deleteIfExists(getStateFile(mTransport, PM_PACKAGE));
    private boolean deletePmStateFile() throws IOException {
        return Files.deleteIfExists(getStateFile(mTransport, PM_PACKAGE));
    }

    /**