Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 02214dc5 authored by Krzysztof Wojcik's avatar Krzysztof Wojcik Committed by NeilBrown
Browse files

FIX: md: process hangs at wait_barrier after 0->10 takeover



Following symptoms were observed:
1. After raid0->raid10 takeover operation we have array with 2
missing disks.
When we add disk for rebuild, recovery process starts as expected
but it does not finish- it stops at about 90%, md126_resync process
hangs in "D" state.
2. Similar behavior is when we have mounted raid0 array and we
execute takeover to raid10. After this when we try to unmount array-
it causes process umount hangs in "D"

In scenarios above processes hang at the same function- wait_barrier
in raid10.c.
Process waits in macro "wait_event_lock_irq" until the
"!conf->barrier" condition will be true.
In scenarios above it never happens.

Reason was that at the end of level_store, after calling pers->run,
we call mddev_resume. This calls pers->quiesce(mddev, 0) with
RAID10, that calls lower_barrier.
However raise_barrier hadn't been called on that 'conf' yet,
so conf->barrier becomes negative, which is bad.

This patch introduces setting conf->barrier=1 after takeover
operation. It prevents to become barrier negative after call
lower_barrier().

Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: default avatarNeilBrown <neilb@suse.de>
parent e91ece55
Loading
Loading
Loading
Loading
+4 −2
Original line number Diff line number Diff line
@@ -2463,10 +2463,12 @@ static void *raid10_takeover_raid0(mddev_t *mddev)
	mddev->recovery_cp = MaxSector;

	conf = setup_conf(mddev);
	if (!IS_ERR(conf))
	if (!IS_ERR(conf)) {
		list_for_each_entry(rdev, &mddev->disks, same_set)
			if (rdev->raid_disk >= 0)
				rdev->new_raid_disk = rdev->raid_disk * 2;
		conf->barrier = 1;
	}

	return conf;
}