bcache: Documentation updates (7b41b51a) · Commits · e / devices / android_kernel_fairphone_FP4

Documentation/bcache.txt

+88 −0

Original line number	Diff line number	Diff line
		@@ -101,6 +101,94 @@ but all the cached data will be invalidated. If there was dirty data in the
		cache, don't expect the filesystem to be recoverable - you will have massive
		filesystem corruption, though ext4's fsck does work miracles.

		ERROR HANDLING:

		Bcache tries to transparently handle IO errors to/from the cache device without
		affecting normal operation; if it sees too many errors (the threshold is
		configurable, and defaults to 0) it shuts down the cache device and switches all
		the backing devices to passthrough mode.

		- For reads from the cache, if they error we just retry the read from the
		backing device.

		- For writethrough writes, if the write to the cache errors we just switch to
		invalidating the data at that lba in the cache (i.e. the same thing we do for
		a write that bypasses the cache)

		- For writeback writes, we currently pass that error back up to the
		filesystem/userspace. This could be improved - we could retry it as a write
		that skips the cache so we don't have to error the write.

		- When we detach, we first try to flush any dirty data (if we were running in
		writeback mode). It currently doesn't do anything intelligent if it fails to
		read some of the dirty data, though.

		TROUBLESHOOTING PERFORMANCE:

		Bcache has a bunch of config options and tunables. The defaults are intended to
		be reasonable for typical desktop and server workloads, but they're not what you
		want for getting the best possible numbers when benchmarking.

		- Bad write performance

		If write performance is not what you expected, you probably wanted to be
		running in writeback mode, which isn't the default (not due to a lack of
		maturity, but simply because in writeback mode you'll lose data if something
		happens to your SSD)

		# echo writeback > /sys/block/bcache0/cache_mode

		- Bad performance, or traffic not going to the SSD that you'd expect

		By default, bcache doesn't cache everything. It tries to skip sequential IO -
		because you really want to be caching the random IO, and if you copy a 10
		gigabyte file you probably don't want that pushing 10 gigabytes of randomly
		accessed data out of your cache.

		But if you want to benchmark reads from cache, and you start out with fio
		writing an 8 gigabyte test file - so you want to disable that.

		# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff

		To set it back to the default (4 mb), do

		# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff

		- Traffic's still going to the spindle/still getting cache misses

		In the real world, SSDs don't always keep up with disks - particularly with
		slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
		you want to avoid being bottlenecked by the SSD and having it slow everything
		down.

		To avoid that bcache tracks latency to the cache device, and gradually
		throttles traffic if the latency exceeds a threshold (it does this by
		cranking down the sequential bypass).

		You can disable this if you need to by setting the thresholds to 0:

		# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
		# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us

		The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.

		- Still getting cache misses, of the same data

		One last issue that sometimes trips people up is actually an old bug, due to
		the way cache coherency is handled for cache misses. If a btree node is full,
		a cache miss won't be able to insert a key for the new data and the data
		won't be written to the cache.

		In practice this isn't an issue because as soon as a write comes along it'll
		cause the btree node to be split, and you need almost no write traffic for
		this to not show up enough to be noticable (especially since bcache's btree
		nodes are huge and index large regions of the device). But when you're
		benchmarking, if you're trying to warm the cache by reading a bunch of data
		and there's no other traffic - that can be a problem.

		Solution: warm the cache by doing writes, or use the testing branch (there's
		a fix for the issue there).

		SYSFS - BACKING DEVICE:

		attach