Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs (968f3e37) · Commits · e / devices / android_kernel_oneplus_sm7250

Documentation/filesystems/btrfs.txt

+11 −250

Original line number	Diff line number	Diff line

		BTRFS
		=====

		Btrfs is a copy on write filesystem for Linux aimed at
		implementing advanced features while focusing on fault tolerance,
		repair and easy administration. Initially developed by Oracle, Btrfs
		is licensed under the GPL and open for contribution from anyone.

		Linux has a wealth of filesystems to choose from, but we are facing a
		number of challenges with scaling to the large storage subsystems that
		are becoming common in today's data centers. Filesystems need to scale
		in their ability to address and manage large storage, and also in
		their ability to detect, repair and tolerate errors in the data stored
		on disk. Btrfs is under heavy development, and is not suitable for
		any uses other than benchmarking and review. The Btrfs disk format is
		not yet finalized.
		Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
		features while focusing on fault tolerance, repair and easy administration.
		Jointly developed by several companies, licensed under the GPL and open for
		contribution from anyone.

		The main Btrfs features include:

		@@ -28,243 +18,14 @@ The main Btrfs features include:
		* Checksums on data and metadata (multiple algorithms available)
		* Compression
		* Integrated multiple device support, with several raid algorithms
		* Online filesystem check (not yet implemented)
		* Very fast offline filesystem check
		* Efficient incremental backup and FS mirroring (not yet implemented)
		* Offline filesystem check
		* Efficient incremental backup and FS mirroring
		* Online filesystem defragmentation

		For more information please refer to the wiki

		Mount Options
		=============

		When mounting a btrfs filesystem, the following option are accepted.
		Options with (*) are default options and will not show in the mount options.

		alloc_start=<bytes>
		Debugging option to force all block allocations above a certain
		byte threshold on each block device. The value is specified in
		bytes, optionally with a K, M, or G suffix, case insensitive.
		Default is 1MB.

		noautodefrag(*)
		autodefrag
		Disable/enable auto defragmentation.
		Auto defragmentation detects small random writes into files and queue
		them up for the defrag process. Works best for small files;
		Not well suited for large database workloads.

		check_int
		check_int_data
		check_int_print_mask=<value>
		These debugging options control the behavior of the integrity checking
		module (the BTRFS_FS_CHECK_INTEGRITY config option required).

		check_int enables the integrity checker module, which examines all
		block write requests to ensure on-disk consistency, at a large
		memory and CPU cost.

		check_int_data includes extent data in the integrity checks, and
		implies the check_int option.

		check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
		as defined in fs/btrfs/check-integrity.c, to control the integrity
		checker module behavior.

		See comments at the top of fs/btrfs/check-integrity.c for more info.

		commit=<seconds>
		Set the interval of periodic commit, 30 seconds by default. Higher
		values defer data being synced to permanent storage with obvious
		consequences when the system crashes. The upper bound is not forced,
		but a warning is printed if it's more than 300 seconds (5 minutes).

		compress
		compress=<type>
		compress-force
		compress-force=<type>
		Control BTRFS file data compression. Type may be specified as "zlib"
		"lzo" or "no" (for no compression, used for remounting). If no type
		is specified, zlib is used. If compress-force is specified,
		all files will be compressed, whether or not they compress well.
		If compression is enabled, nodatacow and nodatasum are disabled.

		degraded
		Allow mounts to continue with missing devices. A read-write mount may
		fail with too many devices missing, for example if a stripe member
		is completely missing.

		device=<devicepath>
		Specify a device during mount so that ioctls on the control device
		can be avoided. Especially useful when trying to mount a multi-device
		setup as root. May be specified multiple times for multiple devices.

		nodiscard(*)
		discard
		Disable/enable discard mount option.
		Discard issues frequent commands to let the block device reclaim space
		freed by the filesystem.
		This is useful for SSD devices, thinly provisioned
		LUNs and virtual machine images, but may have a significant
		performance impact. (The fstrim command is also available to
		initiate batch trims from userspace).

		noenospc_debug(*)
		enospc_debug
		Disable/enable debugging option to be more verbose in some ENOSPC conditions.

		fatal_errors=<action>
		Action to take when encountering a fatal error:
		"bug" - BUG() on a fatal error. This is the default.
		"panic" - panic() on a fatal error.

		noflushoncommit(*)
		flushoncommit
		The 'flushoncommit' mount option forces any data dirtied by a write in a
		prior transaction to commit as part of the current commit. This makes
		the committed state a fully consistent view of the file system from the
		application's perspective (i.e., it includes all completed file system
		operations). This was previously the behavior only when a snapshot is
		created.

		inode_cache
		Enable free inode number caching. Defaults to off due to an overflow
		problem when the free space crcs don't fit inside a single page.

		max_inline=<bytes>
		Specify the maximum amount of space, in bytes, that can be inlined in
		a metadata B-tree leaf. The value is specified in bytes, optionally
		with a K, M, or G suffix, case insensitive. In practice, this value
		is limited by the root sector size, with some space unavailable due
		to leaf headers. For a 4k sector size, max inline data is ~3900 bytes.

		metadata_ratio=<value>
		Specify that 1 metadata chunk should be allocated after every <value>
		data chunks. Off by default.

		acl(*)
		noacl
		Enable/disable support for Posix Access Control Lists (ACLs). See the
		acl(5) manual page for more information about ACLs.

		barrier(*)
		nobarrier
		Enable/disable the use of block layer write barriers. Write barriers
		ensure that certain IOs make it through the device cache and are on
		persistent storage. If disabled on a device with a volatile
		(non-battery-backed) write-back cache, nobarrier option will lead to
		filesystem corruption on a system crash or power loss.

		datacow(*)
		nodatacow
		Enable/disable data copy-on-write for newly created files.
		Nodatacow implies nodatasum, and disables all compression.

		datasum(*)
		nodatasum
		Enable/disable data checksumming for newly created files.
		Datasum implies datacow.

		treelog(*)
		notreelog
		Enable/disable the tree logging used for fsync and O_SYNC writes.

		recovery
		Enable autorecovery attempts if a bad tree root is found at mount time.
		Currently this scans a list of several previous tree roots and tries to
		use the first readable.

		rescan_uuid_tree
		Force check and rebuild procedure of the UUID tree. This should not
		normally be needed.

		skip_balance
		Skip automatic resume of interrupted balance operation after mount.
		May be resumed with "btrfs balance resume."

		space_cache (*)
		Enable the on-disk freespace cache.
		nospace_cache
		Disable freespace cache loading without clearing the cache.
		clear_cache
		Force clearing and rebuilding of the disk space cache if something
		has gone wrong.

		ssd
		nossd
		ssd_spread
		Options to control ssd allocation schemes. By default, BTRFS will
		enable or disable ssd allocation heuristics depending on whether a
		rotational or non-rotational disk is in use. The ssd and nossd options
		can override this autodetection.

		The ssd_spread mount option attempts to allocate into big chunks
		of unused space, and may perform better on low-end ssds. ssd_spread
		implies ssd, enabling all other ssd heuristics as well.

		subvol=<path>
		Mount subvolume at <path> rather than the root subvolume. <path> is
		relative to the top level subvolume.

		subvolid=<ID>
		Mount subvolume specified by an ID number rather than the root subvolume.
		This allows mounting of subvolumes which are not in the root of the mounted
		filesystem.
		You can use "btrfs subvolume list" to see subvolume ID numbers.

		subvolrootid=<objectid> (deprecated)
		Mount subvolume specified by <objectid> rather than the root subvolume.
		This allows mounting of subvolumes which are not in the root of the mounted
		filesystem.
		You can use "btrfs subvolume show " to see the object ID for a subvolume.

		thread_pool=<number>
		The number of worker threads to allocate. The default number is equal
		to the number of CPUs + 2, or 8, whichever is smaller.

		user_subvol_rm_allowed
		Allow subvolumes to be deleted by a non-root user. Use with caution.

		MAILING LIST
		============

		There is a Btrfs mailing list hosted on vger.kernel.org. You can
		find details on how to subscribe here:

		http://vger.kernel.org/vger-lists.html#linux-btrfs

		Mailing list archives are available from gmane:

		http://dir.gmane.org/gmane.comp.file-systems.btrfs



		IRC
		===

		Discussion of Btrfs also occurs on the #btrfs channel of the Freenode
		IRC network.



		UTILITIES
		=========

		Userspace tools for creating and manipulating Btrfs file systems are
		available from the git repository at the following location:

		http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git
		git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

		These include the following tools:

		* mkfs.btrfs: create a filesystem

		* btrfs: a single tool to manage the filesystems, refer to the manpage for more details

		* 'btrfsck' or 'btrfs check': do a consistency check of the filesystem

		Other tools for specific tasks:

		* btrfs-convert: in-place conversion from ext2/3/4 filesystems
		https://btrfs.wiki.kernel.org

		* btrfs-image: dump filesystem metadata for debugging
		that maintains information about administration tasks, frequently asked
		questions, use cases, mount options, comprehensible changelogs, features,
		manual pages, source code repositories, contacts etc.

fs/btrfs/backref.c

+4 −8

Original line number	Diff line number	Diff line
		@@ -148,7 +148,6 @@ int __init btrfs_prelim_ref_init(void)

		void btrfs_prelim_ref_exit(void)
		{
		if (btrfs_prelim_ref_cache)
		kmem_cache_destroy(btrfs_prelim_ref_cache);
		}

		@@ -566,17 +565,14 @@ static void __merge_refs(struct list_head *head, int mode)
		struct __prelim_ref pos2 = pos1, tmp;

		list_for_each_entry_safe_continue(pos2, tmp, head, list) {
		struct __prelim_ref xchg, ref1 = pos1, *ref2 = pos2;
		struct __prelim_ref ref1 = pos1, ref2 = pos2;
		struct extent_inode_elem *eie;

		if (!ref_for_same_block(ref1, ref2))
		continue;
		if (mode == 1) {
		if (!ref1->parent && ref2->parent) {
		xchg = ref1;
		ref1 = ref2;
		ref2 = xchg;
		}
		if (!ref1->parent && ref2->parent)
		swap(ref1, ref2);
		} else {
		if (ref1->parent != ref2->parent)
		continue;

fs/btrfs/check-integrity.c

+5 −7

Original line number	Diff line number	Diff line
		@@ -95,6 +95,7 @@
		#include <linux/genhd.h>
		#include <linux/blkdev.h>
		#include <linux/vmalloc.h>
		#include <linux/string.h>
		#include "ctree.h"
		#include "disk-io.h"
		#include "hash.h"
		@@ -105,6 +106,7 @@
		#include "locking.h"
		#include "check-integrity.h"
		#include "rcu-string.h"
		#include "compression.h"

		#define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x10000
		#define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x10000
		@@ -176,7 +178,7 @@ struct btrfsic_block {
		* Elements of this type are allocated dynamically and required because
		* each block object can refer to and can be ref from multiple blocks.
		* The key to lookup them in the hashtable is the dev_bytenr of
		* the block ref to plus the one from the block refered from.
		* the block ref to plus the one from the block referred from.
		* The fact that they are searchable via a hashtable and that a
		* ref_cnt is maintained is not required for the btrfs integrity
		* check algorithm itself, it is only used to make the output more
		@@ -3076,7 +3078,7 @@ int btrfsic_mount(struct btrfs_root *root,

		list_for_each_entry(device, dev_head, dev_list) {
		struct btrfsic_dev_state *ds;
		char *p;
		const char *p;

		if (!device->bdev \|\| !device->name)
		continue;
		@@ -3092,11 +3094,7 @@ int btrfsic_mount(struct btrfs_root *root,
		ds->state = state;
		bdevname(ds->bdev, ds->name);
		ds->name[BDEVNAME_SIZE - 1] = '\0';
		for (p = ds->name; *p != '\0'; p++);
		while (p > ds->name && *p != '/')
		p--;
		if (*p == '/')
		p++;
		p = kbasename(ds->name);
		strlcpy(ds->name, p, sizeof(ds->name));
		btrfsic_dev_state_hashtable_add(ds,
		&btrfsic_dev_state_hashtable);

fs/btrfs/compression.h

+9 −0

Original line number	Diff line number	Diff line
		@@ -48,6 +48,15 @@ int btrfs_submit_compressed_read(struct inode inode, struct bio bio,
		void btrfs_clear_biovec_end(struct bio_vec *bvec, int vcnt,
		unsigned long pg_index,
		unsigned long pg_offset);

		enum btrfs_compression_type {
		BTRFS_COMPRESS_NONE = 0,
		BTRFS_COMPRESS_ZLIB = 1,
		BTRFS_COMPRESS_LZO = 2,
		BTRFS_COMPRESS_TYPES = 2,
		BTRFS_COMPRESS_LAST = 3,
		};

		struct btrfs_compress_op {
		struct list_head (alloc_workspace)(void);

fs/btrfs/ctree.c

+18 −18

Original line number	Diff line number	Diff line
		@@ -311,7 +311,7 @@ struct tree_mod_root {

		struct tree_mod_elem {
		struct rb_node node;
		u64 index; /* shifted logical */
		u64 logical;
		u64 seq;
		enum mod_log_op op;

		@@ -435,11 +435,11 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,

		/*
		* key order of the log:
		* index -> sequence
		* node/leaf start address -> sequence
		*
		* the index is the shifted logical of the new root node for root replace
		* operations, or the shifted logical of the affected block for all other
		* operations.
		* The 'start address' is the logical address of the new root node
		* for root replace operations, or the logical address of the affected
		* block for all other operations.
		*
		* Note: must be called with write lock (tree_mod_log_write_lock).
		*/
		@@ -460,9 +460,9 @@ __tree_mod_log_insert(struct btrfs_fs_info fs_info, struct tree_mod_elem tm)
		while (*new) {
		cur = container_of(*new, struct tree_mod_elem, node);
		parent = *new;
		if (cur->index < tm->index)
		if (cur->logical < tm->logical)
		new = &((*new)->rb_left);
		else if (cur->index > tm->index)
		else if (cur->logical > tm->logical)
		new = &((*new)->rb_right);
		else if (cur->seq < tm->seq)
		new = &((*new)->rb_left);
		@@ -523,7 +523,7 @@ alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
		if (!tm)
		return NULL;

		tm->index = eb->start >> PAGE_CACHE_SHIFT;
		tm->logical = eb->start;
		if (op != MOD_LOG_KEY_ADD) {
		btrfs_node_key(eb, &tm->key, slot);
		tm->blockptr = btrfs_node_blockptr(eb, slot);
		@@ -588,7 +588,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
		goto free_tms;
		}

		tm->index = eb->start >> PAGE_CACHE_SHIFT;
		tm->logical = eb->start;
		tm->slot = src_slot;
		tm->move.dst_slot = dst_slot;
		tm->move.nr_items = nr_items;
		@@ -699,7 +699,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
		goto free_tms;
		}

		tm->index = new_root->start >> PAGE_CACHE_SHIFT;
		tm->logical = new_root->start;
		tm->old_root.logical = old_root->start;
		tm->old_root.level = btrfs_header_level(old_root);
		tm->generation = btrfs_header_generation(old_root);
		@@ -739,16 +739,15 @@ __tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq,
		struct rb_node *node;
		struct tree_mod_elem *cur = NULL;
		struct tree_mod_elem *found = NULL;
		u64 index = start >> PAGE_CACHE_SHIFT;

		tree_mod_log_read_lock(fs_info);
		tm_root = &fs_info->tree_mod_log;
		node = tm_root->rb_node;
		while (node) {
		cur = container_of(node, struct tree_mod_elem, node);
		if (cur->index < index) {
		if (cur->logical < start) {
		node = node->rb_left;
		} else if (cur->index > index) {
		} else if (cur->logical > start) {
		node = node->rb_right;
		} else if (cur->seq < min_seq) {
		node = node->rb_left;
		@@ -1230,9 +1229,10 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
		return NULL;

		/*
		* the very last operation that's logged for a root is the replacement
		* operation (if it is replaced at all). this has the index of the new
		* root, making it the very first operation that's logged for this root.
		* the very last operation that's logged for a root is the
		* replacement operation (if it is replaced at all). this has
		* the logical address of the new root, making it the very
		* first operation that's logged for this root.
		*/
		while (1) {
		tm = tree_mod_log_search_oldest(fs_info, root_logical,
		@@ -1336,7 +1336,7 @@ __tree_mod_log_rewind(struct btrfs_fs_info fs_info, struct extent_buffer eb,
		if (!next)
		break;
		tm = container_of(next, struct tree_mod_elem, node);
		if (tm->index != first_tm->index)
		if (tm->logical != first_tm->logical)
		break;
		}
		tree_mod_log_read_unlock(fs_info);
		@@ -5361,7 +5361,7 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
		goto out;
		}

		tmp_buf = kmalloc(left_root->nodesize, GFP_NOFS);
		tmp_buf = kmalloc(left_root->nodesize, GFP_KERNEL);
		if (!tmp_buf) {
		ret = -ENOMEM;
		goto out;