Loading Documentation/filesystems/Locking +3 −3 Original line number Original line Diff line number Diff line Loading @@ -47,8 +47,8 @@ ata *); void * (*follow_link) (struct dentry *, struct nameidata *); void * (*follow_link) (struct dentry *, struct nameidata *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*truncate) (struct inode *); void (*truncate) (struct inode *); int (*permission) (struct inode *, int, struct nameidata *); int (*permission) (struct inode *, int, unsigned int); int (*check_acl)(struct inode *, int); int (*check_acl)(struct inode *, int, unsigned int); int (*setattr) (struct dentry *, struct iattr *); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *); int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); Loading Loading @@ -76,7 +76,7 @@ follow_link: no put_link: no put_link: no truncate: yes (see below) truncate: yes (see below) setattr: yes setattr: yes permission: no permission: no (may not block if called in rcu-walk mode) check_acl: no check_acl: no getattr: no getattr: no setxattr: yes setxattr: yes Loading Documentation/filesystems/path-lookup.txt +41 −3 Original line number Original line Diff line number Diff line Loading @@ -316,11 +316,9 @@ The detailed design for rcu-walk is like this: The cases where rcu-walk cannot continue are: The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * Following links * Following links In future patches, permission checks become rcu-walk aware. It may be possible It may be possible eventually to make following links rcu-walk aware. eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. very least because i_mutex needs to be grabbed, and objects allocated. Loading @@ -336,9 +334,49 @@ or stored into. The result is massive improvements in performance and scalability of path resolution. scalability of path resolution. Interesting statistics ====================== The following table gives rcu lookup statistics for a few simple workloads (2s12c24t Westmere, debian non-graphical system). Ungraceful are attempts to drop rcu that fail due to d_seq failure and requiring the entire path lookup again. Other cases are successful rcu-drops that are required before the final element, nodentry for missing dentry, revalidate for filesystem revalidate routine requiring rcu drop, permission for permission check requiring drop, and link for symlink traversal requiring drop. rcu-lookups restart nodentry link revalidate permission bootup 47121 0 4624 1010 10283 7852 dbench 25386793 0 6778659(26.7%) 55 549 1156 kbuild 2696672 10 64442(2.3%) 108764(4.0%) 1 1590 git diff 39605 0 28 2 0 106 vfstest 24185492 4945 708725(2.9%) 1076136(4.4%) 0 2651 What this shows is that failed rcu-walk lookups, ie. ones that are restarted entirely with ref-walk, are quite rare. Even the "vfstest" case which specifically has concurrent renames/mkdir/rmdir/ creat/unlink/etc to excercise such races is not showing a huge amount of restarts. Dropping from rcu-walk to ref-walk mean that we have encountered a dentry where the reference count needs to be taken for some reason. This is either because we have reached the target of the path walk, or because we have encountered a condition that can't be resolved in rcu-walk mode. Ideally, we drop rcu-walk only when we have reached the target dentry, so the other statistics show where this does not happen. Note that a graceful drop from rcu-walk mode due to something such as the dentry not existing (which can be common) is not necessarily a failure of rcu-walk scheme, because some elements of the path may have been walked in rcu-walk mode. The further we get from common path elements (such as cwd or root), the less contended the dentry is likely to be. The closer we are to common path elements, the more likely they will exist in dentry cache. Papers and other documentation on dcache locking Papers and other documentation on dcache locking ================================================ ================================================ 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). 2. http://lse.sourceforge.net/locking/dcache/dcache.html 2. http://lse.sourceforge.net/locking/dcache/dcache.html Documentation/filesystems/porting +5 −0 Original line number Original line Diff line number Diff line Loading @@ -379,4 +379,9 @@ where possible. the filesystem provides it), which requires dropping out of rcu-walk mode. This the filesystem provides it), which requires dropping out of rcu-walk mode. This may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be returned if the filesystem cannot handle rcu-walk. See returned if the filesystem cannot handle rcu-walk. See Documentation/filesystems/vfs.txt for more details. permission and check_acl are inode permission checks that are called on many or all directory inodes on the way down a path walk (to check for exec permission). These must now be rcu-walk aware (flags & IPERM_RCU). See Documentation/filesystems/vfs.txt for more details. Documentation/filesystems/vfs.txt for more details. Documentation/filesystems/vfs.txt +9 −1 Original line number Original line Diff line number Diff line Loading @@ -325,7 +325,8 @@ struct inode_operations { void * (*follow_link) (struct dentry *, struct nameidata *); void * (*follow_link) (struct dentry *, struct nameidata *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*truncate) (struct inode *); void (*truncate) (struct inode *); int (*permission) (struct inode *, int, struct nameidata *); int (*permission) (struct inode *, int, unsigned int); int (*check_acl)(struct inode *, int, unsigned int); int (*setattr) (struct dentry *, struct iattr *); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); Loading Loading @@ -414,6 +415,13 @@ otherwise noted. permission: called by the VFS to check for access rights on a POSIX-like permission: called by the VFS to check for access rights on a POSIX-like filesystem. filesystem. May be called in rcu-walk mode (flags & IPERM_RCU). If in rcu-walk mode, the filesystem must check the permission without blocking or storing to the inode. If a situation is encountered that rcu-walk cannot handle, return -ECHILD and it will be called again in ref-walk mode. setattr: called by the VFS to set attributes for a file. This method setattr: called by the VFS to set attributes for a file. This method is called by chmod(2) and related system calls. is called by chmod(2) and related system calls. Loading drivers/staging/smbfs/file.c +4 −1 Original line number Original line Diff line number Diff line Loading @@ -407,11 +407,14 @@ smb_file_release(struct inode *inode, struct file * file) * privileges, so we need our own check for this. * privileges, so we need our own check for this. */ */ static int static int smb_file_permission(struct inode *inode, int mask) smb_file_permission(struct inode *inode, int mask, unsigned int flags) { { int mode = inode->i_mode; int mode = inode->i_mode; int error = 0; int error = 0; if (flags & IPERM_FLAG_RCU) return -ECHILD; VERBOSE("mode=%x, mask=%x\n", mode, mask); VERBOSE("mode=%x, mask=%x\n", mode, mask); /* Look at user permissions */ /* Look at user permissions */ Loading Loading
Documentation/filesystems/Locking +3 −3 Original line number Original line Diff line number Diff line Loading @@ -47,8 +47,8 @@ ata *); void * (*follow_link) (struct dentry *, struct nameidata *); void * (*follow_link) (struct dentry *, struct nameidata *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*truncate) (struct inode *); void (*truncate) (struct inode *); int (*permission) (struct inode *, int, struct nameidata *); int (*permission) (struct inode *, int, unsigned int); int (*check_acl)(struct inode *, int); int (*check_acl)(struct inode *, int, unsigned int); int (*setattr) (struct dentry *, struct iattr *); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *); int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); Loading Loading @@ -76,7 +76,7 @@ follow_link: no put_link: no put_link: no truncate: yes (see below) truncate: yes (see below) setattr: yes setattr: yes permission: no permission: no (may not block if called in rcu-walk mode) check_acl: no check_acl: no getattr: no getattr: no setxattr: yes setxattr: yes Loading
Documentation/filesystems/path-lookup.txt +41 −3 Original line number Original line Diff line number Diff line Loading @@ -316,11 +316,9 @@ The detailed design for rcu-walk is like this: The cases where rcu-walk cannot continue are: The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * Following links * Following links In future patches, permission checks become rcu-walk aware. It may be possible It may be possible eventually to make following links rcu-walk aware. eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. very least because i_mutex needs to be grabbed, and objects allocated. Loading @@ -336,9 +334,49 @@ or stored into. The result is massive improvements in performance and scalability of path resolution. scalability of path resolution. Interesting statistics ====================== The following table gives rcu lookup statistics for a few simple workloads (2s12c24t Westmere, debian non-graphical system). Ungraceful are attempts to drop rcu that fail due to d_seq failure and requiring the entire path lookup again. Other cases are successful rcu-drops that are required before the final element, nodentry for missing dentry, revalidate for filesystem revalidate routine requiring rcu drop, permission for permission check requiring drop, and link for symlink traversal requiring drop. rcu-lookups restart nodentry link revalidate permission bootup 47121 0 4624 1010 10283 7852 dbench 25386793 0 6778659(26.7%) 55 549 1156 kbuild 2696672 10 64442(2.3%) 108764(4.0%) 1 1590 git diff 39605 0 28 2 0 106 vfstest 24185492 4945 708725(2.9%) 1076136(4.4%) 0 2651 What this shows is that failed rcu-walk lookups, ie. ones that are restarted entirely with ref-walk, are quite rare. Even the "vfstest" case which specifically has concurrent renames/mkdir/rmdir/ creat/unlink/etc to excercise such races is not showing a huge amount of restarts. Dropping from rcu-walk to ref-walk mean that we have encountered a dentry where the reference count needs to be taken for some reason. This is either because we have reached the target of the path walk, or because we have encountered a condition that can't be resolved in rcu-walk mode. Ideally, we drop rcu-walk only when we have reached the target dentry, so the other statistics show where this does not happen. Note that a graceful drop from rcu-walk mode due to something such as the dentry not existing (which can be common) is not necessarily a failure of rcu-walk scheme, because some elements of the path may have been walked in rcu-walk mode. The further we get from common path elements (such as cwd or root), the less contended the dentry is likely to be. The closer we are to common path elements, the more likely they will exist in dentry cache. Papers and other documentation on dcache locking Papers and other documentation on dcache locking ================================================ ================================================ 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). 2. http://lse.sourceforge.net/locking/dcache/dcache.html 2. http://lse.sourceforge.net/locking/dcache/dcache.html
Documentation/filesystems/porting +5 −0 Original line number Original line Diff line number Diff line Loading @@ -379,4 +379,9 @@ where possible. the filesystem provides it), which requires dropping out of rcu-walk mode. This the filesystem provides it), which requires dropping out of rcu-walk mode. This may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be returned if the filesystem cannot handle rcu-walk. See returned if the filesystem cannot handle rcu-walk. See Documentation/filesystems/vfs.txt for more details. permission and check_acl are inode permission checks that are called on many or all directory inodes on the way down a path walk (to check for exec permission). These must now be rcu-walk aware (flags & IPERM_RCU). See Documentation/filesystems/vfs.txt for more details. Documentation/filesystems/vfs.txt for more details.
Documentation/filesystems/vfs.txt +9 −1 Original line number Original line Diff line number Diff line Loading @@ -325,7 +325,8 @@ struct inode_operations { void * (*follow_link) (struct dentry *, struct nameidata *); void * (*follow_link) (struct dentry *, struct nameidata *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*truncate) (struct inode *); void (*truncate) (struct inode *); int (*permission) (struct inode *, int, struct nameidata *); int (*permission) (struct inode *, int, unsigned int); int (*check_acl)(struct inode *, int, unsigned int); int (*setattr) (struct dentry *, struct iattr *); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); Loading Loading @@ -414,6 +415,13 @@ otherwise noted. permission: called by the VFS to check for access rights on a POSIX-like permission: called by the VFS to check for access rights on a POSIX-like filesystem. filesystem. May be called in rcu-walk mode (flags & IPERM_RCU). If in rcu-walk mode, the filesystem must check the permission without blocking or storing to the inode. If a situation is encountered that rcu-walk cannot handle, return -ECHILD and it will be called again in ref-walk mode. setattr: called by the VFS to set attributes for a file. This method setattr: called by the VFS to set attributes for a file. This method is called by chmod(2) and related system calls. is called by chmod(2) and related system calls. Loading
drivers/staging/smbfs/file.c +4 −1 Original line number Original line Diff line number Diff line Loading @@ -407,11 +407,14 @@ smb_file_release(struct inode *inode, struct file * file) * privileges, so we need our own check for this. * privileges, so we need our own check for this. */ */ static int static int smb_file_permission(struct inode *inode, int mask) smb_file_permission(struct inode *inode, int mask, unsigned int flags) { { int mode = inode->i_mode; int mode = inode->i_mode; int error = 0; int error = 0; if (flags & IPERM_FLAG_RCU) return -ECHILD; VERBOSE("mode=%x, mask=%x\n", mode, mask); VERBOSE("mode=%x, mask=%x\n", mode, mask); /* Look at user permissions */ /* Look at user permissions */ Loading