From a0feff3c2cc00c665d2497dd1c4080469ec75df9 Mon Sep 17 00:00:00 2001
From: Waldemar Brodkorb <wbx@uclibc-ng.org>
Date: Thu, 22 Oct 2015 22:29:23 +0200
Subject: add aufs patch for 4.1.x

---
 target/linux/patches/4.1.10/aufs.patch | 35215 +++++++++++++++++++++++++++++++
 1 file changed, 35215 insertions(+)
 create mode 100644 target/linux/patches/4.1.10/aufs.patch

(limited to 'target/linux/patches')
diff --git a/target/linux/patches/4.1.10/aufs.patch b/target/linux/patches/4.1.10/aufs.patch
new file mode 100644
index 000000000..749c90989
--- /dev/null
+++ b/target/linux/patches/4.1.10/aufs.patch
@@ -0,0 +1,35215 @@
+diff -Nur linux-4.1.10.orig/Documentation/ABI/testing/debugfs-aufs linux-4.1.10/Documentation/ABI/testing/debugfs-aufs
+--- linux-4.1.10.orig/Documentation/ABI/testing/debugfs-aufs	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/ABI/testing/debugfs-aufs	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,50 @@
++What:		/debug/aufs/si_<id>/
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		Under /debug/aufs, a directory named si_<id> is created
++		per aufs mount, where <id> is a unique id generated
++		internally.
++
++What:		/debug/aufs/si_<id>/plink
++Date:		Apr 2013
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It has three lines and shows the information about the
++		pseudo-link. The first line is a single number
++		representing a number of buckets. The second line is a
++		number of pseudo-links per buckets (separated by a
++		blank). The last line is a single number representing a
++		total number of psedo-links.
++		When the aufs mount option 'noplink' is specified, it
++		will show "1\n0\n0\n".
++
++What:		/debug/aufs/si_<id>/xib
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It shows the consumed blocks by xib (External Inode Number
++		Bitmap), its block size and file size.
++		When the aufs mount option 'noxino' is specified, it
++		will be empty. About XINO files, see the aufs manual.
++
++What:		/debug/aufs/si_<id>/xino0, xino1 ... xinoN
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It shows the consumed blocks by xino (External Inode Number
++		Translation Table), its link count, block size and file
++		size.
++		When the aufs mount option 'noxino' is specified, it
++		will be empty. About XINO files, see the aufs manual.
++
++What:		/debug/aufs/si_<id>/xigen
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It shows the consumed blocks by xigen (External Inode
++		Generation Table), its block size and file size.
++		If CONFIG_AUFS_EXPORT is disabled, this entry will not
++		be created.
++		When the aufs mount option 'noxino' is specified, it
++		will be empty. About XINO files, see the aufs manual.
+diff -Nur linux-4.1.10.orig/Documentation/ABI/testing/sysfs-aufs linux-4.1.10/Documentation/ABI/testing/sysfs-aufs
+--- linux-4.1.10.orig/Documentation/ABI/testing/sysfs-aufs	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/ABI/testing/sysfs-aufs	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,31 @@
++What:		/sys/fs/aufs/si_<id>/
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		Under /sys/fs/aufs, a directory named si_<id> is created
++		per aufs mount, where <id> is a unique id generated
++		internally.
++
++What:		/sys/fs/aufs/si_<id>/br0, br1 ... brN
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It shows the abolute path of a member directory (which
++		is called branch) in aufs, and its permission.
++
++What:		/sys/fs/aufs/si_<id>/brid0, brid1 ... bridN
++Date:		July 2013
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It shows the id of a member directory (which is called
++		branch) in aufs.
++
++What:		/sys/fs/aufs/si_<id>/xi_path
++Date:		March 2009
++Contact:	J. R. Okajima <hooanon05g@gmail.com>
++Description:
++		It shows the abolute path of XINO (External Inode Number
++		Bitmap, Translation Table and Generation Table) file
++		even if it is the default path.
++		When the aufs mount option 'noxino' is specified, it
++		will be empty. About XINO files, see the aufs manual.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/01intro.txt linux-4.1.10/Documentation/filesystems/aufs/design/01intro.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/01intro.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/01intro.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,170 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Introduction
++----------------------------------------
++
++aufs [ei ju: ef es] | [a u f s]
++1. abbrev. for "advanced multi-layered unification filesystem".
++2. abbrev. for "another unionfs".
++3. abbrev. for "auf das" in German which means "on the" in English.
++   Ex. "Butter aufs Brot"(G) means "butter onto bread"(E).
++   But "Filesystem aufs Filesystem" is hard to understand.
++
++AUFS is a filesystem with features:
++- multi layered stackable unification filesystem, the member directory
++  is called as a branch.
++- branch permission and attribute, 'readonly', 'real-readonly',
++  'readwrite', 'whiteout-able', 'link-able whiteout', etc. and their
++  combination.
++- internal "file copy-on-write".
++- logical deletion, whiteout.
++- dynamic branch manipulation, adding, deleting and changing permission.
++- allow bypassing aufs, user's direct branch access.
++- external inode number translation table and bitmap which maintains the
++  persistent aufs inode number.
++- seekable directory, including NFS readdir.
++- file mapping, mmap and sharing pages.
++- pseudo-link, hardlink over branches.
++- loopback mounted filesystem as a branch.
++- several policies to select one among multiple writable branches.
++- revert a single systemcall when an error occurs in aufs.
++- and more...
++
++
++Multi Layered Stackable Unification Filesystem
++----------------------------------------------------------------------
++Most people already knows what it is.
++It is a filesystem which unifies several directories and provides a
++merged single directory. When users access a file, the access will be
++passed/re-directed/converted (sorry, I am not sure which English word is
++correct) to the real file on the member filesystem. The member
++filesystem is called 'lower filesystem' or 'branch' and has a mode
++'readonly' and 'readwrite.' And the deletion for a file on the lower
++readonly branch is handled by creating 'whiteout' on the upper writable
++branch.
++
++On LKML, there have been discussions about UnionMount (Jan Blunck,
++Bharata B Rao and Valerie Aurora) and Unionfs (Erez Zadok). They took
++different approaches to implement the merged-view.
++The former tries putting it into VFS, and the latter implements as a
++separate filesystem.
++(If I misunderstand about these implementations, please let me know and
++I shall correct it. Because it is a long time ago when I read their
++source files last time).
++
++UnionMount's approach will be able to small, but may be hard to share
++branches between several UnionMount since the whiteout in it is
++implemented in the inode on branch filesystem and always
++shared. According to Bharata's post, readdir does not seems to be
++finished yet.
++There are several missing features known in this implementations such as
++- for users, the inode number may change silently. eg. copy-up.
++- link(2) may break by copy-up.
++- read(2) may get an obsoleted filedata (fstat(2) too).
++- fcntl(F_SETLK) may be broken by copy-up.
++- unnecessary copy-up may happen, for example mmap(MAP_PRIVATE) after
++  open(O_RDWR).
++
++In linux-3.18, "overlay" filesystem (formerly known as "overlayfs") was
++merged into mainline. This is another implementation of UnionMount as a
++separated filesystem. All the limitations and known problems which
++UnionMount are equally inherited to "overlay" filesystem.
++
++Unionfs has a longer history. When I started implementing a stackable
++filesystem (Aug 2005), it already existed. It has virtual super_block,
++inode, dentry and file objects and they have an array pointing lower
++same kind objects. After contributing many patches for Unionfs, I
++re-started my project AUFS (Jun 2006).
++
++In AUFS, the structure of filesystem resembles to Unionfs, but I
++implemented my own ideas, approaches and enhancements and it became
++totally different one.
++
++Comparing DM snapshot and fs based implementation
++- the number of bytes to be copied between devices is much smaller.
++- the type of filesystem must be one and only.
++- the fs must be writable, no readonly fs, even for the lower original
++  device. so the compression fs will not be usable. but if we use
++  loopback mount, we may address this issue.
++  for instance,
++	mount /cdrom/squashfs.img /sq
++	losetup /sq/ext2.img
++	losetup /somewhere/cow
++	dmsetup "snapshot /dev/loop0 /dev/loop1 ..."
++- it will be difficult (or needs more operations) to extract the
++  difference between the original device and COW.
++- DM snapshot-merge may help a lot when users try merging. in the
++  fs-layer union, users will use rsync(1).
++
++You may want to read my old paper "Filesystems in LiveCD"
++(http://aufs.sourceforge.net/aufs2/report/sq/sq.pdf).
++
++
++Several characters/aspects/persona of aufs
++----------------------------------------------------------------------
++
++Aufs has several characters, aspects or persona.
++1. a filesystem, callee of VFS helper
++2. sub-VFS, caller of VFS helper for branches
++3. a virtual filesystem which maintains persistent inode number
++4. reader/writer of files on branches such like an application
++
++1. Callee of VFS Helper
++As an ordinary linux filesystem, aufs is a callee of VFS. For instance,
++unlink(2) from an application reaches sys_unlink() kernel function and
++then vfs_unlink() is called. vfs_unlink() is one of VFS helper and it
++calls filesystem specific unlink operation. Actually aufs implements the
++unlink operation but it behaves like a redirector.
++
++2. Caller of VFS Helper for Branches
++aufs_unlink() passes the unlink request to the branch filesystem as if
++it were called from VFS. So the called unlink operation of the branch
++filesystem acts as usual. As a caller of VFS helper, aufs should handle
++every necessary pre/post operation for the branch filesystem.
++- acquire the lock for the parent dir on a branch
++- lookup in a branch
++- revalidate dentry on a branch
++- mnt_want_write() for a branch
++- vfs_unlink() for a branch
++- mnt_drop_write() for a branch
++- release the lock on a branch
++
++3. Persistent Inode Number
++One of the most important issue for a filesystem is to maintain inode
++numbers. This is particularly important to support exporting a
++filesystem via NFS. Aufs is a virtual filesystem which doesn't have a
++backend block device for its own. But some storage is necessary to
++keep and maintain the inode numbers. It may be a large space and may not
++suit to keep in memory. Aufs rents some space from its first writable
++branch filesystem (by default) and creates file(s) on it. These files
++are created by aufs internally and removed soon (currently) keeping
++opened.
++Note: Because these files are removed, they are totally gone after
++      unmounting aufs. It means the inode numbers are not persistent
++      across unmount or reboot. I have a plan to make them really
++      persistent which will be important for aufs on NFS server.
++
++4. Read/Write Files Internally (copy-on-write)
++Because a branch can be readonly, when you write a file on it, aufs will
++"copy-up" it to the upper writable branch internally. And then write the
++originally requested thing to the file. Generally kernel doesn't
++open/read/write file actively. In aufs, even a single write may cause a
++internal "file copy". This behaviour is very similar to cp(1) command.
++
++Some people may think it is better to pass such work to user space
++helper, instead of doing in kernel space. Actually I am still thinking
++about it. But currently I have implemented it in kernel space.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/02struct.txt linux-4.1.10/Documentation/filesystems/aufs/design/02struct.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/02struct.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/02struct.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,258 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Basic Aufs Internal Structure
++
++Superblock/Inode/Dentry/File Objects
++----------------------------------------------------------------------
++As like an ordinary filesystem, aufs has its own
++superblock/inode/dentry/file objects. All these objects have a
++dynamically allocated array and store the same kind of pointers to the
++lower filesystem, branch.
++For example, when you build a union with one readwrite branch and one
++readonly, mounted /au, /rw and /ro respectively.
++- /au = /rw + /ro
++- /ro/fileA exists but /rw/fileA
++
++Aufs lookup operation finds /ro/fileA and gets dentry for that. These
++pointers are stored in a aufs dentry. The array in aufs dentry will be,
++- [0] = NULL (because /rw/fileA doesn't exist)
++- [1] = /ro/fileA
++
++This style of an array is essentially same to the aufs
++superblock/inode/dentry/file objects.
++
++Because aufs supports manipulating branches, ie. add/delete/change
++branches dynamically, these objects has its own generation. When
++branches are changed, the generation in aufs superblock is
++incremented. And a generation in other object are compared when it is
++accessed. When a generation in other objects are obsoleted, aufs
++refreshes the internal array.
++
++
++Superblock
++----------------------------------------------------------------------
++Additionally aufs superblock has some data for policies to select one
++among multiple writable branches, XIB files, pseudo-links and kobject.
++See below in detail.
++About the policies which supports copy-down a directory, see
++wbr_policy.txt too.
++
++
++Branch and XINO(External Inode Number Translation Table)
++----------------------------------------------------------------------
++Every branch has its own xino (external inode number translation table)
++file. The xino file is created and unlinked by aufs internally. When two
++members of a union exist on the same filesystem, they share the single
++xino file.
++The struct of a xino file is simple, just a sequence of aufs inode
++numbers which is indexed by the lower inode number.
++In the above sample, assume the inode number of /ro/fileA is i111 and
++aufs assigns the inode number i999 for fileA. Then aufs writes 999 as
++4(8) bytes at 111 * 4(8) bytes offset in the xino file.
++
++When the inode numbers are not contiguous, the xino file will be sparse
++which has a hole in it and doesn't consume as much disk space as it
++might appear. If your branch filesystem consumes disk space for such
++holes, then you should specify 'xino=' option at mounting aufs.
++
++Aufs has a mount option to free the disk blocks for such holes in XINO
++files on tmpfs or ramdisk. But it is not so effective actually. If you
++meet a problem of disk shortage due to XINO files, then you should try
++"tmpfs-ino.patch" (and "vfs-ino.patch" too) in aufs4-standalone.git.
++The patch localizes the assignment inumbers per tmpfs-mount and avoid
++the holes in XINO files.
++
++Also a writable branch has three kinds of "whiteout bases". All these
++are existed when the branch is joined to aufs, and their names are
++whiteout-ed doubly, so that users will never see their names in aufs
++hierarchy.
++1. a regular file which will be hardlinked to all whiteouts.
++2. a directory to store a pseudo-link.
++3. a directory to store an "orphan"-ed file temporary.
++
++1. Whiteout Base
++   When you remove a file on a readonly branch, aufs handles it as a
++   logical deletion and creates a whiteout on the upper writable branch
++   as a hardlink of this file in order not to consume inode on the
++   writable branch.
++2. Pseudo-link Dir
++   See below, Pseudo-link.
++3. Step-Parent Dir
++   When "fileC" exists on the lower readonly branch only and it is
++   opened and removed with its parent dir, and then user writes
++   something into it, then aufs copies-up fileC to this
++   directory. Because there is no other dir to store fileC. After
++   creating a file under this dir, the file is unlinked.
++
++Because aufs supports manipulating branches, ie. add/delete/change
++dynamically, a branch has its own id. When the branch order changes,
++aufs finds the new index by searching the branch id.
++
++
++Pseudo-link
++----------------------------------------------------------------------
++Assume "fileA" exists on the lower readonly branch only and it is
++hardlinked to "fileB" on the branch. When you write something to fileA,
++aufs copies-up it to the upper writable branch. Additionally aufs
++creates a hardlink under the Pseudo-link Directory of the writable
++branch. The inode of a pseudo-link is kept in aufs super_block as a
++simple list. If fileB is read after unlinking fileA, aufs returns
++filedata from the pseudo-link instead of the lower readonly
++branch. Because the pseudo-link is based upon the inode, to keep the
++inode number by xino (see above) is essentially necessary.
++
++All the hardlinks under the Pseudo-link Directory of the writable branch
++should be restored in a proper location later. Aufs provides a utility
++to do this. The userspace helpers executed at remounting and unmounting
++aufs by default.
++During this utility is running, it puts aufs into the pseudo-link
++maintenance mode. In this mode, only the process which began the
++maintenance mode (and its child processes) is allowed to operate in
++aufs. Some other processes which are not related to the pseudo-link will
++be allowed to run too, but the rest have to return an error or wait
++until the maintenance mode ends. If a process already acquires an inode
++mutex (in VFS), it has to return an error.
++
++
++XIB(external inode number bitmap)
++----------------------------------------------------------------------
++Addition to the xino file per a branch, aufs has an external inode number
++bitmap in a superblock object. It is also an internal file such like a
++xino file.
++It is a simple bitmap to mark whether the aufs inode number is in-use or
++not.
++To reduce the file I/O, aufs prepares a single memory page to cache xib.
++
++As well as XINO files, aufs has a feature to truncate/refresh XIB to
++reduce the number of consumed disk blocks for these files.
++
++
++Virtual or Vertical Dir, and Readdir in Userspace
++----------------------------------------------------------------------
++In order to support multiple layers (branches), aufs readdir operation
++constructs a virtual dir block on memory. For readdir, aufs calls
++vfs_readdir() internally for each dir on branches, merges their entries
++with eliminating the whiteout-ed ones, and sets it to file (dir)
++object. So the file object has its entry list until it is closed. The
++entry list will be updated when the file position is zero and becomes
++obsoleted. This decision is made in aufs automatically.
++
++The dynamically allocated memory block for the name of entries has a
++unit of 512 bytes (by default) and stores the names contiguously (no
++padding). Another block for each entry is handled by kmem_cache too.
++During building dir blocks, aufs creates hash list and judging whether
++the entry is whiteouted by its upper branch or already listed.
++The merged result is cached in the corresponding inode object and
++maintained by a customizable life-time option.
++
++Some people may call it can be a security hole or invite DoS attack
++since the opened and once readdir-ed dir (file object) holds its entry
++list and becomes a pressure for system memory. But I'd say it is similar
++to files under /proc or /sys. The virtual files in them also holds a
++memory page (generally) while they are opened. When an idea to reduce
++memory for them is introduced, it will be applied to aufs too.
++For those who really hate this situation, I've developed readdir(3)
++library which operates this merging in userspace. You just need to set
++LD_PRELOAD environment variable, and aufs will not consume no memory in
++kernel space for readdir(3).
++
++
++Workqueue
++----------------------------------------------------------------------
++Aufs sometimes requires privilege access to a branch. For instance,
++in copy-up/down operation. When a user process is going to make changes
++to a file which exists in the lower readonly branch only, and the mode
++of one of ancestor directories may not be writable by a user
++process. Here aufs copy-up the file with its ancestors and they may
++require privilege to set its owner/group/mode/etc.
++This is a typical case of a application character of aufs (see
++Introduction).
++
++Aufs uses workqueue synchronously for this case. It creates its own
++workqueue. The workqueue is a kernel thread and has privilege. Aufs
++passes the request to call mkdir or write (for example), and wait for
++its completion. This approach solves a problem of a signal handler
++simply.
++If aufs didn't adopt the workqueue and changed the privilege of the
++process, then the process may receive the unexpected SIGXFSZ or other
++signals.
++
++Also aufs uses the system global workqueue ("events" kernel thread) too
++for asynchronous tasks, such like handling inotify/fsnotify, re-creating a
++whiteout base and etc. This is unrelated to a privilege.
++Most of aufs operation tries acquiring a rw_semaphore for aufs
++superblock at the beginning, at the same time waits for the completion
++of all queued asynchronous tasks.
++
++
++Whiteout
++----------------------------------------------------------------------
++The whiteout in aufs is very similar to Unionfs's. That is represented
++by its filename. UnionMount takes an approach of a file mode, but I am
++afraid several utilities (find(1) or something) will have to support it.
++
++Basically the whiteout represents "logical deletion" which stops aufs to
++lookup further, but also it represents "dir is opaque" which also stop
++further lookup.
++
++In aufs, rmdir(2) and rename(2) for dir uses whiteout alternatively.
++In order to make several functions in a single systemcall to be
++revertible, aufs adopts an approach to rename a directory to a temporary
++unique whiteouted name.
++For example, in rename(2) dir where the target dir already existed, aufs
++renames the target dir to a temporary unique whiteouted name before the
++actual rename on a branch, and then handles other actions (make it opaque,
++update the attributes, etc). If an error happens in these actions, aufs
++simply renames the whiteouted name back and returns an error. If all are
++succeeded, aufs registers a function to remove the whiteouted unique
++temporary name completely and asynchronously to the system global
++workqueue.
++
++
++Copy-up
++----------------------------------------------------------------------
++It is a well-known feature or concept.
++When user modifies a file on a readonly branch, aufs operate "copy-up"
++internally and makes change to the new file on the upper writable branch.
++When the trigger systemcall does not update the timestamps of the parent
++dir, aufs reverts it after copy-up.
++
++
++Move-down (aufs3.9 and later)
++----------------------------------------------------------------------
++"Copy-up" is one of the essential feature in aufs. It copies a file from
++the lower readonly branch to the upper writable branch when a user
++changes something about the file.
++"Move-down" is an opposite action of copy-up. Basically this action is
++ran manually instead of automatically and internally.
++For desgin and implementation, aufs has to consider these issues.
++- whiteout for the file may exist on the lower branch.
++- ancestor directories may not exist on the lower branch.
++- diropq for the ancestor directories may exist on the upper branch.
++- free space on the lower branch will reduce.
++- another access to the file may happen during moving-down, including
++  UDBA (see "Revalidate Dentry and UDBA").
++- the file should not be hard-linked nor pseudo-linked. they should be
++  handled by auplink utility later.
++
++Sometimes users want to move-down a file from the upper writable branch
++to the lower readonly or writable branch. For instance,
++- the free space of the upper writable branch is going to run out.
++- create a new intermediate branch between the upper and lower branch.
++- etc.
++
++For this purpose, use "aumvdown" command in aufs-util.git.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/03atomic_open.txt linux-4.1.10/Documentation/filesystems/aufs/design/03atomic_open.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/03atomic_open.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/03atomic_open.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,85 @@
++
++# Copyright (C) 2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Support for a branch who has its ->atomic_open()
++----------------------------------------------------------------------
++The filesystems who implement its ->atomic_open() are not majority. For
++example NFSv4 does, and aufs should call NFSv4 ->atomic_open,
++particularly for open(O_CREAT|O_EXCL, 0400) case. Other than
++->atomic_open(), NFSv4 returns an error for this open(2). While I am not
++sure whether all filesystems who have ->atomic_open() behave like this,
++but NFSv4 surely returns the error.
++
++In order to support ->atomic_open() for aufs, there are a few
++approaches.
++
++A. Introduce aufs_atomic_open()
++   - calls one of VFS:do_last(), lookup_open() or atomic_open() for
++     branch fs.
++B. Introduce aufs_atomic_open() calling create, open and chmod. this is
++   an aufs user Pip Cet's approach
++   - calls aufs_create(), VFS finish_open() and notify_change().
++   - pass fake-mode to finish_open(), and then correct the mode by
++     notify_change().
++C. Extend aufs_open() to call branch fs's ->atomic_open()
++   - no aufs_atomic_open().
++   - aufs_lookup() registers the TID to an aufs internal object.
++   - aufs_create() does nothing when the matching TID is registered, but
++     registers the mode.
++   - aufs_open() calls branch fs's ->atomic_open() when the matching
++     TID is registered.
++D. Extend aufs_open() to re-try branch fs's ->open() with superuser's
++   credential
++   - no aufs_atomic_open().
++   - aufs_create() registers the TID to an internal object. this info
++     represents "this process created this file just now."
++   - when aufs gets EACCES from branch fs's ->open(), then confirm the
++     registered TID and re-try open() with superuser's credential.
++
++Pros and cons for each approach.
++
++A.
++   - straightforward but highly depends upon VFS internal.
++   - the atomic behavaiour is kept.
++   - some of parameters such as nameidata are hard to reproduce for
++     branch fs.
++   - large overhead.
++B.
++   - easy to implement.
++   - the atomic behavaiour is lost.
++C.
++   - the atomic behavaiour is kept.
++   - dirty and tricky.
++   - VFS checks whether the file is created correctly after calling
++     ->create(), which means this approach doesn't work.
++D.
++   - easy to implement.
++   - the atomic behavaiour is lost.
++   - to open a file with superuser's credential and give it to a user
++     process is a bad idea, since the file object keeps the credential
++     in it. It may affect LSM or something. This approach doesn't work
++     either.
++
++The approach A is ideal, but it hard to implement. So here is a
++variation of A, which is to be implemented.
++
++A-1. Introduce aufs_atomic_open()
++     - calls branch fs ->atomic_open() if exists. otherwise calls
++       vfs_create() and finish_open().
++     - the demerit is that the several checks after branch fs
++       ->atomic_open() are lost. in the ordinary case, the checks are
++       done by VFS:do_last(), lookup_open() and atomic_open(). some can
++       be implemented in aufs, but not all I am afraid.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/03lookup.txt linux-4.1.10/Documentation/filesystems/aufs/design/03lookup.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/03lookup.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/03lookup.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,113 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Lookup in a Branch
++----------------------------------------------------------------------
++Since aufs has a character of sub-VFS (see Introduction), it operates
++lookup for branches as VFS does. It may be a heavy work. But almost all
++lookup operation in aufs is the simplest case, ie. lookup only an entry
++directly connected to its parent. Digging down the directory hierarchy
++is unnecessary. VFS has a function lookup_one_len() for that use, and
++aufs calls it.
++
++When a branch is a remote filesystem, aufs basically relies upon its
++->d_revalidate(), also aufs forces the hardest revalidate tests for
++them.
++For d_revalidate, aufs implements three levels of revalidate tests. See
++"Revalidate Dentry and UDBA" in detail.
++
++
++Test Only the Highest One for the Directory Permission (dirperm1 option)
++----------------------------------------------------------------------
++Let's try case study.
++- aufs has two branches, upper readwrite and lower readonly.
++  /au = /rw + /ro
++- "dirA" exists under /ro, but /rw. and its mode is 0700.
++- user invoked "chmod a+rx /au/dirA"
++- the internal copy-up is activated and "/rw/dirA" is created and its
++  permission bits are set to world readable.
++- then "/au/dirA" becomes world readable?
++
++In this case, /ro/dirA is still 0700 since it exists in readonly branch,
++or it may be a natively readonly filesystem. If aufs respects the lower
++branch, it should not respond readdir request from other users. But user
++allowed it by chmod. Should really aufs rejects showing the entries
++under /ro/dirA?
++
++To be honest, I don't have a good solution for this case. So aufs
++implements 'dirperm1' and 'nodirperm1' mount options, and leave it to
++users.
++When dirperm1 is specified, aufs checks only the highest one for the
++directory permission, and shows the entries. Otherwise, as usual, checks
++every dir existing on all branches and rejects the request.
++
++As a side effect, dirperm1 option improves the performance of aufs
++because the number of permission check is reduced when the number of
++branch is many.
++
++
++Revalidate Dentry and UDBA (User's Direct Branch Access)
++----------------------------------------------------------------------
++Generally VFS helpers re-validate a dentry as a part of lookup.
++0. digging down the directory hierarchy.
++1. lock the parent dir by its i_mutex.
++2. lookup the final (child) entry.
++3. revalidate it.
++4. call the actual operation (create, unlink, etc.)
++5. unlock the parent dir
++
++If the filesystem implements its ->d_revalidate() (step 3), then it is
++called. Actually aufs implements it and checks the dentry on a branch is
++still valid.
++But it is not enough. Because aufs has to release the lock for the
++parent dir on a branch at the end of ->lookup() (step 2) and
++->d_revalidate() (step 3) while the i_mutex of the aufs dir is still
++held by VFS.
++If the file on a branch is changed directly, eg. bypassing aufs, after
++aufs released the lock, then the subsequent operation may cause
++something unpleasant result.
++
++This situation is a result of VFS architecture, ->lookup() and
++->d_revalidate() is separated. But I never say it is wrong. It is a good
++design from VFS's point of view. It is just not suitable for sub-VFS
++character in aufs.
++
++Aufs supports such case by three level of revalidation which is
++selectable by user.
++1. Simple Revalidate
++   Addition to the native flow in VFS's, confirm the child-parent
++   relationship on the branch just after locking the parent dir on the
++   branch in the "actual operation" (step 4). When this validation
++   fails, aufs returns EBUSY. ->d_revalidate() (step 3) in aufs still
++   checks the validation of the dentry on branches.
++2. Monitor Changes Internally by Inotify/Fsnotify
++   Addition to above, in the "actual operation" (step 4) aufs re-lookup
++   the dentry on the branch, and returns EBUSY if it finds different
++   dentry.
++   Additionally, aufs sets the inotify/fsnotify watch for every dir on branches
++   during it is in cache. When the event is notified, aufs registers a
++   function to kernel 'events' thread by schedule_work(). And the
++   function sets some special status to the cached aufs dentry and inode
++   private data. If they are not cached, then aufs has nothing to
++   do. When the same file is accessed through aufs (step 0-3) later,
++   aufs will detect the status and refresh all necessary data.
++   In this mode, aufs has to ignore the event which is fired by aufs
++   itself.
++3. No Extra Validation
++   This is the simplest test and doesn't add any additional revalidation
++   test, and skip the revalidation in step 4. It is useful and improves
++   aufs performance when system surely hide the aufs branches from user,
++   by over-mounting something (or another method).
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/04branch.txt linux-4.1.10/Documentation/filesystems/aufs/design/04branch.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/04branch.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/04branch.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,74 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Branch Manipulation
++
++Since aufs supports dynamic branch manipulation, ie. add/remove a branch
++and changing its permission/attribute, there are a lot of works to do.
++
++
++Add a Branch
++----------------------------------------------------------------------
++o Confirm the adding dir exists outside of aufs, including loopback
++  mount, and its various attributes.
++o Initialize the xino file and whiteout bases if necessary.
++  See struct.txt.
++
++o Check the owner/group/mode of the directory
++  When the owner/group/mode of the adding directory differs from the
++  existing branch, aufs issues a warning because it may impose a
++  security risk.
++  For example, when a upper writable branch has a world writable empty
++  top directory, a malicious user can create any files on the writable
++  branch directly, like copy-up and modify manually. If something like
++  /etc/{passwd,shadow} exists on the lower readonly branch but the upper
++  writable branch, and the writable branch is world-writable, then a
++  malicious guy may create /etc/passwd on the writable branch directly
++  and the infected file will be valid in aufs.
++  I am afraid it can be a security issue, but aufs can do nothing except
++  producing a warning.
++
++
++Delete a Branch
++----------------------------------------------------------------------
++o Confirm the deleting branch is not busy
++  To be general, there is one merit to adopt "remount" interface to
++  manipulate branches. It is to discard caches. At deleting a branch,
++  aufs checks the still cached (and connected) dentries and inodes. If
++  there are any, then they are all in-use. An inode without its
++  corresponding dentry can be alive alone (for example, inotify/fsnotify case).
++
++  For the cached one, aufs checks whether the same named entry exists on
++  other branches.
++  If the cached one is a directory, because aufs provides a merged view
++  to users, as long as one dir is left on any branch aufs can show the
++  dir to users. In this case, the branch can be removed from aufs.
++  Otherwise aufs rejects deleting the branch.
++
++  If any file on the deleting branch is opened by aufs, then aufs
++  rejects deleting.
++
++
++Modify the Permission of a Branch
++----------------------------------------------------------------------
++o Re-initialize or remove the xino file and whiteout bases if necessary.
++  See struct.txt.
++
++o rw --> ro: Confirm the modifying branch is not busy
++  Aufs rejects the request if any of these conditions are true.
++  - a file on the branch is mmap-ed.
++  - a regular file on the branch is opened for write and there is no
++    same named entry on the upper branch.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/05wbr_policy.txt linux-4.1.10/Documentation/filesystems/aufs/design/05wbr_policy.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/05wbr_policy.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/05wbr_policy.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,64 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Policies to Select One among Multiple Writable Branches
++----------------------------------------------------------------------
++When the number of writable branch is more than one, aufs has to decide
++the target branch for file creation or copy-up. By default, the highest
++writable branch which has the parent (or ancestor) dir of the target
++file is chosen (top-down-parent policy).
++By user's request, aufs implements some other policies to select the
++writable branch, for file creation several policies, round-robin,
++most-free-space, and other policies. For copy-up, top-down-parent,
++bottom-up-parent, bottom-up and others.
++
++As expected, the round-robin policy selects the branch in circular. When
++you have two writable branches and creates 10 new files, 5 files will be
++created for each branch. mkdir(2) systemcall is an exception. When you
++create 10 new directories, all will be created on the same branch.
++And the most-free-space policy selects the one which has most free
++space among the writable branches. The amount of free space will be
++checked by aufs internally, and users can specify its time interval.
++
++The policies for copy-up is more simple,
++top-down-parent is equivalent to the same named on in create policy,
++bottom-up-parent selects the writable branch where the parent dir
++exists and the nearest upper one from the copyup-source,
++bottom-up selects the nearest upper writable branch from the
++copyup-source, regardless the existence of the parent dir.
++
++There are some rules or exceptions to apply these policies.
++- If there is a readonly branch above the policy-selected branch and
++  the parent dir is marked as opaque (a variation of whiteout), or the
++  target (creating) file is whiteout-ed on the upper readonly branch,
++  then the result of the policy is ignored and the target file will be
++  created on the nearest upper writable branch than the readonly branch.
++- If there is a writable branch above the policy-selected branch and
++  the parent dir is marked as opaque or the target file is whiteouted
++  on the branch, then the result of the policy is ignored and the target
++  file will be created on the highest one among the upper writable
++  branches who has diropq or whiteout. In case of whiteout, aufs removes
++  it as usual.
++- link(2) and rename(2) systemcalls are exceptions in every policy.
++  They try selecting the branch where the source exists as possible
++  since copyup a large file will take long time. If it can't be,
++  ie. the branch where the source exists is readonly, then they will
++  follow the copyup policy.
++- There is an exception for rename(2) when the target exists.
++  If the rename target exists, aufs compares the index of the branches
++  where the source and the target exists and selects the higher
++  one. If the selected branch is readonly, then aufs follows the
++  copyup policy.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/06fhsm.txt linux-4.1.10/Documentation/filesystems/aufs/design/06fhsm.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/06fhsm.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/06fhsm.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,120 @@
++
++# Copyright (C) 2011-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
++
++
++File-based Hierarchical Storage Management (FHSM)
++----------------------------------------------------------------------
++Hierarchical Storage Management (or HSM) is a well-known feature in the
++storage world. Aufs provides this feature as file-based with multiple
++writable branches, based upon the principle of "Colder, the Lower".
++Here the word "colder" means that the less used files, and "lower" means
++that the position in the order of the stacked branches vertically.
++These multiple writable branches are prioritized, ie. the topmost one
++should be the fastest drive and be used heavily.
++
++o Characters in aufs FHSM story
++- aufs itself and a new branch attribute.
++- a new ioctl interface to move-down and to establish a connection with
++  the daemon ("move-down" is a converse of "copy-up").
++- userspace tool and daemon.
++
++The userspace daemon establishes a connection with aufs and waits for
++the notification. The notified information is very similar to struct
++statfs containing the number of consumed blocks and inodes.
++When the consumed blocks/inodes of a branch exceeds the user-specified
++upper watermark, the daemon activates its move-down process until the
++consumed blocks/inodes reaches the user-specified lower watermark.
++
++The actual move-down is done by aufs based upon the request from
++user-space since we need to maintain the inode number and the internal
++pointer arrays in aufs.
++
++Currently aufs FHSM handles the regular files only. Additionally they
++must not be hard-linked nor pseudo-linked.
++
++
++o Cowork of aufs and the user-space daemon
++  During the userspace daemon established the connection, aufs sends a
++  small notification to it whenever aufs writes something into the
++  writable branch. But it may cost high since aufs issues statfs(2)
++  internally. So user can specify a new option to cache the
++  info. Actually the notification is controlled by these factors.
++  + the specified cache time.
++  + classified as "force" by aufs internally.
++  Until the specified time expires, aufs doesn't send the info
++  except the forced cases. When aufs decide forcing, the info is always
++  notified to userspace.
++  For example, the number of free inodes is generally large enough and
++  the shortage of it happens rarely. So aufs doesn't force the
++  notification when creating a new file, directory and others. This is
++  the typical case which aufs doesn't force.
++  When aufs writes the actual filedata and the files consumes any of new
++  blocks, the aufs forces notifying.
++
++
++o Interfaces in aufs
++- New branch attribute.
++  + fhsm
++    Specifies that the branch is managed by FHSM feature. In other word,
++    participant in the FHSM.
++    When nofhsm is set to the branch, it will not be the source/target
++    branch of the move-down operation. This attribute is set
++    independently from coo and moo attributes, and if you want full
++    FHSM, you should specify them as well.
++- New mount option.
++  + fhsm_sec
++    Specifies a second to suppress many less important info to be
++    notified.
++- New ioctl.
++  + AUFS_CTL_FHSM_FD
++    create a new file descriptor which userspace can read the notification
++    (a subset of struct statfs) from aufs.
++- Module parameter 'brs'
++  It has to be set to 1. Otherwise the new mount option 'fhsm' will not
++  be set.
++- mount helpers /sbin/mount.aufs and /sbin/umount.aufs
++  When there are two or more branches with fhsm attributes,
++  /sbin/mount.aufs invokes the user-space daemon and /sbin/umount.aufs
++  terminates it. As a result of remounting and branch-manipulation, the
++  number of branches with fhsm attribute can be one. In this case,
++  /sbin/mount.aufs will terminate the user-space daemon.
++
++
++Finally the operation is done as these steps in kernel-space.
++- make sure that,
++  + no one else is using the file.
++  + the file is not hard-linked.
++  + the file is not pseudo-linked.
++  + the file is a regular file.
++  + the parent dir is not opaqued.
++- find the target writable branch.
++- make sure the file is not whiteout-ed by the upper (than the target)
++  branch.
++- make the parent dir on the target branch.
++- mutex lock the inode on the branch.
++- unlink the whiteout on the target branch (if exists).
++- lookup and create the whiteout-ed temporary name on the target branch.
++- copy the file as the whiteout-ed temporary name on the target branch.
++- rename the whiteout-ed temporary name to the original name.
++- unlink the file on the source branch.
++- maintain the internal pointer array and the external inode number
++  table (XINO).
++- maintain the timestamps and other attributes of the parent dir and the
++  file.
++
++And of course, in every step, an error may happen. So the operation
++should restore the original file state after an error happens.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/06mmap.txt linux-4.1.10/Documentation/filesystems/aufs/design/06mmap.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/06mmap.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/06mmap.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,72 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++mmap(2) -- File Memory Mapping
++----------------------------------------------------------------------
++In aufs, the file-mapped pages are handled by a branch fs directly, no
++interaction with aufs. It means aufs_mmap() calls the branch fs's
++->mmap().
++This approach is simple and good, but there is one problem.
++Under /proc, several entries show the mmapped files by its path (with
++device and inode number), and the printed path will be the path on the
++branch fs's instead of virtual aufs's.
++This is not a problem in most cases, but some utilities lsof(1) (and its
++user) may expect the path on aufs.
++
++To address this issue, aufs adds a new member called vm_prfile in struct
++vm_area_struct (and struct vm_region). The original vm_file points to
++the file on the branch fs in order to handle everything correctly as
++usual. The new vm_prfile points to a virtual file in aufs, and the
++show-functions in procfs refers to vm_prfile if it is set.
++Also we need to maintain several other places where touching vm_file
++such like
++- fork()/clone() copies vma and the reference count of vm_file is
++  incremented.
++- merging vma maintains the ref count too.
++
++This is not a good approach. It just fakes the printed path. But it
++leaves all behaviour around f_mapping unchanged. This is surely an
++advantage.
++Actually aufs had adopted another complicated approach which calls
++generic_file_mmap() and handles struct vm_operations_struct. In this
++approach, aufs met a hard problem and I could not solve it without
++switching the approach.
++
++There may be one more another approach which is
++- bind-mount the branch-root onto the aufs-root internally
++- grab the new vfsmount (ie. struct mount)
++- lazy-umount the branch-root internally
++- in open(2) the aufs-file, open the branch-file with the hidden
++  vfsmount (instead of the original branch's vfsmount)
++- ideally this "bind-mount and lazy-umount" should be done atomically,
++  but it may be possible from userspace by the mount helper.
++
++Adding the internal hidden vfsmount and using it in opening a file, the
++file path under /proc will be printed correctly. This approach looks
++smarter, but is not possible I am afraid.
++- aufs-root may be bind-mount later. when it happens, another hidden
++  vfsmount will be required.
++- it is hard to get the chance to bind-mount and lazy-umount
++  + in kernel-space, FS can have vfsmount in open(2) via
++    file->f_path, and aufs can know its vfsmount. But several locks are
++    already acquired, and if aufs tries to bind-mount and lazy-umount
++    here, then it may cause a deadlock.
++  + in user-space, bind-mount doesn't invoke the mount helper.
++- since /proc shows dev and ino, aufs has to give vma these info. it
++  means a new member vm_prinode will be necessary. this is essentially
++  equivalent to vm_prfile described above.
++
++I have to give up this "looks-smater" approach.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/06xattr.txt linux-4.1.10/Documentation/filesystems/aufs/design/06xattr.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/06xattr.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/06xattr.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,96 @@
++
++# Copyright (C) 2014-2015 Junjiro R. Okajima
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
++
++
++Listing XATTR/EA and getting the value
++----------------------------------------------------------------------
++For the inode standard attributes (owner, group, timestamps, etc.), aufs
++shows the values from the topmost existing file. This behaviour is good
++for the non-dir entries since the bahaviour exactly matches the shown
++information. But for the directories, aufs considers all the same named
++entries on the lower branches. Which means, if one of the lower entry
++rejects readdir call, then aufs returns an error even if the topmost
++entry allows it. This behaviour is necessary to respect the branch fs's
++security, but can make users confused since the user-visible standard
++attributes don't match the behaviour.
++To address this issue, aufs has a mount option called dirperm1 which
++checks the permission for the topmost entry only, and ignores the lower
++entry's permission.
++
++A similar issue can happen around XATTR.
++getxattr(2) and listxattr(2) families behave as if dirperm1 option is
++always set. Otherwise these very unpleasant situation would happen.
++- listxattr(2) may return the duplicated entries.
++- users may not be able to remove or reset the XATTR forever,
++
++
++XATTR/EA support in the internal (copy,move)-(up,down)
++----------------------------------------------------------------------
++Generally the extended attributes of inode are categorized as these.
++- "security" for LSM and capability.
++- "system" for posix ACL, 'acl' mount option is required for the branch
++  fs generally.
++- "trusted" for userspace, CAP_SYS_ADMIN is required.
++- "user" for userspace, 'user_xattr' mount option is required for the
++  branch fs generally.
++
++Moreover there are some other categories. Aufs handles these rather
++unpopular categories as the ordinary ones, ie. there is no special
++condition nor exception.
++
++In copy-up, the support for XATTR on the dst branch may differ from the
++src branch. In this case, the copy-up operation will get an error and
++the original user operation which triggered the copy-up will fail. It
++can happen that even all copy-up will fail.
++When both of src and dst branches support XATTR and if an error occurs
++during copying XATTR, then the copy-up should fail obviously. That is a
++good reason and aufs should return an error to userspace. But when only
++the src branch support that XATTR, aufs should not return an error.
++For example, the src branch supports ACL but the dst branch doesn't
++because the dst branch may natively un-support it or temporary
++un-support it due to "noacl" mount option. Of course, the dst branch fs
++may NOT return an error even if the XATTR is not supported. It is
++totally up to the branch fs.
++
++Anyway when the aufs internal copy-up gets an error from the dst branch
++fs, then aufs tries removing the just copied entry and returns the error
++to the userspace. The worst case of this situation will be all copy-up
++will fail.
++
++For the copy-up operation, there two basic approaches.
++- copy the specified XATTR only (by category above), and return the
++  error unconditionally if it happens.
++- copy all XATTR, and ignore the error on the specified category only.
++
++In order to support XATTR and to implement the correct behaviour, aufs
++chooses the latter approach and introduces some new branch attributes,
++"icexsec", "icexsys", "icextr", "icexusr", and "icexoth".
++They correspond to the XATTR namespaces (see above). Additionally, to be
++convenient, "icex" is also provided which means all "icex*" attributes
++are set (here the word "icex" stands for "ignore copy-error on XATTR").
++
++The meaning of these attributes is to ignore the error from setting
++XATTR on that branch.
++Note that aufs tries copying all XATTR unconditionally, and ignores the
++error from the dst branch according to the specified attributes.
++
++Some XATTR may have its default value. The default value may come from
++the parent dir or the environment. If the default value is set at the
++file creating-time, it will be overwritten by copy-up.
++Some contradiction may happen I am afraid.
++Do we need another attribute to stop copying XATTR? I am unsure. For
++now, aufs implements the branch attributes to ignore the error.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/07export.txt linux-4.1.10/Documentation/filesystems/aufs/design/07export.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/07export.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/07export.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,58 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Export Aufs via NFS
++----------------------------------------------------------------------
++Here is an approach.
++- like xino/xib, add a new file 'xigen' which stores aufs inode
++  generation.
++- iget_locked(): initialize aufs inode generation for a new inode, and
++  store it in xigen file.
++- destroy_inode(): increment aufs inode generation and store it in xigen
++  file. it is necessary even if it is not unlinked, because any data of
++  inode may be changed by UDBA.
++- encode_fh(): for a root dir, simply return FILEID_ROOT. otherwise
++  build file handle by
++  + branch id (4 bytes)
++  + superblock generation (4 bytes)
++  + inode number (4 or 8 bytes)
++  + parent dir inode number (4 or 8 bytes)
++  + inode generation (4 bytes))
++  + return value of exportfs_encode_fh() for the parent on a branch (4
++    bytes)
++  + file handle for a branch (by exportfs_encode_fh())
++- fh_to_dentry():
++  + find the index of a branch from its id in handle, and check it is
++    still exist in aufs.
++  + 1st level: get the inode number from handle and search it in cache.
++  + 2nd level: if not found in cache, get the parent inode number from
++    the handle and search it in cache. and then open the found parent
++    dir, find the matching inode number by vfs_readdir() and get its
++    name, and call lookup_one_len() for the target dentry.
++  + 3rd level: if the parent dir is not cached, call
++    exportfs_decode_fh() for a branch and get the parent on a branch,
++    build a pathname of it, convert it a pathname in aufs, call
++    path_lookup(). now aufs gets a parent dir dentry, then handle it as
++    the 2nd level.
++  + to open the dir, aufs needs struct vfsmount. aufs keeps vfsmount
++    for every branch, but not itself. to get this, (currently) aufs
++    searches in current->nsproxy->mnt_ns list. it may not be a good
++    idea, but I didn't get other approach.
++  + test the generation of the gotten inode.
++- every inode operation: they may get EBUSY due to UDBA. in this case,
++  convert it into ESTALE for NFSD.
++- readdir(): call lockdep_on/off() because filldir in NFSD calls
++  lookup_one_len(), vfs_getattr(), encode_fh() and others.
+diff -Nur linux-4.1.10.orig/Documentation/filesystems/aufs/design/08shwh.txt linux-4.1.10/Documentation/filesystems/aufs/design/08shwh.txt
+--- linux-4.1.10.orig/Documentation/filesystems/aufs/design/08shwh.txt	1970-01-01 01:00:00.000000000 +0100
++++ linux-4.1.10/Documentation/filesystems/aufs/design/08shwh.txt	2015-10-22 21:35:53.000000000 +0200
+@@ -0,0 +1,52 @@
++
++# Copyright (C) 2005-2015 Junjiro R. Okajima
++# 
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++# 
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++# 
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++
++Show Whiteout Mode (shwh)
++----------------------------------------------------------------------
++Generally aufs hides the name of whiteouts. But in some cases, to show
++them is very useful for users. For instance, creating a new middle layer
++(branch) by merging existing layers.
++
++(borrowing aufs1 HOW-TO from a user, Michael Towers)
++When you have three branches,
++- Bottom: 'system', squashfs (underlying base system), read-only
++- Middle: 'mods', squashfs, read-only
++- Top: 'overlay', ram (tmpfs), read-write
++
++The top layer is loaded at boot time and saved at shutdown, to preserve
++the changes made to the system during the session.
++When larger changes have been made, or smaller changes have accumulated,
++the size of the saved top layer data grows. At this point, it would be
++nice to be able to merge the two overlay branches ('mods' and 'overlay')
++and rewrite the 'mods' squashfs, clearing the top layer and thus
++restoring save