Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 315227f6 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull misc DAX updates from Vishal Verma:
 "DAX error handling for 4.7

   - Until now, dax has been disabled if media errors were found on any
     device.  This enables the use of DAX in the presence of these
     errors by making all sector-aligned zeroing go through the driver.

   - The driver (already) has the ability to clear errors on writes that
     are sent through the block layer using 'DSMs' defined in ACPI 6.1.

  Other misc changes:

   - When mounting DAX filesystems, check to make sure the partition is
     page aligned.  This is a requirement for DAX, and previously, we
     allowed such unaligned mounts to succeed, but subsequent
     reads/writes would fail.

   - Misc/cleanup fixes from Jan that remove unused code from DAX
     related to zeroing, writeback, and some size checks"

* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: fix a comment in dax_zero_page_range and dax_truncate_page
  dax: for truncate/hole-punch, do zeroing through the driver if possible
  dax: export a low-level __dax_zero_page_range helper
  dax: use sb_issue_zerout instead of calling dax_clear_sectors
  dax: enable dax in the presence of known media errors (badblocks)
  dax: fallback from pmd to pte on error
  block: Update blkdev_dax_capable() for consistency
  xfs: Add alignment check for DAX mount
  ext2: Add alignment check for DAX mount
  ext4: Add alignment check for DAX mount
  block: Add bdev_dax_supported() for dax mount checks
  block: Add vfs_msg() interface
  dax: Remove redundant inode size checks
  dax: Remove pointless writeback from dax_do_io()
  dax: Remove zeroing from dax_io()
  dax: Remove dead zeroing code from fault handlers
  ext2: Avoid DAX zeroing to corrupt data
  ext2: Fix block zeroing in ext2_get_blocks() for DAX
  dax: Remove complete_unwritten argument
  DAX: move RADIX_DAX_ definitions to dax.c
parents a10c38a4 40543f62
Loading
Loading
Loading
Loading
+32 −0
Original line number Diff line number Diff line
@@ -79,6 +79,38 @@ These filesystems may be used for inspiration:
- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt


Handling Media Errors
---------------------

The libnvdimm subsystem stores a record of known media error locations for
each pmem block device (in gendisk->badblocks). If we fault at such location,
or one with a latent error not yet discovered, the application can expect
to receive a SIGBUS. Libnvdimm also allows clearing of these errors by simply
writing the affected sectors (through the pmem driver, and if the underlying
NVDIMM supports the clear_poison DSM defined by ACPI).

Since DAX IO normally doesn't go through the driver/bio path, applications or
sysadmins have an option to restore the lost data from a prior backup/inbuilt
redundancy in the following ways:

1. Delete the affected file, and restore from a backup (sysadmin route):
   This will free the file system blocks that were being used by the file,
   and the next time they're allocated, they will be zeroed first, which
   happens through the driver, and will clear bad sectors.

2. Truncate or hole-punch the part of the file that has a bad-block (at least
   an entire aligned sector has to be hole-punched, but not necessarily an
   entire filesystem block).

These are the two basic paths that allow DAX filesystems to continue operating
in the presence of media errors. More robust error recovery mechanisms can be
built on top of this in the future, for example, involving redundancy/mirroring
provided at the block layer through DM, or additionally, at the filesystem
level. These would have to rely on the above two tenets, that error clearing
can happen either by sending an IO through the driver, or zeroing (also through
the driver).


Shortcomings
------------

+1 −1
Original line number Diff line number Diff line
@@ -143,7 +143,7 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
 */
static long
axon_ram_direct_access(struct block_device *device, sector_t sector,
		       void __pmem **kaddr, pfn_t *pfn)
		       void __pmem **kaddr, pfn_t *pfn, long size)
{
	struct axon_ram_bank *bank = device->bd_disk->private_data;
	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
+0 −1
Original line number Diff line number Diff line
@@ -4,7 +4,6 @@
#include <linux/gfp.h>
#include <linux/blkpg.h>
#include <linux/hdreg.h>
#include <linux/badblocks.h>
#include <linux/backing-dev.h>
#include <linux/fs.h>
#include <linux/blktrace_api.h>
+1 −1
Original line number Diff line number Diff line
@@ -381,7 +381,7 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,

#ifdef CONFIG_BLK_DEV_RAM_DAX
static long brd_direct_access(struct block_device *bdev, sector_t sector,
			void __pmem **kaddr, pfn_t *pfn)
			void __pmem **kaddr, pfn_t *pfn, long size)
{
	struct brd_device *brd = bdev->bd_disk->private_data;
	struct page *page;
+9 −1
Original line number Diff line number Diff line
@@ -164,14 +164,22 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
}

static long pmem_direct_access(struct block_device *bdev, sector_t sector,
		      void __pmem **kaddr, pfn_t *pfn)
		      void __pmem **kaddr, pfn_t *pfn, long size)
{
	struct pmem_device *pmem = bdev->bd_queue->queuedata;
	resource_size_t offset = sector * 512 + pmem->data_offset;

	if (unlikely(is_bad_pmem(&pmem->bb, sector, size)))
		return -EIO;
	*kaddr = pmem->virt_addr + offset;
	*pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);

	/*
	 * If badblocks are present, limit known good range to the
	 * requested range.
	 */
	if (unlikely(pmem->bb.count))
		return size;
	return pmem->size - pmem->pfn_pad - offset;
}

Loading