Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit a24e3d41 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'akpm' (patches from Andrew)

Merge third patch-bomb from Andrew Morton:

 - more ocfs2 changes

 - a few hotfixes

 - Andy's compat cleanups

 - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
   panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

 - many rapidio updates: fixes, new drivers.

 - kcov: kernel code coverage feature.  Like gcov, but not
   "prohibitively expensive".

 - extable code consolidation for various archs

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  ia64/extable: use generic search and sort routines
  x86/extable: use generic search and sort routines
  s390/extable: use generic search and sort routines
  alpha/extable: use generic search and sort routines
  kernel/...: convert pr_warning to pr_warn
  drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
  drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
  memremap: add MEMREMAP_WC flag
  memremap: don't modify flags
  kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
  mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
  ipc/sem: make semctl setting sempid consistent
  ubsan: fix tree-wide -Wmaybe-uninitialized false positives
  kfifo: fix sparse complaints
  scripts/gdb: account for changes in module data structure
  scripts/gdb: add cmdline reader command
  scripts/gdb: add version command
  kernel: add kcov code coverage
  profile: hide unused functions when !CONFIG_PROC_FS
  hpwdt: use nmi_panic() when kernel panics in NMI handler
  ...
parents b91d9c67 8fe9752e
Loading
Loading
Loading
Loading
+94 −0
Original line number Diff line number Diff line
		    OCFS2 online file check
		    -----------------------

This document will describe OCFS2 online file check feature.

Introduction
============
OCFS2 is often used in high-availaibility systems. However, OCFS2 usually
converts the filesystem to read-only when encounters an error. This may not be
necessary, since turning the filesystem read-only would affect other running
processes as well, decreasing availability.
Then, a mount option (errors=continue) is introduced, which would return the
-EIO errno to the calling process and terminate furhter processing so that the
filesystem is not corrupted further. The filesystem is not converted to
read-only, and the problematic file's inode number is reported in the kernel
log. The user can try to check/fix this file via online filecheck feature.

Scope
=====
This effort is to check/fix small issues which may hinder day-to-day operations
of a cluster filesystem by turning the filesystem read-only. The scope of
checking/fixing is at the file level, initially for regular files and eventually
to all files (including system files) of the filesystem.

In case of directory to file links is incorrect, the directory inode is
reported as erroneous.

This feature is not suited for extravagant checks which involve dependency of
other components of the filesystem, such as but not limited to, checking if the
bits for file blocks in the allocation has been set. In case of such an error,
the offline fsck should/would be recommended.

Finally, such an operation/feature should not be automated lest the filesystem
may end up with more damage than before the repair attempt. So, this has to
be performed using user interaction and consent.

User interface
==============
When there are errors in the OCFS2 filesystem, they are usually accompanied
by the inode number which caused the error. This inode number would be the
input to check/fix the file.

There is a sysfs directory for each OCFS2 file system mounting:

  /sys/fs/ocfs2/<devname>/filecheck

Here, <devname> indicates the name of OCFS2 volumn device which has been already
mounted. The file above would accept inode numbers. This could be used to
communicate with kernel space, tell which file(inode number) will be checked or
fixed. Currently, three operations are supported, which includes checking
inode, fixing inode and setting the size of result record history.

1. If you want to know what error exactly happened to <inode> before fixing, do

  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
  # cat /sys/fs/ocfs2/<devname>/filecheck/check

The output is like this:
  INO		DONE	ERROR
39502		1	GENERATION

<INO> lists the inode numbers.
<DONE> indicates whether the operation has been finished.
<ERROR> says what kind of errors was found. For the detailed error numbers,
please refer to the file linux/fs/ocfs2/filecheck.h.

2. If you determine to fix this inode, do

  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
  # cat /sys/fs/ocfs2/<devname>/filecheck/fix

The output is like this:
  INO		DONE	ERROR
39502		1	SUCCESS

This time, the <ERROR> column indicates whether this fix is successful or not.

3. The record cache is used to store the history of check/fix results. It's
defalut size is 10, and can be adjust between the range of 10 ~ 100. You can
adjust the size like this:

  # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set

Fixing stuff
============
On receivng the inode, the filesystem would read the inode and the
file metadata. In case of errors, the filesystem would fix the errors
and report the problems it fixed in the kernel log. As a precautionary measure,
the inode must first be checked for errors before performing a final fix.

The inode and the result history will be maintained temporarily in a
small linked list buffer which would contain the last (N) inodes
fixed/checked, the detailed errors which were fixed/checked are printed in the
kernel log.
+4 −3
Original line number Diff line number Diff line
@@ -56,9 +56,10 @@ iocharset=<name> -- Character set to use for converting between the
		 you should consider the following option instead.

utf8=<bool>   -- UTF-8 is the filesystem safe version of Unicode that
		 is used by the console.  It can be enabled for the
		 filesystem with this option. If 'uni_xlate' gets set,
		 UTF-8 gets disabled.
		 is used by the console. It can be enabled or disabled
		 for the filesystem with this option.
		 If 'uni_xlate' gets set, UTF-8 gets disabled.
		 By default, FAT_DEFAULT_UTF8 setting is used.

uni_xlate=<bool> -- Translate unhandled Unicode characters to special
		 escaped sequences.  This would let you backup and

Documentation/kcov.txt

0 → 100644
+111 −0
Original line number Diff line number Diff line
kcov: code coverage for fuzzing
===============================

kcov exposes kernel code coverage information in a form suitable for coverage-
guided fuzzing (randomized testing). Coverage data of a running kernel is
exported via the "kcov" debugfs file. Coverage collection is enabled on a task
basis, and thus it can capture precise coverage of a single system call.

Note that kcov does not aim to collect as much coverage as possible. It aims
to collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard interrupts
and instrumentation of some inherently non-deterministic parts of kernel is
disbled (e.g. scheduler, locking).

Usage:
======

Configure kernel with:

        CONFIG_KCOV=y

CONFIG_KCOV requires gcc built on revision 231296 or later.
Profiling data will only become accessible once debugfs has been mounted:

        mount -t debugfs none /sys/kernel/debug

The following program demonstrates kcov usage from within a test program:

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>

#define KCOV_INIT_TRACE			_IOR('c', 1, unsigned long)
#define KCOV_ENABLE			_IO('c', 100)
#define KCOV_DISABLE			_IO('c', 101)
#define COVER_SIZE			(64<<10)

int main(int argc, char **argv)
{
	int fd;
	unsigned long *cover, n, i;

	/* A single fd descriptor allows coverage collection on a single
	 * thread.
	 */
	fd = open("/sys/kernel/debug/kcov", O_RDWR);
	if (fd == -1)
		perror("open"), exit(1);
	/* Setup trace mode and trace size. */
	if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE))
		perror("ioctl"), exit(1);
	/* Mmap buffer shared between kernel- and user-space. */
	cover = (unsigned long*)mmap(NULL, COVER_SIZE * sizeof(unsigned long),
				     PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
	if ((void*)cover == MAP_FAILED)
		perror("mmap"), exit(1);
	/* Enable coverage collection on the current thread. */
	if (ioctl(fd, KCOV_ENABLE, 0))
		perror("ioctl"), exit(1);
	/* Reset coverage from the tail of the ioctl() call. */
	__atomic_store_n(&cover[0], 0, __ATOMIC_RELAXED);
	/* That's the target syscal call. */
	read(-1, NULL, 0);
	/* Read number of PCs collected. */
	n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED);
	for (i = 0; i < n; i++)
		printf("0x%lx\n", cover[i + 1]);
	/* Disable coverage collection for the current thread. After this call
	 * coverage can be enabled for a different thread.
	 */
	if (ioctl(fd, KCOV_DISABLE, 0))
		perror("ioctl"), exit(1);
	/* Free resources. */
	if (munmap(cover, COVER_SIZE * sizeof(unsigned long)))
		perror("munmap"), exit(1);
	if (close(fd))
		perror("close"), exit(1);
	return 0;
}

After piping through addr2line output of the program looks as follows:

SyS_read
fs/read_write.c:562
__fdget_pos
fs/file.c:774
__fget_light
fs/file.c:746
__fget_light
fs/file.c:750
__fget_light
fs/file.c:760
__fdget_pos
fs/file.c:784
SyS_read
fs/read_write.c:562

If a program needs to collect coverage from several threads (independently),
it needs to open /sys/kernel/debug/kcov in each thread separately.

The interface is fine-grained to allow efficient forking of test processes.
That is, a parent process opens /sys/kernel/debug/kcov, enables trace mode,
mmaps coverage buffer and then forks child processes in a loop. Child processes
only need to enable coverage (disable happens automatically on thread end).
+104 −0
Original line number Diff line number Diff line
RapidIO subsystem mport character device driver (rio_mport_cdev.c)
==================================================================

Version History:
----------------
  1.0.0 - Initial driver release.

==================================================================

I. Overview

This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Freescale,
Prodrive Technologies, Nokia Networks, BAE and IDT.  Additional input was
received from other members of RapidIO.org. The objective was to create a
character mode driver interface which exposes the capabilities of RapidIO
devices directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.

This driver (MPORT_CDEV) provides access to basic RapidIO subsystem operations
for user-space applications. Most of RapidIO operations are supported through
'ioctl' system calls.

When loaded this device driver creates filesystem nodes named rio_mportX in /dev
directory for each registered RapidIO mport device. 'X' in the node name matches
to unique port ID assigned to each local mport device.

Using available set of ioctl commands user-space applications can perform
following RapidIO bus and subsystem operations:

- Reads and writes from/to configuration registers of mport devices
    (RIO_MPORT_MAINT_READ_LOCAL/RIO_MPORT_MAINT_WRITE_LOCAL)
- Reads and writes from/to configuration registers of remote RapidIO devices.
  This operations are defined as RapidIO Maintenance reads/writes in RIO spec.
    (RIO_MPORT_MAINT_READ_REMOTE/RIO_MPORT_MAINT_WRITE_REMOTE)
- Set RapidIO Destination ID for mport devices (RIO_MPORT_MAINT_HDID_SET)
- Set RapidIO Component Tag for mport devices (RIO_MPORT_MAINT_COMPTAG_SET)
- Query logical index of mport devices (RIO_MPORT_MAINT_PORT_IDX_GET)
- Query capabilities and RapidIO link configuration of mport devices
    (RIO_MPORT_GET_PROPERTIES)
- Enable/Disable reporting of RapidIO doorbell events to user-space applications
    (RIO_ENABLE_DOORBELL_RANGE/RIO_DISABLE_DOORBELL_RANGE)
- Enable/Disable reporting of RIO port-write events to user-space applications
    (RIO_ENABLE_PORTWRITE_RANGE/RIO_DISABLE_PORTWRITE_RANGE)
- Query/Control type of events reported through this driver: doorbells,
  port-writes or both (RIO_SET_EVENT_MASK/RIO_GET_EVENT_MASK)
- Configure/Map mport's outbound requests window(s) for specific size,
  RapidIO destination ID, hopcount and request type
    (RIO_MAP_OUTBOUND/RIO_UNMAP_OUTBOUND)
- Configure/Map mport's inbound requests window(s) for specific size,
  RapidIO base address and local memory base address
    (RIO_MAP_INBOUND/RIO_UNMAP_INBOUND)
- Allocate/Free contiguous DMA coherent memory buffer for DMA data transfers
  to/from remote RapidIO devices (RIO_ALLOC_DMA/RIO_FREE_DMA)
- Initiate DMA data transfers to/from remote RapidIO devices (RIO_TRANSFER).
  Supports blocking, asynchronous and posted (a.k.a 'fire-and-forget') data
  transfer modes.
- Check/Wait for completion of asynchronous DMA data transfer
    (RIO_WAIT_FOR_ASYNC)
- Manage device objects supported by RapidIO subsystem (RIO_DEV_ADD/RIO_DEV_DEL).
  This allows implementation of various RapidIO fabric enumeration algorithms
  as user-space applications while using remaining functionality provided by
  kernel RapidIO subsystem.

II. Hardware Compatibility

This device driver uses standard interfaces defined by kernel RapidIO subsystem
and therefore it can be used with any mport device driver registered by RapidIO
subsystem with limitations set by available mport implementation.

At this moment the most common limitation is availability of RapidIO-specific
DMA engine framework for specific mport device. Users should verify available
functionality of their platform when planning to use this driver:

- IDT Tsi721 PCIe-to-RapidIO bridge device and its mport device driver are fully
  compatible with this driver.
- Freescale SoCs 'fsl_rio' mport driver does not have implementation for RapidIO
  specific DMA engine support and therefore DMA data transfers mport_cdev driver
  are not available.

III. Module parameters

- 'dbg_level' - This parameter allows to control amount of debug information
        generated by this device driver. This parameter is formed by set of
        This parameter can be changed bit masks that correspond to the specific
        functional block.
        For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
        This parameter can be changed dynamically.
        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.

IV. Known problems

  None.

V. User-space Applications and API

API library and applications that use this device driver are available from
RapidIO.org.

VI. TODO List

- Add support for sending/receiving "raw" RapidIO messaging packets.
- Add memory mapped DMA data transfers as an option when RapidIO-specific DMA
  is not available.
+9 −0
Original line number Diff line number Diff line
@@ -16,6 +16,15 @@ For inbound messages this driver uses destination ID matching to forward message
into the corresponding message queue. Messaging callbacks are implemented to be
fully compatible with RIONET driver (Ethernet over RapidIO messaging services).

1. Module parameters:
- 'dbg_level' - This parameter allows to control amount of debug information
        generated by this device driver. This parameter is formed by set of
        This parameter can be changed bit masks that correspond to the specific
        functional block.
        For mask definitions see 'drivers/rapidio/devices/tsi721.h'
        This parameter can be changed dynamically.
        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.

II. Known problems

  None.
Loading