mm/hwpoison: fix panic due to split huge zero page
Bug:
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:1957!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 snd_hda_codec_realtek snd_hda_codec_generic nfsv4 dns_re
  CPU: 2 PID: 2576 Comm: test_huge Not tainted 4.2.0-rc5-mm1+ #27
  Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
  task: ffff880204e3d600 ti: ffff8800db16c000 task.ti: ffff8800db16c000
  RIP: split_huge_page_to_list+0xdb/0x120
  Call Trace:
    memory_failure+0x32e/0x7c0
    madvise_hwpoison+0x8b/0x160
    SyS_madvise+0x40/0x240
    ? do_page_fault+0x37/0x90
    entry_SYSCALL_64_fastpath+0x12/0x71
  Code: ff f0 41 ff 4c 24 30 74 0d 31 c0 48 83 c4 08 5b 41 5c 41 5d c9 c3 4c 89 e7 e8 e2 58 fd ff 48 83 c4 08 31 c0
  RIP  split_huge_page_to_list+0xdb/0x120
   RSP <ffff8800db16fde8>
  ---[ end trace aee7ce0df8e44076 ]---
Testcase:
    #define _GNU_SOURCE
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/mman.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <sys/types.h>
    #include <errno.h>
    #include <string.h>
    #define MB 1024*1024
    int main(void)
    {
            char *mem;
            posix_memalign((void **)&mem, 2 * MB, 200 * MB);
            madvise(mem, 200 * MB, MADV_HWPOISON);
            free(mem);
            return 0;
    }
Huge zero page is allocated if page fault w/o FAULT_FLAG_WRITE flag.
The get_user_pages_fast() which called in madvise_hwpoison() will get
huge zero page if the page is not allocated before.  Huge zero page is a
tranparent huge page, however, it is not an anonymous page.
memory_failure will split the huge zero page and trigger
BUG_ON(is_huge_zero_page(page));
After commit 98ed2b00 ("mm/memory-failure: give up error handling
for non-tail-refcounted thp"), memory_failure will not catch non anon
thp from madvise_hwpoison path and this bug occur.
Fix it by catching non anon thp in memory_failure in order to not split
huge zero page in madvise_hwpoison path.
After this patch:
  Injecting memory failure for page 0x202800 at 0x7fd8ae800000
  MCE: 0x202800: non anonymous thp
  [...]
[akpm@linux-foundation.org: remove second split, per Wanpeng]
Signed-off-by:  Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by:
Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by:  Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by:
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by:  Andrew Morton <akpm@linux-foundation.org>
Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
Signed-off-by:  Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds <torvalds@linux-foundation.org>
Loading
Please register or sign in to comment
