Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Skip to content
Commit de82bdff authored by Mitko Haralanov's avatar Mitko Haralanov Committed by Doug Ledford
Browse files

IB/hfi1: Fix deadlock caused by locking with wrong scope



The locking around the interval RB tree is designed to prevent
access to the tree while it's being modified. The locking in its
current form is too overzealous, which is causing a deadlock in
certain cases with the following backtrace:

    Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
    CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G           O 3.12.18-wfr+ #1
     0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
     ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
     ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
    Call Trace:
     <NMI>  [<ffffffff814f1caa>] dump_stack+0x45/0x56
     [<ffffffff814ecd56>] panic+0xc2/0x1cb
     [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
     [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
     [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
     [<ffffffff8110a714>] perf_event_overflow+0x14/0x20
     [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
     [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
     [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
     [<ffffffff814f8d39>] do_nmi+0x169/0x310
     [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
     [<ffffffff81272600>] ? unmap_single+0x30/0x30
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     <<EOE>>  <IRQ>  [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
     [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
     [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
     [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
     [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
     [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
     [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
     [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
     [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
     [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
     [<ffffffff81098017>] handle_irq_event+0x37/0x60
     [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
     [<ffffffff810044af>] handle_irq+0xbf/0x150
     [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80
     [<ffffffff8150168d>] do_IRQ+0x4d/0xc0
     [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
     <EOI>  [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0
     [<ffffffff810763a6>] __cond_resched+0x26/0x30
     [<ffffffff814f5eda>] _cond_resched+0x3a/0x50
     [<ffffffff814f4f82>] down_write+0x12/0x30
     [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
     [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
     [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
     [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
     [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
     [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
     [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
     [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230
     [<ffffffff811a9da1>] ? fsnotify+0x241/0x320
     [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff8116b795>] vfs_writev+0x35/0x60
     [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0
     [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
     [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b

As evident from the backtrace above, the process was being put to sleep
while holding the lock.

Limiting the scope of the lock only to the RB tree operation fixes the
above error allowing for proper locking and the process being put to
sleep when needed.

Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
Signed-off-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
parent f19bd643
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment