scsi: ufs: fix deadlock between resume and eh_work
A deadlock condition can occur as per below sequence of events:
1. SSU command timed out in context of ufshcd_resume.
2. ufshcd_abort invoked after ssu command timeout on device WLUN and
scheduled eh_work and waiting for flush on eh_work.
3. pm_runtime_get_sync invoked from ufshcd_err_handling_prepare in
eh_work remained pending as runtime_status is RPM_RESUMING due
to ufshcd_resume(step1).
Fix this by :
1. skipping the pm_runtime_get_sync call for WLUN(ssu command) in
ufshcd_err_handling_prepare invoked as part of eh_work and continue
with the err_handler.
2. Later check the device state and link state after ssu returned
failure in ufshcd_resume as per below conditions.
a. If current dev and link state is active, dont return error as
dev and link state is in correct state and hence proceed with the
resume.
b. If current dev and link state is not active but err_handler is in
progress, wait on err handler to get finished and then proceed
with the resume.
c. If current dev and link state is not active and err_handler is
not in progress as well, then let resume process abort.
Thread1(ufshcd_resume):
wait_for_completion_io()
blk_execute_rq()
__scsi_execute()
ufshcd_set_dev_pwr_mode()
ufshcd_resume()
ufshcd_runtime_resume
Thread2(ufshcd_abort):
flush_work() >> eh_work
ufshcd_eh_host_reset_handler()
ufshcd_abort()
scsi_try_to_abort_cmd(inline)
scmd_eh_abort_handler()
Thread3(err_handler):
rpm_resume()
__pm_runtime_resume()
ufshcd_err_handling_prepare()
ufshcd_err_handler().
Change-Id: I04a3cddecad4beda957d4d4f2fa3d7096f111c6d
Signed-off-by:
Nitin Rawat <quic_nitirawa@quicinc.com>
Loading
Please register or sign in to comment