Commit 90c8ff46 authored Nov 24, 2021 by Nitin Rawat

scsi: ufs: fix deadlock between resume and eh_work



A deadlock condition can occur as per below sequence of events:

1. SSU command timed out in context of ufshcd_resume.

2. ufshcd_abort invoked after ssu command timeout on device WLUN and
   scheduled eh_work and waiting for flush on eh_work.

3. pm_runtime_get_sync invoked from ufshcd_err_handling_prepare in
   eh_work remained pending as runtime_status is RPM_RESUMING due
   to ufshcd_resume(step1).

Fix this by :
1. skipping the pm_runtime_get_sync call for WLUN(ssu command) in
   ufshcd_err_handling_prepare invoked as part of eh_work and continue
   with the err_handler.

2.  Later check the device state and link state after ssu returned
    failure in ufshcd_resume as per below conditions.

	a. If current dev and link state is active, dont return error as
	   dev and link state is in correct state and hence proceed with the
	   resume.

	b. If current dev and link state is not active but err_handler is in
	   progress, wait on err handler to get finished and then proceed
	   with the resume.

	c. If current dev and link state is not active and err_handler is
	   not in progress as well, then let resume process abort.

Thread1(ufshcd_resume):
 wait_for_completion_io()
 blk_execute_rq()
 __scsi_execute()
 ufshcd_set_dev_pwr_mode()
 ufshcd_resume()
 ufshcd_runtime_resume

Thread2(ufshcd_abort):
 flush_work() >> eh_work
 ufshcd_eh_host_reset_handler()
 ufshcd_abort()
 scsi_try_to_abort_cmd(inline)
 scmd_eh_abort_handler()

Thread3(err_handler):
 rpm_resume()
 __pm_runtime_resume()
 ufshcd_err_handling_prepare()
 ufshcd_err_handler().

Change-Id: I04a3cddecad4beda957d4d4f2fa3d7096f111c6d
Signed-off-by: Nitin Rawat <quic_nitirawa@quicinc.com>

parent 862b38de

Show whitespace changes

Inline Side-by-side

Please register or to comment