Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit c7c4e7ff authored by Bart Van Assche's avatar Bart Van Assche Committed by Roland Dreier
Browse files

IB/srp: Avoid endless SCSI error handling loop



If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out.  If aborting fails, a device reset will be attempted.  If
the device reset also fails a host reset will be attempted.  If the
host reset also fails the whole procedure will be repeated.

srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started.  Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.

Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
  has already started or if reconnecting fails.

Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
Acked-by: default avatarDavid Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
parent 3780d1f0
Loading
Loading
Loading
Loading
+15 −14
Original line number Original line Diff line number Diff line
@@ -700,23 +700,24 @@ static int srp_reconnect_target(struct srp_target_port *target)
	struct Scsi_Host *shost = target->scsi_host;
	struct Scsi_Host *shost = target->scsi_host;
	int i, ret;
	int i, ret;


	if (target->state != SRP_TARGET_LIVE)
		return -EAGAIN;

	scsi_target_block(&shost->shost_gendev);
	scsi_target_block(&shost->shost_gendev);


	srp_disconnect_target(target);
	srp_disconnect_target(target);
	/*
	/*
	 * Now get a new local CM ID so that we avoid confusing the
	 * Now get a new local CM ID so that we avoid confusing the target in
	 * target in case things are really fouled up.
	 * case things are really fouled up. Doing so also ensures that all CM
	 * callbacks will have finished before a new QP is allocated.
	 */
	 */
	ret = srp_new_cm_id(target);
	ret = srp_new_cm_id(target);
	if (ret)
	/*
		goto unblock;
	 * Whether or not creating a new CM ID succeeded, create a new

	 * QP. This guarantees that all completion callback function
	 * invocations have finished before request resetting starts.
	 */
	if (ret == 0)
		ret = srp_create_target_ib(target);
		ret = srp_create_target_ib(target);
	if (ret)
	else
		goto unblock;
		srp_create_target_ib(target);


	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
		struct srp_request *req = &target->req_ring[i];
		struct srp_request *req = &target->req_ring[i];
@@ -728,9 +729,9 @@ static int srp_reconnect_target(struct srp_target_port *target)
	for (i = 0; i < SRP_SQ_SIZE; ++i)
	for (i = 0; i < SRP_SQ_SIZE; ++i)
		list_add(&target->tx_ring[i]->list, &target->free_tx);
		list_add(&target->tx_ring[i]->list, &target->free_tx);


	if (ret == 0)
		ret = srp_connect_target(target);
		ret = srp_connect_target(target);


unblock:
	scsi_target_unblock(&shost->shost_gendev, ret == 0 ? SDEV_RUNNING :
	scsi_target_unblock(&shost->shost_gendev, ret == 0 ? SDEV_RUNNING :
			    SDEV_TRANSPORT_OFFLINE);
			    SDEV_TRANSPORT_OFFLINE);


@@ -1739,7 +1740,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)


	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");


	if (!req || target->qp_in_error || !srp_claim_req(target, req, scmnd))
	if (!req || !srp_claim_req(target, req, scmnd))
		return FAILED;
		return FAILED;
	srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
	srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
			  SRP_TSK_ABORT_TASK);
			  SRP_TSK_ABORT_TASK);