drm/amdgpu: fix race condition in amd_sched_entity_push_job (786b5219) · Commits · e / devices / android_kernel_oneplus_sm8150

As soon as we leave the spinlock after the job has been added to the job queue, we can no longer rely on the job's data to be available. I have seen a null-pointer dereference due to sched == NULL in amd_sched_wakeup via amd_sched_entity_push_job and amd_sched_ib_submit_kernel_helper. Since the latter initializes sched_job->sched with the address of the ring scheduler, which is guaranteed to be non-NULL, this race appears to be a likely culprit. Signed-off-by:

Nicolai Hähnle <nicolai.haehnle@amd.com> Bugzilla: https://bugs.freedesktop.org/attachment.cgi?bugid=93079 Reviewed-by:

Christian König <christian.koenig@amd.com>

drivers/gpu/drm/amd/scheduler/gpu_scheduler.c

+3 −2

Original line number	Diff line number	Diff line
		@@ -288,6 +288,7 @@ amd_sched_entity_pop_job(struct amd_sched_entity *entity)
		*/
		static bool amd_sched_entity_in(struct amd_sched_job *sched_job)
		{
		struct amd_gpu_scheduler *sched = sched_job->sched;
		struct amd_sched_entity *entity = sched_job->s_entity;
		bool added, first = false;

		@@ -302,7 +303,7 @@ static bool amd_sched_entity_in(struct amd_sched_job *sched_job)

		/* first job wakes up scheduler */
		if (first)
		amd_sched_wakeup(sched_job->sched);
		amd_sched_wakeup(sched);

		return added;
		}
		@@ -318,9 +319,9 @@ void amd_sched_entity_push_job(struct amd_sched_job *sched_job)
		{
		struct amd_sched_entity *entity = sched_job->s_entity;

		trace_amd_sched_job(sched_job);
		wait_event(entity->sched->job_scheduled,
		amd_sched_entity_in(sched_job));
		trace_amd_sched_job(sched_job);
		}

		/**