sched/tune: Fix improper accounting of tasks
cgroup_migrate_execute() calls can_attach() and css_set_move_task() separately without holding rq->lock. The schedtune implementation breaks here, since can_attach() accounts for the task move way before the group move is committed. If the task sleeps right after can_attach(), the sleep is accounted towards the previous group. This ends up in disparity of counts between group. Consider this race: TaskA is moved from root_grp to topapp_grp, root_grp's tasks = 1 and topapp tasks =0 right before the move and TaskB is moving it. On cpu X TaskA runs * cgroup_migrate_execute() schedtune_can_attach() root_grp.tasks--; topapp_grp.tasks++; (root_grp.tasks = 0 and topapp_grp.tasks = 1) *right at this moment context is switched and TaskA runs. *TaskA sleeps dequeue_task() schedtune_dequeue_task() schedtune_task_update root_grp.tasks--; //TaskA has not really "switched" group, so it decrements from the root_grp, however can_attach() has accounted the task move and this leaves us with root_grp.tasks = 0 (it is -ve value protected) topapp.grp.tasks = 1 Now even if cpuX is idle (TaskA is long gone sleeping), its topapp_grp.tasks continues to stay +ve and it is subject to topapp's boost unnecessarily. An easy way to fix this is to move the group change accounting in attach() callback which gets called _after_ css_set_move_task(). Also maintain the task's current idx in struct task_struct as it moves between groups. The task's enqueue/dequeue is accounted towards the cached idx value. In an event when the task dequeues just before group changes, it gets subtracted from the old group, which is correct because the task would have bumped up the old group's count. If the task changes group while its running, the attach() callback has to decrement from the old group and increment from the new group so that the next dequeue will subtract from the new group. IOW the attach() callback has to account only for running task but has to update the cached index for both running and sleeping task. The current uses task->on_rq != 0 check to determine whether a task is queued on the runqueue or not. This is an incorrect check. Because task->on_rq is set to TASK_ON_RQ_MIGRATING (value = 2) during migration. Fix this by using task_on_rq_queued() to check if a task is queued or not. Change-Id: If412da5a239c18d9122cfad2be59b355c14c068f Signed-off-by:Abhijeet Dharmapurikar <adharmap@codeaurora.org> Co-developed-by:
Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by:
Pavankumar Kondeti <pkondeti@codeaurora.org>
Loading
Please register or sign in to comment