Loading Documentation/power/devices.txt +30 −4 Original line number Diff line number Diff line Loading @@ -2,6 +2,7 @@ Device Power Management Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu> Copyright (c) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> Most of the code in Linux is device drivers, so most of the Linux power Loading Loading @@ -326,6 +327,20 @@ the phases are: driver in some way for the upcoming system power transition, but it should not put the device into a low-power state. For devices supporting runtime power management, the return value of the prepare callback can be used to indicate to the PM core that it may safely leave the device in runtime suspend (if runtime-suspended already), provided that all of the device's descendants are also left in runtime suspend. Namely, if the prepare callback returns a positive number and that happens for all of the descendants of the device too, and all of them (including the device itself) are runtime-suspended, the PM core will skip the suspend, suspend_late and suspend_noirq suspend phases as well as the resume_noirq, resume_early and resume phases of the following system resume for all of these devices. In that case, the complete callback will be called directly after the prepare callback and is entirely responsible for bringing the device back to the functional state as appropriate. 2. The suspend methods should quiesce the device to stop it from performing I/O. They also may save the device registers and put it into the appropriate low-power state, depending on the bus type the device is on, Loading Loading @@ -400,12 +415,23 @@ When resuming from freeze, standby or memory sleep, the phases are: the resume callbacks occur; it's not necessary to wait until the complete phase. Moreover, if the preceding prepare callback returned a positive number, the device may have been left in runtime suspend throughout the whole system suspend and resume (the suspend, suspend_late, suspend_noirq phases of system suspend and the resume_noirq, resume_early, resume phases of system resume may have been skipped for it). In that case, the complete callback is entirely responsible for bringing the device back to the functional state after system suspend if necessary. [For example, it may need to queue up a runtime resume request for the device for this purpose.] To check if that is the case, the complete callback can consult the device's power.direct_complete flag. Namely, if that flag is set when the complete callback is being run, it has been called directly after the preceding prepare and special action may be required to make the device work correctly afterward. At the end of these phases, drivers should be as functional as they were before suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are gated on. Even if the device was in a low-power state before the system sleep because of runtime power management, afterwards it should be back in its full-power state. There are multiple reasons why it's best to do this; they are discussed in more detail in Documentation/power/runtime_pm.txt. gated on. However, the details here may again be platform-specific. For example, some systems support multiple "run" states, and the mode in effect at Loading Documentation/power/runtime_pm.txt +17 −0 Original line number Diff line number Diff line Loading @@ -2,6 +2,7 @@ Runtime Power Management Framework for I/O Devices (C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. (C) 2010 Alan Stern <stern@rowland.harvard.edu> (C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> 1. Introduction Loading Loading @@ -444,6 +445,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: bool pm_runtime_status_suspended(struct device *dev); - return true if the device's runtime PM status is 'suspended' bool pm_runtime_suspended_if_enabled(struct device *dev); - return true if the device's runtime PM status is 'suspended' and its 'power.disable_depth' field is equal to 1 void pm_runtime_allow(struct device *dev); - set the power.runtime_auto flag for the device and decrease its usage counter (used by the /sys/devices/.../power/control interface to Loading Loading @@ -644,6 +649,18 @@ place (in particular, if the system is not waking up from hibernation), it may be more efficient to leave the devices that had been suspended before the system suspend began in the suspended state. To this end, the PM core provides a mechanism allowing some coordination between different levels of device hierarchy. Namely, if a system suspend .prepare() callback returns a positive number for a device, that indicates to the PM core that the device appears to be runtime-suspended and its state is fine, so it may be left in runtime suspend provided that all of its descendants are also left in runtime suspend. If that happens, the PM core will not execute any system suspend and resume callbacks for all of those devices, except for the complete callback, which is then entirely responsible for handling the device as appropriate. This only applies to system suspend transitions that are not related to hibernation (see Documentation/power/devices.txt for more information). The PM core does its best to reduce the probability of race conditions between the runtime PM and system suspend/resume (and hibernation) callbacks by carrying out the following operations: Loading Documentation/power/swsusp.txt +4 −1 Original line number Diff line number Diff line Loading @@ -220,7 +220,10 @@ Q: After resuming, system is paging heavily, leading to very bad interactivity. A: Try running cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file do test -f "$file" && cat "$file" > /dev/null done after resume. swapoff -a; swapon -a may also be useful. Loading drivers/base/power/main.c +51 −15 Original line number Diff line number Diff line Loading @@ -479,7 +479,7 @@ static int device_resume_noirq(struct device *dev, pm_message_t state, bool asyn TRACE_DEVICE(dev); TRACE_RESUME(0); if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Out; if (!dev->power.is_noirq_suspended) Loading Loading @@ -605,7 +605,7 @@ static int device_resume_early(struct device *dev, pm_message_t state, bool asyn TRACE_DEVICE(dev); TRACE_RESUME(0); if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Out; if (!dev->power.is_late_suspended) Loading Loading @@ -735,6 +735,12 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; if (dev->power.direct_complete) { /* Match the pm_runtime_disable() in __device_suspend(). */ pm_runtime_enable(dev); goto Complete; } dpm_wait(dev->parent, async); dpm_watchdog_set(&wd, dev); device_lock(dev); Loading Loading @@ -1007,7 +1013,7 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a goto Complete; } if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Complete; dpm_wait_for_children(dev, async); Loading Loading @@ -1146,7 +1152,7 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as goto Complete; } if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Complete; dpm_wait_for_children(dev, async); Loading Loading @@ -1332,6 +1338,17 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; if (dev->power.direct_complete) { if (pm_runtime_status_suspended(dev)) { pm_runtime_disable(dev); if (pm_runtime_suspended_if_enabled(dev)) goto Complete; pm_runtime_enable(dev); } dev->power.direct_complete = false; } dpm_watchdog_set(&wd, dev); device_lock(dev); Loading Loading @@ -1382,10 +1399,19 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) End: if (!error) { struct device *parent = dev->parent; dev->power.is_suspended = true; if (parent) { spin_lock_irq(&parent->power.lock); dev->parent->power.direct_complete = false; if (dev->power.wakeup_path && dev->parent && !dev->parent->power.ignore_children) && !dev->parent->power.ignore_children) dev->parent->power.wakeup_path = true; spin_unlock_irq(&parent->power.lock); } } device_unlock(dev); Loading Loading @@ -1487,7 +1513,7 @@ static int device_prepare(struct device *dev, pm_message_t state) { int (*callback)(struct device *) = NULL; char *info = NULL; int error = 0; int ret = 0; if (dev->power.syscore) return 0; Loading Loading @@ -1523,17 +1549,27 @@ static int device_prepare(struct device *dev, pm_message_t state) callback = dev->driver->pm->prepare; } if (callback) { error = callback(dev); suspend_report_result(callback, error); } if (callback) ret = callback(dev); device_unlock(dev); if (error) if (ret < 0) { suspend_report_result(callback, ret); pm_runtime_put(dev); return error; return ret; } /* * A positive return value from ->prepare() means "this device appears * to be runtime-suspended and its state is fine, so if it really is * runtime-suspended, you can leave it in that state provided that you * will do the same thing with all of its descendants". This only * applies to suspend transitions, however. */ spin_lock_irq(&dev->power.lock); dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND; spin_unlock_irq(&dev->power.lock); return 0; } /** Loading drivers/cpuidle/cpuidle.c +42 −13 Original line number Diff line number Diff line Loading @@ -32,6 +32,7 @@ LIST_HEAD(cpuidle_detected_devices); static int enabled_devices; static int off __read_mostly; static int initialized __read_mostly; static bool use_deepest_state __read_mostly; int cpuidle_disabled(void) { Loading Loading @@ -65,23 +66,42 @@ int cpuidle_play_dead(void) } /** * cpuidle_enabled - check if the cpuidle framework is ready * @dev: cpuidle device for this cpu * @drv: cpuidle driver for this cpu * cpuidle_use_deepest_state - Enable/disable the "deepest idle" mode. * @enable: Whether enable or disable the feature. * * If the "deepest idle" mode is enabled, cpuidle will ignore the governor and * always use the state with the greatest exit latency (out of the states that * are not disabled). * * Return 0 on success, otherwise: * -NODEV : the cpuidle framework is not available * -EBUSY : the cpuidle framework is not initialized * This function can only be called after cpuidle_pause() to avoid races. */ int cpuidle_enabled(struct cpuidle_driver *drv, struct cpuidle_device *dev) void cpuidle_use_deepest_state(bool enable) { if (off || !initialized) return -ENODEV; use_deepest_state = enable; } if (!drv || !dev || !dev->enabled) return -EBUSY; /** * cpuidle_find_deepest_state - Find the state of the greatest exit latency. * @drv: cpuidle driver for a given CPU. * @dev: cpuidle device for a given CPU. */ static int cpuidle_find_deepest_state(struct cpuidle_driver *drv, struct cpuidle_device *dev) { unsigned int latency_req = 0; int i, ret = CPUIDLE_DRIVER_STATE_START - 1; return 0; for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; struct cpuidle_state_usage *su = &dev->states_usage[i]; if (s->disabled || su->disable || s->exit_latency <= latency_req) continue; latency_req = s->exit_latency; ret = i; } return ret; } /** Loading Loading @@ -138,6 +158,15 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, */ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) { if (off || !initialized) return -ENODEV; if (!drv || !dev || !dev->enabled) return -EBUSY; if (unlikely(use_deepest_state)) return cpuidle_find_deepest_state(drv, dev); return cpuidle_curr_governor->select(drv, dev); } Loading Loading @@ -169,7 +198,7 @@ int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ void cpuidle_reflect(struct cpuidle_device *dev, int index) { if (cpuidle_curr_governor->reflect) if (cpuidle_curr_governor->reflect && !unlikely(use_deepest_state)) cpuidle_curr_governor->reflect(dev, index); } Loading Loading
Documentation/power/devices.txt +30 −4 Original line number Diff line number Diff line Loading @@ -2,6 +2,7 @@ Device Power Management Copyright (c) 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. Copyright (c) 2010 Alan Stern <stern@rowland.harvard.edu> Copyright (c) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> Most of the code in Linux is device drivers, so most of the Linux power Loading Loading @@ -326,6 +327,20 @@ the phases are: driver in some way for the upcoming system power transition, but it should not put the device into a low-power state. For devices supporting runtime power management, the return value of the prepare callback can be used to indicate to the PM core that it may safely leave the device in runtime suspend (if runtime-suspended already), provided that all of the device's descendants are also left in runtime suspend. Namely, if the prepare callback returns a positive number and that happens for all of the descendants of the device too, and all of them (including the device itself) are runtime-suspended, the PM core will skip the suspend, suspend_late and suspend_noirq suspend phases as well as the resume_noirq, resume_early and resume phases of the following system resume for all of these devices. In that case, the complete callback will be called directly after the prepare callback and is entirely responsible for bringing the device back to the functional state as appropriate. 2. The suspend methods should quiesce the device to stop it from performing I/O. They also may save the device registers and put it into the appropriate low-power state, depending on the bus type the device is on, Loading Loading @@ -400,12 +415,23 @@ When resuming from freeze, standby or memory sleep, the phases are: the resume callbacks occur; it's not necessary to wait until the complete phase. Moreover, if the preceding prepare callback returned a positive number, the device may have been left in runtime suspend throughout the whole system suspend and resume (the suspend, suspend_late, suspend_noirq phases of system suspend and the resume_noirq, resume_early, resume phases of system resume may have been skipped for it). In that case, the complete callback is entirely responsible for bringing the device back to the functional state after system suspend if necessary. [For example, it may need to queue up a runtime resume request for the device for this purpose.] To check if that is the case, the complete callback can consult the device's power.direct_complete flag. Namely, if that flag is set when the complete callback is being run, it has been called directly after the preceding prepare and special action may be required to make the device work correctly afterward. At the end of these phases, drivers should be as functional as they were before suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are gated on. Even if the device was in a low-power state before the system sleep because of runtime power management, afterwards it should be back in its full-power state. There are multiple reasons why it's best to do this; they are discussed in more detail in Documentation/power/runtime_pm.txt. gated on. However, the details here may again be platform-specific. For example, some systems support multiple "run" states, and the mode in effect at Loading
Documentation/power/runtime_pm.txt +17 −0 Original line number Diff line number Diff line Loading @@ -2,6 +2,7 @@ Runtime Power Management Framework for I/O Devices (C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. (C) 2010 Alan Stern <stern@rowland.harvard.edu> (C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> 1. Introduction Loading Loading @@ -444,6 +445,10 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: bool pm_runtime_status_suspended(struct device *dev); - return true if the device's runtime PM status is 'suspended' bool pm_runtime_suspended_if_enabled(struct device *dev); - return true if the device's runtime PM status is 'suspended' and its 'power.disable_depth' field is equal to 1 void pm_runtime_allow(struct device *dev); - set the power.runtime_auto flag for the device and decrease its usage counter (used by the /sys/devices/.../power/control interface to Loading Loading @@ -644,6 +649,18 @@ place (in particular, if the system is not waking up from hibernation), it may be more efficient to leave the devices that had been suspended before the system suspend began in the suspended state. To this end, the PM core provides a mechanism allowing some coordination between different levels of device hierarchy. Namely, if a system suspend .prepare() callback returns a positive number for a device, that indicates to the PM core that the device appears to be runtime-suspended and its state is fine, so it may be left in runtime suspend provided that all of its descendants are also left in runtime suspend. If that happens, the PM core will not execute any system suspend and resume callbacks for all of those devices, except for the complete callback, which is then entirely responsible for handling the device as appropriate. This only applies to system suspend transitions that are not related to hibernation (see Documentation/power/devices.txt for more information). The PM core does its best to reduce the probability of race conditions between the runtime PM and system suspend/resume (and hibernation) callbacks by carrying out the following operations: Loading
Documentation/power/swsusp.txt +4 −1 Original line number Diff line number Diff line Loading @@ -220,7 +220,10 @@ Q: After resuming, system is paging heavily, leading to very bad interactivity. A: Try running cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file do test -f "$file" && cat "$file" > /dev/null done after resume. swapoff -a; swapon -a may also be useful. Loading
drivers/base/power/main.c +51 −15 Original line number Diff line number Diff line Loading @@ -479,7 +479,7 @@ static int device_resume_noirq(struct device *dev, pm_message_t state, bool asyn TRACE_DEVICE(dev); TRACE_RESUME(0); if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Out; if (!dev->power.is_noirq_suspended) Loading Loading @@ -605,7 +605,7 @@ static int device_resume_early(struct device *dev, pm_message_t state, bool asyn TRACE_DEVICE(dev); TRACE_RESUME(0); if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Out; if (!dev->power.is_late_suspended) Loading Loading @@ -735,6 +735,12 @@ static int device_resume(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; if (dev->power.direct_complete) { /* Match the pm_runtime_disable() in __device_suspend(). */ pm_runtime_enable(dev); goto Complete; } dpm_wait(dev->parent, async); dpm_watchdog_set(&wd, dev); device_lock(dev); Loading Loading @@ -1007,7 +1013,7 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a goto Complete; } if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Complete; dpm_wait_for_children(dev, async); Loading Loading @@ -1146,7 +1152,7 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as goto Complete; } if (dev->power.syscore) if (dev->power.syscore || dev->power.direct_complete) goto Complete; dpm_wait_for_children(dev, async); Loading Loading @@ -1332,6 +1338,17 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; if (dev->power.direct_complete) { if (pm_runtime_status_suspended(dev)) { pm_runtime_disable(dev); if (pm_runtime_suspended_if_enabled(dev)) goto Complete; pm_runtime_enable(dev); } dev->power.direct_complete = false; } dpm_watchdog_set(&wd, dev); device_lock(dev); Loading Loading @@ -1382,10 +1399,19 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async) End: if (!error) { struct device *parent = dev->parent; dev->power.is_suspended = true; if (parent) { spin_lock_irq(&parent->power.lock); dev->parent->power.direct_complete = false; if (dev->power.wakeup_path && dev->parent && !dev->parent->power.ignore_children) && !dev->parent->power.ignore_children) dev->parent->power.wakeup_path = true; spin_unlock_irq(&parent->power.lock); } } device_unlock(dev); Loading Loading @@ -1487,7 +1513,7 @@ static int device_prepare(struct device *dev, pm_message_t state) { int (*callback)(struct device *) = NULL; char *info = NULL; int error = 0; int ret = 0; if (dev->power.syscore) return 0; Loading Loading @@ -1523,17 +1549,27 @@ static int device_prepare(struct device *dev, pm_message_t state) callback = dev->driver->pm->prepare; } if (callback) { error = callback(dev); suspend_report_result(callback, error); } if (callback) ret = callback(dev); device_unlock(dev); if (error) if (ret < 0) { suspend_report_result(callback, ret); pm_runtime_put(dev); return error; return ret; } /* * A positive return value from ->prepare() means "this device appears * to be runtime-suspended and its state is fine, so if it really is * runtime-suspended, you can leave it in that state provided that you * will do the same thing with all of its descendants". This only * applies to suspend transitions, however. */ spin_lock_irq(&dev->power.lock); dev->power.direct_complete = ret > 0 && state.event == PM_EVENT_SUSPEND; spin_unlock_irq(&dev->power.lock); return 0; } /** Loading
drivers/cpuidle/cpuidle.c +42 −13 Original line number Diff line number Diff line Loading @@ -32,6 +32,7 @@ LIST_HEAD(cpuidle_detected_devices); static int enabled_devices; static int off __read_mostly; static int initialized __read_mostly; static bool use_deepest_state __read_mostly; int cpuidle_disabled(void) { Loading Loading @@ -65,23 +66,42 @@ int cpuidle_play_dead(void) } /** * cpuidle_enabled - check if the cpuidle framework is ready * @dev: cpuidle device for this cpu * @drv: cpuidle driver for this cpu * cpuidle_use_deepest_state - Enable/disable the "deepest idle" mode. * @enable: Whether enable or disable the feature. * * If the "deepest idle" mode is enabled, cpuidle will ignore the governor and * always use the state with the greatest exit latency (out of the states that * are not disabled). * * Return 0 on success, otherwise: * -NODEV : the cpuidle framework is not available * -EBUSY : the cpuidle framework is not initialized * This function can only be called after cpuidle_pause() to avoid races. */ int cpuidle_enabled(struct cpuidle_driver *drv, struct cpuidle_device *dev) void cpuidle_use_deepest_state(bool enable) { if (off || !initialized) return -ENODEV; use_deepest_state = enable; } if (!drv || !dev || !dev->enabled) return -EBUSY; /** * cpuidle_find_deepest_state - Find the state of the greatest exit latency. * @drv: cpuidle driver for a given CPU. * @dev: cpuidle device for a given CPU. */ static int cpuidle_find_deepest_state(struct cpuidle_driver *drv, struct cpuidle_device *dev) { unsigned int latency_req = 0; int i, ret = CPUIDLE_DRIVER_STATE_START - 1; return 0; for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; struct cpuidle_state_usage *su = &dev->states_usage[i]; if (s->disabled || su->disable || s->exit_latency <= latency_req) continue; latency_req = s->exit_latency; ret = i; } return ret; } /** Loading Loading @@ -138,6 +158,15 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, */ int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) { if (off || !initialized) return -ENODEV; if (!drv || !dev || !dev->enabled) return -EBUSY; if (unlikely(use_deepest_state)) return cpuidle_find_deepest_state(drv, dev); return cpuidle_curr_governor->select(drv, dev); } Loading Loading @@ -169,7 +198,7 @@ int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ void cpuidle_reflect(struct cpuidle_device *dev, int index) { if (cpuidle_curr_governor->reflect) if (cpuidle_curr_governor->reflect && !unlikely(use_deepest_state)) cpuidle_curr_governor->reflect(dev, index); } Loading