Loading Documentation/power/suspend-and-interrupts.txt 0 → 100644 +123 −0 Original line number Diff line number Diff line System Suspend and Device Interrupts Copyright (C) 2014 Intel Corp. Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Suspending and Resuming Device IRQs ----------------------------------- Device interrupt request lines (IRQs) are generally disabled during system suspend after the "late" phase of suspending devices (that is, after all of the ->prepare, ->suspend and ->suspend_late callbacks have been executed for all devices). That is done by suspend_device_irqs(). The rationale for doing so is that after the "late" phase of device suspend there is no legitimate reason why any interrupts from suspended devices should trigger and if any devices have not been suspended properly yet, it is better to block interrupts from them anyway. Also, in the past we had problems with interrupt handlers for shared IRQs that device drivers implementing them were not prepared for interrupts triggering after their devices had been suspended. In some cases they would attempt to access, for example, memory address spaces of suspended devices and cause unpredictable behavior to ensue as a result. Unfortunately, such problems are very difficult to debug and the introduction of suspend_device_irqs(), along with the "noirq" phase of device suspend and resume, was the only practical way to mitigate them. Device IRQs are re-enabled during system resume, right before the "early" phase of resuming devices (that is, before starting to execute ->resume_early callbacks for devices). The function doing that is resume_device_irqs(). The IRQF_NO_SUSPEND Flag ------------------------ There are interrupts that can legitimately trigger during the entire system suspend-resume cycle, including the "noirq" phases of suspending and resuming devices as well as during the time when nonboot CPUs are taken offline and brought back online. That applies to timer interrupts in the first place, but also to IPIs and to some other special-purpose interrupts. The IRQF_NO_SUSPEND flag is used to indicate that to the IRQ subsystem when requesting a special-purpose interrupt. It causes suspend_device_irqs() to leave the corresponding IRQ enabled so as to allow the interrupt to work all the time as expected. Note that the IRQF_NO_SUSPEND flag affects the entire IRQ and not just one user of it. Thus, if the IRQ is shared, all of the interrupt handlers installed for it will be executed as usual after suspend_device_irqs(), even if the IRQF_NO_SUSPEND flag was not passed to request_irq() (or equivalent) by some of the IRQ's users. For this reason, using IRQF_NO_SUSPEND and IRQF_SHARED at the same time should be avoided. System Wakeup Interrupts, enable_irq_wake() and disable_irq_wake() ------------------------------------------------------------------ System wakeup interrupts generally need to be configured to wake up the system from sleep states, especially if they are used for different purposes (e.g. as I/O interrupts) in the working state. That may involve turning on a special signal handling logic within the platform (such as an SoC) so that signals from a given line are routed in a different way during system sleep so as to trigger a system wakeup when needed. For example, the platform may include a dedicated interrupt controller used specifically for handling system wakeup events. Then, if a given interrupt line is supposed to wake up the system from sleep sates, the corresponding input of that interrupt controller needs to be enabled to receive signals from the line in question. After wakeup, it generally is better to disable that input to prevent the dedicated controller from triggering interrupts unnecessarily. The IRQ subsystem provides two helper functions to be used by device drivers for those purposes. Namely, enable_irq_wake() turns on the platform's logic for handling the given IRQ as a system wakeup interrupt line and disable_irq_wake() turns that logic off. Calling enable_irq_wake() causes suspend_device_irqs() to treat the given IRQ in a special way. Namely, the IRQ remains enabled, by on the first interrupt it will be disabled, marked as pending and "suspended" so that it will be re-enabled by resume_device_irqs() during the subsequent system resume. Also the PM core is notified about the event which casues the system suspend in progress to be aborted (that doesn't have to happen immediately, but at one of the points where the suspend thread looks for pending wakeup events). This way every interrupt from a wakeup interrupt source will either cause the system suspend currently in progress to be aborted or wake up the system if already suspended. However, after suspend_device_irqs() interrupt handlers are not executed for system wakeup IRQs. They are only executed for IRQF_NO_SUSPEND IRQs at that time, but those IRQs should not be configured for system wakeup using enable_irq_wake(). Interrupts and Suspend-to-Idle ------------------------------ Suspend-to-idle (also known as the "freeze" sleep state) is a relatively new system sleep state that works by idling all of the processors and waiting for interrupts right after the "noirq" phase of suspending devices. Of course, this means that all of the interrupts with the IRQF_NO_SUSPEND flag set will bring CPUs out of idle while in that state, but they will not cause the IRQ subsystem to trigger a system wakeup. System wakeup interrupts, in turn, will trigger wakeup from suspend-to-idle in analogy with what they do in the full system suspend case. The only difference is that the wakeup from suspend-to-idle is signaled using the usual working state interrupt delivery mechanisms and doesn't require the platform to use any special interrupt handling logic for it to work. IRQF_NO_SUSPEND and enable_irq_wake() ------------------------------------- There are no valid reasons to use both enable_irq_wake() and the IRQF_NO_SUSPEND flag on the same IRQ. First of all, if the IRQ is not shared, the rules for handling IRQF_NO_SUSPEND interrupts (interrupt handlers are invoked after suspend_device_irqs()) are directly at odds with the rules for handling system wakeup interrupts (interrupt handlers are not invoked after suspend_device_irqs()). Second, both enable_irq_wake() and IRQF_NO_SUSPEND apply to entire IRQs and not to individual interrupt handlers, so sharing an IRQ between a system wakeup interrupt source and an IRQF_NO_SUSPEND interrupt source does not make sense. arch/x86/kernel/apic/io_apic.c +5 −0 Original line number Diff line number Diff line Loading @@ -2623,6 +2623,7 @@ static struct irq_chip ioapic_chip __read_mostly = { .irq_eoi = ack_apic_level, .irq_set_affinity = native_ioapic_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; static inline void init_IO_APIC_traps(void) Loading Loading @@ -3173,6 +3174,7 @@ static struct irq_chip msi_chip = { .irq_ack = ack_apic_edge, .irq_set_affinity = msi_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, Loading Loading @@ -3271,6 +3273,7 @@ static struct irq_chip dmar_msi_type = { .irq_ack = ack_apic_edge, .irq_set_affinity = dmar_msi_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int arch_setup_dmar_msi(unsigned int irq) Loading Loading @@ -3321,6 +3324,7 @@ static struct irq_chip hpet_msi_type = { .irq_ack = ack_apic_edge, .irq_set_affinity = hpet_msi_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int default_setup_hpet_msi(unsigned int irq, unsigned int id) Loading Loading @@ -3384,6 +3388,7 @@ static struct irq_chip ht_irq_chip = { .irq_ack = ack_apic_edge, .irq_set_affinity = ht_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev) Loading drivers/base/power/wakeup.c +15 −1 Original line number Diff line number Diff line Loading @@ -24,6 +24,9 @@ */ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ static bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. * They need to be modified together atomically, so it's better to use one Loading Loading @@ -719,7 +722,18 @@ bool pm_wakeup_pending(void) pm_print_active_wakeup_sources(); } return ret; return ret || pm_abort_suspend; } void pm_system_wakeup(void) { pm_abort_suspend = true; freeze_wake(); } void pm_wakeup_clear(void) { pm_abort_suspend = false; } /** Loading drivers/base/syscore.c +3 −4 Original line number Diff line number Diff line Loading @@ -9,7 +9,7 @@ #include <linux/syscore_ops.h> #include <linux/mutex.h> #include <linux/module.h> #include <linux/interrupt.h> #include <linux/suspend.h> #include <trace/events/power.h> static LIST_HEAD(syscore_ops_list); Loading Loading @@ -54,9 +54,8 @@ int syscore_suspend(void) pr_debug("Checking wakeup interrupts\n"); /* Return error code if there are any wakeup interrupts pending. */ ret = check_wakeup_irqs(); if (ret) return ret; if (pm_wakeup_pending()) return -EBUSY; WARN_ONCE(!irqs_disabled(), "Interrupts enabled before system core suspend.\n"); Loading drivers/pci/pcie/pme.c +51 −10 Original line number Diff line number Diff line Loading @@ -41,11 +41,17 @@ static int __init pcie_pme_setup(char *str) } __setup("pcie_pme=", pcie_pme_setup); enum pme_suspend_level { PME_SUSPEND_NONE = 0, PME_SUSPEND_WAKEUP, PME_SUSPEND_NOIRQ, }; struct pcie_pme_service_data { spinlock_t lock; struct pcie_device *srv; struct work_struct work; bool noirq; /* Don't enable the PME interrupt used by this service. */ enum pme_suspend_level suspend_level; }; /** Loading Loading @@ -223,7 +229,7 @@ static void pcie_pme_work_fn(struct work_struct *work) spin_lock_irq(&data->lock); for (;;) { if (data->noirq) if (data->suspend_level != PME_SUSPEND_NONE) break; pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); Loading @@ -250,7 +256,7 @@ static void pcie_pme_work_fn(struct work_struct *work) spin_lock_irq(&data->lock); } if (!data->noirq) if (data->suspend_level == PME_SUSPEND_NONE) pcie_pme_interrupt_enable(port, true); spin_unlock_irq(&data->lock); Loading Loading @@ -367,6 +373,21 @@ static int pcie_pme_probe(struct pcie_device *srv) return ret; } static bool pcie_pme_check_wakeup(struct pci_bus *bus) { struct pci_dev *dev; if (!bus) return false; list_for_each_entry(dev, &bus->devices, bus_list) if (device_may_wakeup(&dev->dev) || pcie_pme_check_wakeup(dev->subordinate)) return true; return false; } /** * pcie_pme_suspend - Suspend PCIe PME service device. * @srv: PCIe service device to suspend. Loading @@ -375,11 +396,26 @@ static int pcie_pme_suspend(struct pcie_device *srv) { struct pcie_pme_service_data *data = get_service_data(srv); struct pci_dev *port = srv->port; bool wakeup; if (device_may_wakeup(&port->dev)) { wakeup = true; } else { down_read(&pci_bus_sem); wakeup = pcie_pme_check_wakeup(port->subordinate); up_read(&pci_bus_sem); } spin_lock_irq(&data->lock); if (wakeup) { enable_irq_wake(srv->irq); data->suspend_level = PME_SUSPEND_WAKEUP; } else { struct pci_dev *port = srv->port; pcie_pme_interrupt_enable(port, false); pcie_clear_root_pme_status(port); data->noirq = true; data->suspend_level = PME_SUSPEND_NOIRQ; } spin_unlock_irq(&data->lock); synchronize_irq(srv->irq); Loading @@ -394,12 +430,17 @@ static int pcie_pme_suspend(struct pcie_device *srv) static int pcie_pme_resume(struct pcie_device *srv) { struct pcie_pme_service_data *data = get_service_data(srv); struct pci_dev *port = srv->port; spin_lock_irq(&data->lock); data->noirq = false; if (data->suspend_level == PME_SUSPEND_NOIRQ) { struct pci_dev *port = srv->port; pcie_clear_root_pme_status(port); pcie_pme_interrupt_enable(port, true); } else { disable_irq_wake(srv->irq); } data->suspend_level = PME_SUSPEND_NONE; spin_unlock_irq(&data->lock); return 0; Loading Loading
Documentation/power/suspend-and-interrupts.txt 0 → 100644 +123 −0 Original line number Diff line number Diff line System Suspend and Device Interrupts Copyright (C) 2014 Intel Corp. Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Suspending and Resuming Device IRQs ----------------------------------- Device interrupt request lines (IRQs) are generally disabled during system suspend after the "late" phase of suspending devices (that is, after all of the ->prepare, ->suspend and ->suspend_late callbacks have been executed for all devices). That is done by suspend_device_irqs(). The rationale for doing so is that after the "late" phase of device suspend there is no legitimate reason why any interrupts from suspended devices should trigger and if any devices have not been suspended properly yet, it is better to block interrupts from them anyway. Also, in the past we had problems with interrupt handlers for shared IRQs that device drivers implementing them were not prepared for interrupts triggering after their devices had been suspended. In some cases they would attempt to access, for example, memory address spaces of suspended devices and cause unpredictable behavior to ensue as a result. Unfortunately, such problems are very difficult to debug and the introduction of suspend_device_irqs(), along with the "noirq" phase of device suspend and resume, was the only practical way to mitigate them. Device IRQs are re-enabled during system resume, right before the "early" phase of resuming devices (that is, before starting to execute ->resume_early callbacks for devices). The function doing that is resume_device_irqs(). The IRQF_NO_SUSPEND Flag ------------------------ There are interrupts that can legitimately trigger during the entire system suspend-resume cycle, including the "noirq" phases of suspending and resuming devices as well as during the time when nonboot CPUs are taken offline and brought back online. That applies to timer interrupts in the first place, but also to IPIs and to some other special-purpose interrupts. The IRQF_NO_SUSPEND flag is used to indicate that to the IRQ subsystem when requesting a special-purpose interrupt. It causes suspend_device_irqs() to leave the corresponding IRQ enabled so as to allow the interrupt to work all the time as expected. Note that the IRQF_NO_SUSPEND flag affects the entire IRQ and not just one user of it. Thus, if the IRQ is shared, all of the interrupt handlers installed for it will be executed as usual after suspend_device_irqs(), even if the IRQF_NO_SUSPEND flag was not passed to request_irq() (or equivalent) by some of the IRQ's users. For this reason, using IRQF_NO_SUSPEND and IRQF_SHARED at the same time should be avoided. System Wakeup Interrupts, enable_irq_wake() and disable_irq_wake() ------------------------------------------------------------------ System wakeup interrupts generally need to be configured to wake up the system from sleep states, especially if they are used for different purposes (e.g. as I/O interrupts) in the working state. That may involve turning on a special signal handling logic within the platform (such as an SoC) so that signals from a given line are routed in a different way during system sleep so as to trigger a system wakeup when needed. For example, the platform may include a dedicated interrupt controller used specifically for handling system wakeup events. Then, if a given interrupt line is supposed to wake up the system from sleep sates, the corresponding input of that interrupt controller needs to be enabled to receive signals from the line in question. After wakeup, it generally is better to disable that input to prevent the dedicated controller from triggering interrupts unnecessarily. The IRQ subsystem provides two helper functions to be used by device drivers for those purposes. Namely, enable_irq_wake() turns on the platform's logic for handling the given IRQ as a system wakeup interrupt line and disable_irq_wake() turns that logic off. Calling enable_irq_wake() causes suspend_device_irqs() to treat the given IRQ in a special way. Namely, the IRQ remains enabled, by on the first interrupt it will be disabled, marked as pending and "suspended" so that it will be re-enabled by resume_device_irqs() during the subsequent system resume. Also the PM core is notified about the event which casues the system suspend in progress to be aborted (that doesn't have to happen immediately, but at one of the points where the suspend thread looks for pending wakeup events). This way every interrupt from a wakeup interrupt source will either cause the system suspend currently in progress to be aborted or wake up the system if already suspended. However, after suspend_device_irqs() interrupt handlers are not executed for system wakeup IRQs. They are only executed for IRQF_NO_SUSPEND IRQs at that time, but those IRQs should not be configured for system wakeup using enable_irq_wake(). Interrupts and Suspend-to-Idle ------------------------------ Suspend-to-idle (also known as the "freeze" sleep state) is a relatively new system sleep state that works by idling all of the processors and waiting for interrupts right after the "noirq" phase of suspending devices. Of course, this means that all of the interrupts with the IRQF_NO_SUSPEND flag set will bring CPUs out of idle while in that state, but they will not cause the IRQ subsystem to trigger a system wakeup. System wakeup interrupts, in turn, will trigger wakeup from suspend-to-idle in analogy with what they do in the full system suspend case. The only difference is that the wakeup from suspend-to-idle is signaled using the usual working state interrupt delivery mechanisms and doesn't require the platform to use any special interrupt handling logic for it to work. IRQF_NO_SUSPEND and enable_irq_wake() ------------------------------------- There are no valid reasons to use both enable_irq_wake() and the IRQF_NO_SUSPEND flag on the same IRQ. First of all, if the IRQ is not shared, the rules for handling IRQF_NO_SUSPEND interrupts (interrupt handlers are invoked after suspend_device_irqs()) are directly at odds with the rules for handling system wakeup interrupts (interrupt handlers are not invoked after suspend_device_irqs()). Second, both enable_irq_wake() and IRQF_NO_SUSPEND apply to entire IRQs and not to individual interrupt handlers, so sharing an IRQ between a system wakeup interrupt source and an IRQF_NO_SUSPEND interrupt source does not make sense.
arch/x86/kernel/apic/io_apic.c +5 −0 Original line number Diff line number Diff line Loading @@ -2623,6 +2623,7 @@ static struct irq_chip ioapic_chip __read_mostly = { .irq_eoi = ack_apic_level, .irq_set_affinity = native_ioapic_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; static inline void init_IO_APIC_traps(void) Loading Loading @@ -3173,6 +3174,7 @@ static struct irq_chip msi_chip = { .irq_ack = ack_apic_edge, .irq_set_affinity = msi_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, Loading Loading @@ -3271,6 +3273,7 @@ static struct irq_chip dmar_msi_type = { .irq_ack = ack_apic_edge, .irq_set_affinity = dmar_msi_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int arch_setup_dmar_msi(unsigned int irq) Loading Loading @@ -3321,6 +3324,7 @@ static struct irq_chip hpet_msi_type = { .irq_ack = ack_apic_edge, .irq_set_affinity = hpet_msi_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int default_setup_hpet_msi(unsigned int irq, unsigned int id) Loading Loading @@ -3384,6 +3388,7 @@ static struct irq_chip ht_irq_chip = { .irq_ack = ack_apic_edge, .irq_set_affinity = ht_set_affinity, .irq_retrigger = ioapic_retrigger_irq, .flags = IRQCHIP_SKIP_SET_WAKE, }; int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev) Loading
drivers/base/power/wakeup.c +15 −1 Original line number Diff line number Diff line Loading @@ -24,6 +24,9 @@ */ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ static bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. * They need to be modified together atomically, so it's better to use one Loading Loading @@ -719,7 +722,18 @@ bool pm_wakeup_pending(void) pm_print_active_wakeup_sources(); } return ret; return ret || pm_abort_suspend; } void pm_system_wakeup(void) { pm_abort_suspend = true; freeze_wake(); } void pm_wakeup_clear(void) { pm_abort_suspend = false; } /** Loading
drivers/base/syscore.c +3 −4 Original line number Diff line number Diff line Loading @@ -9,7 +9,7 @@ #include <linux/syscore_ops.h> #include <linux/mutex.h> #include <linux/module.h> #include <linux/interrupt.h> #include <linux/suspend.h> #include <trace/events/power.h> static LIST_HEAD(syscore_ops_list); Loading Loading @@ -54,9 +54,8 @@ int syscore_suspend(void) pr_debug("Checking wakeup interrupts\n"); /* Return error code if there are any wakeup interrupts pending. */ ret = check_wakeup_irqs(); if (ret) return ret; if (pm_wakeup_pending()) return -EBUSY; WARN_ONCE(!irqs_disabled(), "Interrupts enabled before system core suspend.\n"); Loading
drivers/pci/pcie/pme.c +51 −10 Original line number Diff line number Diff line Loading @@ -41,11 +41,17 @@ static int __init pcie_pme_setup(char *str) } __setup("pcie_pme=", pcie_pme_setup); enum pme_suspend_level { PME_SUSPEND_NONE = 0, PME_SUSPEND_WAKEUP, PME_SUSPEND_NOIRQ, }; struct pcie_pme_service_data { spinlock_t lock; struct pcie_device *srv; struct work_struct work; bool noirq; /* Don't enable the PME interrupt used by this service. */ enum pme_suspend_level suspend_level; }; /** Loading Loading @@ -223,7 +229,7 @@ static void pcie_pme_work_fn(struct work_struct *work) spin_lock_irq(&data->lock); for (;;) { if (data->noirq) if (data->suspend_level != PME_SUSPEND_NONE) break; pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); Loading @@ -250,7 +256,7 @@ static void pcie_pme_work_fn(struct work_struct *work) spin_lock_irq(&data->lock); } if (!data->noirq) if (data->suspend_level == PME_SUSPEND_NONE) pcie_pme_interrupt_enable(port, true); spin_unlock_irq(&data->lock); Loading Loading @@ -367,6 +373,21 @@ static int pcie_pme_probe(struct pcie_device *srv) return ret; } static bool pcie_pme_check_wakeup(struct pci_bus *bus) { struct pci_dev *dev; if (!bus) return false; list_for_each_entry(dev, &bus->devices, bus_list) if (device_may_wakeup(&dev->dev) || pcie_pme_check_wakeup(dev->subordinate)) return true; return false; } /** * pcie_pme_suspend - Suspend PCIe PME service device. * @srv: PCIe service device to suspend. Loading @@ -375,11 +396,26 @@ static int pcie_pme_suspend(struct pcie_device *srv) { struct pcie_pme_service_data *data = get_service_data(srv); struct pci_dev *port = srv->port; bool wakeup; if (device_may_wakeup(&port->dev)) { wakeup = true; } else { down_read(&pci_bus_sem); wakeup = pcie_pme_check_wakeup(port->subordinate); up_read(&pci_bus_sem); } spin_lock_irq(&data->lock); if (wakeup) { enable_irq_wake(srv->irq); data->suspend_level = PME_SUSPEND_WAKEUP; } else { struct pci_dev *port = srv->port; pcie_pme_interrupt_enable(port, false); pcie_clear_root_pme_status(port); data->noirq = true; data->suspend_level = PME_SUSPEND_NOIRQ; } spin_unlock_irq(&data->lock); synchronize_irq(srv->irq); Loading @@ -394,12 +430,17 @@ static int pcie_pme_suspend(struct pcie_device *srv) static int pcie_pme_resume(struct pcie_device *srv) { struct pcie_pme_service_data *data = get_service_data(srv); struct pci_dev *port = srv->port; spin_lock_irq(&data->lock); data->noirq = false; if (data->suspend_level == PME_SUSPEND_NOIRQ) { struct pci_dev *port = srv->port; pcie_clear_root_pme_status(port); pcie_pme_interrupt_enable(port, true); } else { disable_irq_wake(srv->irq); } data->suspend_level = PME_SUSPEND_NONE; spin_unlock_irq(&data->lock); return 0; Loading