Loading Documentation/DocBook/genericirq.tmpl +52 −32 Original line number Diff line number Diff line Loading @@ -28,7 +28,7 @@ </authorgroup> <copyright> <year>2005-2006</year> <year>2005-2010</year> <holder>Thomas Gleixner</holder> </copyright> <copyright> Loading Loading @@ -100,6 +100,10 @@ <listitem><para>Edge type</para></listitem> <listitem><para>Simple type</para></listitem> </itemizedlist> During the implementation we identified another type: <itemizedlist> <listitem><para>Fast EOI type</para></listitem> </itemizedlist> In the SMP world of the __do_IRQ() super-handler another type was identified: <itemizedlist> Loading Loading @@ -153,6 +157,7 @@ is still available. This leads to a kind of duality for the time being. Over time the new model should be used in more and more architectures, as it enables smaller and cleaner IRQ subsystems. It's deprecated for three years now and about to be removed. </para> </chapter> <chapter id="bugs"> Loading Loading @@ -217,6 +222,7 @@ <itemizedlist> <listitem><para>handle_level_irq</para></listitem> <listitem><para>handle_edge_irq</para></listitem> <listitem><para>handle_fasteoi_irq</para></listitem> <listitem><para>handle_simple_irq</para></listitem> <listitem><para>handle_percpu_irq</para></listitem> </itemizedlist> Loading @@ -233,33 +239,33 @@ are used by the default flow implementations. The following helper functions are implemented (simplified excerpt): <programlisting> default_enable(irq) default_enable(struct irq_data *data) { desc->chip->unmask(irq); desc->chip->irq_unmask(data); } default_disable(irq) default_disable(struct irq_data *data) { if (!delay_disable(irq)) desc->chip->mask(irq); if (!delay_disable(data)) desc->chip->irq_mask(data); } default_ack(irq) default_ack(struct irq_data *data) { chip->ack(irq); chip->irq_ack(data); } default_mask_ack(irq) default_mask_ack(struct irq_data *data) { if (chip->mask_ack) { chip->mask_ack(irq); if (chip->irq_mask_ack) { chip->irq_mask_ack(data); } else { chip->mask(irq); chip->ack(irq); chip->irq_mask(data); chip->irq_ack(data); } } noop(irq) noop(struct irq_data *data)) { } Loading @@ -278,9 +284,24 @@ noop(irq) <para> The following control flow is implemented (simplified excerpt): <programlisting> desc->chip->start(); desc->chip->irq_mask(); handle_IRQ_event(desc->action); desc->chip->irq_unmask(); </programlisting> </para> </sect3> <sect3 id="Default_FASTEOI_IRQ_flow_handler"> <title>Default Fast EOI IRQ flow handler</title> <para> handle_fasteoi_irq provides a generic implementation for interrupts, which only need an EOI at the end of the handler </para> <para> The following control flow is implemented (simplified excerpt): <programlisting> handle_IRQ_event(desc->action); desc->chip->end(); desc->chip->irq_eoi(); </programlisting> </para> </sect3> Loading @@ -294,20 +315,19 @@ desc->chip->end(); The following control flow is implemented (simplified excerpt): <programlisting> if (desc->status & running) { desc->chip->hold(); desc->chip->irq_mask(); desc->status |= pending | masked; return; } desc->chip->start(); desc->chip->irq_ack(); desc->status |= running; do { if (desc->status & masked) desc->chip->enable(); desc->chip->irq_unmask(); desc->status &= ~pending; handle_IRQ_event(desc->action); } while (status & pending); desc->status &= ~running; desc->chip->end(); </programlisting> </para> </sect3> Loading Loading @@ -342,9 +362,9 @@ handle_IRQ_event(desc->action); <para> The following control flow is implemented (simplified excerpt): <programlisting> desc->chip->start(); handle_IRQ_event(desc->action); desc->chip->end(); if (desc->chip->irq_eoi) desc->chip->irq_eoi(); </programlisting> </para> </sect3> Loading Loading @@ -375,8 +395,7 @@ desc->chip->end(); mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when you want to use the delayed interrupt disable feature and your hardware is not capable of retriggering an interrupt.) The delayed interrupt disable can be runtime enabled, per interrupt, by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field. The delayed interrupt disable is not configurable. </para> </sect2> </sect1> Loading @@ -387,13 +406,13 @@ desc->chip->end(); contains all the direct chip relevant functions, which can be utilized by the irq flow implementations. <itemizedlist> <listitem><para>ack()</para></listitem> <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem> <listitem><para>mask()</para></listitem> <listitem><para>unmask()</para></listitem> <listitem><para>retrigger() - Optional</para></listitem> <listitem><para>set_type() - Optional</para></listitem> <listitem><para>set_wake() - Optional</para></listitem> <listitem><para>irq_ack()</para></listitem> <listitem><para>irq_mask_ack() - Optional, recommended for performance</para></listitem> <listitem><para>irq_mask()</para></listitem> <listitem><para>irq_unmask()</para></listitem> <listitem><para>irq_retrigger() - Optional</para></listitem> <listitem><para>irq_set_type() - Optional</para></listitem> <listitem><para>irq_set_wake() - Optional</para></listitem> </itemizedlist> These primitives are strictly intended to mean what they say: ack means ACK, masking means masking of an IRQ line, etc. It is up to the flow Loading Loading @@ -458,6 +477,7 @@ desc->chip->end(); <para> This chapter contains the autogenerated documentation of the internal functions. </para> !Ikernel/irq/irqdesc.c !Ikernel/irq/handle.c !Ikernel/irq/chip.c </chapter> Loading Documentation/DocBook/kernel-locking.tmpl +4 −10 Original line number Diff line number Diff line Loading @@ -1645,7 +1645,9 @@ the amount of locking which needs to be done. all the readers who were traversing the list when we deleted the element are finished. We use <function>call_rcu()</function> to register a callback which will actually destroy the object once the readers are finished. all pre-existing readers are finished. Alternatively, <function>synchronize_rcu()</function> may be used to block until all pre-existing are finished. </para> <para> But how does Read Copy Update know when the readers are Loading Loading @@ -1714,7 +1716,7 @@ the amount of locking which needs to be done. - object_put(obj); + list_del_rcu(&obj->list); cache_num--; + call_rcu(&obj->rcu, cache_delete_rcu, obj); + call_rcu(&obj->rcu, cache_delete_rcu); } /* Must be holding cache_lock */ Loading @@ -1725,14 +1727,6 @@ the amount of locking which needs to be done. if (++cache_num > MAX_CACHE_SIZE) { struct object *i, *outcast = NULL; list_for_each_entry(i, &cache, list) { @@ -85,6 +94,7 @@ obj->popularity = 0; atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ spin_lock_init(&obj->lock); + INIT_RCU_HEAD(&obj->rcu); spin_lock_irqsave(&cache_lock, flags); __cache_add(obj); @@ -104,12 +114,11 @@ struct object *cache_find(int id) { Loading Documentation/RCU/checklist.txt +39 −7 Original line number Diff line number Diff line Loading @@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome! include: a. Keeping a count of the number of data-structure elements used by the RCU-protected data structure, including those waiting for a grace period to elapse. Enforce a limit on this number, stalling updates as needed to allow previously deferred frees to complete. Alternatively, limit only the number awaiting deferred free rather than the total number of elements. used by the RCU-protected data structure, including those waiting for a grace period to elapse. Enforce a limit on this number, stalling updates as needed to allow previously deferred frees to complete. Alternatively, limit only the number awaiting deferred free rather than the total number of elements. One way to stall the updates is to acquire the update-side mutex. (Don't try this with a spinlock -- other CPUs spinning on the lock could prevent the grace period from ever ending.) Another way to stall the updates is for the updates to use a wrapper function around the memory allocator, so that this wrapper function simulates OOM when there is too much memory awaiting an RCU grace period. There are of course many other variations on this theme. b. Limiting update rate. For example, if updates occur only once per hour, then no explicit rate limiting is required, Loading Loading @@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome! and the compiler to freely reorder code into and out of RCU read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. 17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the __rcu sparse checks to validate your RCU code. These can help find problems as follows: CONFIG_PROVE_RCU: check that accesses to RCU-protected data structures are carried out under the proper RCU read-side critical section, while holding the right combination of locks, or whatever other conditions are appropriate. CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the same object to call_rcu() (or friends) before an RCU grace period has elapsed since the last time that you passed that same object to call_rcu() (or friends). __rcu sparse checks: tag the pointer to the RCU-protected data structure with __rcu, and sparse will warn you if you access that pointer without the services of one of the variants of rcu_dereference(). These debugging aids can help you find problems that are otherwise extremely difficult to spot. Documentation/RCU/stallwarn.txt +18 −0 Original line number Diff line number Diff line Loading @@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel without invoking schedule(). o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, in which case the next RCU grace period can never complete, which will eventually cause the system to run out of memory and hang. While the system is in the process of running itself out of memory, you might see stall-warning messages. o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that is running at a higher priority than the RCU softirq threads. This will prevent RCU callbacks from ever being invoked, and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent RCU grace periods from ever completing. Either way, the system will eventually run out of memory and hang. In the CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning messages. o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred Loading Documentation/RCU/trace.txt +12 −1 Original line number Diff line number Diff line Loading @@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. o "ci" is the number of RCU callbacks that have been invoked for this CPU. Note that ci+ql is the number of callbacks that have been registered in absence of CPU-hotplug activity. o "co" is the number of RCU callbacks that have been orphaned due to this CPU going offline. o "ca" is the number of RCU callbacks that have been adopted due to other CPUs going offline. Note that ci+co-ca+ql is the number of RCU callbacks registered on this CPU. There is also an rcu/rcudata.csv file with the same information in comma-separated-variable spreadsheet format. Loading Loading @@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s o "jfq" is the number of jiffies remaining for this grace period before force_quiescent_state() is invoked to help push things along. Note that CPUs in dyntick-idle mode thoughout the grace along. Note that CPUs in dyntick-idle mode throughout the grace period will not report on their own, but rather must be check by some other CPU via force_quiescent_state(). Loading Loading
Documentation/DocBook/genericirq.tmpl +52 −32 Original line number Diff line number Diff line Loading @@ -28,7 +28,7 @@ </authorgroup> <copyright> <year>2005-2006</year> <year>2005-2010</year> <holder>Thomas Gleixner</holder> </copyright> <copyright> Loading Loading @@ -100,6 +100,10 @@ <listitem><para>Edge type</para></listitem> <listitem><para>Simple type</para></listitem> </itemizedlist> During the implementation we identified another type: <itemizedlist> <listitem><para>Fast EOI type</para></listitem> </itemizedlist> In the SMP world of the __do_IRQ() super-handler another type was identified: <itemizedlist> Loading Loading @@ -153,6 +157,7 @@ is still available. This leads to a kind of duality for the time being. Over time the new model should be used in more and more architectures, as it enables smaller and cleaner IRQ subsystems. It's deprecated for three years now and about to be removed. </para> </chapter> <chapter id="bugs"> Loading Loading @@ -217,6 +222,7 @@ <itemizedlist> <listitem><para>handle_level_irq</para></listitem> <listitem><para>handle_edge_irq</para></listitem> <listitem><para>handle_fasteoi_irq</para></listitem> <listitem><para>handle_simple_irq</para></listitem> <listitem><para>handle_percpu_irq</para></listitem> </itemizedlist> Loading @@ -233,33 +239,33 @@ are used by the default flow implementations. The following helper functions are implemented (simplified excerpt): <programlisting> default_enable(irq) default_enable(struct irq_data *data) { desc->chip->unmask(irq); desc->chip->irq_unmask(data); } default_disable(irq) default_disable(struct irq_data *data) { if (!delay_disable(irq)) desc->chip->mask(irq); if (!delay_disable(data)) desc->chip->irq_mask(data); } default_ack(irq) default_ack(struct irq_data *data) { chip->ack(irq); chip->irq_ack(data); } default_mask_ack(irq) default_mask_ack(struct irq_data *data) { if (chip->mask_ack) { chip->mask_ack(irq); if (chip->irq_mask_ack) { chip->irq_mask_ack(data); } else { chip->mask(irq); chip->ack(irq); chip->irq_mask(data); chip->irq_ack(data); } } noop(irq) noop(struct irq_data *data)) { } Loading @@ -278,9 +284,24 @@ noop(irq) <para> The following control flow is implemented (simplified excerpt): <programlisting> desc->chip->start(); desc->chip->irq_mask(); handle_IRQ_event(desc->action); desc->chip->irq_unmask(); </programlisting> </para> </sect3> <sect3 id="Default_FASTEOI_IRQ_flow_handler"> <title>Default Fast EOI IRQ flow handler</title> <para> handle_fasteoi_irq provides a generic implementation for interrupts, which only need an EOI at the end of the handler </para> <para> The following control flow is implemented (simplified excerpt): <programlisting> handle_IRQ_event(desc->action); desc->chip->end(); desc->chip->irq_eoi(); </programlisting> </para> </sect3> Loading @@ -294,20 +315,19 @@ desc->chip->end(); The following control flow is implemented (simplified excerpt): <programlisting> if (desc->status & running) { desc->chip->hold(); desc->chip->irq_mask(); desc->status |= pending | masked; return; } desc->chip->start(); desc->chip->irq_ack(); desc->status |= running; do { if (desc->status & masked) desc->chip->enable(); desc->chip->irq_unmask(); desc->status &= ~pending; handle_IRQ_event(desc->action); } while (status & pending); desc->status &= ~running; desc->chip->end(); </programlisting> </para> </sect3> Loading Loading @@ -342,9 +362,9 @@ handle_IRQ_event(desc->action); <para> The following control flow is implemented (simplified excerpt): <programlisting> desc->chip->start(); handle_IRQ_event(desc->action); desc->chip->end(); if (desc->chip->irq_eoi) desc->chip->irq_eoi(); </programlisting> </para> </sect3> Loading Loading @@ -375,8 +395,7 @@ desc->chip->end(); mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when you want to use the delayed interrupt disable feature and your hardware is not capable of retriggering an interrupt.) The delayed interrupt disable can be runtime enabled, per interrupt, by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field. The delayed interrupt disable is not configurable. </para> </sect2> </sect1> Loading @@ -387,13 +406,13 @@ desc->chip->end(); contains all the direct chip relevant functions, which can be utilized by the irq flow implementations. <itemizedlist> <listitem><para>ack()</para></listitem> <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem> <listitem><para>mask()</para></listitem> <listitem><para>unmask()</para></listitem> <listitem><para>retrigger() - Optional</para></listitem> <listitem><para>set_type() - Optional</para></listitem> <listitem><para>set_wake() - Optional</para></listitem> <listitem><para>irq_ack()</para></listitem> <listitem><para>irq_mask_ack() - Optional, recommended for performance</para></listitem> <listitem><para>irq_mask()</para></listitem> <listitem><para>irq_unmask()</para></listitem> <listitem><para>irq_retrigger() - Optional</para></listitem> <listitem><para>irq_set_type() - Optional</para></listitem> <listitem><para>irq_set_wake() - Optional</para></listitem> </itemizedlist> These primitives are strictly intended to mean what they say: ack means ACK, masking means masking of an IRQ line, etc. It is up to the flow Loading Loading @@ -458,6 +477,7 @@ desc->chip->end(); <para> This chapter contains the autogenerated documentation of the internal functions. </para> !Ikernel/irq/irqdesc.c !Ikernel/irq/handle.c !Ikernel/irq/chip.c </chapter> Loading
Documentation/DocBook/kernel-locking.tmpl +4 −10 Original line number Diff line number Diff line Loading @@ -1645,7 +1645,9 @@ the amount of locking which needs to be done. all the readers who were traversing the list when we deleted the element are finished. We use <function>call_rcu()</function> to register a callback which will actually destroy the object once the readers are finished. all pre-existing readers are finished. Alternatively, <function>synchronize_rcu()</function> may be used to block until all pre-existing are finished. </para> <para> But how does Read Copy Update know when the readers are Loading Loading @@ -1714,7 +1716,7 @@ the amount of locking which needs to be done. - object_put(obj); + list_del_rcu(&obj->list); cache_num--; + call_rcu(&obj->rcu, cache_delete_rcu, obj); + call_rcu(&obj->rcu, cache_delete_rcu); } /* Must be holding cache_lock */ Loading @@ -1725,14 +1727,6 @@ the amount of locking which needs to be done. if (++cache_num > MAX_CACHE_SIZE) { struct object *i, *outcast = NULL; list_for_each_entry(i, &cache, list) { @@ -85,6 +94,7 @@ obj->popularity = 0; atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ spin_lock_init(&obj->lock); + INIT_RCU_HEAD(&obj->rcu); spin_lock_irqsave(&cache_lock, flags); __cache_add(obj); @@ -104,12 +114,11 @@ struct object *cache_find(int id) { Loading
Documentation/RCU/checklist.txt +39 −7 Original line number Diff line number Diff line Loading @@ -218,13 +218,22 @@ over a rather long period of time, but improvements are always welcome! include: a. Keeping a count of the number of data-structure elements used by the RCU-protected data structure, including those waiting for a grace period to elapse. Enforce a limit on this number, stalling updates as needed to allow previously deferred frees to complete. Alternatively, limit only the number awaiting deferred free rather than the total number of elements. used by the RCU-protected data structure, including those waiting for a grace period to elapse. Enforce a limit on this number, stalling updates as needed to allow previously deferred frees to complete. Alternatively, limit only the number awaiting deferred free rather than the total number of elements. One way to stall the updates is to acquire the update-side mutex. (Don't try this with a spinlock -- other CPUs spinning on the lock could prevent the grace period from ever ending.) Another way to stall the updates is for the updates to use a wrapper function around the memory allocator, so that this wrapper function simulates OOM when there is too much memory awaiting an RCU grace period. There are of course many other variations on this theme. b. Limiting update rate. For example, if updates occur only once per hour, then no explicit rate limiting is required, Loading Loading @@ -365,3 +374,26 @@ over a rather long period of time, but improvements are always welcome! and the compiler to freely reorder code into and out of RCU read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. 17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the __rcu sparse checks to validate your RCU code. These can help find problems as follows: CONFIG_PROVE_RCU: check that accesses to RCU-protected data structures are carried out under the proper RCU read-side critical section, while holding the right combination of locks, or whatever other conditions are appropriate. CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the same object to call_rcu() (or friends) before an RCU grace period has elapsed since the last time that you passed that same object to call_rcu() (or friends). __rcu sparse checks: tag the pointer to the RCU-protected data structure with __rcu, and sparse will warn you if you access that pointer without the services of one of the variants of rcu_dereference(). These debugging aids can help you find problems that are otherwise extremely difficult to spot.
Documentation/RCU/stallwarn.txt +18 −0 Original line number Diff line number Diff line Loading @@ -80,6 +80,24 @@ o A CPU looping with bottom halves disabled. This condition can o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel without invoking schedule(). o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, in which case the next RCU grace period can never complete, which will eventually cause the system to run out of memory and hang. While the system is in the process of running itself out of memory, you might see stall-warning messages. o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that is running at a higher priority than the RCU softirq threads. This will prevent RCU callbacks from ever being invoked, and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent RCU grace periods from ever completing. Either way, the system will eventually run out of memory and hang. In the CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning messages. o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred Loading
Documentation/RCU/trace.txt +12 −1 Original line number Diff line number Diff line Loading @@ -125,6 +125,17 @@ o "b" is the batch limit for this CPU. If more than this number of RCU callbacks is ready to invoke, then the remainder will be deferred. o "ci" is the number of RCU callbacks that have been invoked for this CPU. Note that ci+ql is the number of callbacks that have been registered in absence of CPU-hotplug activity. o "co" is the number of RCU callbacks that have been orphaned due to this CPU going offline. o "ca" is the number of RCU callbacks that have been adopted due to other CPUs going offline. Note that ci+co-ca+ql is the number of RCU callbacks registered on this CPU. There is also an rcu/rcudata.csv file with the same information in comma-separated-variable spreadsheet format. Loading Loading @@ -180,7 +191,7 @@ o "s" is the "signaled" state that drives force_quiescent_state()'s o "jfq" is the number of jiffies remaining for this grace period before force_quiescent_state() is invoked to help push things along. Note that CPUs in dyntick-idle mode thoughout the grace along. Note that CPUs in dyntick-idle mode throughout the grace period will not report on their own, but rather must be check by some other CPU via force_quiescent_state(). Loading