Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 9b462d02 authored by Eric Dumazet's avatar Eric Dumazet Committed by David S. Miller
Browse files

tcp: TCP Small Queues and strange attractors

TCP Small queues tries to keep number of packets in qdisc
as small as possible, and depends on a tasklet to feed following
packets at TX completion time.
Choice of tasklet was driven by latencies requirements.

Then, TCP stack tries to avoid reorders, by locking flows with
outstanding packets in qdisc in a given TX queue.

What can happen is that many flows get attracted by a low performing
TX queue, and cpu servicing TX completion has to feed packets for all of
them, making this cpu 100% busy in softirq mode.

This became particularly visible with latest skb->xmit_more support

Strategy adopted in this patch is to detect when tcp_wfree() is called
from ksoftirqd and let the outstanding queue for this flow being drained
before feeding additional packets, so that skb->ooo_okay can be set
to allow select_queue() to select the optimal queue :

Incoming ACKS are normally handled by different cpus, so this patch
gives more chance for these cpus to take over the burd...
parent 82b009f9
Loading
Loading
Loading
Loading
+19 −7
Original line number Diff line number Diff line
@@ -839,26 +839,38 @@ void tcp_wfree(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;
	struct tcp_sock *tp = tcp_sk(sk);
	int wmem;

	/* Keep one reference on sk_wmem_alloc.
	 * Will be released by sk_free() from here or tcp_tasklet_func()
	 */
	wmem = atomic_sub_return(skb->truesize - 1, &sk->sk_wmem_alloc);

	/* If this softirq is serviced by ksoftirqd, we are likely under stress.
	 * Wait until our queues (qdisc + devices) are drained.
	 * This gives :
	 * - less callbacks to tcp_write_xmit(), reducing stress (batches)
	 * - chance for incoming ACK (processed by another cpu maybe)
	 *   to migrate this flow (skb->ooo_okay will be eventually set)
	 */
	if (wmem >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
		goto out;

	if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) &&
	    !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) {
		unsigned long flags;
		struct tsq_tasklet *tsq;

		/* Keep a ref on socket.
		 * This last ref will be released in tcp_tasklet_func()
		 */
		atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc);

		/* queue this socket to tasklet queue */
		local_irq_save(flags);
		tsq = &__get_cpu_var(tsq_tasklet);
		list_add(&tp->tsq_node, &tsq->head);
		tasklet_schedule(&tsq->tasklet);
		local_irq_restore(flags);
	} else {
		sock_wfree(skb);
		return;
	}
out:
	sk_free(sk);
}

/* This routine actually transmits TCP packets queued in by