Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPP-1923] TCP timer race #696

Open
vvalderrv opened this issue Jan 7, 2025 · 0 comments
Open

[VPP-1923] TCP timer race #696

vvalderrv opened this issue Jan 7, 2025 · 0 comments

Comments

@vvalderrv
Copy link
Contributor

The issue was reproduced using a test for Travelping UPF VPP plugin https://github.com/travelping/vpp

The test creates 40K TCP connections using wrk tool and also uses netem to simulate suboptimal network conditions.

The work is presently based on commit afc233aa93c3f23b30b756cb4ae2967f968bbbb1 and from May 21 2020 and wan't rebased for a while, but as far as I can see there were no changes in the workings of TCP timers in master since that time.

The stack trace looks like this:

/src/vpp/src/vnet/tcp/tcp_output.c:1268 (tcp_prepare_retransmit_segment) assertion `max_deq_bytes != 0' fails

Program received signal SIGABRT, Aborted.
0x00007ffff4615f47 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#0 0x00007ffff4615f47 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff46178b1 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00000000004071d3 in os_panic () at /src/vpp/src/vpp/vnet/main.c:366
#3 0x00007ffff54f6609 in debugger () at /src/vpp/src/vppinfra/error.c:84
#4 0x00007ffff54f6387 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff76b8570 "%s:%d (%s) assertion `%s' fails") at /src/vpp/src/vppinfra/error.c:143
#5 0x00007ffff7058893 in tcp_prepare_retransmit_segment (wrk=0x7fffb540c900, tc=0x7fffd86ccac0, offset=0, max_deq_bytes=0, b=0x7fffb4cd0190) at /src/vpp/src/vnet/tcp/tcp_output.c:1268
#6 0x00007ffff7057a83 in tcp_timer_retransmit_handler (tc=0x7fffd86ccac0) at /src/vpp/src/vnet/tcp/tcp_output.c:1419
#7 0x00007ffff70a0e5e in tcp_dispatch_pending_timers (wrk=0x7fffb540c900) at /src/vpp/src/vnet/tcp/tcp.c:1107
#8 0x00007ffff709eded in tcp_update_time (now=50.494292161111133, thread_index=0 '\000') at /src/vpp/src/vnet/tcp/tcp.c:1143
#9 0x00007ffff7418742 in transport_update_time (time_now=50.494292161111133, thread_index=0 '\000') at /src/vpp/src/vnet/session/transport.c:779
#10 0x00007ffff73fb63e in session_queue_node_fn (vm=0x7ffff63849c0 <vlib_global_main>, node=0x7fffb48ac640, frame=0x0) at /src/vpp/src/vnet/session/session_node.c:1377
#11 0x00007ffff60cc4d5 in dispatch_node (vm=0x7ffff63849c0 <vlib_global_main>, node=0x7fffb48ac640, type=VLIB_NODE_TYPE_INPUT, dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, last_time_stamp=8435112566995202) at /src/vpp/src/vlib/main.c:1235
#12 0x00007ffff60c6f75 in vlib_main_or_worker_loop (vm=0x7ffff63849c0 <vlib_global_main>, is_main=1) at /src/vpp/src/vlib/main.c:1815
#13 0x00007ffff60c8f4a in vlib_main_loop (vm=0x7ffff63849c0 <vlib_global_main>) at /src/vpp/src/vlib/main.c:1990
#14 0x00007ffff60c8d1f in vlib_main (vm=0x7ffff63849c0 <vlib_global_main>, input=0x7fffb4cd0fa8) at /src/vpp/src/vlib/main.c:2236
#15 0x00007ffff614f425 in thread0 (arg=140737324272064) at /src/vpp/src/vlib/unix/main.c:658
#16 0x00007ffff551a794 in clib_calljmp () at /src/vpp/src/vppinfra/longjmp.S:123
#17 0x00007fffffffd6b0 in ?? ()
#18 0x00007ffff614efb7 in vlib_unix_main (argc=61, argv=0x71d4d0) at /src/vpp/src/vlib/unix/main.c:730
#19 0x00000000004068b8 in main (argc=61, argv=0x71d4d0) at /src/vpp/src/vpp/vnet/main.c:291

The line numbers may be slightly off due to added debug print when this was captured, sorry about that, but I will describe the details of the issue in a moment.

At the end of this report, there's elog captured for that connection using aforementioned debug print added at the point of assertion.

What happens here:

  1. SYN is received for the connection which is created in SYN_RCVD state
  2. SYN-ACK is sent back to the client and retransmit timer is set / updated for the connection
  3. Client does not reply in time (due to netem dropping packets) and the retransmit timer is enqueued into wrk->pending_timers via tcp_expired_timers_dispatch():
    clib_fifo_add (wrk->pending_timers, expired_timers, n_expired);
  4. The timer handler is called
    tcp_timer_retransmit_handler (tcp_connection_t * tc)
  5. As the connection is in SYN_RCVD state, SYN+ACK is retransmitted

    vpp/src/vnet/tcp/tcp_output.c

    Lines 1410 to 1443 in 94dbf95

    TCP_EVT (TCP_EVT_CC_EVT, tc, 2);
    tc->rtt_ts = 0;
    /* Passive open establish timeout */
    if (tc->rto > TCP_ESTABLISH_TIME >> 1)
    {
    tcp_connection_set_state (tc, TCP_STATE_CLOSED);
    tcp_connection_timers_reset (tc);
    tcp_program_cleanup (wrk, tc);
    tcp_worker_stats_inc (wrk, tr_abort, 1);
    return;
    }
    if (PREDICT_FALSE (!vlib_buffer_alloc (vm, &bi, 1)))
    {
    tcp_timer_update (&wrk->timer_wheel, tc, TCP_TIMER_RETRANSMIT, 1);
    tcp_worker_stats_inc (wrk, no_buffer, 1);
    return;
    }
    tc->rto_boff += 1;
    if (tc->rto_boff > TCP_RTO_SYN_RETRIES)
    tc->rto = clib_min (tc->rto << 1, TCP_RTO_MAX);
    tcp_retransmit_timer_force_update (&wrk->timer_wheel, tc);
    b = vlib_get_buffer (vm, bi);
    tcp_init_buffer (vm, b);
    tcp_make_synack (tc, b);
    TCP_EVT (TCP_EVT_SYN_RXT, tc, 1);
    /* Retransmit timer already updated, just enqueue to output */
    tcp_enqueue_to_output (wrk, b, bi, tc->c_is_ip4);
  6. The timer is rearmed
  7. For several times, the process repeats itself (go to p.3)

Then, at some point, an ACK is received from the server after p.3 but before p.4. So the timer handler is already enqueued for retransmit, but the connection goes into ESTABLISHED state before this happens. This results in a retransmit attempt with no data enqueued here

n_bytes = tcp_prepare_retransmit_segment (wrk, tc, 0, n_bytes, &b);
 , n_bytes being 0, after which an assertion failure happens and VPP dies.

The problem here is that tcp_timer_reset() doesn't remove pending entries from the queue

tcp_timer_reset (tcp_timer_wheel_t * tw, tcp_connection_t * tc, u8 timer_id)
 and doing so is not actually easy. As a band-aid solution, I made the retransmit handler tolerate the zero-retransmit:

diff --git a/src/vnet/tcp/tcp_output.c b/src/vnet/tcp/tcp_output.c
index afc652029..5cbe36494 100644
— a/src/vnet/tcp/tcp_output.c
+++ b/src/vnet/tcp/tcp_output.c
@@ -1383,6 +1383,11 @@ tcp_timer_retransmit_handler (tcp_connection_t * tc)
/* Send the first unacked segment. If we're short on buffers, return

  • as soon as possible */
    n_bytes = clib_min (tc->snd_mss, tc->snd_nxt - tc->snd_una);
    + if (n_bytes == 0)
    + { + // late SYN+ACK retransmit timer firing + return; + }
    n_bytes = tcp_prepare_retransmit_segment (wrk, tc, 0, n_bytes, &b);
    if (!n_bytes)
    {

but that doesn't seem to be a right solution, as there may be other issues lurking due to assumption that the timers which are reset don't fire after that.

With this fix, TCP crashes for me later in this test with different diagnostics, which may or may not be related to timers. Maybe it's actually a bug in SVM FIFO, didn't investigate that enough yet (f->start_chunk being zero when f_update_ooo_deq()) is called:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff5e5d24f in f_update_ooo_deq (f=0x7fff0fb35880, start_pos=0,
end_pos=45) at /src/vpp/src/svm/svm_fifo.c:660
660 /src/vpp/src/svm/svm_fifo.c: No such file or directory.
#0 0x00007ffff5e5d24f in f_update_ooo_deq (f=0x7fff0fb35880, start_pos=0,
end_pos=45) at /src/vpp/src/svm/svm_fifo.c:660
#1 0x00007ffff5e5d186 in svm_fifo_peek (f=0x7fff0fb35880, offset=0, len=45,
Type <return> to continue, or q <return> to quit

d: Wed, 15 Jul 2020 14:37:38 GMT\r\nConnection: keep-alive\r\nETag: \"5f0f14b"...)
at /src/vpp/src/svm/svm_fifo.c:1054
#2 0x00007ffff740605e in session_tx_fill_buffer (vm=0x7ffff63a59c0 <vlib_global_main>,
ctx=0x7fffb4e1ebc0, b=0x1001d94000, n_bufs=0x7fffb56981ce, peek_data=1 '\001')
at /src/vpp/src/vnet/session/session_node.c:667
#3 0x00007ffff740156c in session_tx_fifo_read_and_snd_i (wrk=0x7fffb4e1eb80, node=0x7fffb4922080,
elt=0x7fffd7650a1c, n_tx_packets=0x7fffb56987d8, peek_data=1 '\001')
at /src/vpp/src/vnet/session/session_node.c:1064
#4 0x00007ffff7400703 in session_tx_fifo_peek_and_snd (wrk=0x7fffb4e1eb80, node=0x7fffb4922080,
e=0x7fffd7650a1c, n_tx_packets=0x7fffb56987d8) at /src/vpp/src/vnet/session/session_node.c:1112
#5 0x00007ffff740a2ac in session_event_dispatch_io (wrk=0x7fffb4e1eb80, node=0x7fffb4922080,
elt=0x7fffd7650a1c, thread_index=0, n_tx_packets=0x7fffb56987d8)
at /src/vpp/src/vnet/session/session_node.c:1272
#6 0x00007ffff7403fa2 in session_queue_node_fn (vm=0x7ffff63a59c0 <vlib_global_main>,
node=0x7fffb4922080, frame=0x0) at /src/vpp/src/vnet/session/session_node.c:1432
#7 0x00007ffff60ee175 in dispatch_node (vm=0x7ffff63a59c0 <vlib_global_main>, node=0x7fffb4922080,
type=VLIB_NODE_TYPE_INPUT, dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
last_time_stamp=8465767685985762) at /src/vpp/src/vlib/main.c:1235
#8 0x00007ffff60e8c15 in vlib_main_or_worker_loop (vm=0x7ffff63a59c0 <vlib_global_main>, is_main=1)
at /src/vpp/src/vlib/main.c:1815
#9 0x00007ffff60eabea in vlib_main_loop (vm=0x7ffff63a59c0 <vlib_global_main>)
at /src/vpp/src/vlib/main.c:1990
#10 0x00007ffff60ea9bf in vlib_main (vm=0x7ffff63a59c0 <vlib_global_main>, input=0x7fffb5698fa8)
at /src/vpp/src/vlib/main.c:2236
#11 0x00007ffff6170fa5 in thread0 (arg=140737324407232) at /src/vpp/src/vlib/unix/main.c:658
#12 0x00007ffff553d744 in clib_calljmp () at /src/vpp/src/vppinfra/longjmp.S:123
#13 0x00007fffffffd6b0 in ?? ()
#14 0x00007ffff6170b37 in vlib_unix_main (argc=61, argv=0x71d4d0) at /src/vpp/src/vlib/unix/main.c:730
#15 0x00000000004068b8 in main (argc=61, argv=0x71d4d0) at /src/vpp/src/vpp/vnet/main.c:291

Additional info 

elog for the TCP connection:

22.828418901: lcl: 198.19.0.2:80
22.828419081: rmt: 198.18.0.2:52752
22.828419141: syn-rx: cidx 9356 sidx 9358 irs 4158682863
22.828419197: state: syn-rcvd cidx 9356 sidx 9358
22.828437277: synack-tx: cidx 9356 sidx 9358 snd_una 0 snd_nxt 1 rcv_nxt 1
22.857900159: out: flags 12, bytes 40
22.957510075: timer-pop: retransmit cidx 9356 sidx 9358
23.145276195: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
23.145292312: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
23.145445507: out: flags 12, bytes 40
23.384408719: timer-pop: retransmit cidx 9356 sidx 9358
23.607211673: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
23.607237791: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
23.608412308: out: flags 12, bytes 40
23.851170613: timer-pop: retransmit cidx 9356 sidx 9358
23.999213139: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
23.999226549: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
24.002257791: out: flags 12, bytes 40
24.214047256: timer-pop: retransmit cidx 9356 sidx 9358
24.371062133: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
24.371074807: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
24.371108788: out: flags 12, bytes 40
24.864990968: timer-pop: retransmit cidx 9356 sidx 9358
24.901257625: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
24.901269878: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
24.901411387: out: flags 12, bytes 40
25.765769822: timer-pop: retransmit cidx 9356 sidx 9358
25.882664424: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
25.882679897: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
25.882852569: out: flags 12, bytes 40
27.529723082: timer-pop: retransmit cidx 9356 sidx 9358
27.586602849: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
27.586618753: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
27.586855949: out: flags 12, bytes 40
30.820255771: timer-pop: retransmit cidx 9356 sidx 9358
30.941896684: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
30.941908639: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
30.941976769: out: flags 12, bytes 40
37.476458664: timer-pop: retransmit cidx 9356 sidx 9358
37.548429419: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
37.548438776: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
37.548603744: out: flags 12, bytes 40
50.387905019: timer-pop: retransmit cidx 9356 sidx 9358
50.469505878: state: established cidx 9356 sidx 9358
50.469517778: in: order len 45 written 45 rcv_nxt 46 rcv_wnd(o) 4096
50.472787714: enq: length 45
50.476222077: ack-tx: acked 45 rcv_nxt 46 rcv_wnd 65536 snd_nxt 1 snd_wnd 62720
50.476222412: huh?: rcv_wnd 65536 obsd 65536 av 65491 rcv_nxt 46 rcv_las 46
50.476521576: out: flags 10, bytes 32
50.494423659: cc_stat: cwnd 4020 flight 0 space 4014 ssthresh 2147483647 snd_wnd 62720
50.494424496: rcv_stat: rto 1112 srtt 212 mrtt-us 1000000 rttvar 225
50.494424548: snd_stat: cc_space 4020 sacked 0 lost 0 out 0 rxt 0
50.494424771: cc: rxt-timeout snd_space 4020 snd_una 1 out 0 flight 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant