Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 3 features #1

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Add 3 features #1

wants to merge 5 commits into from

Conversation

jiho-jung
Copy link
Collaborator

  1. http-request options
    set-proxy-v2-header
    set-proxy-v2-tlv-header

  2. server option
    set-proxy-v2-tlv

  3. done memory leakage detection under valgrind

do nothing
6205 ==16466== LEAK SUMMARY:
6206 ==16466== definitely lost: 0 bytes in 0 blocks
6207 ==16466== indirectly lost: 0 bytes in 0 blocks
6208 ==16466== possibly lost: 512,353 bytes in 70 blocks
6209 ==16466== still reachable: 239,043,476 bytes in 4,231 blocks
6210 ==16466== suppressed: 0 bytes in 0 blocks

only http
6600 ==18082== LEAK SUMMARY:
6601 ==18082== definitely lost: 0 bytes in 0 blocks
6602 ==18082== indirectly lost: 0 bytes in 0 blocks
6603 ==18082== possibly lost: 622,001 bytes in 146 blocks
6604 ==18082== still reachable: 239,043,880 bytes in 4,232 blocks
6605 ==18082== suppressed: 0 bytes in 0 blocks
6606 ==18082==
6607 ==18082== ERROR SUMMARY: 58 errors from 58 contexts (suppressed: 0 from 0)
6608 ==18082== could not unlink /tmp/vgdb-pipe-from-vgdb-to-18082-by-root-on-sun
6609 ==18082== could not unlink /tmp/vgdb-pipe-to-vgdb-from-18082-by-root-on-sun
6610 ==18082== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-18082-by-root-on-sun

only accept-proxy and send-proxy-v2
6895 ==19881== LEAK SUMMARY:
6896 ==19881== definitely lost: 0 bytes in 0 blocks
6897 ==19881== indirectly lost: 0 bytes in 0 blocks
6898 ==19881== possibly lost: 740,801 bytes in 286 blocks
6899 ==19881== still reachable: 239,043,476 bytes in 4,231 blocks
6900 ==19881== suppressed: 0 bytes in 0 blocks
6901 ==19881==
6902 ==19881== ERROR SUMMARY: 77 errors from 77 contexts (suppressed: 0 from 0)
6903 ==19881== could not unlink /tmp/vgdb-pipe-from-vgdb-to-19881-by-root-on-sun
6904 ==19881== could not unlink /tmp/vgdb-pipe-to-vgdb-from-19881-by-root-on-sun
6905 ==19881== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-19881-by-root-on-sun

my patch
6921 ==21025== LEAK SUMMARY:
6922 ==21025== definitely lost: 0 bytes in 0 blocks
6923 ==21025== indirectly lost: 0 bytes in 0 blocks
6924 ==21025== possibly lost: 698,705 bytes in 222 blocks
6925 ==21025== still reachable: 239,048,325 bytes in 4,269 blocks
6926 ==21025== suppressed: 0 bytes in 0 blocks
6927 ==21025==
6928 ==21025== ERROR SUMMARY: 69 errors from 69 contexts (suppressed: 0 from 0)
6929 ==21025== could not unlink /tmp/vgdb-pipe-from-vgdb-to-21025-by-root-on-sun
6930 ==21025== could not unlink /tmp/vgdb-pipe-to-vgdb-from-21025-by-root-on-sun
6931 ==21025== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-21025-by-root-on-sun

 1) http-request options
    set-proxy-v2-header
    set-proxy-v2-tlv-header
 2) server option
    set-proxy-v2-tlv
@jiho-jung jiho-jung requested a review from fratj September 6, 2023 02:34
@jiho-jung jiho-jung self-assigned this Sep 6, 2023
@jiho-jung jiho-jung added the enhancement New feature or request label Sep 6, 2023
fratj pushed a commit to fratj/haproxy that referenced this pull request Sep 7, 2023
If a stream is interrupted during its initialization by a panic signal
and tries to dump itself, it may cause a crash during the dump due to
scf and/or scb not being fully initialized. This may also happen while
releasing an endpoint to attach a new one. The effect is that instead
of dying on an abort, the process dies on a segv. This race is ultra-
rare but totally possible. E.g:

  #0  se_fl_test (test=1, se=0x0) at include/haproxy/stconn.h:98
  joyent#1  sc_ep_test (test=1, sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:148
  joyent#2  sc_conn (sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:223
  haproxy#3  stream_dump (buf=buf@entry=0x7ff9507e7678, s=0x7ff4c40c8800, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>, eol=eol@entry=10 '\n') at src/stream.c:2840
  haproxy#4  0x000055996c493b42 in ha_task_dump (buf=buf@entry=0x7ff9507e7678, task=<optimized out>, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>) at src/debug.c:328
  haproxy#5  0x000055996c493edb in ha_thread_dump_one (thr=thr@entry=18, from_signal=from_signal@entry=0) at src/debug.c:227
  haproxy#6  0x000055996c493ff1 in ha_thread_dump (buf=buf@entry=0x7ff9507e7678, thr=thr@entry=18) at src/debug.c:270
  haproxy#7  0x000055996c494257 in ha_panic () at src/debug.c:430
  haproxy#8  ha_panic () at src/debug.c:411
  (...)
  haproxy#23 0x000055996c341fe8 in ssl_sock_close (conn=<optimized out>, xprt_ctx=0x7ff8dcae3880) at src/ssl_sock.c:6699
  haproxy#24 0x000055996c397648 in conn_xprt_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:148
  haproxy#25 conn_full_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:192
  haproxy#26 h1_release (h1c=0x7ff8c297b3c0) at src/mux_h1.c:1074
  haproxy#27 0x000055996c39c9f0 in h1_detach (sd=<optimized out>) at src/mux_h1.c:3502
  haproxy#28 0x000055996c474de4 in sc_detach_endp (scp=scp@entry=0x7ff9507e3148) at src/stconn.c:375
  haproxy#29 0x000055996c4752a5 in sc_reset_endp (sc=<optimized out>, sc@entry=0x7ff8d5cbd560) at src/stconn.c:475

Note that this cannot happen on "show sess" since a stream never leaves
process_stream in such an uninitialized state, thus it's really only the
crash dump that may cause this.

It should be backported to 2.8.
@fratj
Copy link

fratj commented Sep 13, 2023

@jiho-jung please review #2

* Make parsing safer
* Allow non-ending ; TLV values
  i.e 0xe0=this-is-tlv-value;0x30=tlv-123456
* Allow max length TLV values of 2048
  Enough to accomodate vpcid (max length 1024)
* Causes the server to drop connections with a CO_ER_PRX_BAD_HDR error message in conn_recv_proxy()
  Replaced put_tlv() by haproxy make_tlv()
* (minor) Add unlikely branch predict macro
fratj pushed a commit that referenced this pull request Sep 20, 2023
When using the listener socket as file descriptor, qc->fd value is -1.
In this case one must not access fdtab[qc->fd] element to change its value.

This bug could have been detected by asan with such a backtrace:

=================================================================
==402222==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fa8ecf417ex7fa8e915cf90 sp 0x7fa8e915cf88
WRITE of size 8 at 0x7fa8ecf417e8 thread T6
    #0 0x55707a0bf18a in qc_new_cc_conn src/quic_conn.c:838
    #1 0x55707a0c6dc0 in quic_conn_release src/quic_conn.c:1408
    #2 0x55707a10916f in quic_close src/xprt_quic.c:35
    haproxy#3 0x55707a0cec77 in conn_xprt_close include/haproxy/connection.h:153
    haproxy#4 0x55707a0ceed0 in conn_full_close include/haproxy/connection.h:197
    haproxy#5 0x55707a0ec253 in qcc_release src/mux_quic.c:2412
    haproxy#6 0x55707a0ec7d0 in qcc_io_cb src/mux_quic.c:2443
    haproxy#7 0x55707a63ff2a in run_tasks_from_lists src/task.c:596
    haproxy#8 0x55707a641cc9 in process_runnable_tasks src/task.c:876
    haproxy#9 0x55707a56f7b2 in run_poll_loop src/haproxy.c:2954
    haproxy#10 0x55707a5705fd in run_thread_poll_loop src/haproxy.c:3153
    haproxy#11 0x7fa8f9450ea6 in start_thread nptl/pthread_create.c:477
    haproxy#12 0x7fa8f936ea2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x7fa8ecf417e8 is located 24 bytes to the left of 134217728-byte region [0x7fa8e
allocated by thread T0 here:
    #0 0x7fa8f9a37037 in __interceptor_calloc ../../../../src/libsanitizer/asan/
    #1 0x55707a71a61d in init_pollers src/fd.c:1161
    #2 0x55707a56cdf1 in init src/haproxy.c:2672
    haproxy#3 0x55707a5714c2 in main src/haproxy.c:3298
    haproxy#4 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

Thread T6 created by T0 here:
    #0 0x7fa8f99e22a2 in __interceptor_pthread_create ../../../../src/libsanitizpp:214
    #1 0x55707a748a21 in setup_extra_threads src/thread.c:252
    #2 0x55707a5735c9 in main src/haproxy.c:3844
    haproxy#3 0x7fa8f9296d09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow src/quic_conn.c:838 in qc_new_cc
Shadow bytes around the buggy address:
  0x0ff59d9e02a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff59d9e02e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0ff59d9e02f0: fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]fa fa
  0x0ff59d9e0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff59d9e0340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==402222==ABORTING
Aborted

Thank you to @Tristan971 for having reported this bug in GH haproxy#2247.

No need to backport.
fratj pushed a commit to fratj/haproxy that referenced this pull request Oct 25, 2023
Since the following patch :
  commit 33c49ce
  MINOR: quic: Make qc_dgrams_retransmit() return a status.
retransmission process is interrupted as soon as a fatal send error has
been encounted. However, this may leave frames in local list. This cause
several issues : a memory leak and a potential crash.

The crash happens because leaked frames are duplicated of an origin
frame via qc_dup_pkt_frms(). If an ACK arrives later for the origin
frame, all duplicated frames are also freed. During qc_frm_free(),
LIST_DEL_INIT() operation is invalid as it still references the local
list used inside qc_dgrams_retransmit().

This bug was reproduced using the following injection from another
machine :
  $ h2load --npn-list h3 -t 8 -c 10000 -m 1 -n 2000000000 \
      https://<host>:<port>/?s=4m

Haproxy was compiled using ASAN. The crash resulted in the following
trace :
==332748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff82bf9d78 at pc 0x556facd3b95a bp 0x7fff82bf8b20 sp 0x7fff82bf8b10
WRITE of size 8 at 0x7fff82bf9d78 thread T0
    #0 0x556facd3b959 in qc_frm_free include/haproxy/quic_frame.h:273
    joyent#1 0x556facd59501 in qc_release_frm src/quic_conn.c:1724
    joyent#2 0x556facd5a07f in quic_stream_try_to_consume src/quic_conn.c:1803
    haproxy#3 0x556facd5abe9 in qc_treat_acked_tx_frm src/quic_conn.c:1866
    haproxy#4 0x556facd5b3d8 in qc_ackrng_pkts src/quic_conn.c:1928
    haproxy#5 0x556facd60187 in qc_parse_ack_frm src/quic_conn.c:2354
    haproxy#6 0x556facd693a1 in qc_parse_pkt_frms src/quic_conn.c:3203
    haproxy#7 0x556facd7531a in qc_treat_rx_pkts src/quic_conn.c:4606
    haproxy#8 0x556facd7a528 in quic_conn_app_io_cb src/quic_conn.c:5059
    haproxy#9 0x556fad3284be in run_tasks_from_lists src/task.c:596
    haproxy#10 0x556fad32a3fa in process_runnable_tasks src/task.c:876
    haproxy#11 0x556fad24a676 in run_poll_loop src/haproxy.c:2968
    haproxy#12 0x556fad24b510 in run_thread_poll_loop src/haproxy.c:3167
    haproxy#13 0x556fad24e7ff in main src/haproxy.c:3857
    haproxy#14 0x7fae30ddd0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    haproxy#15 0x556facc9375d in _start (/opt/haproxy-quic-2.8/haproxy+0x1ea75d)

Address 0x7fff82bf9d78 is located in stack of thread T0 at offset 40 in frame
    #0 0x556facd74ede in qc_treat_rx_pkts src/quic_conn.c:4580

This must be backported up to 2.7.
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 11, 2023
If a stream is interrupted during its initialization by a panic signal
and tries to dump itself, it may cause a crash during the dump due to
scf and/or scb not being fully initialized. This may also happen while
releasing an endpoint to attach a new one. The effect is that instead
of dying on an abort, the process dies on a segv. This race is ultra-
rare but totally possible. E.g:

  #0  se_fl_test (test=1, se=0x0) at include/haproxy/stconn.h:98
  joyent#1  sc_ep_test (test=1, sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:148
  joyent#2  sc_conn (sc=0x7ff8d5cbd560) at include/haproxy/stconn.h:223
  haproxy#3  stream_dump (buf=buf@entry=0x7ff9507e7678, s=0x7ff4c40c8800, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>, eol=eol@entry=10 '\n') at src/stream.c:2840
  haproxy#4  0x000055996c493b42 in ha_task_dump (buf=buf@entry=0x7ff9507e7678, task=<optimized out>, pfx=pfx@entry=0x55996c558cb3 ' ' <repeats 13 times>) at src/debug.c:328
  haproxy#5  0x000055996c493edb in ha_thread_dump_one (thr=thr@entry=18, from_signal=from_signal@entry=0) at src/debug.c:227
  haproxy#6  0x000055996c493ff1 in ha_thread_dump (buf=buf@entry=0x7ff9507e7678, thr=thr@entry=18) at src/debug.c:270
  haproxy#7  0x000055996c494257 in ha_panic () at src/debug.c:430
  haproxy#8  ha_panic () at src/debug.c:411
  (...)
  haproxy#23 0x000055996c341fe8 in ssl_sock_close (conn=<optimized out>, xprt_ctx=0x7ff8dcae3880) at src/ssl_sock.c:6699
  haproxy#24 0x000055996c397648 in conn_xprt_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:148
  haproxy#25 conn_full_close (conn=0x7ff8c297b0c0) at include/haproxy/connection.h:192
  haproxy#26 h1_release (h1c=0x7ff8c297b3c0) at src/mux_h1.c:1074
  haproxy#27 0x000055996c39c9f0 in h1_detach (sd=<optimized out>) at src/mux_h1.c:3502
  haproxy#28 0x000055996c474de4 in sc_detach_endp (scp=scp@entry=0x7ff9507e3148) at src/stconn.c:375
  haproxy#29 0x000055996c4752a5 in sc_reset_endp (sc=<optimized out>, sc@entry=0x7ff8d5cbd560) at src/stconn.c:475

Note that this cannot happen on "show sess" since a stream never leaves
process_stream in such an uninitialized state, thus it's really only the
crash dump that may cause this.

It should be backported to 2.8.

(cherry picked from commit e64bcca)
Signed-off-by: Christopher Faulet <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 11, 2023
Since the following patch :
  commit 33c49ce
  MINOR: quic: Make qc_dgrams_retransmit() return a status.
retransmission process is interrupted as soon as a fatal send error has
been encounted. However, this may leave frames in local list. This cause
several issues : a memory leak and a potential crash.

The crash happens because leaked frames are duplicated of an origin
frame via qc_dup_pkt_frms(). If an ACK arrives later for the origin
frame, all duplicated frames are also freed. During qc_frm_free(),
LIST_DEL_INIT() operation is invalid as it still references the local
list used inside qc_dgrams_retransmit().

This bug was reproduced using the following injection from another
machine :
  $ h2load --npn-list h3 -t 8 -c 10000 -m 1 -n 2000000000 \
      https://<host>:<port>/?s=4m

Haproxy was compiled using ASAN. The crash resulted in the following
trace :
==332748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff82bf9d78 at pc 0x556facd3b95a bp 0x7fff82bf8b20 sp 0x7fff82bf8b10
WRITE of size 8 at 0x7fff82bf9d78 thread T0
    #0 0x556facd3b959 in qc_frm_free include/haproxy/quic_frame.h:273
    joyent#1 0x556facd59501 in qc_release_frm src/quic_conn.c:1724
    joyent#2 0x556facd5a07f in quic_stream_try_to_consume src/quic_conn.c:1803
    haproxy#3 0x556facd5abe9 in qc_treat_acked_tx_frm src/quic_conn.c:1866
    haproxy#4 0x556facd5b3d8 in qc_ackrng_pkts src/quic_conn.c:1928
    haproxy#5 0x556facd60187 in qc_parse_ack_frm src/quic_conn.c:2354
    haproxy#6 0x556facd693a1 in qc_parse_pkt_frms src/quic_conn.c:3203
    haproxy#7 0x556facd7531a in qc_treat_rx_pkts src/quic_conn.c:4606
    haproxy#8 0x556facd7a528 in quic_conn_app_io_cb src/quic_conn.c:5059
    haproxy#9 0x556fad3284be in run_tasks_from_lists src/task.c:596
    haproxy#10 0x556fad32a3fa in process_runnable_tasks src/task.c:876
    haproxy#11 0x556fad24a676 in run_poll_loop src/haproxy.c:2968
    haproxy#12 0x556fad24b510 in run_thread_poll_loop src/haproxy.c:3167
    haproxy#13 0x556fad24e7ff in main src/haproxy.c:3857
    haproxy#14 0x7fae30ddd0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
    haproxy#15 0x556facc9375d in _start (/opt/haproxy-quic-2.8/haproxy+0x1ea75d)

Address 0x7fff82bf9d78 is located in stack of thread T0 at offset 40 in frame
    #0 0x556facd74ede in qc_treat_rx_pkts src/quic_conn.c:4580

This must be backported up to 2.7.

(cherry picked from commit 89d685f)
 [ad: major quic_conn.c split in 2.9]
Signed-off-by: Amaury Denoyelle <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 11, 2023
A quic_conn is instantiated and tied on the first thread which has
received the first INITIAL packet. After handshake completion,
listener_accept() is called. For each quic_conn, a new thread is
selected among the least loaded ones Note that this occurs earlier if
handling 0-RTT data.

This thread connection migration is done in two steps :
* inside listener_accept(), on the origin thread, quic_conn
  tasks/tasklet are killed. After this, no quic_conn related processing
  will occur on this thread. The connection is flagged with
  QUIC_FL_CONN_AFFINITY_CHANGED.
* as soon as the first quic_conn related processing occurs on the new
  thread, the migration is finalized. This allows to allocate the new
  tasks/tasklet directly on the destination thread.

This last step on the new thread must be done prior to other quic_conn
access. There is two events which may trigger it :
* a packet is received on the new thread. In this case,
  qc_finalize_affinity_rebind() is called from quic_dgram_parse().
* the recently accepted connection is popped from accept_queue_ring via
  accept_queue_process(). This will called session_accept_fd() as
  listener.bind_conf.accept callback. This instantiates a new session
  and start connection stack via conn_xprt_start(), which itself calls
  qc_xprt_start() where qc_finalize_affinity_rebind() is used.

A condition was recently found which could cause a closing to be used
with qc_finalize_affinity_rebind() which is forbidden with a BUG_ON().

This lat step was not compatible with layer 4 rule such as "tcp-request
connection reject" which closes the connection early. In this case, most
of the body of session_accept_fd() is skipped, including
qc_xprt_start(), so thread migration is not finalized. At the end of the
function, conn_xprt_close() is then called which flags the connection as
CLOSING.

If a datagram is received for this connection before it is released,
this will call qc_finalize_affinity_rebind() which triggers its BUG_ON()
to prevent thread migration for CLOSING quic_conn.

FATAL: bug condition "qc->flags & ((1U << 29)|(1U << 30))" matched at src/quic_conn.c:2036
Thread 3 "haproxy" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffff794f700 (LWP 2973030)]
0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
2036            BUG_ON(qc->flags & (QUIC_FL_CONN_CLOSING|QUIC_FL_CONN_DRAINING));
(gdb) bt
 #0  0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
 joyent#1  0x0000555555682463 in quic_dgram_parse (dgram=0x7fff5003ef10, from_qc=0x0, li=0x555555f38670) at src/quic_rx.c:2602
 joyent#2  0x0000555555651aae in quic_lstnr_dghdlr (t=0x555555fc4440, ctx=0x555555fc3f78, state=32832) at src/quic_sock.c:189
 haproxy#3  0x00005555558c9393 in run_tasks_from_lists (budgets=0x7ffff7944c90) at src/task.c:596
 haproxy#4  0x00005555558c9e8e in process_runnable_tasks () at src/task.c:876
 haproxy#5  0x000055555586b7b2 in run_poll_loop () at src/haproxy.c:2966
 haproxy#6  0x000055555586be87 in run_thread_poll_loop (data=0x555555d3d340 <ha_thread_info+64>) at src/haproxy.c:3165
 haproxy#7  0x00007ffff7b59609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
 haproxy#8  0x00007ffff7a7e133 in clone () from /lib/x86_64-linux-gnu/libc.so.6

To fix this issue, ensure quic_conn migration is completed earlier
inside session_accept_fd(), before any tcp rules processing. This is
done by moving qc_finalize_affinity_rebind() invocation from
qc_xprt_start() to qc_conn_init().

This must be backported up to 2.7.

(cherry picked from commit a896870)
[ad: move comment from quic_rx.c to quic_conn.c]
Signed-off-by: Amaury Denoyelle <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 11, 2023
The connections are flagged as "to be killed" asap when the peer has left
(detected by sendto() "Connection refused" errno) by qc_kill_conn(). This
function has to wakeup the idle timer task to release the connection (and the idle
timer  and the idle timer task itself). Then if in the meantime the connection
was flagged as having to process some retransmissions, some packet could lead
to sendto() errors again with a call to qc_kill_conn(), this time with a released
idle timer task.

This bug could be detected by libasan as follows:

.AddressSanitizer:DEADLYSIGNAL
=================================================================
==21018==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x    560b5d898717 bp 0x7f9aaac30000 sp 0x7f9aaac2ff80 T3)
==21018==The signal is caused by a READ memory access.
==21018==Hint: address points to the zero page.
.    #0 0x560b5d898717 in _task_wakeup include/haproxy/task.h:209
    joyent#1 0x560b5d8a563c in qc_kill_conn src/quic_conn.c:171
    joyent#2 0x560b5d97f832 in qc_send_ppkts src/quic_tx.c:636
    haproxy#3 0x560b5d981b53 in qc_send_app_pkts src/quic_tx.c:876
    haproxy#4 0x560b5d987122 in qc_send_app_probing src/quic_tx.c:910
    haproxy#5 0x560b5d987122 in qc_dgrams_retransmit src/quic_tx.c:1397
    haproxy#6 0x560b5d8ab250 in quic_conn_app_io_cb src/quic_conn.c:712
    haproxy#7 0x560b5de41593 in run_tasks_from_lists src/task.c:596
    haproxy#8 0x560b5de4333c in process_runnable_tasks src/task.c:876
    haproxy#9 0x560b5dd6f2b0 in run_poll_loop src/haproxy.c:2966
    haproxy#10 0x560b5dd700fb in run_thread_poll_loop src/haproxy.c:3165
    haproxy#11 0x7f9ab9188ea6 in start_thread nptl/pthread_create.c:477
    haproxy#12 0x7f9ab90a8a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV include/haproxy/task.h:209 in _task_wakeup
Thread T3 created by T0 here:
    #0 0x7f9ab97ac2a2 in __interceptor_pthread_create ../../../../src/libsaniti    zer/asan/asan_interceptors.cpp:214
    joyent#1 0x560b5df4f3ef in setup_extra_threads src/thread.c:252                  o
    joyent#2 0x560b5dd730c7 in main src/haproxy.c:3856
    haproxy#3 0x7f9ab8fd0d09 in __libc_start_main ../csu/libc-start.c:308             i

==21018==ABORTING
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

To fix, simply reset the connection flag QUIC_FL_CONN_RETRANS_NEEDED to cancel
the retransmission when qc_kill_conn is called.

Note that this new bug arrived with this fix which is correct and flagged as to be
backported as far as 2.6.
      BUG/MINOR: quic: idle timer task requeued in the past

Must be backported as far as 2.6.

(cherry picked from commit 756b3c5)
Signed-off-by: Christopher Faulet <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 11, 2023
This bug could be reproduced with -dMfail and dectected by libasan as follows:

$ ASAN_OPTIONS=disable_coredump=0:unmap_shadow_on_exit=1:abort_on_error=f quic-freeze.cfg -dMfail -dMno-cache -dM0x55
=================================================================
==82989==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc 0x560790cc4749 bp 0x7fff8e0e8e30 sp 0x7fff8e0e8e28
WRITE of size 8 at 0x7fff8e0ea338 thread T0
    #0 0x560790cc4748 in qc_frm_free src/quic_frame.c:1222
    joyent#1 0x560790cc5260 in qc_release_frm src/quic_frame.c:1261
    joyent#2 0x560790d1de99 in qc_treat_acked_tx_frm src/quic_rx.c:312
    haproxy#3 0x560790d1e708 in qc_ackrng_pkts src/quic_rx.c:370
    haproxy#4 0x560790d22a1d in qc_parse_ack_frm src/quic_rx.c:694
    haproxy#5 0x560790d25daa in qc_parse_pkt_frms src/quic_rx.c:988
    haproxy#6 0x560790d2a509 in qc_treat_rx_pkts src/quic_rx.c:1373
    haproxy#7 0x560790c72d45 in quic_conn_io_cb src/quic_conn.c:906
    haproxy#8 0x560791207847 in run_tasks_from_lists src/task.c:596
    haproxy#9 0x5607912095f0 in process_runnable_tasks src/task.c:876
    haproxy#10 0x560791135564 in run_poll_loop src/haproxy.c:2966
    haproxy#11 0x5607911363af in run_thread_poll_loop src/haproxy.c:3165
    haproxy#12 0x56079113938c in main src/haproxy.c:3862
    haproxy#13 0x7f92606edd09 in __libc_start_main ../csu/libc-start.c:308
    haproxy#14 0x560790bcd529 in _start (/home/flecaille/src/haproxy/haproxy+0x

Address 0x7fff8e0ea338 is located in stack of thread T0 at offset 1032 i
    #0 0x560790d29b52 in qc_treat_rx_pkts src/quic_rx.c:1341

  This frame has 2 object(s):
    [32, 48) 'ar' (line 1380)
    [64, 1088) '_msg' (line 1368) <== Memory access at offset 1032 is inable
HINT: this may be a false positive if your program uses some custom stacnism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope src/quic_frame.c:1222 i
Shadow bytes around the buggy address:
  0x100071c15410: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15420: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15430: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15440: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15450: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
=>0x100071c15460: f8 f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f8 f8 f8 f3 f3
  0x100071c15470: f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00
  0x100071c15480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c15490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154b0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f3 f3 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==82989==ABORTING
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

Note that a coredump could not always be produced with all compilers. This was
always the case with clang 11.

When allocating frames to be retransmitted from qc_dgrams_retransmit(), if they
could not be sent for any reason, they could remain attached to a local list to
qc_dgrams_retransmit() and trigger a crash with libasan when releasing the
original frames they were duplicated from.

To fix this, always release the frames which could not be sent during
retransmissions calling qc_free_frm_list() where needed.

Must be backported as far as 2.6.

(cherry picked from commit dc8a20b)
[cf: Adapted with Fred's help]
Signed-off-by: Christopher Faulet <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 11, 2023
This bug could be reproduced loading several certificated from "bind" line:
with "server_ocsp.pem" as argument to "crt" setting and updating
the CDSA certificate with the RSA as follows:

echo -e "set ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa \
	     <<\n$(cat reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.rsa)\n" | socat - /tmp/stats
followed by an "commit ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa"
command. This could be detected by libasan as follows:

=================================================================
==507223==ERROR: AddressSanitizer: attempting double-free on 0x60200007afb0 in thread T3:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    joyent#1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)
    joyent#2 0x7fabc6af54e9 in ossl_asn1_primitive_free (/opt/quictls/lib/libcrypto.so.81.3+0xe14e9)
    haproxy#3 0x7fabc6af5960 in ossl_asn1_template_free (/opt/quictls/lib/libcrypto.so.81.3+0xe1960)
    haproxy#4 0x7fabc6af569f in ossl_asn1_item_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xe169f)
    haproxy#5 0x7fabc6af58a4 in ASN1_item_free (/opt/quictls/lib/libcrypto.so.81.3+0xe18a4)
    haproxy#6 0x46a159 in ssl_sock_free_cert_key_and_chain_contents src/ssl_ckch.c:723
    haproxy#7 0x46aa92 in ckch_store_free src/ssl_ckch.c:869
    haproxy#8 0x4704ad in cli_release_commit_cert src/ssl_ckch.c:1981
    haproxy#9 0x962e83 in cli_io_handler src/cli.c:1140
    haproxy#10 0xc1edff in task_run_applet src/applet.c:454
    haproxy#11 0xaf8be9 in run_tasks_from_lists src/task.c:634
    haproxy#12 0xafa2ed in process_runnable_tasks src/task.c:876
    haproxy#13 0xa23c72 in run_poll_loop src/haproxy.c:3024
    haproxy#14 0xa24aa3 in run_thread_poll_loop src/haproxy.c:3226
    haproxy#15 0x7fabc69e7ea6 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7ea6)
    haproxy#16 0x7fabc6907a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x60200007afb0 is located 0 bytes inside of 3-byte region [0x60200007afb0,0x60200007afb3)
freed by thread T3 here:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    joyent#1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)

previously allocated by thread T2 here:
    #0 0x7fabc6fb573f in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5473f)
    joyent#1 0x7fabc6ae8d77 in ASN1_STRING_set (/opt/quictls/lib/libcrypto.so.81.3+0xd4d77)

Thread T3 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    joyent#1 0xc04f36 in setup_extra_threads src/thread.c:252
    joyent#2 0xa2761f in main src/haproxy.c:3917
    haproxy#3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

Thread T2 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    joyent#1 0xc04f36 in setup_extra_threads src/thread.c:252
    joyent#2 0xa2761f in main src/haproxy.c:3917
    haproxy#3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

SUMMARY: AddressSanitizer: double-free ??:0 __interceptor_free
==507223==ABORTING
Aborted

The OCSP CID stored in the impacted ckch data were freed but not reset to NULL,
leading to a subsequent double free.

Must be backported to 2.8.

(cherry picked from commit 7dab3e8)
Signed-off-by: Christopher Faulet <[email protected]>
(cherry picked from commit 5c82bd9)
Signed-off-by: Christopher Faulet <[email protected]>
fratj pushed a commit that referenced this pull request Dec 11, 2023
A quic_conn is instantiated and tied on the first thread which has
received the first INITIAL packet. After handshake completion,
listener_accept() is called. For each quic_conn, a new thread is
selected among the least loaded ones Note that this occurs earlier if
handling 0-RTT data.

This thread connection migration is done in two steps :
* inside listener_accept(), on the origin thread, quic_conn
  tasks/tasklet are killed. After this, no quic_conn related processing
  will occur on this thread. The connection is flagged with
  QUIC_FL_CONN_AFFINITY_CHANGED.
* as soon as the first quic_conn related processing occurs on the new
  thread, the migration is finalized. This allows to allocate the new
  tasks/tasklet directly on the destination thread.

This last step on the new thread must be done prior to other quic_conn
access. There is two events which may trigger it :
* a packet is received on the new thread. In this case,
  qc_finalize_affinity_rebind() is called from quic_dgram_parse().
* the recently accepted connection is popped from accept_queue_ring via
  accept_queue_process(). This will called session_accept_fd() as
  listener.bind_conf.accept callback. This instantiates a new session
  and start connection stack via conn_xprt_start(), which itself calls
  qc_xprt_start() where qc_finalize_affinity_rebind() is used.

A condition was recently found which could cause a closing to be used
with qc_finalize_affinity_rebind() which is forbidden with a BUG_ON().

This lat step was not compatible with layer 4 rule such as "tcp-request
connection reject" which closes the connection early. In this case, most
of the body of session_accept_fd() is skipped, including
qc_xprt_start(), so thread migration is not finalized. At the end of the
function, conn_xprt_close() is then called which flags the connection as
CLOSING.

If a datagram is received for this connection before it is released,
this will call qc_finalize_affinity_rebind() which triggers its BUG_ON()
to prevent thread migration for CLOSING quic_conn.

FATAL: bug condition "qc->flags & ((1U << 29)|(1U << 30))" matched at src/quic_conn.c:2036
Thread 3 "haproxy" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffff794f700 (LWP 2973030)]
0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
2036            BUG_ON(qc->flags & (QUIC_FL_CONN_CLOSING|QUIC_FL_CONN_DRAINING));
(gdb) bt
 #0  0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
 #1  0x0000555555682463 in quic_dgram_parse (dgram=0x7fff5003ef10, from_qc=0x0, li=0x555555f38670) at src/quic_rx.c:2602
 #2  0x0000555555651aae in quic_lstnr_dghdlr (t=0x555555fc4440, ctx=0x555555fc3f78, state=32832) at src/quic_sock.c:189
 haproxy#3  0x00005555558c9393 in run_tasks_from_lists (budgets=0x7ffff7944c90) at src/task.c:596
 haproxy#4  0x00005555558c9e8e in process_runnable_tasks () at src/task.c:876
 haproxy#5  0x000055555586b7b2 in run_poll_loop () at src/haproxy.c:2966
 haproxy#6  0x000055555586be87 in run_thread_poll_loop (data=0x555555d3d340 <ha_thread_info+64>) at src/haproxy.c:3165
 haproxy#7  0x00007ffff7b59609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
 haproxy#8  0x00007ffff7a7e133 in clone () from /lib/x86_64-linux-gnu/libc.so.6

To fix this issue, ensure quic_conn migration is completed earlier
inside session_accept_fd(), before any tcp rules processing. This is
done by moving qc_finalize_affinity_rebind() invocation from
qc_xprt_start() to qc_conn_init().

This must be backported up to 2.7.
fratj pushed a commit that referenced this pull request Dec 11, 2023
The connections are flagged as "to be killed" asap when the peer has left
(detected by sendto() "Connection refused" errno) by qc_kill_conn(). This
function has to wakeup the idle timer task to release the connection (and the idle
timer  and the idle timer task itself). Then if in the meantime the connection
was flagged as having to process some retransmissions, some packet could lead
to sendto() errors again with a call to qc_kill_conn(), this time with a released
idle timer task.

This bug could be detected by libasan as follows:

.AddressSanitizer:DEADLYSIGNAL
=================================================================
==21018==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x    560b5d898717 bp 0x7f9aaac30000 sp 0x7f9aaac2ff80 T3)
==21018==The signal is caused by a READ memory access.
==21018==Hint: address points to the zero page.
.    #0 0x560b5d898717 in _task_wakeup include/haproxy/task.h:209
    #1 0x560b5d8a563c in qc_kill_conn src/quic_conn.c:171
    #2 0x560b5d97f832 in qc_send_ppkts src/quic_tx.c:636
    haproxy#3 0x560b5d981b53 in qc_send_app_pkts src/quic_tx.c:876
    haproxy#4 0x560b5d987122 in qc_send_app_probing src/quic_tx.c:910
    haproxy#5 0x560b5d987122 in qc_dgrams_retransmit src/quic_tx.c:1397
    haproxy#6 0x560b5d8ab250 in quic_conn_app_io_cb src/quic_conn.c:712
    haproxy#7 0x560b5de41593 in run_tasks_from_lists src/task.c:596
    haproxy#8 0x560b5de4333c in process_runnable_tasks src/task.c:876
    haproxy#9 0x560b5dd6f2b0 in run_poll_loop src/haproxy.c:2966
    haproxy#10 0x560b5dd700fb in run_thread_poll_loop src/haproxy.c:3165
    haproxy#11 0x7f9ab9188ea6 in start_thread nptl/pthread_create.c:477
    haproxy#12 0x7f9ab90a8a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV include/haproxy/task.h:209 in _task_wakeup
Thread T3 created by T0 here:
    #0 0x7f9ab97ac2a2 in __interceptor_pthread_create ../../../../src/libsaniti    zer/asan/asan_interceptors.cpp:214
    #1 0x560b5df4f3ef in setup_extra_threads src/thread.c:252                  o
    #2 0x560b5dd730c7 in main src/haproxy.c:3856
    haproxy#3 0x7f9ab8fd0d09 in __libc_start_main ../csu/libc-start.c:308             i

==21018==ABORTING
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

To fix, simply reset the connection flag QUIC_FL_CONN_RETRANS_NEEDED to cancel
the retransmission when qc_kill_conn is called.

Note that this new bug arrived with this fix which is correct and flagged as to be
backported as far as 2.6.
      BUG/MINOR: quic: idle timer task requeued in the past

Must be backported as far as 2.6.
fratj pushed a commit that referenced this pull request Dec 11, 2023
This bug could be reproduced with -dMfail and dectected by libasan as follows:

$ ASAN_OPTIONS=disable_coredump=0:unmap_shadow_on_exit=1:abort_on_error=f quic-freeze.cfg -dMfail -dMno-cache -dM0x55
=================================================================
==82989==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffc 0x560790cc4749 bp 0x7fff8e0e8e30 sp 0x7fff8e0e8e28
WRITE of size 8 at 0x7fff8e0ea338 thread T0
    #0 0x560790cc4748 in qc_frm_free src/quic_frame.c:1222
    #1 0x560790cc5260 in qc_release_frm src/quic_frame.c:1261
    #2 0x560790d1de99 in qc_treat_acked_tx_frm src/quic_rx.c:312
    haproxy#3 0x560790d1e708 in qc_ackrng_pkts src/quic_rx.c:370
    haproxy#4 0x560790d22a1d in qc_parse_ack_frm src/quic_rx.c:694
    haproxy#5 0x560790d25daa in qc_parse_pkt_frms src/quic_rx.c:988
    haproxy#6 0x560790d2a509 in qc_treat_rx_pkts src/quic_rx.c:1373
    haproxy#7 0x560790c72d45 in quic_conn_io_cb src/quic_conn.c:906
    haproxy#8 0x560791207847 in run_tasks_from_lists src/task.c:596
    haproxy#9 0x5607912095f0 in process_runnable_tasks src/task.c:876
    haproxy#10 0x560791135564 in run_poll_loop src/haproxy.c:2966
    haproxy#11 0x5607911363af in run_thread_poll_loop src/haproxy.c:3165
    haproxy#12 0x56079113938c in main src/haproxy.c:3862
    haproxy#13 0x7f92606edd09 in __libc_start_main ../csu/libc-start.c:308
    haproxy#14 0x560790bcd529 in _start (/home/flecaille/src/haproxy/haproxy+0x

Address 0x7fff8e0ea338 is located in stack of thread T0 at offset 1032 i
    #0 0x560790d29b52 in qc_treat_rx_pkts src/quic_rx.c:1341

  This frame has 2 object(s):
    [32, 48) 'ar' (line 1380)
    [64, 1088) '_msg' (line 1368) <== Memory access at offset 1032 is inable
HINT: this may be a false positive if your program uses some custom stacnism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope src/quic_frame.c:1222 i
Shadow bytes around the buggy address:
  0x100071c15410: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15420: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15430: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15440: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  0x100071c15450: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
=>0x100071c15460: f8 f8 f8 f8 f8 f8 f8[f8]f8 f8 f8 f8 f8 f8 f3 f3
  0x100071c15470: f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00
  0x100071c15480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c15490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100071c154b0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f3 f3 f3
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==82989==ABORTING
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
Aborted (core dumped)

Note that a coredump could not always be produced with all compilers. This was
always the case with clang 11.

When allocating frames to be retransmitted from qc_dgrams_retransmit(), if they
could not be sent for any reason, they could remain attached to a local list to
qc_dgrams_retransmit() and trigger a crash with libasan when releasing the
original frames they were duplicated from.

To fix this, always release the frames which could not be sent during
retransmissions calling qc_free_frm_list() where needed.

Must be backported as far as 2.6.
fratj pushed a commit that referenced this pull request Dec 11, 2023
This bug could be reproduced loading several certificated from "bind" line:
with "server_ocsp.pem" as argument to "crt" setting and updating
the CDSA certificate with the RSA as follows:

echo -e "set ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa \
	     <<\n$(cat reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.rsa)\n" | socat - /tmp/stats
followed by an "commit ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa"
command. This could be detected by libasan as follows:

=================================================================
==507223==ERROR: AddressSanitizer: attempting double-free on 0x60200007afb0 in thread T3:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)
    #2 0x7fabc6af54e9 in ossl_asn1_primitive_free (/opt/quictls/lib/libcrypto.so.81.3+0xe14e9)
    haproxy#3 0x7fabc6af5960 in ossl_asn1_template_free (/opt/quictls/lib/libcrypto.so.81.3+0xe1960)
    haproxy#4 0x7fabc6af569f in ossl_asn1_item_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xe169f)
    haproxy#5 0x7fabc6af58a4 in ASN1_item_free (/opt/quictls/lib/libcrypto.so.81.3+0xe18a4)
    haproxy#6 0x46a159 in ssl_sock_free_cert_key_and_chain_contents src/ssl_ckch.c:723
    haproxy#7 0x46aa92 in ckch_store_free src/ssl_ckch.c:869
    haproxy#8 0x4704ad in cli_release_commit_cert src/ssl_ckch.c:1981
    haproxy#9 0x962e83 in cli_io_handler src/cli.c:1140
    haproxy#10 0xc1edff in task_run_applet src/applet.c:454
    haproxy#11 0xaf8be9 in run_tasks_from_lists src/task.c:634
    haproxy#12 0xafa2ed in process_runnable_tasks src/task.c:876
    haproxy#13 0xa23c72 in run_poll_loop src/haproxy.c:3024
    haproxy#14 0xa24aa3 in run_thread_poll_loop src/haproxy.c:3226
    haproxy#15 0x7fabc69e7ea6 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7ea6)
    haproxy#16 0x7fabc6907a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x60200007afb0 is located 0 bytes inside of 3-byte region [0x60200007afb0,0x60200007afb3)
freed by thread T3 here:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)

previously allocated by thread T2 here:
    #0 0x7fabc6fb573f in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5473f)
    #1 0x7fabc6ae8d77 in ASN1_STRING_set (/opt/quictls/lib/libcrypto.so.81.3+0xd4d77)

Thread T3 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    #1 0xc04f36 in setup_extra_threads src/thread.c:252
    #2 0xa2761f in main src/haproxy.c:3917
    haproxy#3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

Thread T2 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    #1 0xc04f36 in setup_extra_threads src/thread.c:252
    #2 0xa2761f in main src/haproxy.c:3917
    haproxy#3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

SUMMARY: AddressSanitizer: double-free ??:0 __interceptor_free
==507223==ABORTING
Aborted

The OCSP CID stored in the impacted ckch data were freed but not reset to NULL,
leading to a subsequent double free.

Must be backported to 2.8.
fratj pushed a commit to fratj/haproxy that referenced this pull request Dec 18, 2023
This bug could be reproduced loading several certificated from "bind" line:
with "server_ocsp.pem" as argument to "crt" setting and updating
the CDSA certificate with the RSA as follows:

echo -e "set ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa \
	     <<\n$(cat reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.rsa)\n" | socat - /tmp/stats
followed by an "commit ssl cert reg-tests/ssl/ocsp_update/multicert/server_ocsp.pem.ecdsa"
command. This could be detected by libasan as follows:

=================================================================
==507223==ERROR: AddressSanitizer: attempting double-free on 0x60200007afb0 in thread T3:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    joyent#1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)
    joyent#2 0x7fabc6af54e9 in ossl_asn1_primitive_free (/opt/quictls/lib/libcrypto.so.81.3+0xe14e9)
    haproxy#3 0x7fabc6af5960 in ossl_asn1_template_free (/opt/quictls/lib/libcrypto.so.81.3+0xe1960)
    haproxy#4 0x7fabc6af569f in ossl_asn1_item_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xe169f)
    haproxy#5 0x7fabc6af58a4 in ASN1_item_free (/opt/quictls/lib/libcrypto.so.81.3+0xe18a4)
    haproxy#6 0x46a159 in ssl_sock_free_cert_key_and_chain_contents src/ssl_ckch.c:723
    haproxy#7 0x46aa92 in ckch_store_free src/ssl_ckch.c:869
    haproxy#8 0x4704ad in cli_release_commit_cert src/ssl_ckch.c:1981
    haproxy#9 0x962e83 in cli_io_handler src/cli.c:1140
    haproxy#10 0xc1edff in task_run_applet src/applet.c:454
    haproxy#11 0xaf8be9 in run_tasks_from_lists src/task.c:634
    haproxy#12 0xafa2ed in process_runnable_tasks src/task.c:876
    haproxy#13 0xa23c72 in run_poll_loop src/haproxy.c:3024
    haproxy#14 0xa24aa3 in run_thread_poll_loop src/haproxy.c:3226
    haproxy#15 0x7fabc69e7ea6 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7ea6)
    haproxy#16 0x7fabc6907a2e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xfba2e)

0x60200007afb0 is located 0 bytes inside of 3-byte region [0x60200007afb0,0x60200007afb3)
freed by thread T3 here:
    #0 0x7fabc6fb5527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    joyent#1 0x7fabc6ae8f8c in ossl_asn1_string_embed_free (/opt/quictls/lib/libcrypto.so.81.3+0xd4f8c)

previously allocated by thread T2 here:
    #0 0x7fabc6fb573f in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5473f)
    joyent#1 0x7fabc6ae8d77 in ASN1_STRING_set (/opt/quictls/lib/libcrypto.so.81.3+0xd4d77)

Thread T3 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    joyent#1 0xc04f36 in setup_extra_threads src/thread.c:252
    joyent#2 0xa2761f in main src/haproxy.c:3917
    haproxy#3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

Thread T2 created by T0 here:
    #0 0x7fabc6f84bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    joyent#1 0xc04f36 in setup_extra_threads src/thread.c:252
    joyent#2 0xa2761f in main src/haproxy.c:3917
    haproxy#3 0x7fabc682fd09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)

SUMMARY: AddressSanitizer: double-free ??:0 __interceptor_free
==507223==ABORTING
Aborted

The OCSP CID stored in the impacted ckch data were freed but not reset to NULL,
leading to a subsequent double free.

Must be backported to 2.8.

(cherry picked from commit 7dab3e8)
Signed-off-by: Christopher Faulet <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Jan 17, 2024
h3_resp_data_send() is used to transcode HTX data into H3 data frames.
If QCS Tx buffer is not aligned when first invoked, two separate frames
may be built, first until buffer end, then with remaining space in
front.

If buffer space is not enough for at least the H3 frame header, -1 is
returned with the flag QC_SF_BLK_MROOM set to await for more room. An
issue arises if this occurs for the second frame : -1 is returned even
though HTX data were properly transcoded and removed on the first step.
This causes snd_buf callback to return an incorrect value to the stream
layer, which in the end will corrupt the channel output buffer.

To fix this, stop considering that not enough remaining space is an
error case. Instead, return 0 if this is encountered for the first frame
or the HTX removed block size for the second one. As QC_SF_BLK_MROOM is
set, this will correctly interrupt H3 encoding. Label err is thus only
properly limited to fatal error which should cause a connection closure.
A new BUG_ON() has been added which should prevent similar issues in the
future.

This issue was detected using the following client :
 $ ngtcp2-client --no-quic-dump --no-http-dump --exit-on-all-streams-close \
   127.0.0.1 20443 -n2 "http://127.0.0.1:20443/?s=50k"

This triggers the following CHECK_IF statement. Note that it may be
necessary to disable fast forwarding to enforce snd_buf usage.

Thread 1 "haproxy" received signal SIGILL, Illegal instruction.
0x00005555558bc48a in co_data (c=0x5555561ed428) at include/haproxy/channel.h:130
130             CHECK_IF_HOT(c->output > c_data(c));
[ ## gdb ## ] bt
 #0  0x00005555558bc48a in co_data (c=0x5555561ed428) at include/haproxy/channel.h:130
 joyent#1  0x00005555558c1d69 in sc_conn_send (sc=0x5555561f92d0) at src/stconn.c:1637
 joyent#2  0x00005555558c2683 in sc_conn_io_cb (t=0x5555561f7f10, ctx=0x5555561f92d0, state=32832) at src/stconn.c:1824
 haproxy#3  0x000055555590c48f in run_tasks_from_lists (budgets=0x7fffffffdaa0) at src/task.c:596
 haproxy#4  0x000055555590cf88 in process_runnable_tasks () at src/task.c:876
 haproxy#5  0x00005555558aae3b in run_poll_loop () at src/haproxy.c:3049
 haproxy#6  0x00005555558ab57e in run_thread_poll_loop (data=0x555555d9fa00 <ha_thread_info>) at src/haproxy.c:3251
 haproxy#7  0x00005555558ad053 in main (argc=6, argv=0x7fffffffddd8) at src/haproxy.c:3948

In case CHECK_IF are not activated, it may cause crash or incorrect
transfers.

This was introduced by the following commit
  commit 2144d24
  BUG/MINOR: h3: close connection on sending alloc errors

This must be backported wherever the above patch is.
fratj pushed a commit to fratj/haproxy that referenced this pull request Feb 2, 2024
h3_resp_data_send() is used to transcode HTX data into H3 data frames.
If QCS Tx buffer is not aligned when first invoked, two separate frames
may be built, first until buffer end, then with remaining space in
front.

If buffer space is not enough for at least the H3 frame header, -1 is
returned with the flag QC_SF_BLK_MROOM set to await for more room. An
issue arises if this occurs for the second frame : -1 is returned even
though HTX data were properly transcoded and removed on the first step.
This causes snd_buf callback to return an incorrect value to the stream
layer, which in the end will corrupt the channel output buffer.

To fix this, stop considering that not enough remaining space is an
error case. Instead, return 0 if this is encountered for the first frame
or the HTX removed block size for the second one. As QC_SF_BLK_MROOM is
set, this will correctly interrupt H3 encoding. Label err is thus only
properly limited to fatal error which should cause a connection closure.
A new BUG_ON() has been added which should prevent similar issues in the
future.

This issue was detected using the following client :
 $ ngtcp2-client --no-quic-dump --no-http-dump --exit-on-all-streams-close \
   127.0.0.1 20443 -n2 "http://127.0.0.1:20443/?s=50k"

This triggers the following CHECK_IF statement. Note that it may be
necessary to disable fast forwarding to enforce snd_buf usage.

Thread 1 "haproxy" received signal SIGILL, Illegal instruction.
0x00005555558bc48a in co_data (c=0x5555561ed428) at include/haproxy/channel.h:130
130             CHECK_IF_HOT(c->output > c_data(c));
[ ## gdb ## ] bt
 #0  0x00005555558bc48a in co_data (c=0x5555561ed428) at include/haproxy/channel.h:130
 joyent#1  0x00005555558c1d69 in sc_conn_send (sc=0x5555561f92d0) at src/stconn.c:1637
 joyent#2  0x00005555558c2683 in sc_conn_io_cb (t=0x5555561f7f10, ctx=0x5555561f92d0, state=32832) at src/stconn.c:1824
 haproxy#3  0x000055555590c48f in run_tasks_from_lists (budgets=0x7fffffffdaa0) at src/task.c:596
 haproxy#4  0x000055555590cf88 in process_runnable_tasks () at src/task.c:876
 haproxy#5  0x00005555558aae3b in run_poll_loop () at src/haproxy.c:3049
 haproxy#6  0x00005555558ab57e in run_thread_poll_loop (data=0x555555d9fa00 <ha_thread_info>) at src/haproxy.c:3251
 haproxy#7  0x00005555558ad053 in main (argc=6, argv=0x7fffffffddd8) at src/haproxy.c:3948

In case CHECK_IF are not activated, it may cause crash or incorrect
transfers.

This was introduced by the following commit
  commit 2144d24
  BUG/MINOR: h3: close connection on sending alloc errors

This must be backported wherever the above patch is.

(cherry picked from commit 14673fe)
[wt: adjusted context in h3_resp_data_send]
Signed-off-by: Willy Tarreau <[email protected]>
fratj pushed a commit to fratj/haproxy that referenced this pull request Mar 25, 2024
Before a dynamic server can be deleted, a set of preconditions must be
validated to ensure it is not referenced naymore by a stream or a
connection. This is implemented in srv_check_for_deletion().

The various criteria specified were incomplete. This allows a server
instance to be deleted while still be referenced by a stream and a
connection.

This bug was reproduced by using ASAN compilation. A script was used to
add and delete a server every second, while using h2load to generate
traffic with download of 1k objects. Here is the ASAN error.

==140916==ERROR: AddressSanitizer: heap-use-after-free on address 0x520000020080 at pc 0x63cb25679537 bp 0x701529ff5070 sp 0x701529ff5060
READ of size 1 at 0x520000020080 thread T7
    #0 0x63cb25679536 in objt_server include/haproxy/obj_type.h:99
    joyent#1 0x63cb2568f465 in process_stream src/stream.c:1823
    joyent#2 0x63cb25a4a4a2 in run_tasks_from_lists src/task.c:632
    haproxy#3 0x63cb25a4bf62 in process_runnable_tasks src/task.c:876
    haproxy#4 0x63cb2596a220 in run_poll_loop src/haproxy.c:3050
    haproxy#5 0x63cb2596b192 in run_thread_poll_loop src/haproxy.c:3252
    haproxy#6 0x701539aa9559  (/usr/lib/libc.so.6+0x8b559) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
    haproxy#7 0x701539b26a3b  (/usr/lib/libc.so.6+0x108a3b) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)

To fix this, add <curr_used_conns> to the counters checked in
srv_check_for_deletion().

Outside of this bug, one case which remains sensible is for SF_DIRECT
streams which referenced a server instance early in process_stream()
before connect_server(). This occurs with use-server directive,
force-persist rule or cookie persistence. However, after code
reexamination, the code is considered reliable as process_stream() is
not rescheduled before connect_server() invocation. These observations
have been saved in sess_change_server() documentation to ensure it
remains valid in the future.

This must be backported up to 2.6.
fratj pushed a commit to fratj/haproxy that referenced this pull request Mar 25, 2024
Backend connections can be marked as private to prevent their sharing by
multiple clients. Now, this has become an exception as only two reasons
for data traffic can trigger this (checks are ignored here) :
* http-reuse never
* HTTP response with NTLM header

The first case is easy to manage as the connection is flagged as private
since its inception. However, the second case is dynamic as the
connection can be flagged anytime during its lifetime. When using a
backend protocol such as HTTP/2 with reuse mode aggressive or always, we
face a design issue as the connection would be marked as private,
despite potentially being shared by several clients at the same time.

This is conceptually invalid, but worst it can trigger crashes on MUX
stream detach callback depending on the order of release of the streams,
by calling session_check_idle_conn() with a NULL session. It could also
be possible to have several NTLM responses on a single connection for
different sessions. In this case, connection owner is still being
updated without attaching the connection to its correct session, which
ultimately would cause a crash on session_check_idle_conn with an
invalid session.

Here are two backtrace examples from GDB for such cases :

Thread 1 (Thread 0x7ff73e9fc700 (LWP 648859)):
 #0  session_check_idle_conn (conn=0x7ff72f597800, sess=0x0) at include/haproxy/session.h:209
 joyent#1  h2_detach (sd=<optimized out>) at src/mux_h2.c:4520
 joyent#2  0x000056151742be24 in sc_detach_endp (scp=scp@entry=0x7ff73e9f0f18) at src/stconn.c:376
 haproxy#3  0x000056151742c208 in sc_destroy (sc=<optimized out>) at src/stconn.c:444
 haproxy#4  0x0000561517370871 in stream_free (s=s@entry=0x7ff72a2dbd80) at src/stream.c:728
 haproxy#5  0x000056151737541f in process_stream (t=t@entry=0x7ff72d5e2620, context=0x7ff72a2dbd80, state=<optimized out>) at src/stream.c:2645
 haproxy#6  0x0000561517456cbb in run_tasks_from_lists (budgets=budgets@entry=0x7ff73e9f10d0) at src/task.c:632
 haproxy#7  0x00005615174576b9 in process_runnable_tasks () at src/task.c:876
 haproxy#8  0x000056151742275a in run_poll_loop () at src/haproxy.c:2996
 haproxy#9  0x0000561517422db1 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3195
 haproxy#10 0x00007ff789e081ca in start_thread () from /lib64/libpthread.so.0
 haproxy#11 0x00007ff789a39e73 in clone () from /lib64/libc.so.6

(gdb)
Thread 1 (Thread 0x7ff52e7fc700 (LWP 681458)):
 #0  0x0000556ebd6e7e69 in session_check_idle_conn (conn=0x7ff5787ff100, sess=0x7ff51d2539a0) at include/haproxy/session.h:209
 joyent#1  h2_detach (sd=<optimized out>) at src/mux_h2.c:4520
 joyent#2  0x0000556ebd7f3e24 in sc_detach_endp (scp=scp@entry=0x7ff52e7f0f18) at src/stconn.c:376
 haproxy#3  0x0000556ebd7f4208 in sc_destroy (sc=<optimized out>) at src/stconn.c:444
 haproxy#4  0x0000556ebd738871 in stream_free (s=s@entry=0x7ff520e28200) at src/stream.c:728
 haproxy#5  0x0000556ebd73d41f in process_stream (t=t@entry=0x7ff565783700, context=0x7ff520e28200, state=<optimized out>) at src/stream.c:2645
 haproxy#6  0x0000556ebd81ecbb in run_tasks_from_lists (budgets=budgets@entry=0x7ff52e7f10d0) at src/task.c:632
 haproxy#7  0x0000556ebd81f6b9 in process_runnable_tasks () at src/task.c:876
 haproxy#8  0x0000556ebd7ea75a in run_poll_loop () at src/haproxy.c:2996
 haproxy#9  0x0000556ebd7eadb1 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3195
 haproxy#10 0x00007ff5752081ca in start_thread () from /lib64/libpthread.so.0
 haproxy#11 0x00007ff574e39e73 in clone () from /lib64/libc.so.6
(gdb)

To solve this issue, simply ignore NTLM responses when using a
multiplexer with streams support and the connection is not already
attached to the session. The connection is not marked as private and
will continue to be shared freely accross clients. This is considered
conceptually valid as NTLM usage (rfc 4559) with HTTP is broken and was
designed only with HTTP/1.1 in mind. A side-effect of the change is that
SESS_FL_PREFER_LAST is also not set anymore on NTLM detection, which
allows following requests to be load-balanced accross several server
instances.

The original behavior is kept for HTTP/1 or if the connection is already
attached to the session. This last case happens when using HTTP/2 with
default http-reuse safe mode since the following patch :

  0d21dea
  MEDIUM: backend: add reused conn to sess if mux marked as HOL blocking

This should be backported up to all stable releases. Up until 2.4, it
can be taken as-is. For lesser versions, above patch is not present. In
this case the condition should be restricted only to HTTP/1 usage :

  if (srv_conn && strcmp(srv_conn->mux->name, "H1") == 0) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants