## The problem
I had to migrate an old logging system to a new logging system, where one logging source was emitting syslog over UDP. During the migration we needed to keep the old system in place, and in parallel set up new system for validation and UAT before finally cutting over. The simplest solution I could think up was to duplicate the UDP syslog packets, and send them to both logging systems simultaneously. In other words, “UDP tee”.
It was a simple concept but the implementation was trickier. This being 2020 the whole system (both logs source and sinks) was running on Kubernetes. We did not have access to dedicated network hardware, or a service mesh, to do any kind of port or packet mirroring.
Ideally we would be able to run a regular Kubernetes service which would take a single source of UDP packets and duplicate it to multiple sinks. So we needed to run this in a container, which means userspace only - no kernel-level netfilter tricks.
## A poor solution
Google doesn’t have great results for “udp tee”, and the top few userspace solutions seem to use combinations of tee
and socat
similar to this:
socat - udp-recvfrom:$port,fork | tee >(socat - udp-sendto:$dest1) >(socat - udp-sendto:$dest2)
I created a logs-tee
container with this solution for testing, however it had significant problems.
### Stream conversion
By piping the UDP data from one socat
process to another, the incoming UDP datagrams are converted to a stream (the pipe) before being converted again back into datagrams to be emitted to the two destinations.
The problem with doing this is that datagrams have boundaries which are lost on the conversion to a stream.
socat
will buffer up to 8192 bytes by default, so that will be the maximum size of the UDP datagrams emitted as per the man page.
… At most <size> bytes are transferred per step. Default is 8192 bytes.
This is a problem for RFC 5426 § 3.1, which states:
Each syslog UDP datagram MUST contain only one syslog message, which MAY be complete or truncated. … Additional data MUST NOT be present in the datagram payload.
In this pcap of packets egressing logs-tee
, the first packet to each destination is the goben
control packet, and the following packets are throughput testing data.
By default goben
uses 64000-byte datagrams, so you can see how badly chopped up (and irregular) these are.
No. Time Source Destination Protocol Length Info
1 0.000000 172.23.0.2 172.23.0.3 UDP 287 56565 → 8080 Len=245
2 0.000003 172.23.0.2 172.23.0.4 UDP 287 56565 → 8080 Len=245
3 0.000224 172.23.0.2 172.23.0.3 UDP 4138 56565 → 8080 Len=4096
4 0.000252 172.23.0.2 172.23.0.4 UDP 8234 56565 → 8080 Len=8192
5 0.000271 172.23.0.2 172.23.0.3 UDP 4138 56565 → 8080 Len=4096
6 0.005674 172.23.0.2 172.23.0.4 UDP 7210 56565 → 8080 Len=7168
7 0.005676 172.23.0.2 172.23.0.3 UDP 7210 56565 → 8080 Len=7168
8 0.005697 172.23.0.2 172.23.0.4 UDP 1066 56565 → 8080 Len=1024
9 0.005704 172.23.0.2 172.23.0.3 UDP 1066 56565 → 8080 Len=1024
10 0.011586 172.23.0.2 172.23.0.4 UDP 8234 56565 → 8080 Len=8192
11 0.011602 172.23.0.2 172.23.0.3 UDP 8234 56565 → 8080 Len=8192
12 0.016175 172.23.0.2 172.23.0.3 UDP 5162 56565 → 8080 Len=5120
13 0.016199 172.23.0.2 172.23.0.4 UDP 8234 56565 → 8080 Len=8192
14 0.016214 172.23.0.2 172.23.0.3 UDP 3114 56565 → 8080 Len=3072
15 0.021850 172.23.0.2 172.23.0.3 UDP 8234 56565 → 8080 Len=8192
16 0.021856 172.23.0.2 172.23.0.4 UDP 8234 56565 → 8080 Len=8192
17 0.026482 172.23.0.2 172.23.0.3 UDP 8234 56565 → 8080 Len=8192
18 0.026502 172.23.0.2 172.23.0.4 UDP 8234 56565 → 8080 Len=8192
19 0.032363 172.23.0.2 172.23.0.4 UDP 8234 56565 → 8080 Len=8192
20 0.032377 172.23.0.2 172.23.0.3 UDP 8234 56565 → 8080 Len=8192
Indeed the logs sink did not correctly parse logs using this solution.
### Buffers and packet loss
By piping to tee
there were now multiple buffers that could fill:
- the pipe buffer from
socat
totee
- the tee read buffer
- the two destination socat read buffers
If any of these buffers filled, it would stall the pipeline and cause incoming packets to be dropped. And indeed we saw significant packet loss using this solution.
### Performance
Trying to benchmark this system using goben
didn’t even work - the server just hung without printing any statistics.
## A better solution
It’s clear we need to keep datagram boundaries.
One way to do that would be to avoid transforming the flow of logs from datagrams to a stream.
If we write to the localhost
broadcast address we can then have multiple listeners, and keep the datagram boundaries!
I created a container with two socat
processes:
Process 1 reads datagrams off a regular UDP port, and sends them to 127.255.255.255
(localhost
broadcast).
# 1111 listen port, and 9999 broadcast port are both arbitrary
socat -u "udp-recvfrom:1111,fork" udp-sendto:127.255.255.255:9999,broadcast
Process 2 reads datagrams from the broadcast, and sends them to a UDP unicast address.
# 192.168.77.66 is an arbitrary address
socat -u udp-recvfrom:9999,reuseaddr,fork udp-sendto:192.168.77.66
Any number of processes can receive the broadcast and forward packets to a different address. So process 2 can be duplicated any number of times for an arbitrary number of endpoints.
### Datagram retention
The only things which could overflow here are UDP socket buffers. And the datagram boundaries are retained as expected:
No. Time Source Destination Protocol Length Info
1 0.000000 172.23.0.2 172.23.0.3 UDP 287 56565 → 8080 Len=245
2 0.000195 172.23.0.2 172.23.0.4 UDP 287 56565 → 8080 Len=245
3 0.000860 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
4 0.000913 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
5 0.005504 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
6 0.005515 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
7 0.014920 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
8 0.015292 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
9 0.021584 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
10 0.021698 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
11 0.025073 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
12 0.025485 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
13 0.031647 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
14 0.031682 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
15 0.034086 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
16 0.034146 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
17 0.038558 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
18 0.038708 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
19 0.043166 172.23.0.2 172.23.0.3 UDP 64042 56565 → 8080 Len=64000
20 0.043350 172.23.0.2 172.23.0.4 UDP 64042 56565 → 8080 Len=64000
### Performance
I wrapped this up in a container and ran goben
over this system.
With one client and two servers I was able to get 100 Mbit/s to both servers without packet loss.
100 Mbit/s is plenty of performance for my use-case, plus this service is perfectly horizontally scalable.
I can just replicate the pod a few times if packet loss becomes a problem!
goben
client:
$ docker run -it --rm --name=goben-client --network iperf goben -tls=false -passiveServer -udp -hosts logs-tee -maxSpeed 100
2022/09/23 15:11:32 goben version 0.6 runtime go1.18beta1 GOMAXPROCS=8 OS=linux arch=amd64
2022/09/23 15:11:32 connections=1 defaultPort=:8080 listeners=[":8080"] hosts=["logs-tee"]
2022/09/23 15:11:32 reportInterval=2s totalDuration=10s
2022/09/23 15:11:32 client mode, udp protocol
2022/09/23 15:11:32 open: opening TLS=false udp 0/1: logs-tee:8080
2022/09/23 15:11:32 handleConnectionClient: starting TCP 0/1 172.23.0.4:8080
2022/09/23 15:11:32 handleConnectionClient: options sent: {2s 10s 1000000 1000000 64000 64000 true 100 map[]}
2022/09/23 15:11:32 clientWriter: starting: 0/1 172.23.0.4:8080
2022/09/23 15:11:32 clientReader: starting: 0/1 172.23.0.4:8080
2022/09/23 15:11:34 0/1 report clientWriter rate: 100 Mbps 195 snd/s
2022/09/23 15:11:36 0/1 report clientWriter rate: 100 Mbps 195 snd/s
2022/09/23 15:11:38 0/1 report clientWriter rate: 100 Mbps 195 snd/s
2022/09/23 15:11:40 0/1 report clientWriter rate: 100 Mbps 195 snd/s
2022/09/23 15:11:42 handleConnectionClient: 10s timer
2022/09/23 15:11:42 workLoop: 0/1 clientReader: read udp 172.23.0.5:50372->172.23.0.4:8080: use of closed network connection
2022/09/23 15:11:42 0/1 average clientReader rate: 0 Mbps 0 rcv/s
2022/09/23 15:11:42 clientReader: exiting: 0/1 172.23.0.4:8080
2022/09/23 15:11:42 workLoop: 0/1 clientWriter: write udp 172.23.0.5:50372->172.23.0.4:8080: use of closed network connection
2022/09/23 15:11:42 0/1 average clientWriter rate: 100 Mbps 195 snd/s
2022/09/23 15:11:42 clientWriter: exiting: 0/1 172.23.0.4:8080
2022/09/23 15:11:42 172.23.0.4:8080 output:
100 ┤ ╭───
100 ┤ ╭───╯
100 ┤ ╭────╯
100 ┤ ╭────╯
100 ┤ ╭────╯
100 ┤ ╭───╯
100 ┤ ╭────╯
100 ┤ ╭───╯
100 ┤ ╭────╯
100 ┤ ╭────╯
100 ┼───────────────────────╯
Output Mbps: 172.23.0.4:8080 Connection 0
2022/09/23 15:11:42 handleConnectionClient: closing: 0/1 172.23.0.4:8080
2022/09/23 15:11:42 aggregate reading: 0 Mbps 0 recv/s
2022/09/23 15:11:42 aggregate writing: 100 Mbps 195 send/s
goben
server:
$ docker run -it --rm --name=goben-server-0 --network iperf goben -passiveServer -tls=false
2022/09/23 15:14:33 goben version 0.6 runtime go1.18beta1 GOMAXPROCS=8 OS=linux arch=amd64
2022/09/23 15:14:33 connections=1 defaultPort=:8080 listeners=[":8080"] hosts=[]
2022/09/23 15:14:33 reportInterval=2s totalDuration=10s
2022/09/23 15:14:33 server mode (use -hosts to switch to client mode)
2022/09/23 15:14:33 listenTCP: TLS=false spawning TCP listener: :8080
2022/09/23 15:14:33 serve: spawning UDP listener: :8080
2022/09/23 15:14:38 handleUDP: incoming: 172.23.0.4:56565
2022/09/23 15:14:38 handleUDP: options received: {2s 10s 1000000 1000000 64000 64000 true 100 map[]}
2022/09/23 15:14:40 0/0 report handleUDP rate: 100 Mbps 195 rcv/s
2022/09/23 15:14:42 0/0 report handleUDP rate: 100 Mbps 195 rcv/s
2022/09/23 15:14:44 0/0 report handleUDP rate: 100 Mbps 195 rcv/s
2022/09/23 15:14:46 0/0 report handleUDP rate: 100 Mbps 195 rcv/s
2022/09/23 15:14:48 handleUDP: total duration 10s timer: 172.23.0.4:56565
2022/09/23 15:14:48 0/0 average handleUDP rate: 99 Mbps 195 rcv/s
Here’s a trivial Dockerfile
for goben
, and a Dockerfile
and entrypoint.sh
for a logs-tee
service.
Note that goben
relies on the 5-tuple for identifying a client.
For that reason, when benchmarking, the entrypoint.sh
binds to a socket using sourceport=56565,reuseaddr
for logs-tee
egress so that goben
recognises logs-tee
as a single client.
For the actual logging use-case those options are not required and we can rely on ephemeral ports.