Re: [NLNOG] Curious problem with connections from Ziggo customers to Linode nodes in some data centers

24 Aug 2023


      Munging the http urls since the listserver claims they are on a spam list .

See if this gets through.
Also switched to plain text since html didn't edit the URls properly.
...
Op 24-08-2023 12:50 CEST schreef Boudewijn Visser (nlnog) <bvisser-nlnog@xs4all.nl>:
Hi Stefan,
While I'm quite old skool, I just never really got into irc, so I missed the conversation.
I've had a look at your packet capture .
It doesn't seem to be an MTU issue .
Filtering for the traffic captured on the server side :
(ip.src_host == 192.46.232.6 && ip.dst_host == 84.28.119.251 ) ||( ip.dst_host == 192.46.232.6 && ip.src_host == 84.28.119.251 )
So it seems your Ziggo public IP is 84.28.119.251 .
And filtering for the capture from the inside client side
(ip.src_host == 192.46.232.6 && ip.dst_host == 192.168.0.107 ) ||( ip.dst_host == 192.46.232.6 && ip.src_host == 192.168.0.107 )
I see an OK session using source port 50006 , and then a session that seems to have severe packet loss issues with source port 50007 .
See al the TCP retransmissions for the source-port 50007 session - and rarely that a packet gets through.
If you still can use this client (same public IP) try
curl --local-port 50006 http://192.46.232.six
curl --local-port 50007 http://192.46.232.six
that should replicate the problem exactly, first one always OK, second one always major problems.
Note : some socket timeouts when trying multiple times shorty after each other.(bind failure socket already in use )
And - the specific local port that fails or works very likely also depends on the client source IP.
Sabri's suggestion on for tcp-traceroute is also valuable .
(normally , traceroute is done using UDP (classic Unix, Cisco) or ICMP - but it can be done with TCP too. )
With some luck , tcp-traceroute may give a hint for a node or path where the failure starts.
I've done a quick test (I happen to be behind Ziggo at the moment) but a tcp traceroute isn't too conclusive .
Generally load balancing within a network is deterministic - based on ip/port combination for example.
IMO, the whole problem still looks like a network link that has severe issues (probably corrups large amount of packets which are then dropped at the neighbor node) , and traffic is load balanced over this link .
So some session flows are impacted and others are not .
Since it seems limited to Ziggo clients it would likely be somewhere in the Ziggo network .
Something at an exchange point is a remoter possibility - depending on what (other) destinations are impacted it might just not have been noticed either .
(some caveats : NAT in the Ziggo modem may change source port , esp with repeated tests )
I think that to get anything more it will need a quite senior Ziggo network engineer to investigate further.
Best regards, Boudewijn
...
Op 24-08-2023 08:01 CEST schreef Stefan van den Oord <stefan+nlnog@medicinemen.eu>:
Thanks Boudewijn!
There was a lively conversation about this on #nlnog yesterday, so I forgot to respond to you. I tried changing the MTU to 1420, that didn’t make a difference. I did a packet capture as well. This was between server 192.46.232.6 and client 192.168.0.107. Command used on the server was:
tcpdump -Aennvvi eth0 -w server.pcap port not 22
And on the client (because I was connected through VNC):
sudo tcpdump -Aennvvi en1 -w client.pcap port not 22 and port not 5900
During this capture I did two requests (using curl) to http://192.46.232.six, the first one succeeded and the second one I aborted after half a minute. The result is here: http://192.46.232.six/client+server.pcap
I lack the experience to properly analyse this. Does this contain any clues to you?
-- 
Stefan van den Oord 
CTO @ Medicine Men B.V.
Not in the office on Wednesdays
Regulierenring 22 
3981 LB Bunnik 
The Netherlands