记Redis服务容器内网络排查

一个业务同事反馈:部署了产品包,但是其中Redis组件(容器)无法连接。

1. 服务状态(正常)

1
2
3
4
sder@host-88:~$ docker-compose ps
Name Command State Ports
----------------------------------------------------------------------------------------------------------------------------------------------------
redis-2.5.5 docker-entrypoint.sh /etc/ ... Up 0.0.0.0:6379->6379/tcp,:::6379->6379/tcp

2. 查看服务启动日志(正常)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
sder@host-88:~$ docker logs -f -n 100 redis-2.5.5
1:C 17 Feb 2025 07:54:54.350 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 17 Feb 2025 07:54:54.350 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 17 Feb 2025 07:54:54.350 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.9 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

1:M 17 Feb 2025 07:54:54.352 # Server initialized
1:M 17 Feb 2025 07:54:54.352 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 17 Feb 2025 07:54:54.352 * Loading RDB produced by version 6.0.9
1:M 17 Feb 2025 07:54:54.352 * RDB age 43 seconds
1:M 17 Feb 2025 07:54:54.352 * RDB memory usage when created 9.78 Mb
1:M 17 Feb 2025 07:54:54.410 * DB loaded from disk: 0.058 seconds
1:M 17 Feb 2025 07:54:54.410 * Ready to accept connections

3. 服务端口状态(正常)

1
2
3
4
5
sder@host-88:~$ sudo netstat -ltunp | grep 6379
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 155403/docker-proxy
tcp 0 0 0.0.0.0:16379 0.0.0.0:* LISTEN 4966/docker-proxy
tcp6 0 0 :::6379 :::* LISTEN 155419/docker-proxy
tcp6 0 0 :::16379 :::* LISTEN 4981/docker-proxy

4. 进入容器内部访问(正常)

1
2
3
4
sder@host-88$ docker exec -it redis-2.5.5  bash
root@redis:/data# redis-cli
127.0.0.1:6379> auth AsdA34<!
OK

5. 宿主机连接redis服务(异常)

1
2
3
sder@host-88:~$ redis-cli -h 127.0.0.1
127.0.0.1:6379> auth AsdA34<!
Error: Connection reset by peer

此时,已基本可以确认是网络问题,而非容器服务问题。
继续排查….

6. 在宿主机上抓包

奇怪的是,重复步骤5,发现在宿主机(10.18.1.88)抓包窗口无输出……

7. 尝试在本地机器连接redis服务(异常)

1
fq@local:~$ redis-cli -h 10.18.1.88 -p 6379

结果是本地连接hang住了,而抓包窗口有如下输出

1
2
3
4
5
6
7
8
9
10
sder@host-88:~$ sudo tcpdump port 6379
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), snapshot length 262144 bytes

08:22:43.808158 IP 10.18.11.74.51356 > sde-vm.redis: Flags [S], seq 2987472577, win 64240, options [mss 1460,sackOK,TS val 4223095679 ecr 0,nop,wscale 7], length 0
08:22:44.887604 IP 10.18.11.74.51356 > sde-vm.redis: Flags [S], seq 2987472577, win 64240, options [mss 1460,sackOK,TS val 4223096722 ecr 0,nop,wscale 7], length 0
08:22:47.040596 IP 10.18.11.74.51356 > sde-vm.redis: Flags [S], seq 2987472577, win 64240, options [mss 1460,sackOK,TS val 4223098802 ecr 0,nop,wscale 7], length 0

08:22:51.265181 IP 10.18.11.74.51356 > sde-vm.redis: Flags [S], seq 2987472577, win 64240, options [mss 1460,sackOK,TS val 4223102882 ecr 0,nop,wscale 7], length 0
08:23:00.127597 IP 10.18.11.74.51356 > sde-vm.redis: Flags [S], seq 2987472577, win 64240, options [mss 1460,sackOK,TS val 4223111442 ecr 0,nop,wscale 7], length 0

通过抓包的结果,可以得出:宿主机准确的接受了数据包,但是没有响应数据包。

8. 分析问题

此时,回顾一下整个流程:

1
2
3
# 说明b网络出现了问题(网卡、虚拟设备、路由)

[ 外部 ] <-a-> [ 宿主机 ] <-b-> [ 容器内部 ]

查看宿主机路由表/网卡/ip配置:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
sder@host-88:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.18.1.254 0.0.0.0 UG 0 0 0 ens33
10.18.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-09d23abfe394
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens33
192.168.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens33

sder@host-88:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:9a:94:05 brd ff:ff:ff:ff:ff:ff
altname enp2s1
inet 10.18.1.88/24 brd 10.18.1.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.11.16/16 brd 192.168.255.255 scope global ens33
valid_lft forever preferred_lft forever
inet 172.19.0.96/16 brd 172.19.255.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.11.17/16 brd 192.168.255.255 scope global secondary ens33
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe9a:9405/64 scope link
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:63:61:8f:c1 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: br-09d23abfe394: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:6c:be:cf:30 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-09d23abfe394
valid_lft forever preferred_lft forever
inet6 fe80::42:6cff:febe:cf30/64 scope link
valid_lft forever preferred_lft forever
5: br-f77400f851bf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:70:1b:6c:eb brd ff:ff:ff:ff:ff:ff
inet 172.19.0.1/16 brd 172.19.255.255 scope global br-f77400f851bf
valid_lft forever preferred_lft forever
inet6 fe80::42:70ff:fe1b:6ceb/64 scope link
valid_lft forever preferred_lft forever

其实,如果仔细观察上述网络设备的ip,能第一时间发现问题(172.19网段冲突了),但是我当时并没注意到这一点

目前问题出在[容器->宿主机]的网络上,那么需要查看容器内部的网络环境:路由&IP

9. 查看容器内部网络环境

redis容器是没有route/ping之类的网络工具的,所以我们使用nsenter进入容器的网络命名空间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 首先,查看容器的container
sder@host-88:~$ docker ps | grep redis
927a7c86dcc0 registry.cn-hangzhou.aliyuncs.com/10_18_1_2_5000/redis:6.0.9 "docker-entrypoint.s…" 40 minutes ago Up 39 minutes 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp redis-2.5.5

# 然后,拿到容器对应的pid
sder@host-88:~$ docker inspect --format '{{.State.Pid}}' 927a7c86dcc0
155462

# 最后,进入容器网络命名空间
sder@host-88:~$ sudo nsenter --target 155462 --net
root@host-88:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.19.0.1 0.0.0.0 UG 0 0 0 eth0
172.19.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0

到这里已经发现了路由冲突,本该由docker生成的br-f77400f851bf虚拟网桥,在宿主机的路由被ens33网卡覆盖了

10. 解决问题

1
2
3
4
5
6
7
8
9
# 这是业务同学在开发网络功能时,使用了和docker网桥冲突的网段,导致路由被覆盖的问题。
# 故删除冲突网卡上的ip即可

sder@host-88:~$ ip addr | grep 172.19
inet 172.19.0.96/16 brd 172.19.255.255 scope global ens33
inet 172.19.0.1/16 brd 172.19.255.255 scope global br-f77400f851bf
sder@host-88:~$ sudo ip addr delete 172.19.0.96/16 dev ens33
sder@host-88:~$ ip addr | grep 172.19
inet 172.19.0.1/16 brd 172.19.255.255 scope global br-f77400f851bf

重启docker服务后,问题解决

  • Copyrights © 2019-2025 Klusfq
  • Visitors: | Views: