求助,亚马逊服务器自动重启问题

2026-04-11 15:121阅读0评论SEO基础
  • 内容介绍
  • 文章标签
  • 相关推荐
问题描述:

在今天下午两点左右的时候,网站突然访问不了,通过ssh连接服务器也显示连接超时,然后登录亚马逊,发现当时正显示数据不足,并且状态检查也显示有告警,然后在通知里面看到了一个Health Event:

One of your Amazon EC2 instances associated with your AWS account in the eu-west-1 Region was successfully recovered after a failed System status check. The Instance ID is listed in the 'Affected resources' tab. * What do I need to do? Your instance is running and reporting healthy. If you have startup procedures that aren't automated during your instance boot process, please remember that you need to log in and run them. * Why did Amazon EC2 auto recover my instance? Your instance was configured to automatically recover after a failed System status check. Your instance may have failed a System status check due to an underlying hardware failure or due to loss of network connectivity or power. Please refer to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-recover.html#auto-recovery-configuration for more information. * How is the recovered instance different from the original instance? The recovered instance is identical to the original instance, including the instance ID, private IP addresses, public IP address, Elastic IP addresses, attached EBS volumes and all instance metadata. The instance is rebooted as part of the automatic recovery process and the contents of the memory (RAM) are not retained.

大概过了十分钟,服务器自动被亚马逊恢复了,我通过亚马逊的CloudWatch也没看到什么内存或者cpu突增,只能看到在那一段时间有告警产生和一部分监控数据没了
image1886×689 83.7 KB
image1863×246 16.7 KB
cloudWatch里面只能看到StatusCheckFailed_Instance和StatusCheckFailed_System里面有告警,我通过journalctl --list-boots只看到两次启动记录,一次是之前的时候,另外是今天出问题的时候:

-1 62ee9467615d4b228ef7d03086020a30 Fri 2025-10-31 06:16:25 UTC—Fri 2026-03-27 06:05:01 UTC 0 0d0d2e079e104421bb88ddfbf160da05 Fri 2026-03-27 06:18:39 UTC—Fri 2026-03-27 12:17:11 UTC

sudo journalctl --since "2026-03-27 06:00:00" --until "2026-03-27 06:30:00"命令只能看到中间的日志少了一段时间:

Mar 27 06:18:39 ip-10-0-3-133 kernel: Linux version 6.8.0-1050-aws (buildd@lcy02-amd64-098) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04.3) 12.3.0> Mar 27 06:18:39 ip-10-0-3-133 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-1050-aws root=PARTUUID=ba765845-c4a3-4c8a-b033-8b4feb8c1849 ro console=tt> Mar 27 06:18:39 ip-10-0-3-133 kernel: KERNEL supported cpus: Mar 27 06:18:39 ip-10-0-3-133 kernel: Intel GenuineIntel Mar 27 06:18:39 ip-10-0-3-133 kernel: AMD AuthenticAMD Mar 27 06:18:39 ip-10-0-3-133 kernel: Hygon HygonGenuine Mar 27 06:18:39 ip-10-0-3-133 kernel: Centaur CentaurHauls Mar 27 06:18:39 ip-10-0-3-133 kernel: zhaoxin Shanghai Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-provided physical RAM map: Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bbccdfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbcce000-0x00000000bbf4dfff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbf4e000-0x00000000bbf5dfff] ACPI data Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbf5e000-0x00000000bbfddfff] ACPI NVS Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbfde000-0x00000000bff7bfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bff7c000-0x00000000bfffffff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000100000000-0x00000004289fffff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000428a00000-0x000000043fffffff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: NX (Execute Disable) protection: active Mar 27 06:18:39 ip-10-0-3-133 kernel: APIC: Static calls initialized Mar 27 06:18:39 ip-10-0-3-133 kernel: e820: update [mem 0xb9ec0018-0xb9ec8e57] usable ==> usable Mar 27 06:18:39 ip-10-0-3-133 kernel: e820: update [mem 0xb9ec0018-0xb9ec8e57] usable ==> usable Mar 27 06:18:39 ip-10-0-3-133 kernel: extended physical RAM map: Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x0000000000000000-0x000000000009ffff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x0000000000100000-0x00000000b9ec0017] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000b9ec0018-0x00000000b9ec8e57] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000b9ec8e58-0x00000000bbccdfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbcce000-0x00000000bbf4dfff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbf4e000-0x00000000bbf5dfff] ACPI data Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbf5e000-0x00000000bbfddfff] ACPI NVS Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbfde000-0x00000000bff7bfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bff7c000-0x00000000bfffffff] reserved

这是不是亚马逊自己的问题啊?还是说服务器上部署的服务导致的这个问题?有佬友遇到类似的问题吗?

网友解答:
--【壹】--:

有佬友遇到过这种问题吗?这个服务器上面部署的服务从买来到现在已经几个月了,所以我感觉不是服务器上面部署的服务导致的问题

问题描述:

在今天下午两点左右的时候,网站突然访问不了,通过ssh连接服务器也显示连接超时,然后登录亚马逊,发现当时正显示数据不足,并且状态检查也显示有告警,然后在通知里面看到了一个Health Event:

One of your Amazon EC2 instances associated with your AWS account in the eu-west-1 Region was successfully recovered after a failed System status check. The Instance ID is listed in the 'Affected resources' tab. * What do I need to do? Your instance is running and reporting healthy. If you have startup procedures that aren't automated during your instance boot process, please remember that you need to log in and run them. * Why did Amazon EC2 auto recover my instance? Your instance was configured to automatically recover after a failed System status check. Your instance may have failed a System status check due to an underlying hardware failure or due to loss of network connectivity or power. Please refer to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-recover.html#auto-recovery-configuration for more information. * How is the recovered instance different from the original instance? The recovered instance is identical to the original instance, including the instance ID, private IP addresses, public IP address, Elastic IP addresses, attached EBS volumes and all instance metadata. The instance is rebooted as part of the automatic recovery process and the contents of the memory (RAM) are not retained.

大概过了十分钟,服务器自动被亚马逊恢复了,我通过亚马逊的CloudWatch也没看到什么内存或者cpu突增,只能看到在那一段时间有告警产生和一部分监控数据没了
image1886×689 83.7 KB
image1863×246 16.7 KB
cloudWatch里面只能看到StatusCheckFailed_Instance和StatusCheckFailed_System里面有告警,我通过journalctl --list-boots只看到两次启动记录,一次是之前的时候,另外是今天出问题的时候:

-1 62ee9467615d4b228ef7d03086020a30 Fri 2025-10-31 06:16:25 UTC—Fri 2026-03-27 06:05:01 UTC 0 0d0d2e079e104421bb88ddfbf160da05 Fri 2026-03-27 06:18:39 UTC—Fri 2026-03-27 12:17:11 UTC

sudo journalctl --since "2026-03-27 06:00:00" --until "2026-03-27 06:30:00"命令只能看到中间的日志少了一段时间:

Mar 27 06:18:39 ip-10-0-3-133 kernel: Linux version 6.8.0-1050-aws (buildd@lcy02-amd64-098) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04.3) 12.3.0> Mar 27 06:18:39 ip-10-0-3-133 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-1050-aws root=PARTUUID=ba765845-c4a3-4c8a-b033-8b4feb8c1849 ro console=tt> Mar 27 06:18:39 ip-10-0-3-133 kernel: KERNEL supported cpus: Mar 27 06:18:39 ip-10-0-3-133 kernel: Intel GenuineIntel Mar 27 06:18:39 ip-10-0-3-133 kernel: AMD AuthenticAMD Mar 27 06:18:39 ip-10-0-3-133 kernel: Hygon HygonGenuine Mar 27 06:18:39 ip-10-0-3-133 kernel: Centaur CentaurHauls Mar 27 06:18:39 ip-10-0-3-133 kernel: zhaoxin Shanghai Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-provided physical RAM map: Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bbccdfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbcce000-0x00000000bbf4dfff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbf4e000-0x00000000bbf5dfff] ACPI data Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbf5e000-0x00000000bbfddfff] ACPI NVS Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bbfde000-0x00000000bff7bfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x00000000bff7c000-0x00000000bfffffff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000100000000-0x00000004289fffff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: BIOS-e820: [mem 0x0000000428a00000-0x000000043fffffff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: NX (Execute Disable) protection: active Mar 27 06:18:39 ip-10-0-3-133 kernel: APIC: Static calls initialized Mar 27 06:18:39 ip-10-0-3-133 kernel: e820: update [mem 0xb9ec0018-0xb9ec8e57] usable ==> usable Mar 27 06:18:39 ip-10-0-3-133 kernel: e820: update [mem 0xb9ec0018-0xb9ec8e57] usable ==> usable Mar 27 06:18:39 ip-10-0-3-133 kernel: extended physical RAM map: Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x0000000000000000-0x000000000009ffff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x0000000000100000-0x00000000b9ec0017] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000b9ec0018-0x00000000b9ec8e57] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000b9ec8e58-0x00000000bbccdfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbcce000-0x00000000bbf4dfff] reserved Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbf4e000-0x00000000bbf5dfff] ACPI data Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbf5e000-0x00000000bbfddfff] ACPI NVS Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bbfde000-0x00000000bff7bfff] usable Mar 27 06:18:39 ip-10-0-3-133 kernel: reserve setup_data: [mem 0x00000000bff7c000-0x00000000bfffffff] reserved

这是不是亚马逊自己的问题啊?还是说服务器上部署的服务导致的这个问题?有佬友遇到类似的问题吗?

网友解答:
--【壹】--:

有佬友遇到过这种问题吗?这个服务器上面部署的服务从买来到现在已经几个月了,所以我感觉不是服务器上面部署的服务导致的问题