Featured image of post SSDの新規取り付けと古いSSDの対応

SSDの新規取り付けと古いSSDの対応

目次

背景

  • SSDで2つの課題がある
  • 1つ目は、メインPC(OMEN 45L)のSSDがUnhealthと警告がでるようになってしまった
  • 2つ目は、SSDの容量が足らない問題
  • そこで、古いSSDのCheckと、新規にSSDの取り付けとそのCheckを行った
  • 後々のために、その時の作業記録を残す

新規のSSD

SSDの詳細

買ったもの:

  • 東芝のM.2 2280 NVMe SSD 4TB
  • Thermalright製ヒートシンク

インストールが必要なもの:

1
2
sudo apt install nvme-cli
sudo apt install smartmontools

取り付け後の認識されるのかの確認

nvme1n1 3.6T TLD-M5B04T4 diskが新たに取り付けたSSD。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
lsblk -o NAME,SIZE,MODEL,TYPE,FSTYPE,MOUNTPOINTS

NAME                        SIZE MODEL                              TYPE FSTYPE      MOUNTPOINTS
...

da                        14.6T 7TH EXTERNAL                       disk
└─sda1                     14.6T                                    part ext4        /media/mike/ac44dbae-268e-46f0-aebb-7333d3c71908
nvme0n1                     1.9T WD WD_BLACK Gen4 SDCPNRZ-2T00-1106 disk
├─nvme0n1p1                   1G                                    part vfat        /boot/efi
├─nvme0n1p2                   2G                                    part ext4        /boot
└─nvme0n1p3                 1.9T                                    part LVM2_member
  └─ubuntu--vg-ubuntu--lv   1.9T                                    lvm  ext4        /
nvme1n1                     3.6T TLD-M5B04T4                        disk
  • デバイス名: /dev/nvme1n1
  • 製品名:TLD-M5B04T4
  • 表示容量:3.6T
  • パーティション:まだなし
  • ファイルシステム:まだなし
  • マウント:まだされていない

パーテーションの作成とフォーマット

GPTパーティションを作ってext4でフォーマット。

1
2
3
sudo parted /dev/nvme1n1 --script mklabel gpt
sudo parted /dev/nvme1n1 --script mkpart primary ext4 0% 100%
sudo mkfs.ext4 -L data4tb /dev/nvme1n1p1

手動でマウント

1
2
3
sudo mkdir -p /mnt/data4tb
sudo mount /dev/nvme1n1p1 /mnt/data4tb
sudo chown mike:mike /mnt/data4tb

確認:

1
2
3
4
df -h /mnt/data4tb 

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1p1  3.6T   28K  3.4T   1% /mnt/data4tb

自動マウントの設定

/etc/fstab に登録すれば、毎回 /mnt/data4tb に自動マウントされるので、設定する。

UUIDのCheck。

1
2
3
sudo blkid /dev/nvme1n1p1

/dev/nvme1n1p1: LABEL="data4tb" UUID="7c391bf3-9dca-4591-82fd-9f5e5ae63661" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="primary" PARTUUID="e70d4ffb-187a-4757-9afc-4ded0b7c3c50"

UUID=7c391bf3-9dca-4591-82fd-9f5e5ae63661 /mnt/data4tb ext4 defaults,nofail 0 2fstabの末尾に追加する。

1
2
3
4
5
cd ~

sudo mkdir -p /mnt/data4tb
sudo cp /etc/fstab /etc/fstab.backup
sudoedit /etc/fstab

再起動せずにテスト:

1
2
3
sudo umount /mnt/data4tb
sudo systemctl daemon-reload
sudo mount -a

sudo mount -aで何もエラーが出なければ、ほぼ成功。

mntの確認。

1
2
3
4
5
6
7
findmnt /mnt/data4tb
TARGET       SOURCE         FSTYPE OPTIONS
/mnt/data4tb /dev/nvme1n1p1 ext4   rw,relatime,errors=remount-ro

df -h /mnt/data4tb
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1p1  3.6T   28K  3.4T   1% /mnt/data4tb

書き込みも確認。

1
2
touch /mnt/data4tb/mount-test
rm /mnt/data4tb/mount-test

SMART

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
sudo nvme smart-log /dev/nvme1

Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning                        : 0
temperature                             : 31 °C (304 K)
available_spare                         : 100%
available_spare_threshold               : 5%
percentage_used                         : 0%
endurance group critical warning summary: 0
Data Units Read                         : 49 (25.09 MB)
Data Units Written                      : 124310 (63.65 GB)
host_read_commands                      : 2932
host_write_commands                     : 487497
controller_busy_time                    : 0
power_cycles                            : 1
power_on_hours                          : 0
unsafe_shutdowns                        : 0
media_errors                            : 0
num_err_log_entries                     : 0
Warning Temperature Time                : 0
Critical Composite Temperature Time     : 0
Temperature Sensor 1           : 31 °C (304 K)
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 0
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 0

SMART結果:

  • critical_warning : 0 → 重大な警告なし
  • temperature : 31°C → 十分低く、ヒートシンクも問題なさそう
  • available_spare : 100% → 予備領域は健全
  • percentage_used : 0% → 寿命消費はほぼゼロ
  • media_errors : 0 → SSD内部の読み書きエラーなし
  • num_err_log_entries : 0 → NVMeエラーログなし
  • unsafe_shutdowns : 0 → 異常な電源断なし

自己診断

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
sudo nvme device-self-test /dev/nvme1 -s 1
Short Device self-test started

sudo nvme self-test-log /dev/nvme1
Device Self Test Log for NVME device:nvme1
Current operation  : 0
Current Completion : 0%
Self Test Result[0]:
  Operation Result             : 0
  Self Test Code               : 1
  Valid Diagnostic Information : 0
  Power on hours (POH)         : 0
  Vendor Specific              : 0 0
Self Test Result[1]:
  Operation Result             : 0xf
Self Test Result[2]:
  Operation Result             : 0xf
Self Test Result[3]:
  Operation Result             : 0xf
Self Test Result[4]:
  Operation Result             : 0xf
Self Test Result[5]:
  Operation Result             : 0xf
Self Test Result[6]:
  Operation Result             : 0xf
Self Test Result[7]:
  Operation Result             : 0xf
Self Test Result[8]:
  Operation Result             : 0xf
Self Test Result[9]:
  Operation Result             : 0xf
Self Test Result[10]:
  Operation Result             : 0xf
Self Test Result[11]:
  Operation Result             : 0xf
Self Test Result[12]:
  Operation Result             : 0xf
Self Test Result[13]:
  Operation Result             : 0xf
Self Test Result[14]:
  Operation Result             : 0xf
Self Test Result[15]:
  Operation Result             : 0xf
Self Test Result[16]:
  Operation Result             : 0xf
Self Test Result[17]:
  Operation Result             : 0xf
Self Test Result[18]:
  Operation Result             : 0xf
Self Test Result[19]:
  Operation Result             : 0xf
  • Operation Result : 0 → エラーなく完了
  • Self Test Code : 1 → 実行したのはショートテスト
  • Valid Diagnostic Information : 0 → 報告すべき故障箇所なし
  • Power on hours : 0 → 使用時間がまだ1時間未満、または時間単位で切り捨て表示
  • Result[1]以降の0xf → 過去のテスト履歴が入っていない空き欄

結論

まず、マウントは以下になった。

  • デバイス名: /dev/nvme1n1
  • パーティション: /dev/nvme1n1p1
  • マウント先: /mnt/data4tb

新品買ったから当たり前だが、健康状態は以下の結果になった:

  • 重大な警告: なし
  • 温度: 31℃
  • 寿命消費: 0%
  • メディアエラー: 0
  • NVMeエラー: 0
  • 自己診断: 正常終了

既存の壊れかけのSSD

エラーメッセージ

以下の警告が出るようになってしまった。

SMART

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
sudo smartctl -x /dev/nvme0n1

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-110-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD WD_BLACK Gen4 SDCPNRZ-2T00-1106
Serial Number:                      23335U800038
Firmware Version:                   HPS2
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 2,048,408,248,320 [2.04 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8224
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b4afc75de
Local Time is:                      Fri Jun 12 14:02:32 2026 JST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     88 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.25W    8.25W       -    0  0  0  0        0       0
 1 +     3.50W    3.50W       -    0  0  0  0        0       0
 2 +     2.60W    2.60W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000   10000
 4 -   0.0035W       -        -    4  4  4  4     3900   45700

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- NVM subsystem reliability has been degraded

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x04
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    1%
Data Units Read:                    125,700,274 [64.3 TB]
Data Units Written:                 78,595,056 [40.2 TB]
Host Read Commands:                 1,527,910,999
Host Write Commands:                1,158,031,701
Controller Busy Time:               8,273
Power Cycles:                       1,806
Power On Hours:                     1,657
Unsafe Shutdowns:                   1,137
Media and Data Integrity Errors:    7
Error Information Log Entries:      24
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x4002)

NVMe専用コマンドでCheck

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
sudo nvme smart-log /dev/nvme0n1

Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                        : 0x4
temperature                             : 33 °C (306 K)
available_spare                         : 100%
available_spare_threshold               : 5%
percentage_used                         : 1%
endurance group critical warning summary: 0x4
Data Units Read                         : 125700274 (64.36 TB)
Data Units Written                      : 78595090 (40.24 TB)
host_read_commands                      : 1527911003
host_write_commands                     : 1158032559
controller_busy_time                    : 8273
power_cycles                            : 1806
power_on_hours                          : 1657
unsafe_shutdowns                        : 1137
media_errors                            : 7
num_err_log_entries                     : 24
Warning Temperature Time                : 0
Critical Composite Temperature Time     : 0
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 0
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 0

システムログのチェック

今の所、マザーボード側やPCIeリンク側で現在進行中の通信障害はなし。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
sudo dmesg -T | grep -Ei 'nvme|pcie|aer|I/O error|timeout|reset|abort'

[Fri Jun 12 12:19:10 2026] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[Fri Jun 12 12:19:10 2026] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME AER PCIeCapability LTR DPC]
[Fri Jun 12 12:19:10 2026] pci 0000:00:01.0: [8086:460d] type 01 class 0x060400 PCIe Root Port
[Fri Jun 12 12:19:10 2026] pci 0000:00:0e.0: [8086:467f] type 00 class 0x010400 PCIe Root Complex Integrated Endpoint
[Fri Jun 12 12:19:10 2026] pci 0000:00:14.3: [8086:7af0] type 00 class 0x028000 PCIe Root Complex Integrated Endpoint
[Fri Jun 12 12:19:10 2026] pci 0000:00:1c.0: [8086:7abf] type 01 class 0x060400 PCIe Root Port
[Fri Jun 12 12:19:10 2026] pci 0000:01:00.0: [10de:2204] type 00 class 0x030000 PCIe Legacy Endpoint
[Fri Jun 12 12:19:10 2026] pci 0000:01:00.1: [10de:1aef] type 00 class 0x040300 PCIe Endpoint
[Fri Jun 12 12:19:10 2026] pci 0000:02:00.0: [10ec:8168] type 00 class 0x020000 PCIe Endpoint
[Fri Jun 12 12:19:11 2026] pcieport 0000:00:01.0: PME: Signaling with IRQ 121
[Fri Jun 12 12:19:11 2026] pcieport 0000:00:1c.0: PME: Signaling with IRQ 122
[Fri Jun 12 12:19:11 2026] pci 10000:e0:06.0: [8086:464d] type 01 class 0x060400 PCIe Root Port
[Fri Jun 12 12:19:11 2026] pci 10000:e0:1d.4: [8086:7ab4] type 01 class 0x060400 PCIe Root Port
[Fri Jun 12 12:19:11 2026] pci 10000:e1:00.0: [1987:5029] type 00 class 0x010802 PCIe Endpoint
[Fri Jun 12 12:19:11 2026] pci 10000:e2:00.0: [15b7:5011] type 00 class 0x010802 PCIe Endpoint
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:06.0: can't derive routing for PCI INT D
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:06.0: PCI INT D: no GSI
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:06.0: PME: Signaling with IRQ 151
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:06.0: AER: enabled with IRQ 151
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:1d.4: can't derive routing for PCI INT A
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:1d.4: PCI INT A: no GSI
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:1d.4: PME: Signaling with IRQ 152
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:1d.4: AER: enabled with IRQ 152
[Fri Jun 12 12:19:12 2026] nvme nvme0: pci function 10000:e2:00.0
[Fri Jun 12 12:19:12 2026] nvme nvme1: pci function 10000:e1:00.0
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:06.0: can't derive routing for PCI INT A
[Fri Jun 12 12:19:12 2026] nvme 10000:e1:00.0: PCI INT A: no GSI
[Fri Jun 12 12:19:12 2026] pcieport 10000:e0:1d.4: can't derive routing for PCI INT A
[Fri Jun 12 12:19:12 2026] nvme 10000:e2:00.0: PCI INT A: no GSI
[Fri Jun 12 12:19:12 2026] nvme nvme1: allocated 64 MiB host memory buffer.
[Fri Jun 12 12:19:12 2026] nvme nvme0: 18/0/0 default/read/poll queues
[Fri Jun 12 12:19:12 2026]  nvme0n1: p1 p2 p3
[Fri Jun 12 12:19:12 2026] nvme nvme1: 18/0/0 default/read/poll queues
[Fri Jun 12 12:19:14 2026] systemd[1]: Starting modprobe@nvme_fabrics.service - Load Kernel Module nvme_fabrics...
[Fri Jun 12 12:19:14 2026] Bluetooth: hci0: DSM reset method type: 0x00
[Fri Jun 12 12:19:16 2026] EXT4-fs (nvme0n1p2): mounted filesystem 5ea66b3a-4ed4-4cf2-a696-80fbb3cde0a0 r/w with ordered data mode. Quota mode: none.
[Fri Jun 12 12:19:17 2026] block nvme0n1: No UUID available providing old NGUID
[Fri Jun 12 12:25:31 2026]  nvme1n1:
[Fri Jun 12 12:25:36 2026]  nvme1n1: p1
[Fri Jun 12 12:25:49 2026] EXT4-fs (nvme1n1p1): mounted filesystem 7c391bf3-9dca-4591-82fd-9f5e5ae63661 r/w with ordered data mode. Quota mode: none.

こっちも念の為。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
└─▶ sudo journalctl -k -b -1 | grep -Ei 'nvme|pcie|aer|I/O error|timeout|reset|abort'
Jun 08 19:45:20 omen kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
Jun 08 19:45:20 omen kernel: acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME AER PCIeCapability LTR DPC]
Jun 08 19:45:20 omen kernel: pci 0000:00:01.0: [8086:460d] type 01 class 0x060400 PCIe Root Port
Jun 08 19:45:20 omen kernel: pci 0000:00:0e.0: [8086:467f] type 00 class 0x010400 PCIe Root Complex Integrated Endpoint
Jun 08 19:45:20 omen kernel: pci 0000:00:14.3: [8086:7af0] type 00 class 0x028000 PCIe Root Complex Integrated Endpoint
Jun 08 19:45:20 omen kernel: pci 0000:00:1c.0: [8086:7abf] type 01 class 0x060400 PCIe Root Port
Jun 08 19:45:20 omen kernel: pci 0000:01:00.0: [10de:2204] type 00 class 0x030000 PCIe Legacy Endpoint
Jun 08 19:45:20 omen kernel: pci 0000:01:00.1: [10de:1aef] type 00 class 0x040300 PCIe Endpoint
Jun 08 19:45:20 omen kernel: pci 0000:02:00.0: [10ec:8168] type 00 class 0x020000 PCIe Endpoint
Jun 08 19:45:20 omen kernel: pcieport 0000:00:01.0: PME: Signaling with IRQ 121
Jun 08 19:45:20 omen kernel: pcieport 0000:00:1c.0: PME: Signaling with IRQ 122
Jun 08 19:45:20 omen kernel: pci 10000:e0:1d.4: [8086:7ab4] type 01 class 0x060400 PCIe Root Port
Jun 08 19:45:20 omen kernel: pci 10000:e1:00.0: [15b7:5011] type 00 class 0x010802 PCIe Endpoint
Jun 08 19:45:20 omen kernel: pcieport 10000:e0:1d.4: can't derive routing for PCI INT A
Jun 08 19:45:20 omen kernel: pcieport 10000:e0:1d.4: PCI INT A: no GSI
Jun 08 19:45:20 omen kernel: pcieport 10000:e0:1d.4: PME: Signaling with IRQ 151
Jun 08 19:45:20 omen kernel: pcieport 10000:e0:1d.4: AER: enabled with IRQ 151
Jun 08 19:45:20 omen kernel: nvme nvme0: pci function 10000:e1:00.0
Jun 08 19:45:20 omen kernel: pcieport 10000:e0:1d.4: can't derive routing for PCI INT A
Jun 08 19:45:20 omen kernel: nvme 10000:e1:00.0: PCI INT A: no GSI
Jun 08 19:45:20 omen kernel: nvme nvme0: 18/0/0 default/read/poll queues
Jun 08 19:45:20 omen kernel:  nvme0n1: p1 p2 p3
Jun 08 19:45:20 omen systemd[1]: Starting modprobe@nvme_fabrics.service - Load Kernel Module nvme_fabrics...
Jun 08 19:45:20 omen kernel: Bluetooth: hci0: DSM reset method type: 0x00
Jun 08 19:45:22 omen kernel: EXT4-fs (nvme0n1p2): mounted filesystem 5ea66b3a-4ed4-4cf2-a696-80fbb3cde0a0 r/w with ordered data mode. Quota mode: none.
Jun 08 19:45:23 omen kernel: block nvme0n1: No UUID available providing old NGUID
Jun 08 19:45:26 omen kernel: usb 1-4.2: reset high-speed USB device number 5 using xhci_hcd
Jun 12 11:41:28 omen kernel: EXT4-fs (nvme0n1p2): unmounting filesystem 5ea66b3a-4ed4-4cf2-a696-80fbb3cde0a0.

PCIe接続のCheck

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
sudo lspci -s 10000:e2:00.0 -vv |
        grep -E 'MSI|MSI-X|LnkCap|LnkSta|AER'
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
        Capabilities: [b0] MSI-X: Enable+ Count=65 Masked-
        Capabilities: [c0] Express (v2) Endpoint, MSI 00
                LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <8us
                LnkSta: Speed 16GT/s, Width x4
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-

AERのエラーカウンター

こっちも+が多いとエラー。

1
2
3
4
5
6
7
sudo lspci -s 10000:e2:00.0 -vv |
          grep -E 'DevSta|UESta|CESta|UEMsk|CEMsk'
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+

AI診断のまとめ

項目判定
PCIe Gen4 x4リンク正常
MSI-X割り込み正常
Fatal/Non-Fatal PCIeエラーなし
過去の訂正可能PCIeエラーあり
過去のUnsupported Requestあり
カーネルのNVMe timeout/resetなし
SSD内部SMARTFAILED
SSD内部メディアエラー7件

PCIeに軽微な履歴はあるものの、現時点でSSDのSMART故障をマザーボード側だけの問題として説明できる材料はなし。

結論

ログ:

  • PCIeリンク:正常
  • MSI-X割り込み:正常
  • Gen4 x4速度:正常
  • カーネルのtimeout/reset/I/O error:なし
  • SSD内部SMART:FAILED
  • Media/Data Integrity Errors:7
  • Critical Warning:0x04

問題:

  • Critical Warning: 0x04
  • SMART overall-health: FAILED
  • NVM subsystem reliability has been degraded

詳細:

  • 0x04はNVMe規格の Reliability Degraded(信頼性低下) を意味する。
  • つまり、SSD自身が、重大なメディア関連エラーまたは内部エラーによって信頼性が低下したと判定
  • Media and Data Integrity Errors: 7は、SSDコントローラーが回復できなかったデータ整合性エラーを7回検出したという事

つまり、書き込み寿命を使い切ったわけではなく、比較的新しい状態なのにNAND、コントローラー、ファームウェアなどの内部障害が起きている可能性がある。

まとめ

  • 壊れかけのSSDも交換しよう
  • pushしていないリポジトリはすべてpushするようにしよう
Built with Hugo
テーマ StackJimmy によって設計されています。