SMART Error

FreeBSD, Gentoo, openSUSE, CentOS, Ubuntu, Debian
kos
Posts: 56
Joined: 2004-02-15 03:36
 

SMART Error

Post by kos »

Hallo,

neulich ist mein Rootserver aus irgend einem Grund hängen geblieben. (nicht nachvollziehbar)
Nach einem Neustart bzw. auch einem Hard-Reset und durchführung des fsck beim Boot habe ich jetzt durch Munin eine Änderung der SMART Daten festgestellt.
Und zwar zeigt mir Munin nun den Parametetr smartctl_exit_status mit 6 statt normal 0 an. Ein Warning ab 1 ist ja auch standardmäßig eingestellt, was bedeutet dieser Parameter? Irgendwo hab ich mal gelesen solang es 0 ist alles in ordnung, wenn nicht sollte man aufpassen.

Ich habe auch mal mit smartctl ein long Test gemacht und ein paar Errors vorliegen womit ich aber nicht wirklich was anfangen kann. Kenne mich leider mit Smart nicht wirklich aus.

Hier mal eine Ausgabe von SMART:

Code: Select all

# smartctl -l error /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
Warning: ATA error count 14 inconsistent with error log pointer 5

ATA Error Count: 14 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 14 occurred at disk power-on lifetime: 27307 hours (1137 days + 19 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  01 51 08 48 8a 55 e0  Error: AMNF 8 sectors at LBA = 0x00558a48 = 5605960

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 48 8a 55 e0 08   3d+21:17:24.656  READ DMA
  ca 00 08 5f 34 30 ee 08   3d+21:17:23.200  WRITE DMA
  ca 00 10 4f 34 30 ee 08   3d+21:17:23.200  WRITE DMA
  ca 00 08 47 34 30 ee 08   3d+21:17:21.216  WRITE DMA
  ca 00 10 37 34 30 ee 08   3d+21:17:21.216  WRITE DMA

Error 13 occurred at disk power-on lifetime: 27307 hours (1137 days + 19 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f 98 30 57 e0  Error: UNC 15 sectors at LBA = 0x00573098 = 5714072

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 10 98 30 57 e0 08   3d+21:17:17.568  READ DMA
  ca 00 08 2f 34 30 ee 08   3d+21:17:15.536  WRITE DMA
  ca 00 10 1f 34 30 ee 08   3d+21:17:15.536  WRITE DMA
  ca 00 08 17 34 30 ee 08   3d+21:17:11.712  WRITE DMA
  ca 00 10 07 34 30 ee 08   3d+21:17:11.712  WRITE DMA

Error 12 occurred at disk power-on lifetime: 27306 hours (1137 days + 18 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 18 8a 4a e0  Error: UNC 8 sectors at LBA = 0x004a8a18 = 4885016

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 18 8a 4a e0 08   3d+20:31:07.120  READ DMA
  c8 00 08 b8 64 5a e0 08   3d+20:31:07.120  READ DMA
  ca 00 00 d0 eb 08 e0 08   3d+20:31:07.040  WRITE DMA
  ca 00 00 d0 ea 08 e0 08   3d+20:31:07.040  WRITE DMA
  ca 00 00 d0 e9 08 e0 08   3d+20:31:07.040  WRITE DMA

Error 11 occurred at disk power-on lifetime: 27306 hours (1137 days + 18 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  01 51 29 f0 81 55 e0  Error: AMNF 41 sectors at LBA = 0x005581f0 = 5603824

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 f0 81 55 e0 08   3d+20:16:47.184  READ DMA
  ca 00 08 50 c3 04 e0 08   3d+20:16:47.152  WRITE DMA
  ca 00 08 a8 74 06 e0 08   3d+20:16:47.152  WRITE DMA
  ca 00 48 38 be 05 e0 08   3d+20:16:47.152  WRITE DMA
  ca 00 00 38 bd 05 e0 08   3d+20:16:47.152  WRITE DMA

Error 10 occurred at disk power-on lifetime: 27306 hours (1137 days + 18 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  01 51 06 08 fc 46 e0  Error: AMNF 6 sectors at LBA = 0x0046fc08 = 4652040

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 08 fc 46 e0 08   3d+20:15:14.256  READ DMA
  c8 00 18 e8 90 ef e4 08   3d+20:15:14.144  READ DMA
  ca 00 08 ef 73 30 ee 08   3d+20:15:13.744  WRITE DMA
  ca 00 10 df 73 30 ee 08   3d+20:15:13.744  WRITE DMA
  ca 00 08 d7 73 30 ee 08   3d+20:15:09.904  WRITE DMA
Ist das irgendetwas gravierendes, sollte ich nun schleunigst eine neue Platte einbauen oder kann man diese Fehler irgnorieren?
franki
Posts: 60
Joined: 2005-05-31 16:23
Location: Dresden
 

Re: SMART Error

Post by franki »

Was sagt denn

smartctl -A /dev/hda

Ich würde die Platte aber auf jeden Fall sicherheitshalber wechseln, 27000 Betriebsstunden erscheinen mir auch relativ viel, ich habe gerade mal geschaut, bei mir hat die Platte die am längsten läuft 18100.

Gruß von Franki.
kos
Posts: 56
Joined: 2004-02-15 03:36
 

Re: SMART Error

Post by kos »

Code: Select all

# smartctl -A /dev/hda
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   202   202   063    Pre-fail  Always       -       5497
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   250   250   063    Pre-fail  Always       -       34
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   247   232   187    Pre-fail  Always       -       36439
  9 Power_On_Minutes        0x0032   169   169   000    Old_age   Always       -       791h+56m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       29
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       26
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       6227
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   237   000    Old_age   Always       -       294
202 TA_Increase_Count       0x000a   253   236   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   195   191   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
Die Platte ist seit Nov. 2003 im Einsatz.
franki
Posts: 60
Joined: 2005-05-31 16:23
Location: Dresden
 

Re: SMART Error

Post by franki »

Immerhin 34 umgeleitete Sektoren. Den Wert würde ich beobachten, wenn er weiter steigt, schleunigst die Platte wechseln.

Ich wechsele prophylaktisch nach 4 Jahren Laufzeit im Server die Platten. Setze dafür keine teuren SCSi-Platten oder speziellen Server-Platten ein.

Gruß von Franki.