EqualLogic, cuando un disco falla.

Un correo electrónico que incluye el siguiente mensaje de advertencia: Warning health conditions currently exist. Correct these conditions before they affect array operation. Non-fatal RAIDset failure. While the RAID set is degraded, performance and availability might be decreased. There are 1 outstanding health conditions. Correct these conditions before they affect array operation.

No puede ser buena señal, al menos menciona que la condición no es fatal y podemos regresar a dormir tranquilos.

Por la mañana podemos verificar los registros del sistema y comprobamos que el error es un disco dañado, el sistema lo intento reparar y no le fue posible, por lo tanto entro uno de los discos de respaldo para reconstruir el arreglo.

Severity  Date      Time         Member  Message                                                                                                                                                                                                                                                                                                                                                                                                                            
--------  --------  -----------  ------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
INFO     9/01/13   09:41:44 PM  EQL3    Reconstruction of RAID LUN 0 completed in 3815 seconds.                                                                                                                                                                                                                                                                                                                                                                            
WARNING  9/01/13   09:41:44 PM  EQL3    Warning health conditions currently exist.   Correct these conditions before they affect array operation.   More spare drives are expected.   There are 1 outstanding health conditions. Correct these conditions before they affect array operation.                                                                                                                                                                              
INFO     9/01/13   09:41:44 PM  EQL3    RAID set has recovered from a failure.                                                                                                                                                                                                                                                                                                                                                                                             
INFO     9/01/13   08:42:26 PM  EQL3    Attempt to remove drive 12 from RAID set was not successful.                                                                                                                                                                                                                                                                                                                                                                       
INFO     9/01/13   08:38:08 PM  EQL3    Reconstruction of RAID LUN 0 initiated.                                                                                                                                                                                                                                                                                                                                                                                            
WARNING  9/01/13   08:38:08 PM  EQL3    Warning health conditions currently exist.   Correct these conditions before they affect array operation.   Non-fatal RAIDset failure. While the RAID set is degraded, performance and availability might be decreased.   More spare drives are expected.   There are 2 outstanding health conditions. Correct these conditions before they affect array operation.                                                                
WARNING  9/01/13   08:38:08 PM  EQL3    Failure: HDD Drive: 12, Model: XXXXXXXXXXX     , Serial Number: XXXXXXXX                                                                                                                                                                                                                                                                                                                                                           
WARNING  9/01/13   08:37:46 PM  EQL3    Preemptive removal of Enclosure/Drive 0/12 has now been approved; proceeding with removal.                                                                                                                                                                                                                                                                                                                                         
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660619 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660616 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660614 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660608 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660603 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660601 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660599 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660597 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660595 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660593 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:46 PM  EQL3    Unable to repair bad disk sector 49660591 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:26 PM  EQL3    Unable to repair bad disk sector 49658995 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:26 PM  EQL3    Unable to repair bad disk sector 49658988 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:26 PM  EQL3    Unable to repair bad disk sector 49658986 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
ERROR    9/01/13   08:37:26 PM  EQL3    Unable to repair bad disk sector 49658982 on disk drive 12 in RAID LUN 0.                                                                                                                                                                                                                                                                                                                                                          
INFO     9/01/13   08:37:26 PM  EQL3    Attempt to remove drive 12 from RAID set was not successful.                                                                                                                                                                                                                                                                                                                                                                       
WARNING  9/01/13   08:37:17 PM  EQL3    Warning health conditions currently exist.   Correct these conditions before they affect array operation.   Non-fatal RAIDset failure. While the RAID set is degraded, performance and availability might be decreased.   There are 1 outstanding health conditions. Correct these conditions before they affect array operation.                                                                                                  
ERROR    9/01/13   08:37:17 PM  EQL3    Disk drive 12 failed in RAID LUN 0.

Si tienes estos equipos en garantía lo mas fácil y recomendable es hablar con el soporte técnico de DELL para que te reemplacen el disco averiado. Es requisito obligatorio ejecutar el comando diag para enviar el reporte a los ingenieros de soporte y puedan revisar los eventos ocurridos.

<Grupo3> diag
 
The diag command will gather configuration data from this array
for support and troubleshooting purposes.  No user information will be
included in this data.
 
Results will be sent to "gabrielxx@xxxxxxxxxxxx.com.mx" through e-mail.
If this is unsuccessful, other options for retrieving the results
will be presented at the end of the procedure.
 
Finally, please remember to include your Dell Technical Support case or incident number
in the subject line of any e-mail that you send to Dell Support.  This will help
ensure that the message is routed correctly.
 
Do you wish to proceed (y/n) [y]: y
 
Starting data collection on Thu Jan 10 14:59:06 CST 2013.
 
Section 1 of 15: .
Finished in 0 seconds
Section 2 of 15: .........0.........0.......
Finished in 12 seconds
Section 3 of 15: ....
Finished in 28 seconds
Section 4 of 15: .........0......
Finished in 12 seconds
Section 5 of 15: .........0.......
Finished in 5 seconds
Section 6 of 15: .........0.........0.
Finished in 7 seconds
Section 7 of 15: .
Finished in 2 seconds
Section 8 of 15: ....
Finished in 1 seconds
Section 9 of 15: .........0.........0.........0.........0.........0.........0
Finished in 4 seconds
Section 10 of 15: ...
Finished in 1 seconds
Section 11 of 15: .........0.........0.........0..
Finished in 34 seconds
Section 12 of 15: ..
Finished in 2 seconds
Section 13 of 15: .
Finished in 4 seconds
Section 14 of 15: .........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0.........0....
Finished in 57 seconds
Section 15 of 15: .
Finished in 1 seconds
 
Sending e-mail 1 of 6.
Sending e-mail 2 of 6.
Sending e-mail 3 of 6.
Sending e-mail 4 of 6.
Sending e-mail 5 of 6.
Sending e-mail 6 of 6.
 
You have the option of retrieving the diagnostic data using FTP or SCP.
To use FTP, use the 'mget' command to retrieve all files matching
the specification "Seg_*.dgo".  You must use the "grpadmin" account
and password, and connect to one of the IP addresses from the list below.
 
To use SCP, enter the command: 'scp -r grpadmin@x.x.x.x:. destdir'
where "x.x.x.x" is one of the IP addresses from the list below.
Then, in the destination location, look for files with the name "Seg_*.dgo".
You can delete any other files retrieved by scp.
 
Here are the IP addresses you can use to retrieve files from this member:
   172.XXX.XXX.XXX
   172.XXX.XXX.XXX
   172.XXX.XXX.XXX
   172.XXX.XXX.XXX
   172.XXX.XXX.XXX
 
You also have the option to capture the output by using the "text capture"
feature of your Telnet or terminal emulator program.
 
Do you wish to do this (y/n) [n]: 
Grupo3> logout
Grupo3> Connection closed by foreign host.

En menos de 24 horas tenemos a la gente de soporte técnico en el sitio listos para reemplazar el disco dañado, vemos en los registros del sistema que lo retiran e insertan uno nuevo que sera un disco de respaldo en sustitución del anterior que entro como miembro activo del arreglo.

Severity  Date      Time         Member  Message                                                                                                                                                                                                                                                                                                                                                                                                                            
--------  --------  -----------  ------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
INFO     11/01/13  10:07:35 AM  EQL3    Expected number of spare drives now present.                                                                                                                                                                                                                                                                                                                                                                                       
INFO     11/01/13  10:07:35 AM  EQL3    Creating a RAID label for uninitialized drive 12.                                                                                                                                                                                                                                                                                                                                                                                  
INFO     11/01/13  10:07:35 AM  EQL3    Disk 12 is online.                                                                                                                                                                                                                                                                                                                                                                                                                 
INFO     11/01/13  10:07:35 AM  EQL3    Found and verified new drive: enclosure 0, disk 12, Model ST3600057SS     , SN 6SL4NJ2J.                                                                                                                                                                                                                                                                                                                                           
INFO     11/01/13  10:07:13 AM  EQL3    Disk 12 has been inserted.                                                                                                                                                                                                                                                                                                                                                                                                         
WARNING  11/01/13  10:03:33 AM  EQL3    Disk 12 has been removed.

Mientras no se pongan de acuerdo todos los discos el sistema de almacenamiento estará bien.

One Comment

  1. Reply
    xade January 31, 2013

    Es muy dificil reparar un raid de discos. Sobretodo cuando la falla implica el daño de uno o mas discos de forma fisica…

    Ahi solo se puede intentar recuperar los datos a traves de alguna empresa especializada en dicha tarea. Onretrieval es un laboratorio que se especializa en raid con fallas.

    Saludos.

Leave a Reply

Your email address will not be published. Required fields are marked *