Quantcast

bad primary superblock - bad magic number !!!

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

bad primary superblock - bad magic number !!!

stress_buster
my hp proliant DL185 server hangs/crashes and sometimes do not boot correctly...

[root@localhost dev]# xfs_repair -n /dev/cciss/c0d0p1
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
.....................................................................................Sorry, could not find valid secondary superblock
Exiting now.

[root@localhost dev]# xfs_repair -n /dev/cciss/c0d2
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Input/output error


[root@localhost dev]# xfs_db /dev/cciss/c0d0p1
xfs_db: /dev/cciss/c0d0p1 is not a valid XFS filesystem (unexpected SB magic number 0x00000000)

The next time, server didnt even boot up alright

i've managed to capture the msgs & traces dumped to console. See below


end_request: I/O error, dev cciss/c0d2, sector 0
end_request: I/O error, dev cciss/c0d2, sector 0
end_request: I/O error, dev cciss/c0d2, sector 1
Quote:
ciss: cmd f6c00000 has CHECK CONDITION sense key = 0x4
end_request: I/O error, dev cciss/c0d3, sector 0
cciss: cmd f6c00000 has CHECK CONDITION sense key = 0x4
end_request: I/O error, dev cciss/c0d3, sector 0

backtrace from SysRq -w

SysRq : Show Blocked State
f7ad1e40 00203082 f7853b90 e54af7f0 e54af948 cba30e00 00000001 00000020
e5ad2250 00000000 000000ff e5ad2250 00000000 00000000 00000000 7fffffff
e55afe00 e55afd44 e55afe04 c05ab1c5 256e2000 00000000 e56e2000 00000000
Call Trace:
[<c05ab1c5>] schedule_timeout+0x13/0x86
[<c05ab095>] wait_for_common+0xb9/0x103
[<c021a4b6>] default_wake_function+0x0/0x8
[<c0409473>] cciss_ioctl+0x6fb/0xd1e
[<c0207852>] read_tsc+0x6/0x22
[<c02335a6>] getnstimeofday+0x4a/0xca
[<c023618a>] tick_dev_program_event+0x1e/0x8c
[<c026c316>] dput+0x31/0xf7
[<c026570c>] __link_path_walk+0x9fd/0xb2b
[<c038442f>] blkdev_driver_ioctl+0x4b/0x5b
[<c054420b>] igmp_rcv+0x38f/0x496
[<c0384ad6>] blkdev_ioctl+0x697/0x6e5
[<c054420b>] igmp_rcv+0x38f/0x496
[<c054420b>] igmp_rcv+0x38f/0x496
[<c027e02d>] do_open+0x1d9/0x258
[<c027e21a>] blkdev_open+0x0/0x4d
[<c027e23f>] blkdev_open+0x25/0x4d
[<c025c3a5>] __dentry_open+0x13b/0x212
[<c025c498>] nameidata_to_filp+0x1c/0x2c
[<c02667c3>] do_filp_open+0x350/0x64d
[<c023786c>] do_futex+0x8a/0x6ee
[<c024feaa>] handle_mm_fault+0x4e0/0x4ea
[<c054420b>] igmp_rcv+0x38f/0x496
[<c027d871>] block_ioctl+0x13/0x16
[<c027d85e>] block_ioctl+0x0/0x16
[<c026744c>] vfs_ioctl+0x1c/0x5d
[<c02676c6>] do_vfs_ioctl+0x239/0x247
[<c025c203>] do_sys_open+0xae/0xb6
[<c0267715>] sys_ioctl+0x41/0x58
[<c0203759>] sysenter_do_call+0x12/0x25
[<c054420b>] igmp_rcv+0x38f/0x496

Plz help.....

Thanks in advance,
David
ub007 is offline   Click here to find out more!
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

Eric Sandeen-3
stress_buster wrote:

> my hp proliant DL185 server hangs/crashes and sometimes do not boot
> correctly...
>
> [root@localhost dev]# xfs_repair -n /dev/cciss/c0d0p1
> Phase 1 - find and verify superblock...
> bad primary superblock - bad magic number !!!
>
> attempting to find secondary superblock...
> .....................................................................................Sorry,
> could not find valid secondary superblock
> Exiting now.
>
> [root@localhost dev]# xfs_repair -n /dev/cciss/c0d2
> Phase 1 - find and verify superblock...
> superblock read failed, offset 0, size 524288, ag 0, rval -1
>
> fatal error -- Input/output error
>
>
> [root@localhost dev]# xfs_db /dev/cciss/c0d0p1
> xfs_db: /dev/cciss/c0d0p1 is not a valid XFS filesystem (unexpected SB magic
> number 0x00000000)
>
> The next time, server didnt even boot up alright
>
> i've managed to capture the msgs & traces dumped to console. See below
>
>
> end_request: I/O error, dev cciss/c0d2, sector 0
> end_request: I/O error, dev cciss/c0d2, sector 0
> end_request: I/O error, dev cciss/c0d2, sector 1
> Quote:
> ciss: cmd f6c00000 has CHECK CONDITION sense key = 0x4
> end_request: I/O error, dev cciss/c0d3, sector 0
> cciss: cmd f6c00000 has CHECK CONDITION sense key = 0x4
> end_request: I/O error, dev cciss/c0d3, sector 0

You seem to have serious storage problems that are not XFS related.

You'll need to get that resolved.

-Eric

_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

stress_buster
Many Thanks.
 
I agree. I destroy and re-create raid and everything would show up GOOD, only for it to break again.
So was wondering whether those traces would point to anything.... my prime suspect is hard drives, but those xfs msgs confused me.
Apologies for posting before carrying put more tests.
 
Thanks,
leo


From: Eric Sandeen <[hidden email]>
To: stress_buster <[hidden email]>
Cc: [hidden email]
Sent: Mon, May 10, 2010 4:42:13 PM
Subject: Re: bad primary superblock - bad magic number !!!

stress_buster wrote:

> my hp proliant DL185 server hangs/crashes and sometimes do not boot
> correctly...
>
> [root@localhost dev]# xfs_repair -n /dev/cciss/c0d0p1
> Phase 1 - find and verify superblock...
> bad primary superblock - bad magic number !!!
>
> attempting to find secondary superblock...
> .....................................................................................Sorry,
> could not find valid secondary superblock
> Exiting now.
>
> [root@localhost dev]# xfs_repair -n /dev/cciss/c0d2
> Phase 1 - find and verify superblock...
> superblock read failed, offset 0, size 524288, ag 0, rval -1
>
> fatal error -- Input/output error
>
>
> [root@localhost dev]# xfs_db /dev/cciss/c0d0p1
> xfs_db: /dev/cciss/c0d0p1 is not a valid XFS filesystem (unexpected SB magic
> number 0x00000000)
>
> The next time, server didnt even boot up alright
>
> i've managed to capture the msgs & traces dumped to console. See below
>
>
> end_request: I/O error, dev cciss/c0d2, sector 0
> end_request: I/O error, dev cciss/c0d2, sector 0
> end_request: I/O error, dev cciss/c0d2, sector 1
> Quote:
> ciss: cmd f6c00000 has CHECK CONDITION sense key = 0x4
> end_request: I/O error, dev cciss/c0d3, sector 0
> cciss: cmd f6c00000 has CHECK CONDITION sense key = 0x4
> end_request: I/O error, dev cciss/c0d3, sector 0

You seem to have serious storage problems that are not XFS related.

You'll need to get that resolved.

-Eric


_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

Emmanuel Florac
Le Mon, 10 May 2010 11:11:45 -0700 (PDT) vous écriviez:

> I agree. I destroy and re-create raid and everything would show up
> GOOD, only for it to break again. So was wondering whether those
> traces would point to anything.... my prime suspect is hard drives,
> but those xfs msgs confused me.

Check the hard drives separately with the maker utility (Seatools,
etc). One of them at the very least must be seriously ill.

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    | <[hidden email]>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

stress_buster
havent had much success with testing the hard drives, tried mhdd & seatools with no luck yet.

Meanwhile i recreated RAID, all shows up OK for now.

Previously the msgs shown were:

end_request: I/O error, dev cciss/c0d2, sector 0
end_request: I/O error, dev cciss/c0d2, sector 0
end_request: I/O error, dev cciss/c0d2, sector 1

That seems to indicate that the problem is with the disc or array. It is unable to read the beginning of the device.
So is I do a - dd if=/dev/random of=dev/cciss/c0d2 , that should fail and therby confirm that the drive or array has issues...do i make any sense here?

thanks




From: Emmanuel Florac <[hidden email]>
To: Leo Davis <[hidden email]>
Cc: [hidden email]
Sent: Mon, May 10, 2010 9:22:11 PM
Subject: Re: bad primary superblock - bad magic number !!!

Le Mon, 10 May 2010 11:11:45 -0700 (PDT) vous écriviez:

> I agree. I destroy and re-create raid and everything would show up
> GOOD, only for it to break again. So was wondering whether those
> traces would point to anything.... my prime suspect is hard drives,
> but those xfs msgs confused me.

Check the hard drives separately with the maker utility (Seatools,
etc). One of them at the very least must be seriously ill.

--
------------------------------------------------------------------------
Emmanuel Florac    |  Direction technique
                    |  Intellique
                    |    <[hidden email]>
                    |  +33 1 78 94 84 02
------------------------------------------------------------------------


_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

Emmanuel Florac
Le Wed, 12 May 2010 02:19:16 -0700 (PDT)
Leo Davis <[hidden email]> écrivait:

> havent had much success with testing the hard drives, tried mhdd &
> seatools with no luck yet.

What do you mean? Does the tools report any problem with the drives?

> So is I do a - dd if=/dev/random of=dev/cciss/c0d2 , that should fail
> and therby confirm that the drive or array has issues...do i make any
> sense here?

Uh, you should try the other way around to avoid breaking the
filesystem :

dd if=/dev/cciss/c0d2 of=/dev/null bs=131072

If no error occurs it should be OK.

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    | <[hidden email]>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

stress_buster
> So is I do a - dd if=/dev/random of=dev/cciss/c0d2 , that should fail
> and therby confirm that the drive or array has issues...do i make any
> sense here?

>Uh, you should try the other way around to avoid breaking the
>filesystem :

>dd if=/dev/cciss/c0d2 of=/dev/null bs=131072

>If no error occurs it should be OK.

# dd if=/dev/cciss/c0d2 of=/dev/null bs=131072
796+1 records in
796+1 records out
#

doesn't show any errors here ......

> havent had much success with testing the hard drives, tried mhdd &
> seatools with no luck yet.

>What do you mean? Does the tools report any problem with the drives?

mhdd doesnt detect the drives, probably an issue with chipset...still looking for tools

thanks




From: Emmanuel Florac <[hidden email]>
To: Leo Davis <[hidden email]>
Cc: [hidden email]
Sent: Wed, May 12, 2010 1:29:02 PM
Subject: Re: bad primary superblock - bad magic number !!!

Le Wed, 12 May 2010 02:19:16 -0700 (PDT)
Leo Davis <[hidden email]> écrivait:

> havent had much success with testing the hard drives, tried mhdd &
> seatools with no luck yet.

What do you mean? Does the tools report any problem with the drives?

> So is I do a - dd if=/dev/random of=dev/cciss/c0d2 , that should fail
> and therby confirm that the drive or array has issues...do i make any
> sense here?

Uh, you should try the other way around to avoid breaking the
filesystem :

dd if=/dev/cciss/c0d2 of=/dev/null bs=131072

If no error occurs it should be OK.

--
------------------------------------------------------------------------
Emmanuel Florac    |  Direction technique
                    |  Intellique
                    |    <[hidden email]>
                    |  +33 1 78 94 84 02
------------------------------------------------------------------------


_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

stress_buster
In reply to this post by Emmanuel Florac

>Uh, you should try the other way around to avoid breaking the
>filesystem :

>dd if=/dev/cciss/c0d2 of=/dev/null bs=131072

>If no error occurs it should be OK.

i did that on all 4 luns
#dd if=/dev/cciss/c0d2 of=/dev/null bs=131072
796+1 records in
796+1 records out

#dd if=/dev/cciss/c0d0 of=/dev/null bs=131072
796+1 records in
796+1 records out

# dd if=/dev/cciss/c0d1 of=/dev/null bs=131072
68675509+1 records in
68675509+1 records out

## dd if=/dev/cciss/c0d3 of=/dev/null bs=131072
68675509+1 records in
68675509+1 records out

I also had a serial cable attached to my P800 controller to capture any traces..this is what that picked up:

/dev/cciss/c0d0: [05/12 13:38:28]Int13 BIOS unit 0x81 = CISS LUN 0x0000004000000
000
/dev/cciss/c0d0: [05/12 13:38:28]Int13 BIOS unit 0x82 = CISS LUN 0x0100004000000
000
/dev/cciss/c0d0: [05/12 13:38:28]Int13 BIOS unit 0x83 = CISS LUN 0x0200004000000
000
/dev/cciss/c0d0: [05/12 13:38:28]Int13 BIOS unit 0x84 = CISS LUN 0x0300004000000
000
/dev/cciss/c0d0: [05/13 09:13:03]PR=030fefb8h D245 Op=1c PLErr=04 IopErr=30 S=00
 STag=0x018d Has/dev/cciss/c0d0: hAddr=0x00e59c6c PLLog=0x31190000
/dev/cciss/c0d0: [05/13 09:21:04]Ctlr SCSI Request, Illegal CDB Opcode=0x3c
/dev/cciss/c0d0: [05/13 09:21:08]BadReq:CDB0-15=260008000000A200A000000000000000
,LUN=00000000L00/dev/cciss/c0d0: 000000H
/dev/cciss/c0d0: [05/13 09:21:08]BadReq:CDB0-15=260009000000A200A000000000000000
,LUN=00000000L00/dev/cciss/c0d0: 000000H
/dev/cciss/c0d0: [05/13 09:21:08]BadReq:CDB0-15=26000A000000A200A000000000000000
,LUN=00000000L00/dev/cciss/c0d0: 000000H
/dev/cciss/c0d0: [05/13 09:21:08]BadReq:CDB0-15=26000B000000A200A000000000000000
,LUN=00000000L00/dev/cciss/c0d0: 000000H
/dev/cciss/c0d0: [05/13 09:21:08]BadReq:CDB0-15=26000C000000A200A000000000000000
..the spew continues..


Any thoughts here?



From: Emmanuel Florac <[hidden email]>
To: Leo Davis <[hidden email]>
Cc: [hidden email]
Sent: Wed, May 12, 2010 1:29:02 PM
Subject: Re: bad primary superblock - bad magic number !!!

Le Wed, 12 May 2010 02:19:16 -0700 (PDT)
Leo Davis <[hidden email]> écrivait:

> havent had much success with testing the hard drives, tried mhdd &
> seatools with no luck yet.

What do you mean? Does the tools report any problem with the drives?

> So is I do a - dd if=/dev/random of=dev/cciss/c0d2 , that should fail
> and therby confirm that the drive or array has issues...do i make any
> sense here?

Uh, you should try the other way around to avoid breaking the
filesystem :

dd if=/dev/cciss/c0d2 of=/dev/null bs=131072

If no error occurs it should be OK.

--
------------------------------------------------------------------------
Emmanuel Florac    |  Direction technique
                    |  Intellique
                    |    <[hidden email]>
                    |  +33 1 78 94 84 02
------------------------------------------------------------------------


_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

Emmanuel Florac
Le Thu, 13 May 2010 01:38:25 -0700 (PDT) vous écriviez:

> /dev/cciss/c0d0: [05/13
> 09:21:08]BadReq:CDB0-15=26000C000000A200A000000000000000 ..the spew
> continues..
>
>
> Any thoughts here?
>

If I understand correctly, c0d0 represents a drive (the first one).
Apparently this drive is dead, or close. You should probably ditch it.

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    | <[hidden email]>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

stress_buster
>If I understand correctly, c0d0 represents a drive (the first one).
>Apparently this drive is dead, or close. You should probably ditch it.

nope, c0d0 represents ( ControllerNumber[c0] LogicalDriveNumber[d0] )
so its 12 disks in 2 partitions- c0d0 and c0d1
c0d0 holds configuration information
I boot from a different device, the raid set is used only for storing data.

cheers


From: Emmanuel Florac <[hidden email]>
To: Leo Davis <[hidden email]>
Cc: [hidden email]
Sent: Thu, May 13, 2010 10:41:51 AM
Subject: Re: bad primary superblock - bad magic number !!!

Le Thu, 13 May 2010 01:38:25 -0700 (PDT) vous écriviez:

> /dev/cciss/c0d0: [05/13
> 09:21:08]BadReq:CDB0-15=26000C000000A200A000000000000000 ..the spew
> continues..
>
>
> Any thoughts here?
>

If I understand correctly, c0d0 represents a drive (the first one).
Apparently this drive is dead, or close. You should probably ditch it.

--
------------------------------------------------------------------------
Emmanuel Florac    |  Direction technique
                    |  Intellique
                    |    <[hidden email]>
                    |  +33 1 78 94 84 02
------------------------------------------------------------------------


_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

Emmanuel Florac
Le Thu, 13 May 2010 04:00:33 -0700 (PDT) vous écriviez:

> nope, c0d0 represents ( ControllerNumber[c0] LogicalDriveNumber[d0] )
> so its 12 disks in 2 partitions- c0d0 and c0d1
> c0d0 holds configuration information
> I boot from a different device, the raid set is used only for storing
> data.

Oh, OK. I don't understand how a whole array may generate errors. Maybe
the controller's bad then?

--
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    | <[hidden email]>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: bad primary superblock - bad magic number !!!

Stan Hoeppner
In reply to this post by stress_buster
Leo Davis put forth on 5/13/2010 6:00 AM:
>> If I understand correctly, c0d0 represents a drive (the first one).
>> Apparently this drive is dead, or close. You should probably ditch it.
>
> nope, c0d0 represents ( ControllerNumber[c0] LogicalDriveNumber[d0] )
> so its 12 disks in 2 partitions- c0d0 and c0d1
> c0d0 holds configuration information
> I boot from a different device, the raid set is used only for storing data.

Which model SmartArray controller is this?  Is it SCSI, SAS, or SATA?

If SAS or SATA, is there an expander in the enclosure?

What model is the external drive enclosure which houses the 12 drives?

--
Stan

_______________________________________________
xfs mailing list
[hidden email]
http://oss.sgi.com/mailman/listinfo/xfs
Loading...