CodeVerge.Net Beta


   Item Entry   Register  Login  
Microsoft News
Asp.Net Forums
IBM Software
Borland Forums
Adobe Forums
Novell Forums




Can Reply:  Yes Members Can Edit: No Online: Yes
Zone: > NEWSGROUP > Novell Forums > novell.support.cluster-services Tags:
Item Type: Date Entered: 1/29/2009 1:46:02 PM Date Modified: Subscribers: 0 Subscribe Alert
Rate It:
NR
XPoints: N/A Replies: 12 Views: 14 Favorited: 0 Favorite
13 Items, 1 Pages 1 |< << Go >> >|
cperilli <cperi
NewsGroup User
Cluster resources go comatose on node failure1/29/2009 1:46:02 PM
Reply

0


We have a 7 node NW 6.5SP7 NCS cluster. If a node abends or is manually
taken out of the cluster, sometimes one or more of the cluster resources
on that node go comatose rather than failing over. Simply offlining and
onlining the resource gets it back in operation. This appears to be a
recent occurrence (maybe after SP7?) as I did not notice this happening
in the past. I have not seen the issue when manually migrating
resources, only on node failure or cluster leave. Any suggestions on
how to troubleshoot this? Thanks!

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Bking03 <Bking0
NewsGroup User
Re: Cluster resources go comatose on node failure1/29/2009 5:36:02 PM
Reply

0


Before you take the node out of the cluster can you offline or migrate
the resources without a problem? If you have any problems there look at
the logger screen for errors.


--
Bking03
------------------------------------------------------------------------
Bking03's Profile: http://forums.novell.com/member.php?userid=5893
View this thread: http://forums.novell.com/showthread.php?t=358487

Craig Johnson <
NewsGroup User
Re: Cluster resources go comatose on node failure1/29/2009 6:26:00 PM
Reply

0

In article <cperilli.3ms1yn@no-mx.forums.novell.com>, Cperilli wrote:
> sometimes one or more of the cluster resources
> on that node go comatose rather than failing over.
>
Sometimes?

If the cluster load or unload script has a typo and doesn't complete
normally, the resource will go comatose. (For that reason, I like to
put my commands into an NCF file and just run that as a script, as it
can be a bit more fault-tolerant to little syntax errors). Perhaps
something changed in the SP that affected a load command. For
instance, people used to use ? as a delay command, and after one of the
support packs you could no longer have a space after the ? or the
command wouldn't be seen.

Craig Johnson
Novell Support Connection SysOp

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure1/29/2009 10:54:41 PM
Reply

0

Hi,

cperilli wrote:
>
> We have a 7 node NW 6.5SP7 NCS cluster. If a node abends or is manually
> taken out of the cluster, sometimes one or more of the cluster resources
> on that node go comatose rather than failing over. Simply offlining and
> onlining the resource gets it back in operation. This appears to be a
> recent occurrence (maybe after SP7?) as I did not notice this happening
> in the past. I have not seen the issue when manually migrating
> resources, only on node failure or cluster leave. Any suggestions on
> how to troubleshoot this? Thanks!

A cluster resource usually goes comatose when either the load or the
unload scripts detect an error. In case of a node failing, unload
scripts of course don't come into play, so it must be the load script
that fails. Have you checked the output of the cluste resource screen?
Also, let's see the load script that fails.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/2/2009 1:06:02 PM
Reply

0


Here's the load script for one of the resources (with a few IP and name
changes for security purposes):
nss /poolactivate=USER30
mount USER30 VOLID=236
CLUSTER CVSBIND ADD VOL-USER30 10.2.33.49
NUDP ADD VOL-USER30 10.2.33.49
CIFS ADD .CN=VOL-USER30.OU=HQ.O=MYORG.T=MYTREE.
add secondary ipaddress 10.3.33.49
addsec 192.168.20.79

The last line calls this NCF:
;ADDSEC.NCF
;Called by cluster load script to avoid comatose resource if backup LAN
down.
add secondary ipaddress %1

Why two secondary IPs? Our enterprise backup system (EMC/Legato) runs
on an isolated LAN to keep backup traffic off primary LAN. There is a
dedicated NIC in each server connected to the backup LAN. Each cluster
resource needs an IP address on this LAN in order to be "seen" by the
backup server.

This setup has been working fine for years. The intermittent comatose
resource issue has only popped up in the last few months.

Thanks,

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/2/2009 1:06:02 PM
Reply

0


add secondary ipaddress 10.3.33.49

This was a typo as a result of me changing the IPs to something bogus.
It's correct in the actual script.


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/2/2009 4:46:03 PM
Reply

0


Found these entries in logger. Added a couple of comments starting with
***

CLUSTER-<WARNING>-<10310>: ***** RESOURCE USER30_SERVER HAS BEEN PUT
INTO COMAT
OSE BECAUSE IT MAY CAUSE SERVER ABEND *****

CLUSTER-<WARNING>-<10310>: ***** RESOURCE DATA20_SERVER HAS BEEN PUT
INTO COMAT
OSE BECAUSE IT MAY CAUSE SERVER ABEND *****

***THIS RESOURCE FAILED OVER ON ITS OWN
Activating pool "TRAIN1"...

** Pool layout v43.02

** Previous clean shutdown detected (consistency check OK)

** Loading system objects

** Processing volume purge log

** .

** Processing pool purge log

** .

Loading volume "TRAIN1"

Volume TRAIN1 set to the DEACTIVE state.

Pool TRAIN1 set to the ACTIVE state.

Activating volume "TRAIN1"...

** Volume layout v36.03

** Volume creation layout v36.01

** Processing volume purge log

** .................

The share point "TRAIN1" is now active on volume TRAIN1.

Volume TRAIN1 set to the ACTIVE state.

The share point "TRAIN1" is now active on volume TRAIN1.

CIFS: Adding server = CIFS-TRAIN1, comment = Cluster Virtual CIFS
Server


TCPIP-6.81-185: Wed Jan 28 14:26:40 2009

Added secondary IP address 10.2.33.47.



TCPIP-6.81-185: Wed Jan 28 14:26:45 2009

Added secondary IP address 192.168.20.239.

***OFFLINED/ONLINED THIS RESOURCE AND IT CAME UP FINE.
Activating pool "USER30"...

** Pool layout v43.02

** Processing journal

** 1 uncommitted transaction(s)

** 25 Redo(s), 0 Undo(s), 1 Logical Undo(s)

** System verification completed

** Loading system objects

** Processing volume purge log

** ......

** Processing pool purge log

** .

Loading volume "USER30"

Volume USER30 set to the DEACTIVE state.

Pool USER30 set to the ACTIVE state.

Activating volume "USER30"...

** Volume layout v36.03

** Volume creation layout v36.01

** Processing volume purge log

** ................................

The share point "USER30" is now active on volume USER30.

Volume USER30 set to the ACTIVE state.

The share point "USER30" is now active on volume USER30.

CIFS: Adding server = CIFS-USER30, comment = Cluster Virtual CIFS
Server


TCPIP-6.81-185: Wed Jan 28 14:33:59 2009

Added secondary IP address 10.2.33.49.



TCPIP-6.81-185: Wed Jan 28 14:34:04 2009

Added secondary IP address 192.168.20.79.


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure2/3/2009 1:36:29 AM
Reply

0

Hi,

cperilli wrote:
>
> Found these entries in logger. Added a couple of comments starting with
> ***
>
> CLUSTER-<WARNING>-<10310>: ***** RESOURCE USER30_SERVER HAS BEEN PUT
> INTO COMAT
> OSE BECAUSE IT MAY CAUSE SERVER ABEND *****

Ah. Cascading failover protection is hitting. As for why, that is a good
question. Assuming you have your own failover scheme that prevents this,
I would disable it. See TID 3005414

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/3/2009 2:06:02 PM
Reply

0


The node failures were caused by an abend in COMN.NSS. I have not yet
installed N65NSS7A which I'm hoping fixes the COMN.NSS abend issue. I
plan to install on all cluster nodes this weekend.

I'm wondering if the resources on the failed node are being quarantined
because the node failure was NSS related. My plan is to proceed with
the N65NSS7A install and see what happens. I'll try some forced node
failures and see if I'm still getting comatosed resources. If not, I'll
hopefully assume the problem is fixed. I'd rather not disable cascade
preventation if I don't have too.

Thanks,

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure2/3/2009 7:45:47 PM
Reply

0

Hi,

cperilli wrote:
>
> I'm wondering if the resources on the failed node are being quarantined
> because the node failure was NSS related.

That is very likely, yes.

> My plan is to proceed with
> the N65NSS7A install and see what happens. I'll try some forced node
> failures and see if I'm still getting comatosed resources. If not, I'll
> hopefully assume the problem is fixed. I'd rather not disable cascade
> preventation if I don't have too.

Well, cascade failure prevention is really only useful on two node
clusters, where you can't prevent it your self without making the whole
cluster useless. Most people disable it anyways on clusters with more
than two nodes.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/5/2009 1:56:02 PM
Reply

0


Maybe I don't fully understand cascade failover protection. If a
resource is "bad" and causes a node to abend, what would keep it from
abending one node after another in a multi-node (in our case 7) cluster?
Thanks!

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure2/5/2009 7:19:23 PM
Reply

0

Hi,

cperilli wrote:
>
> Maybe I don't fully understand cascade failover protection. If a
> resource is "bad" and causes a node to abend, what would keep it from
> abending one node after another in a multi-node (in our case 7) cluster?

That you do (hopefully, you really should), specify which cluster nodes
a given resource is allowed to run on, and that way stop the node from
failing over indefinitely. E.G, on a 7 node cluster, you should never
need to have all nodes in the list of allowed nodes a resource can fail
over to. I usually don't do more than three per resource.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/9/2009 12:26:02 PM
Reply

0


Ah...makes sense. Sounds easy enough to implement. Thanks!

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

13 Items, 1 Pages 1 |< << Go >> >|


Free Download:







replicas on cluster servers?

novell cifs on clustered nss pool

clustering iprint oes2sp1 linux cluster

caveats with mixed ncs cluster (netware and oes2 sp1 linux)

cannot see volumes not on master node

nw6.5 sp6 cluster device not staying sharable after reboot

cluster crashes

poison pill given to cluster node

migrating oes 1 (linux) cluster to new san

nfs exports on clustered volumes oes2 sp1 linux

cluster resource on 2 different subnets?

ncs error and storaeg error in imanager 2.7.2

migrate or offline cluster resource kills ndsd

expand a cluster volume - oes2 linux

ncs with 8 nodes

create ssl certificate on cluster ressources

pool and ressource load/unload/migrate

multipath oes2 linux - server reboot after fc port disable

cluster error - bonded nics

cluster resource comatose with cifs

abend - at least one of the nodes is alive in the old master's node partition

sbd partition on drbd

virtual ncp server displaying all ncp shares

heartbeat iscsi filesystem

oes2 linux mirroring iprint ext3 partition in cluster

oes2 sp1, 3 node cluster migrating to new san

imanager cluster management fails on linux

length of load/unlaod scripts must be less then 924 characters

how do i remove 1 node from 2 node cluster?

recreate cluster volumes

ibm x3650 hangs when joining cluster

can't install ncs on oes2

dns volume fills overnight

cifs question

imanager cluster manager

advice - avoid mixed oes 1 and oes 2 clusters !!!

how do i avoid coredump and abend.log prompt?

equallogic volume snapshots

oes2sp1 ncs

can't add new servers to existing cluster

performance - ncs with nss / ext3 / reiserfs

new oes2 w/clustering and all latest updates

how to delete old sbd partition

amd or intel processor?

move resources between clusters

adminfs: error 21702 from the write function

strange node joining problem

cluster 2 sites / bcc

running ncs on xen host (dom 0) - can you run imanager on the xen host?

cluster-<info>-<211>: wsasetservice() failed, error = -1

   
  Privacy | Contact Us
All Times Are GMT