CodeVerge.Net Beta


   Item Entry   Register  Login  
Microsoft News
Asp.Net Forums
IBM Software
Borland Forums
Adobe Forums
Novell Forums




Can Reply:  Yes Members Can Edit: No Online: Yes
Zone: > NEWSGROUP > Novell Forums > novell.support.cluster-services Tags:
Item Type: Date Entered: 1/29/2009 1:46:02 PM Date Modified: Subscribers: 0 Subscribe Alert
Rate It:
NR
XPoints: N/A Replies: 12 Views: 6 Favorited: 0 Favorite
13 Items, 1 Pages 1 |< << Go >> >|
cperilli <cperi
NewsGroup User
Cluster resources go comatose on node failure1/29/2009 1:46:02 PM
Reply

0


We have a 7 node NW 6.5SP7 NCS cluster. If a node abends or is manually
taken out of the cluster, sometimes one or more of the cluster resources
on that node go comatose rather than failing over. Simply offlining and
onlining the resource gets it back in operation. This appears to be a
recent occurrence (maybe after SP7?) as I did not notice this happening
in the past. I have not seen the issue when manually migrating
resources, only on node failure or cluster leave. Any suggestions on
how to troubleshoot this? Thanks!

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Bking03 <Bking0
NewsGroup User
Re: Cluster resources go comatose on node failure1/29/2009 5:36:02 PM
Reply

0


Before you take the node out of the cluster can you offline or migrate
the resources without a problem? If you have any problems there look at
the logger screen for errors.


--
Bking03
------------------------------------------------------------------------
Bking03's Profile: http://forums.novell.com/member.php?userid=5893
View this thread: http://forums.novell.com/showthread.php?t=358487

Craig Johnson <
NewsGroup User
Re: Cluster resources go comatose on node failure1/29/2009 6:26:00 PM
Reply

0

In article <cperilli.3ms1yn@no-mx.forums.novell.com>, Cperilli wrote:
> sometimes one or more of the cluster resources
> on that node go comatose rather than failing over.
>
Sometimes?

If the cluster load or unload script has a typo and doesn't complete
normally, the resource will go comatose. (For that reason, I like to
put my commands into an NCF file and just run that as a script, as it
can be a bit more fault-tolerant to little syntax errors). Perhaps
something changed in the SP that affected a load command. For
instance, people used to use ? as a delay command, and after one of the
support packs you could no longer have a space after the ? or the
command wouldn't be seen.

Craig Johnson
Novell Support Connection SysOp

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure1/29/2009 10:54:41 PM
Reply

0

Hi,

cperilli wrote:
>
> We have a 7 node NW 6.5SP7 NCS cluster. If a node abends or is manually
> taken out of the cluster, sometimes one or more of the cluster resources
> on that node go comatose rather than failing over. Simply offlining and
> onlining the resource gets it back in operation. This appears to be a
> recent occurrence (maybe after SP7?) as I did not notice this happening
> in the past. I have not seen the issue when manually migrating
> resources, only on node failure or cluster leave. Any suggestions on
> how to troubleshoot this? Thanks!

A cluster resource usually goes comatose when either the load or the
unload scripts detect an error. In case of a node failing, unload
scripts of course don't come into play, so it must be the load script
that fails. Have you checked the output of the cluste resource screen?
Also, let's see the load script that fails.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/2/2009 1:06:02 PM
Reply

0


Here's the load script for one of the resources (with a few IP and name
changes for security purposes):
nss /poolactivate=USER30
mount USER30 VOLID=236
CLUSTER CVSBIND ADD VOL-USER30 10.2.33.49
NUDP ADD VOL-USER30 10.2.33.49
CIFS ADD .CN=VOL-USER30.OU=HQ.O=MYORG.T=MYTREE.
add secondary ipaddress 10.3.33.49
addsec 192.168.20.79

The last line calls this NCF:
;ADDSEC.NCF
;Called by cluster load script to avoid comatose resource if backup LAN
down.
add secondary ipaddress %1

Why two secondary IPs? Our enterprise backup system (EMC/Legato) runs
on an isolated LAN to keep backup traffic off primary LAN. There is a
dedicated NIC in each server connected to the backup LAN. Each cluster
resource needs an IP address on this LAN in order to be "seen" by the
backup server.

This setup has been working fine for years. The intermittent comatose
resource issue has only popped up in the last few months.

Thanks,

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/2/2009 1:06:02 PM
Reply

0


add secondary ipaddress 10.3.33.49

This was a typo as a result of me changing the IPs to something bogus.
It's correct in the actual script.


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/2/2009 4:46:03 PM
Reply

0


Found these entries in logger. Added a couple of comments starting with
***

CLUSTER-<WARNING>-<10310>: ***** RESOURCE USER30_SERVER HAS BEEN PUT
INTO COMAT
OSE BECAUSE IT MAY CAUSE SERVER ABEND *****

CLUSTER-<WARNING>-<10310>: ***** RESOURCE DATA20_SERVER HAS BEEN PUT
INTO COMAT
OSE BECAUSE IT MAY CAUSE SERVER ABEND *****

***THIS RESOURCE FAILED OVER ON ITS OWN
Activating pool "TRAIN1"...

** Pool layout v43.02

** Previous clean shutdown detected (consistency check OK)

** Loading system objects

** Processing volume purge log

** .

** Processing pool purge log

** .

Loading volume "TRAIN1"

Volume TRAIN1 set to the DEACTIVE state.

Pool TRAIN1 set to the ACTIVE state.

Activating volume "TRAIN1"...

** Volume layout v36.03

** Volume creation layout v36.01

** Processing volume purge log

** .................

The share point "TRAIN1" is now active on volume TRAIN1.

Volume TRAIN1 set to the ACTIVE state.

The share point "TRAIN1" is now active on volume TRAIN1.

CIFS: Adding server = CIFS-TRAIN1, comment = Cluster Virtual CIFS
Server


TCPIP-6.81-185: Wed Jan 28 14:26:40 2009

Added secondary IP address 10.2.33.47.



TCPIP-6.81-185: Wed Jan 28 14:26:45 2009

Added secondary IP address 192.168.20.239.

***OFFLINED/ONLINED THIS RESOURCE AND IT CAME UP FINE.
Activating pool "USER30"...

** Pool layout v43.02

** Processing journal

** 1 uncommitted transaction(s)

** 25 Redo(s), 0 Undo(s), 1 Logical Undo(s)

** System verification completed

** Loading system objects

** Processing volume purge log

** ......

** Processing pool purge log

** .

Loading volume "USER30"

Volume USER30 set to the DEACTIVE state.

Pool USER30 set to the ACTIVE state.

Activating volume "USER30"...

** Volume layout v36.03

** Volume creation layout v36.01

** Processing volume purge log

** ................................

The share point "USER30" is now active on volume USER30.

Volume USER30 set to the ACTIVE state.

The share point "USER30" is now active on volume USER30.

CIFS: Adding server = CIFS-USER30, comment = Cluster Virtual CIFS
Server


TCPIP-6.81-185: Wed Jan 28 14:33:59 2009

Added secondary IP address 10.2.33.49.



TCPIP-6.81-185: Wed Jan 28 14:34:04 2009

Added secondary IP address 192.168.20.79.


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure2/3/2009 1:36:29 AM
Reply

0

Hi,

cperilli wrote:
>
> Found these entries in logger. Added a couple of comments starting with
> ***
>
> CLUSTER-<WARNING>-<10310>: ***** RESOURCE USER30_SERVER HAS BEEN PUT
> INTO COMAT
> OSE BECAUSE IT MAY CAUSE SERVER ABEND *****

Ah. Cascading failover protection is hitting. As for why, that is a good
question. Assuming you have your own failover scheme that prevents this,
I would disable it. See TID 3005414

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/3/2009 2:06:02 PM
Reply

0


The node failures were caused by an abend in COMN.NSS. I have not yet
installed N65NSS7A which I'm hoping fixes the COMN.NSS abend issue. I
plan to install on all cluster nodes this weekend.

I'm wondering if the resources on the failed node are being quarantined
because the node failure was NSS related. My plan is to proceed with
the N65NSS7A install and see what happens. I'll try some forced node
failures and see if I'm still getting comatosed resources. If not, I'll
hopefully assume the problem is fixed. I'd rather not disable cascade
preventation if I don't have too.

Thanks,

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure2/3/2009 7:45:47 PM
Reply

0

Hi,

cperilli wrote:
>
> I'm wondering if the resources on the failed node are being quarantined
> because the node failure was NSS related.

That is very likely, yes.

> My plan is to proceed with
> the N65NSS7A install and see what happens. I'll try some forced node
> failures and see if I'm still getting comatosed resources. If not, I'll
> hopefully assume the problem is fixed. I'd rather not disable cascade
> preventation if I don't have too.

Well, cascade failure prevention is really only useful on two node
clusters, where you can't prevent it your self without making the whole
cluster useless. Most people disable it anyways on clusters with more
than two nodes.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/5/2009 1:56:02 PM
Reply

0


Maybe I don't fully understand cascade failover protection. If a
resource is "bad" and causes a node to abend, what would keep it from
abending one node after another in a multi-node (in our case 7) cluster?
Thanks!

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

Massimo Rosen <
NewsGroup User
Re: Cluster resources go comatose on node failure2/5/2009 7:19:23 PM
Reply

0

Hi,

cperilli wrote:
>
> Maybe I don't fully understand cascade failover protection. If a
> resource is "bad" and causes a node to abend, what would keep it from
> abending one node after another in a multi-node (in our case 7) cluster?

That you do (hopefully, you really should), specify which cluster nodes
a given resource is allowed to run on, and that way stop the node from
failing over indefinitely. E.G, on a 7 node cluster, you should never
need to have all nodes in the list of allowed nodes a resource can fail
over to. I usually don't do more than three per resource.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de
cperilli <cperi
NewsGroup User
Re: Cluster resources go comatose on node failure2/9/2009 12:26:02 PM
Reply

0


Ah...makes sense. Sounds easy enough to implement. Thanks!

-Chuck


--
cperilli
------------------------------------------------------------------------
cperilli's Profile: http://forums.novell.com/member.php?userid=1560
View this thread: http://forums.novell.com/showthread.php?t=358487

13 Items, 1 Pages 1 |< << Go >> >|


Free Download:







cluster 2 sites / bcc

clustering oes linux

zenworks 7 sp1 install on cluster: lacking docs

cluster resource in unassigned state

oes linux missing cluster templates

sbd.nlm fails to load

cluster oes bug with xerox phaser

cluster-<warning>-<12020> / warning 12038

migrating a netware cluster services to a new oes2 cluster

managing linux cluster via console one

move resources between clusters

netstorage clustering secondary ip

dns volume fills overnight

cluster crashes

groupwise 7 cluster resource comatose

strange problem with netware65 cluster

running ncs on xen host (dom 0) - can you run imanager on the xen host?

cluster resource slp info

oes2 cluster on vmware esx

clustered vm resource(server)

upgrade to sp8 / cluster services 1.8.5

virtual server object vs. cluster object - mapping drives

ressource deleted in c1 not deleted in cluster

randomly problem in 12 nodes cluster.

multiple sbd partition on single hard drive?

really?

two clusters

ncp volumes on a cluster resource with posix-permissions

symantec av 1.0 linux & ncs - hangs machine

problem finishing cluster migration to oes2 linux

netstorage clustering on oes2 linux

ncpcon bind fails

resource hangs when unloading

ncs and crontab

clustering newbie questions

uldncs.ncf does not run

cannot find error in kb

vldb does not exit properly during unload script

pool and ressource load/unload/migrate

cluster resources name

clustering intranet

win client does not reconnect to cluster volume

iscsi initiator error

cluster enable existing dhcp, dns, iprint

cluster services access error in imanager

cluster ncp volume object in e-directory

vmware esx reboot loop

amd or intel processor?

cluster resource comatose with cifs

odd cluster behavior

   
  Privacy | Contact Us
All Times Are GMT