CodeVerge.Net Beta


   Item Entry   Register  Login  
Microsoft News
Asp.Net Forums
IBM Software
Borland Forums
Adobe Forums
Novell Forums




Can Reply:  Yes Members Can Edit: No Online: Yes
Zone: > NEWSGROUP > Novell Forums > novell.support.cluster-services Tags:
Item Type: Date Entered: 8/26/2009 3:48:16 PM Date Modified: Subscribers: 0 Subscribe Alert
Rate It:
NR
XPoints: N/A Replies: 4 Views: 54 Favorited: 0 Favorite
5 Items, 1 Pages 1 |< << Go >> >|
Don Horsfall <d
NewsGroup User
Master IP stops responding - OES2 SP1 Linux cluster8/26/2009 3:48:16 PM
Reply

0

Hi everyone,

I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
pretty badly building new cluster resources and testing them.

In the process, the master ip service just stops responding. I usually
see it in iManager not being able to read the cluster status.

It also shows if I do a cluster status (or any other cluster command)
on the node hosting the master. The command just hangs.

Cluster commands issued from other nodes work fine.

The only fix I've found is to restart the node holding the master ip.
This, of course, forces the master ip to move to another server and
everything works fine again.

This is a relatively small cluster. I am also responsible for a dual
21-node BCCed cluster currently running beautifully on NetWare. I'm
going to have to move this to Linux (obviously). I'd prefer not to
have this kind of problem in that environment.

Has anyone seen this? Any clue what's going on?

Thanks,

Don
changju <changj
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster8/27/2009 3:26:03 PM
Reply

0


I wish I could get my hands on your cluster to diagnose the problems.

If possible, please turn on NCS tracing (echo -n "TRACE ON" >
/proc/ncs/cluster), adminfs debugging (echo -n "debug" >
/admin/adminfs.cmd) and check /var/log/messages for clues on what might
cause the problems.


Regards,

Changju


Don Horsfall;1847221 Wrote:
> Hi everyone,
>
> I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
> pretty badly building new cluster resources and testing them.
>
> In the process, the master ip service just stops responding. I usually
> see it in iManager not being able to read the cluster status.
>
> It also shows if I do a cluster status (or any other cluster command)
> on the node hosting the master. The command just hangs.
>
> Cluster commands issued from other nodes work fine.
>
> The only fix I've found is to restart the node holding the master ip.
> This, of course, forces the master ip to move to another server and
> everything works fine again.
>
> This is a relatively small cluster. I am also responsible for a dual
> 21-node BCCed cluster currently running beautifully on NetWare. I'm
> going to have to move this to Linux (obviously). I'd prefer not to
> have this kind of problem in that environment.
>
> Has anyone seen this? Any clue what's going on?
>
> Thanks,
>
> Don


--
changju
------------------------------------------------------------------------
changju's Profile: http://forums.novell.com/member.php?userid=15279
View this thread: http://forums.novell.com/showthread.php?t=384359

Don Horsfall <d
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster8/27/2009 4:58:28 PM
Reply

0

Changju,

I'll turn on the traces you suggest and see what I get. This one's a
little harder because I don't know when it will happen. I don't have
any track on what causes it.

I'll post the traces when I get them. Next best thing to being here,
right :-)

This little thing's been a challenge but better to work through it and
fully understand it before I tackle the big dog: dual 21-node BCCed
clusters hosting 60 active resources.

Thanks again.

Regards,

Don



changju <changju@no-mx.forums.novell.com> wrote:


I wish I could get my hands on your cluster to diagnose the problems.

If possible, please turn on NCS tracing (echo -n "TRACE ON" >
/proc/ncs/cluster), adminfs debugging (echo -n "debug" >
/admin/adminfs.cmd) and check /var/log/messages for clues on what
might
cause the problems.


Regards,

Changju


Don Horsfall;1847221 Wrote:
> Hi everyone,
>
> I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
> pretty badly building new cluster resources and testing them.
>
> In the process, the master ip service just stops responding. I usually
> see it in iManager not being able to read the cluster status.
>
> It also shows if I do a cluster status (or any other cluster command)
> on the node hosting the master. The command just hangs.
>
> Cluster commands issued from other nodes work fine.
>
> The only fix I've found is to restart the node holding the master ip.
> This, of course, forces the master ip to move to another server and
> everything works fine again.
>
> This is a relatively small cluster. I am also responsible for a dual
> 21-node BCCed cluster currently running beautifully on NetWare. I'm
> going to have to move this to Linux (obviously). I'd prefer not to
> have this kind of problem in that environment.
>
> Has anyone seen this? Any clue what's going on?
>
> Thanks,
>
> Don
davidkrotil <da
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster8/31/2009 12:56:02 PM
Reply

0


Have you installed any patches ? If you don�t, then do it. Without
latest stability patches is NCS on OES2 is hard to run in production.


--
David Krotil, CNE
------------------------------------------------------------------------
davidkrotil's Profile: http://forums.novell.com/member.php?userid=884
View this thread: http://forums.novell.com/showthread.php?t=384359

Don Horsfall <d
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster9/10/2009 2:23:42 PM
Reply

0

changju,

I didn't get the traces -- ran out of time. The problem did occur
again though.

This time, node 3 was hosting the master when it went dead. I shut
down node 3 and the master moved to node 2. When it loaded on node 2,
it was still dead. I had to wait for node 3 to come back, move
resources off of node 2, and restart node 2. Master then moved to node
3 and was fine.

Just before this happened, I had just defined a new cluster resource
using node 1. I presented the lun and forced a FC scan on node 1
followed by a PowerPath command to get it to recognize the newly
discovered lun.

I then proceeded to create the cluster resource using nssmu on node 1.

My procedure has been to follow this with the same set of commands on
the other two nodes to get them to see the newly presented lun and its
partition. It seems that while they see the lun device, they don't
resolve the partition I'd put onto it. No idea why.

Somewhere during this process, the master stopped responding to
iManager queries and to local cluster commands (ie., cluster status).

I'm still not sure of a cause/effect relationship here. I'll see if I
can chase it down when I get back on site.

If this is the source of the problem, it's a real pain. Adding or
expanding a resource requires reboot of each node in the cluster? Not
a big problem with three nodes, but a major headache with two 21-node
clusters.

Thanks and, as usual, all observations welcome.

Regards,

Don



changju <changju@no-mx.forums.novell.com> wrote:


I wish I could get my hands on your cluster to diagnose the problems.

If possible, please turn on NCS tracing (echo -n "TRACE ON" >
/proc/ncs/cluster), adminfs debugging (echo -n "debug" >
/admin/adminfs.cmd) and check /var/log/messages for clues on what
might
cause the problems.


Regards,

Changju


Don Horsfall;1847221 Wrote:
> Hi everyone,
>
> I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
> pretty badly building new cluster resources and testing them.
>
> In the process, the master ip service just stops responding. I usually
> see it in iManager not being able to read the cluster status.
>
> It also shows if I do a cluster status (or any other cluster command)
> on the node hosting the master. The command just hangs.
>
> Cluster commands issued from other nodes work fine.
>
> The only fix I've found is to restart the node holding the master ip.
> This, of course, forces the master ip to move to another server and
> everything works fine again.
>
> This is a relatively small cluster. I am also responsible for a dual
> 21-node BCCed cluster currently running beautifully on NetWare. I'm
> going to have to move this to Linux (obviously). I'd prefer not to
> have this kind of problem in that environment.
>
> Has anyone seen this? Any clue what's going on?
>
> Thanks,
>
> Don
5 Items, 1 Pages 1 |< << Go >> >|


Free Download:







how to create a single installer

wdp for server control

naming the assymbly dll the same as the project

web deployment project error on website with auto-refresh dll references.

excluding config files

web deployment project ,run database script,apply patch of web application

is there any way to simulate the publish on a asp net web service application from the command line

missing section appsettings

data at the root level is invalid. line 1, position 1

the file has not been pre-compiled error, but inconsistent

bug in web deployment project with pages/control methods that contain optional parameters

entering port on installation?

runtime "cannot load file or assembly"

problem with makedir

web deployment project and crystal reports

cannot see the add deployment project under build menu after installing

including .net framework 2.0 with deployment projects

special characters get removed with wdp.

problem with aspnet_merge.exe

the installer was interrupted before xx

web deployment problem ... please, need help.

wdp versions ... april 06

asp.net 2.0: global.asax code-behind and web deployment project problem

outputpath property is not set for this project

how do i not overwrite image folder when publishing

wdp for vs2008 still doesn't allow web.config file section replacement of 'system.net'

web deployment project failing for web application project

bug using checkbox dialog to impose an install condition on a merge module

how to add a wdp to a web application project

web.config file section replacement - source files in directory

asp_merge.exe

error: aspnet_merge.exe exited with code 1

error 5 the target directory is not empty - beforebuild

teambuild to deploy deployment project website to remote server

error in web deploy project whit profile's properties

migrating a database as part of the deployment project?

can't get excludefrombuild to work....

how to deploy .net website on webserver

where is web deployment project after installing sp1?

problem: individual assemby:.cs files

excluding replacement .config files from deployment?

problem whith satellite assemblies (multiple languages)

web deployment project copies .csproj and .sln files

visual source safe web project deployment

how to specify a target installation folder for web setup project in visual studio 2005?

"index was outside the bounds of the array" error

web setup cannot run in different server

.exe deployment output file sizes differ between developers

how to create a single setup.exe file

warning msb3245: could not resolve this reference

   
  Privacy | Contact Us
All Times Are GMT