CodeVerge.Net Beta


   Item Entry   Register  Login  
Microsoft News
Asp.Net Forums
IBM Software
Borland Forums
Adobe Forums
Novell Forums




Can Reply:  Yes Members Can Edit: No Online: Yes
Zone: > NEWSGROUP > Novell Forums > novell.support.cluster-services Tags:
Item Type: Date Entered: 8/26/2009 3:48:16 PM Date Modified: Subscribers: 0 Subscribe Alert
Rate It:
NR
XPoints: N/A Replies: 4 Views: 29 Favorited: 0 Favorite
5 Items, 1 Pages 1 |< << Go >> >|
Don Horsfall <d
NewsGroup User
Master IP stops responding - OES2 SP1 Linux cluster8/26/2009 3:48:16 PM
Reply

0

Hi everyone,

I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
pretty badly building new cluster resources and testing them.

In the process, the master ip service just stops responding. I usually
see it in iManager not being able to read the cluster status.

It also shows if I do a cluster status (or any other cluster command)
on the node hosting the master. The command just hangs.

Cluster commands issued from other nodes work fine.

The only fix I've found is to restart the node holding the master ip.
This, of course, forces the master ip to move to another server and
everything works fine again.

This is a relatively small cluster. I am also responsible for a dual
21-node BCCed cluster currently running beautifully on NetWare. I'm
going to have to move this to Linux (obviously). I'd prefer not to
have this kind of problem in that environment.

Has anyone seen this? Any clue what's going on?

Thanks,

Don
changju <changj
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster8/27/2009 3:26:03 PM
Reply

0


I wish I could get my hands on your cluster to diagnose the problems.

If possible, please turn on NCS tracing (echo -n "TRACE ON" >
/proc/ncs/cluster), adminfs debugging (echo -n "debug" >
/admin/adminfs.cmd) and check /var/log/messages for clues on what might
cause the problems.


Regards,

Changju


Don Horsfall;1847221 Wrote:
> Hi everyone,
>
> I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
> pretty badly building new cluster resources and testing them.
>
> In the process, the master ip service just stops responding. I usually
> see it in iManager not being able to read the cluster status.
>
> It also shows if I do a cluster status (or any other cluster command)
> on the node hosting the master. The command just hangs.
>
> Cluster commands issued from other nodes work fine.
>
> The only fix I've found is to restart the node holding the master ip.
> This, of course, forces the master ip to move to another server and
> everything works fine again.
>
> This is a relatively small cluster. I am also responsible for a dual
> 21-node BCCed cluster currently running beautifully on NetWare. I'm
> going to have to move this to Linux (obviously). I'd prefer not to
> have this kind of problem in that environment.
>
> Has anyone seen this? Any clue what's going on?
>
> Thanks,
>
> Don


--
changju
------------------------------------------------------------------------
changju's Profile: http://forums.novell.com/member.php?userid=15279
View this thread: http://forums.novell.com/showthread.php?t=384359

Don Horsfall <d
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster8/27/2009 4:58:28 PM
Reply

0

Changju,

I'll turn on the traces you suggest and see what I get. This one's a
little harder because I don't know when it will happen. I don't have
any track on what causes it.

I'll post the traces when I get them. Next best thing to being here,
right :-)

This little thing's been a challenge but better to work through it and
fully understand it before I tackle the big dog: dual 21-node BCCed
clusters hosting 60 active resources.

Thanks again.

Regards,

Don



changju <changju@no-mx.forums.novell.com> wrote:


I wish I could get my hands on your cluster to diagnose the problems.

If possible, please turn on NCS tracing (echo -n "TRACE ON" >
/proc/ncs/cluster), adminfs debugging (echo -n "debug" >
/admin/adminfs.cmd) and check /var/log/messages for clues on what
might
cause the problems.


Regards,

Changju


Don Horsfall;1847221 Wrote:
> Hi everyone,
>
> I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
> pretty badly building new cluster resources and testing them.
>
> In the process, the master ip service just stops responding. I usually
> see it in iManager not being able to read the cluster status.
>
> It also shows if I do a cluster status (or any other cluster command)
> on the node hosting the master. The command just hangs.
>
> Cluster commands issued from other nodes work fine.
>
> The only fix I've found is to restart the node holding the master ip.
> This, of course, forces the master ip to move to another server and
> everything works fine again.
>
> This is a relatively small cluster. I am also responsible for a dual
> 21-node BCCed cluster currently running beautifully on NetWare. I'm
> going to have to move this to Linux (obviously). I'd prefer not to
> have this kind of problem in that environment.
>
> Has anyone seen this? Any clue what's going on?
>
> Thanks,
>
> Don
davidkrotil <da
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster8/31/2009 12:56:02 PM
Reply

0


Have you installed any patches ? If you don�t, then do it. Without
latest stability patches is NCS on OES2 is hard to run in production.


--
David Krotil, CNE
------------------------------------------------------------------------
davidkrotil's Profile: http://forums.novell.com/member.php?userid=884
View this thread: http://forums.novell.com/showthread.php?t=384359

Don Horsfall <d
NewsGroup User
Re: Master IP stops responding - OES2 SP1 Linux cluster9/10/2009 2:23:42 PM
Reply

0

changju,

I didn't get the traces -- ran out of time. The problem did occur
again though.

This time, node 3 was hosting the master when it went dead. I shut
down node 3 and the master moved to node 2. When it loaded on node 2,
it was still dead. I had to wait for node 3 to come back, move
resources off of node 2, and restart node 2. Master then moved to node
3 and was fine.

Just before this happened, I had just defined a new cluster resource
using node 1. I presented the lun and forced a FC scan on node 1
followed by a PowerPath command to get it to recognize the newly
discovered lun.

I then proceeded to create the cluster resource using nssmu on node 1.

My procedure has been to follow this with the same set of commands on
the other two nodes to get them to see the newly presented lun and its
partition. It seems that while they see the lun device, they don't
resolve the partition I'd put onto it. No idea why.

Somewhere during this process, the master stopped responding to
iManager queries and to local cluster commands (ie., cluster status).

I'm still not sure of a cause/effect relationship here. I'll see if I
can chase it down when I get back on site.

If this is the source of the problem, it's a real pain. Adding or
expanding a resource requires reboot of each node in the cluster? Not
a big problem with three nodes, but a major headache with two 21-node
clusters.

Thanks and, as usual, all observations welcome.

Regards,

Don



changju <changju@no-mx.forums.novell.com> wrote:


I wish I could get my hands on your cluster to diagnose the problems.

If possible, please turn on NCS tracing (echo -n "TRACE ON" >
/proc/ncs/cluster), adminfs debugging (echo -n "debug" >
/admin/adminfs.cmd) and check /var/log/messages for clues on what
might
cause the problems.


Regards,

Changju


Don Horsfall;1847221 Wrote:
> Hi everyone,
>
> I just built a new 3-node OES2 SP1 Linux cluster. I'm beating it up
> pretty badly building new cluster resources and testing them.
>
> In the process, the master ip service just stops responding. I usually
> see it in iManager not being able to read the cluster status.
>
> It also shows if I do a cluster status (or any other cluster command)
> on the node hosting the master. The command just hangs.
>
> Cluster commands issued from other nodes work fine.
>
> The only fix I've found is to restart the node holding the master ip.
> This, of course, forces the master ip to move to another server and
> everything works fine again.
>
> This is a relatively small cluster. I am also responsible for a dual
> 21-node BCCed cluster currently running beautifully on NetWare. I'm
> going to have to move this to Linux (obviously). I'd prefer not to
> have this kind of problem in that environment.
>
> Has anyone seen this? Any clue what's going on?
>
> Thanks,
>
> Don
5 Items, 1 Pages 1 |< << Go >> >|


Free Download:







publishing website

web site administration tool question

visual web developer vs visual studio 2005

converting 2003 web application to 2005 results in a problem

objectdatasource control not visible

unable to debug web app - "duplicate name exists" error

visual studio 2005, web projects, sourcesafe and deleting files

visual studio analyzer rpc bridge

problem when i attempt to open my web application in vs2005

paths on localhost with vs 2005

form behind code

some questions about vs 2005

dll names

unable to debug asp.net application

solution - project error

copy web site visual studio 2005

visual studio reporting errors, but compiles fine?

multi project template in vs05

send email to different domain

auto css in html designer

source code metrics

be careful using the publish feature

creating website with subprojects

problem deleting files server side in my asp.net application (vs 2005)

change with events in c# 2.0?

basic q's - query strings, includes, etc

problems with solution explorer

how to add custom database item templates?

error aspruntime: object reference not set to an instance of an object

import j2ee application in team foundation server

code snippet support

posting bug in vs2005

hidden code

strange debug problems

problem while building the solution

build enterprise library in vs2005 beta2 got an error

vs 2005 configuration manager (debug/release?)

when i try to debug aeb site on visual studio 2005 an error occur?

javascript intellisense in asp.net

email and multiple threads, seleted of listbox

app_code folder bug

vs unit testing newbie question

sql server 2005 express edition didn't install.

vs 2005 error: unable to open the web... [need help!]

windows 2000 drive development kit (ddk)

sharepointservices3.0 & visual studio2005

burn vs 2005 pro cd's to one dvd

cannot open solution file (*.sln) in visual studio 2005

debug, release vs2005, app.config

recover value @return_value in a class

   
  Privacy | Contact Us
All Times Are GMT