 | crivera <criver | | NewsGroup User |
| Re: Randomly problem in 12 nodes cluster. | 8/6/2008 3:46:01 PM |
Reply
| 0 |   |
| Hi! Thanks Andrew for you reply.
We change the values by recommendation of Novell in the old File&Print Cluster with 8 nodes with NetWare 6.0 SP5, but... I will try to drop to 16 and to test it.
The servers have two dual NIC:
HP NC360T PCIe DP Gigabit Server Adapter. HP NC371i Multifunction Gigabit Server Adapter
Lan drivers and versions: N1000E.LAN Loaded from [C:\NWSERVER\DRIVERS\] on Aug 1, 2008 3:47:35 pm (Address Space = OS) HP NC-Series Intel N1E Ethernet driver Version 10.38 February 4, 2007 Copyright 1998, 2007 Hewlett-Packard Development Company, L.P. BX2.LAN Loaded from [C:\NWSERVER\DRIVERS\] on Aug 1, 2008 3:47:35 pm (Address Space = OS) Broadcom NetXtreme II Gigabit Ethernet Driver Version 3.41 April 30, 2007 Copyright (c) 2002 Broadcom Corporation. All rights reserved.
Certainly, the resources report comatose state and the node that contains it ate the poison pill and reports an abend. But the problem iof comatose state is randomly without some logical order.
Sorry by the mistake in the post.
I checked the load and unload scripts and apparently there is not problems with the times of load or unload, neither with the IP Address , neither with the protected memory
Thanks.
-- crivera ------------------------------------------------------------------------ crivera's Profile: http://forums.novell.com/member.php?userid=16009 View this thread: http://forums.novell.com/showthread.php?t=338836
|
 | Tim Heywood NSC | | NewsGroup User |
| Re: Randomly problem in 12 nodes cluster. | 8/11/2008 4:47:15 PM |
Reply
| 0 |   |
| On Fri, 08 Aug 2008 18:56:02 +0000, crivera wrote:
> Thanks Andrew for you help. > > We are made changes one at time (one by one) and we are monitoring the > cluster. > > The first change has been to eliminate the NIC balancing and we are > monitoring the cluster. > > If the problem is repeated we will modify the timeouts. > > Each node of the cluster have 2 dual NICs. .- N1000E > .- BX2 > > A port of the N1000E NIC is connected to the Cisco. A port of the other > NIC (BX2) is connected to the other Cisco. In other words, the 12 nodes > are connected to the 2 Cisco. > > The problem happen in different nodes and with different resources. > > I agree with you, may be the problem is in the LAN, but I can´t probe... > by the moments. Also, I have 4 cluster more and it works without > problems, only this cluster with 12 nodes have this problem.
The BX2.LAN driver is a known problem - the 3.41 isn't the worst (do noy use 3.70) but Broadcom has released a new version 4.41 with some help from Novell. You can get the new driver from: http://www.broadcom.com/support/ethernet_nic/netxtremeii.php
One of the problems with the BX2 is that it does not reset properly (or the driver does not reset the card, therefore the larger the cluster - the more likely the issue.
-- Tim ___________________ Tim Heywood (SYSOP) NDS8 Scotland (God's Country) www.nds8.co.uk ___________________
In theory, practice and theory are the same In Practice, they are different
|
|