Difference: BeowulfCluster ( vs. 1)

Revision 12015-11-10 - TWikiGuest

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"
Server Documentation:

Switches have been stacked, but management interface still considers them as two separate switches with the same IP
NOTE: 9/5/2010 Turning on the 2nd Switch causes the blades not to be able to communicate any longer, and the NICs on the machines flash green very rapidly

Labelled the cables in the following way:

ControllerNode: Main server with fiber connection is node "00"

Remaining servers each have 2 ethernet data ports denoted node / 1 and node / 2, where n is the servers 01 to 10
Top Port (eth1) is hotpluggable to an internet connection (X.X.X.142 on our local lab network)
Bottom Port (eth0) is set up as IP 10.10.10.1NODE, so IPs 100-110 are reserved for the Nodes

Both disks appear as one disk called: /dev/cciss/c0d0 to the operating system

SlaveNode Configuration:


Drives are configured as a logical volume via the HP Smart Array P400 RAID controller (Manual), RAID 0 ("fault tolerance")
Not sure what the other configuration settings are--ask Sirhan

Partition table of logical drive---must be connected using the RAID Controller (press F8 to get into it and set up the logical drives)

settings of logical drivess: Bays 1 and 2 or 3 and 4 (paired)
Clonezilla will refer to the drives
p1 1 2432 primary LINUX
p2 2433 7295 primary LINUX
p3 7296 10942 primary LINUX
p4 10943 end EXTENDED
p5 10943 17021 EXTENDED

63 sectors
121
594 cylinders
16065 * 512 = 8225280 bytes
-----
[original defaults read:
32 sectors
239389 cylinders

8160 * 612 = 4177920 bytes]

ISSUES:
RAID controller doesn't like swapping of drives for Logical Volumes

NOTE ABOUT LOGICAL DRIVE MANAGEMENT:
If there are 2 logical volumes (4 drives) and bay 3 and 4 are removed, and replaced, the HP Smart Array controller will not detect ANY logical volumes, unless the old one is properly deleted before inserting new disks

FAILED ATTEMPT TO CLONE:
/opt/drbl/sbin/ocs-onthefly -g auto -e1 auto -e2 -j2 -v -f sda -t sdb

dd if=/dev/sda of=/tmp/ocx

FRESH INSTALLATION ON A NODE:

1. SET UP THE RAID DISK ARRAY
- At the HP Smart Array P400 Controller Initialization screen, press F8 (it passes quickly so be ready)
2. I Installed the TESTING snapshot of Debian. I use the business card CD, install the base system, and only have the System Utilities and SSH server installed
3. configure /etc/hosts /etc/hostname /etc/network/interfaces appropriately (fill this in)
4. install the following package: openmpi-common openmpi-bin libopenmpi1.3 libopenmpi-dev emacs23-nox and openssh-server
5. I think something needs to be done with SWAP files


Currently Nodes 01-03 and 05-10 are set up with EXT4 + LVM + NFS, but this seems to fail with "scattertime" test (NFS hangs)

Node 04 is configured with ext3 + nfs and no LVM

We noticed that ext4 + nfs + LVM was causing problems (the system was hanging), so we have gone back to the original partition table (above) with ext3 partitions.

9/7/2010

We also had an issue with the speed of the ethernet cards. Rather than running at gigabit speed (1000BaseT), it was running at 10BaseT half duplex. This was determined by examining /var/log/messages file or typing "dmesg" immediately after unplugging and replugging the cable back in. A simple reboot solved the problem, but we are unsure what set it to that mode in the first place.

-- StephenFox - 2010-08-23

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback