Bright Cluster Manager

News Unix Configuration Management Tools Recommended Links Software configuration Management Heterogeneous Unix server farms Environment Modules Grid engine
Parallel command execution Config files distribution: copying a file to multiple hosts Slurping: copying a file from multiple hosts Configuration Files Generation PDSH -- a parallel remote shell C3 Tools rdist
Provisioning rsync Software and configuration management using RPM Building RPMs cdist Expect SSH for System Administrators
Chief Ansible puppet Bcfg2 Rex Etch GNU cfengine
etckeeper Red Hat Satellite  LCFG A large scale UNIX configuration system Usage of Relax-and-Recover on RHEL Quattor Bright Cluster Manager synctool
Unix System Monitoring git Midnight Commander Tips and Tricks IBM Remote System Management Tool Webmin Baseliners  
Software Distribution Simple Unix Backup Tools Enterprise Job schedulers WinSCP Tips Sysadmin Horror Stories Humor Etc

Introduction

While it is called cluster manager, this is a essentially a pretty generic Linux configuration management system with some (in only few areas) cluster tilt.  It allows "bare metal" reimaging of nodes from the image (which typically is stored on the headnode). There can be multiple images, one for each type of the node. This is a commercial software developed by Bright computing. Development office is in Amsterdam, NL. Backed by ING Bank as shareholder.

It has two typical problems inherent in such systems:

  1. It creates an additional, pretty complex layer that obstruct viewing and understanding lower layers.  This is especially important for troubleshooting, which is badly affected if you need to debug issues that touch CM functionality.  For example after CM is installed on the headnode you can't change the hostname easily.  Also default solution when nodes use specific private network is suboptimal is cases you need to connect nodes to external env during computations (unless you have extra interface, which is often not the case for blades; you probably can use virtual interfaces, though).
  2. It introduces custom command language, which if you use it episodically is a pain in the neck.  As the language is not used often you need a cheat sheet to use most typical commands. They are not intuitive and the syntax sometimes is pretty weird. For anything more complex then typical operations you depend on CM support, which, actually, is pretty good.

With Red Hat introducing Red HAT for HPC computer node license the mode Bright Manager relies upon is broken is the headnode uses a regular RHEL license: you can't patch the image on the headnode as this is a different flavor of OS.  So you need to switch to Red Hat for HPC for headnode, which is a pain in the neck.

As many complex Unix management systems, it modify many system files in a way you do not understand and that make integration of new software more complex. Look for example at definition of parallel environment in SGE, it contains references to some CM scripts. SGE environment is loaded via modules which are are also in SM directories.

You can learn some useful staff from those modifications, but they create unique troubleshooting problems. Sometime the problem is fictitious and connected with misunderstanding of how CM works.  And whether the game is worth the candles is an open question. The ability seamlessly restore computational node from the image can be implemented several other ways. Beyond that Cluster manager does not represent anything special.

Interesting solutions that were found in CM

There are several interesting parts of CM. Among them:

The working with images part is the most interesting part of Bright Cluster Manager. Few systems implement it as consistently as CM. Here the designers demonstrated some original thinking (for example the role of "boot record" as indicator of how the node should behave).  You can create image from the node and distribute it to other nodes. CM takes care of all customarization needed. If node is configured for network book (or if boot record is absent) CM automatically reimage the node, or synchronized it with image if image already exists. Otherwise you have a regular book. that means that inserting/removing boot image changes the behavious of a group of the server in a very useful way. 

Managing images is done using chroot and is not very convenient, but as there is a possibility to creating image from a node you can do everything on a selected node instead, then create an image from this node and distribute it to other nodes.  

Using Bright reduces the labor and effort needed for management and change control, and also can be used with external clouds (set of virtual machines). Bright offers an expandable and scalable turnkey solution for allocating resources.

It install a lot of useful software, such as pdsh and environment modules. The latter are installed with integrated examples of package which can serve as a framework for developing you aow set of environmental modules. Generally the envmodules supplied are of high quality.

Boot process

By default, nodes boot from the network when using Bright Cluster Manager.This is called a network boot, or sometimes a PXE boot. The head node runs a tftpd server from within xinetd, supplies the boot loader for  the default or assigned to the node software image.

Bright Cluster Manager for HPC lets you administer set of servers (assumed to be clusters, but not necessarily so) as a single entity, provisioning the hardware, operating system, and workload manager from a unified interface.

Bright cluster management daemon keeps an eye on virtually every aspect of every node, and reports any problems it detects in the software or the hardware, so that you can take action and keep your cluster healthy.

Power management

Aspects of power management in Bright Cluster Manager include: That creates some opportunities for power savings, which is extremly important in large clusters. You can for example shut down inactive nodes and bring them back if there are jobs in queue that wait for resources.

User management

As clusters often are used by a large number of researcher user management presents some problems. Cm allow (via The usernodelogin setting of cmsh) to restricts direct user logins from outside the HPC scheduler, and is thus one way of preventing the user from using node resources in an unaccountable manner. The usernodelogin setting is applicable to node categories only, not to individual nodes.

# cmsh
[bright71]% category use default
[bright71->category[default]]% set usernodelogin onlywhenjob
[bright71->category*[default*]]% commit

The attributes for usernodelogin are:

Bright Cluster Manager runs its own LDAP service to manage users, rather than using unix user and group files. That means that  users and groups are managed via the centralizing LDAP database server running (assesble via cmgui) on the head node, and not via entries in /etc/passwd or /etc/group files.

You can use cmsh too. for example
[root@bright71 ~]# cmsh
[bright71]% user
[bright71->user]%
[bright71->user]% add user maureen
[bright71->user*[maureen*]]%
[bright71->user*[maureen*]]% commit
[bright71->user[maureen]]% show
You can set user and group properties via  the set command. Typing set and then either using tab to see the possible completions, or following it up with the enter key, suggests several parameters that can be set, one of which is password:
Example
[bright71->user[maureen]]% set
Name:
set - Set specific user or group property
Usage:

set <parameter>
set user <name> <parameter>
set group <name> <parameter>

You can editing groups with append and remove from commands.   They are used to add extra users to, and remove extra users from a group. For example, it may be useful to have a compiler group so that several users can share access to the intel compiler.

Dell BIOS Management

Dell BIOS management in Bright Cluster Manager means that for nodes that run on Dell hardware, the BIOS settings and BIOS firmware updates can be managed via the standard Bright front end utilities to CMDaemon, cmgui and cmsh.

In turn, CMDaemon configures the BIOS settings and applies firmware updates to each node via a standard Dell utility called racadm. The racadm utility is part of the Dell OpenManage software stack. The Dell hardware supported includes R430, R630, R730, R730XD, R930 FC430, FC630, FC830 M630, M830 and C6320 The utility racadm must be present on the Bright Cluster Manager head node. The utility is installed on the head node if Dell is selected as the node hardware manufacturer during Bright Cluster Manager installation. IPMI must be working on all of the servers. This means that it should be possible to communicate out-of-band from the head node to all of the compute nodes, via the IPMI IP address.

Documentation is junk

That's typical for complex software packages. But still this is pretty annoying.  The this no clear description of cmsh with syntax diagrams, example of most useful commands and such. All you left with is command line help.

Important nuances are not mentioned.  Generally this documentation is useful only in one case: if you never read it and rely on Cm support. If they point you to the documentation, just ignore it.

Road to hell is paved with good intentions

CM changes the behavior of some components for example SGE in a way that complicates troubleshooting. for example in one case it enforced wrong number of cores on the servers. And if you correct it in SGE all.q after a while it returns to the incorrect number.

If initial configuration is incorrect you are in trouble in more the n one way. for example with SGE I noticed a very interesting bug: if you server has 24 cores and in all.q mistakenly initially configured with the number of slot equal to  12 cores, you are in trouble. You change it via qconf command in SGE is think that you are done. Wrong. After a while it returns to the incorrect number. At this moment you want to kill CM designers because they are clearly amateurs.

Another case I already mentioned: if the node does not have a boot record it can be reimages from the image and if you have differences between the current state of the node and image all differences are lost. In ideal case you should not. But life is far from ideal. 

Supplement 1: Vendor information about the package

NOTE: this kind of Microsoft style advertising of the product. They present a nice GUI, but forget to mention that GUI is not everything and you can't manage the cluster from it.

== quote ==

The sophisticated node provisioning and image management system in Bright Cluster Manager® allows you to do the following:

Bright Computing engineers will be on hand to demonstrate all the 7.1 updates that enable customers to deploy, manage, use, and maintain complete HPC clusters over bare metal or in the cloud even more effectively. Leading the list of enhancements is fully integrated support for Intel® Enterprise Edition for Lustre (IEEL), integrated Dell BIOS operations, and open source Puppet. Improved integration with several workload managers and a refactored Web portal round out the exciting enhancements.

Those who need to deploy, use and maintain a POSIX-compliant parallel file system will find the integrated IEEL support lets them do so efficiently and with the well-known Bright Cluster Manager interface. Fully integrated support for Puppet ensures the right services are up and running on the right platforms, through enforced configurations. With integrated support for Dell BIOS firmware and configuration settings, users can deploy and maintain supported Dell servers from the BIOS level, using Bright's familiar interface.

Broader and deeper support for Slurm, Sun Grid Engine, and Univa Grid Engine ensures that Bright Cluster Manager for HPC fully integrates the capability to optimally manage HPC workloads. Users can quickly and easily monitor their HPC workloads through the updated web interface provided by Bright's user portal. Version 7.1 also incorporates refactored internals for improved performance, as well as finer-grained management control that includes customized kernels.

"We are excited to share the latest updates and enhancements we've made to Bright Cluster Manager for HPC. Collectively, they further reduce the complexity of on-premise HPC and help our customers extend their on-premise HPC environment into the cloud," said Matthijs van Leeuwen, Bright Computing Founder and CEO. "The latest version allows our customers to manage their HPC environment alongside their platforms for Big Data Analytics, based on Apache Hadoop and Apache Spark, from a single management interface."

For more information, visit http://www.brightcomputing.com/Solutions-HPC


Top updates

Softpanorama Switchboard
Softpanorama Search


NEWS CONTENTS

Old News ;-)

OpenStack Neutron Mellanox ML2 Driver Configuration in Bright

Download the latest Mellanox OFED package for Centos/RHEL 6.5

http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

The package name looks like this: MLNX_OFED_LINUX-<version>-rhel6.5-x86_64 (The package can be download either as an ISO or a tarball).

The OFED package is to be copied (one way or another) to all the compute hosts which require an upgrade of the firmware. (Note, only during a later stage of the article we will be describing the actual installation of the OFED in the package into the software images. Right now we only want the file on the live node)

An efficient way to upgrade the firmware on multiple hosts would be to extract (in case of tar.gz file) or copy (in case of using a ISO) the OFED package directory to a shared location such as /cm/shared (which is mounted on compute nodes by default).
Then we can use the pdsh tool in combination with category names to parallelize the upgrade.

In our example we extract the OFED package to /cm/shared/ofed.

Before we begin the upgrade we need to remove the cm-config-intelcompliance-slave package to avoid conflicts:

[root@headnode ~]# pdsh -g category=openstack-compute-hosts-mellanox "yum remove -y cm-config-intelcompliance-slave"

(For now we will only remove it from live nodes. We will remove it from the software image later in the article. Do not forget to also run this command on the headnode)

In some cases the package 'qlgc-ofed.x86_64' may also need to be removed. In such case the mlnxofed install will not proceed. A log of the installer can always be viewed in /tmp/MLNX_OFED_LINUX-<version>.<num>.logs/ofed_uninstall.log to determine which package is conflicting and remove it manually.

And then run the firmware upgrade:

[root@headnode ~]# pdsh -g category=openstack-compute-hosts-mellanox "cd /cm/shared/ofed/MLNX_OFED_LINUX-2.3-1.0.1-rhel6.5-x86_64/ && echo \"y\" | ./mlnxofedinstall --enable-sriov" | tee -a /tmp/mlnx-firmware-upgrade.log

(Do not forget to execute these two steps on the network node and the headnode)

Note that we are outputting both to the screen and to a temporary file (/tmp/mlnx-firmware-upgrade.log). This can help spotting any errors that might occur during the upgrade.

Running the 'mlnxofedinstall --enable-sriov' utility does two things:

Notice, that in the case of compute nodes (node001-node003) at this point we're mostly interested in the latter (firmware update and enabling SR-IOV). Since we've run this command on the live node, the filesystem changes have not been propagated to the software image used by the nodes (i.e. at this point they would be lost on reboot). We will take care of that later on in this article by installing the OFED also to the software image.

In the case of headnode, however, running this command also effectively installs OFED and update firmware, which is exactly what we want.

Bright Cluster Manager 7 for HPC - New

Bright Cluster Manager 7

Bright Cluster Manager for HPC lets customers deploy complete HPC clusters on bare metal and manage them effectively. It provides single-pane-of-glass management for the hardware, operating system, HPC software, and users. With Bright Cluster Manager for HPC, system administrators can get their clusters up and running quickly and keep them running reliably throughout their life cycle – all with the ease and elegance of a fully featured, enterprise-grade cluster manager.

With the latest release, we've added some great new features that make Bright Cluster Manager for HPC even more powerful.

New Feature Highlights

Image Revision Control – We've added revision control capability which means you can track changes to software images using standardized methods.

Integrated Cisco UCS Support – With the new integrated support for Cisco UCS rack servers, you can rapidly introduce flexible, multiplexed servers into your HPC environment.

Native AWS Storage Service Support – Bright Cluster Manager 7 now supports native AWS storage which means that you can use inexpensive, secure, durable, flexible and simple storage services for data use, archiving and backup in the AWS cloud.

Intelligent Dynamic Cloud Provisioning – By only instantiating compute resources in AWS when they're actually needed – such as after the data to be processed has been uploaded, or when on-site workloads reach a certain threshold – Bright Cluster Manager 7 can save you money.

Bright Cluster Manager Images

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: April, 21, 2017