Bright Cluster Manager
While it is called cluster manager, this is a essentially a pretty generic Linux configuration management
system with some (in only few areas) cluster tilt. It allows "bare metal" reimaging of nodes from
the image (which typically is stored on the headnode). There can be multiple images, one for each type
of the node. This is a commercial software developed by Bright computing. Development office is in Amsterdam,
NL. Backed by ING Bank as shareholder.
It has two typical problems inherent in such systems:
- It creates an additional, pretty complex layer that obstruct viewing and understanding lower
layers. This is especially important for troubleshooting, which is badly affected if you
need to debug issues that touch CM functionality. For example after CM is installed on the
headnode you can't change the hostname easily. Also default solution when nodes use specific
private network is suboptimal is cases you need to connect nodes to external env during computations
(unless you have extra interface, which is often not the case for blades; you probably can use virtual
- It introduces custom command language, which if you use it episodically is a pain in the
neck. As the language is not used often you need a cheat sheet to use most typical commands.
They are not intuitive and the syntax sometimes is pretty weird. For anything more complex then typical
operations you depend on CM support, which, actually, is pretty good.
With Red Hat introducing Red HAT for HPC computer node license the mode Bright Manager relies upon
is broken is the headnode uses a regular RHEL license: you can't patch the image on the headnode as
this is a different flavor of OS. So you need to switch to Red Hat for HPC for headnode, which
is a pain in the neck.
As many complex Unix management systems, it modify many system files in a way you do not understand
and that make integration of new software more complex. Look for example at definition of parallel environment
in SGE, it contains references to some CM scripts. SGE environment is loaded via modules which are are
also in SM directories.
You can learn some useful staff from those modifications, but they create unique troubleshooting
problems. Sometime the problem is fictitious and connected with misunderstanding of how CM works.
And whether the game is worth the candles is an open question. The ability seamlessly restore computational
node from the image can be implemented several other ways. Beyond that Cluster manager does not represent
There are several interesting parts of CM. Among them:
- Boot process
- Working with images
- Power management capabilities
- Working with Dell Bios
- Environment modules integration
The working with images part is the most interesting part of Bright Cluster Manager. Few systems
implement it as consistently as CM. Here the designers demonstrated some original thinking (for example
the role of "boot record" as indicator of how the node should behave). You can create image from
the node and distribute it to other nodes. CM takes care of all customarization needed. If node is configured
for network book (or if boot record is absent) CM automatically reimage the node, or synchronized it
with image if image already exists. Otherwise you have a regular book. that means that inserting/removing
boot image changes the behavious of a group of the server in a very useful way.
Managing images is done using chroot and is not very convenient, but as there is a possibility to
creating image from a node you can do everything on a selected node instead, then create an image from
this node and distribute it to other nodes.
Using Bright reduces the labor and effort needed for management and change control, and also can
be used with external clouds (set of virtual machines). Bright offers an expandable and scalable turnkey
solution for allocating resources.
It install a lot of useful software, such as pdsh and environment modules. The latter are installed
with integrated examples of package which can serve as a framework for developing you aow set of environmental
modules. Generally the envmodules supplied are of high quality.
By default, nodes boot from the network when using Bright Cluster Manager.This is called
a network boot, or sometimes a PXE boot. The head node runs a tftpd server from within xinetd, supplies
the boot loader for the default or assigned to the node software image.
Bright Cluster Manager for HPC lets you administer set of servers (assumed to be clusters, but not
necessarily so) as a single entity, provisioning the hardware, operating system, and workload manager
from a unified interface.
Bright cluster management daemon keeps an eye on virtually every aspect of every node, and reports
any problems it detects in the software or the hardware, so that you can take action and keep your cluster
Aspects of power management in Bright Cluster Manager include:
That creates some opportunities for power savings, which is extremly important in large clusters. You
can for example shut down inactive nodes and bring them back if there are jobs in queue that wait for
- managing the main power supply to nodes through the use of power distribution units, baseboard
management controllers, or CMDaemon. mainly with Dell.
- monitoring power consumption over time
- setting CPU scaling governors for power-saving
- setting power-saving options in workload managers
- ensuring the passive head node can safely take over from the active head during failover
- allowing cluster burn tests to be carried out
As clusters often are used by a large number of researcher user management presents some problems.
Cm allow (via The
of cmsh) to
restricts direct user logins from outside the HPC scheduler,
and is thus one way of preventing the user from using node resources in an unaccountable manner. The
setting is applicable to node categories only, not to individual
[bright71]% category use default
[bright71->category[default]]% set usernodelogin onlywhenjob
The attributes for usernodelogin are:
- always (the default): This allows all users to ssh directly
into a node at any time.
- never: This allows no user other than root to directly
ssh into the node.
- onlywhenjob: This allows the user to ssh directly into
the node when a job is running on it.
Bright Cluster Manager runs its own LDAP service to manage users, rather than using unix user and
group files. That means that users and groups are managed via the centralizing LDAP database server
running (assesble via cmgui) on the head node, and not via entries in /etc/passwd or /etc/group files.
You can use cmsh too. for example
[root@bright71 ~]# cmsh
[bright71->user]% add user maureen
You can set user and group properties via the set command. Typing set and then either using tab
to see the possible completions, or following it up with the enter key, suggests several parameters
that can be set, one of which is password:
set - Set specific user or group property
set user <name> <parameter>
set group <name> <parameter>
You can editing groups with append and remove from commands. They are used to add extra users
to, and remove extra users from a group. For example, it may be useful to have a compiler group so that
several users can share access to the intel compiler.
Dell BIOS management in Bright Cluster Manager means that for nodes that run on Dell hardware, the
BIOS settings and BIOS firmware updates can be managed via the standard Bright front end utilities to
CMDaemon, cmgui and cmsh.
In turn, CMDaemon configures the BIOS settings and applies firmware updates to each node via a standard
Dell utility called racadm. The racadm utility is part of the Dell OpenManage software stack. The Dell
hardware supported includes R430, R630, R730, R730XD, R930 FC430, FC630, FC830 M630, M830 and C6320
The utility racadm must be present on the Bright Cluster Manager head node. The utility is installed
on the head node if Dell is selected as the node hardware manufacturer during Bright Cluster Manager
installation. IPMI must be working on all of the servers. This means that it should be possible to communicate
out-of-band from the head node to all of the compute nodes, via the IPMI IP address.
That's typical for complex software packages. But still this is pretty annoying. The this
no clear description of cmsh with syntax diagrams, example of most useful commands and such. All you
left with is command line help.
Important nuances are not mentioned. Generally this documentation is useful only in one
case: if you never read it and rely on Cm support. If they point you to the documentation, just
CM changes the behavior of some components for example SGE in a way that complicates
troubleshooting. for example in one case it enforced wrong number of cores on the servers. And if
you correct it in SGE all.q after a while it returns to the incorrect number.
If initial configuration is incorrect you are in trouble in more the n one way. for example with
SGE I noticed a very interesting bug: if you server has 24 cores and in all.q mistakenly initially
configured with the number of slot equal to 12 cores, you are in trouble. You change it via
qconf command in SGE is think that you are done. Wrong. After a while it returns to the
incorrect number. At this moment you want to kill CM designers because they are clearly amateurs.
Another case I already mentioned: if the node does not have a boot record it can be reimages from
the image and if you have differences between the current state of the node and image all
differences are lost. In ideal case you should not. But life is far from ideal.
NOTE: this kind of Microsoft style advertising of the product. They present a nice GUI, but
forget to mention that GUI is not everything and you can't manage the cluster from it.
== quote ==
The sophisticated node provisioning and image management system in Bright Cluster Manager® allows
you to do the following:
individual nodes or complete clusters from bare metal within minutes. This applies to big data clusters
and OpenStack private clouds in addition to HPC clusters.
- Create, manage and use as many node images as required.
- Create, manage and use images that are very different (for example, based on different Linux
kernels or distributions of Linux, Apache Hadoop and OpenStack).
- Create or change images substantially without breaking compatibility with application software.
- Assign images to individual nodes or groups of nodes with a single command or mouse click.
- Make changes to node images on the head node, without having to login to regular nodes.
- Synchronize a regular node image on the head node from a hard disk on a regular node.
- Apply RPM package commands
to node images, manually or automatically (for example, using
- Update images incrementally, only transferring changes to the nodes.
- Update images live, without having to reboot nodes.
- Configure how disks should be partitioned (optionally using software
- Protect disks or disk partitions from being overwritten.
- Provision images to memory and run nodes diskless.
- Use revision control to keep track of changes to node images.
- Return to a previously stored node image if and when required.
- Backup all node images by backing up only the head node.
- Automatically update BIOS images or change BIOS configurations without keyboard or console access
to the nodes.
Bright Computing engineers will be on hand to demonstrate all the 7.1 updates that enable customers
to deploy, manage, use, and maintain complete HPC clusters over bare metal or in the cloud even more
effectively. Leading the list of enhancements is fully integrated support for
Intel® Enterprise Edition for Lustre (IEEL), integrated Dell BIOS operations, and
Puppet. Improved integration with several workload managers and a refactored Web portal round out
the exciting enhancements.
Those who need to deploy, use and maintain a POSIX-compliant parallel file system will find the integrated
IEEL support lets them do so efficiently and with the well-known Bright Cluster Manager interface. Fully
integrated support for Puppet ensures the right services are up and running on the right platforms,
through enforced configurations. With integrated support for Dell BIOS firmware and configuration settings,
users can deploy and maintain supported Dell servers from the BIOS level, using Bright's familiar interface.
Broader and deeper support for Slurm, Sun Grid Engine, and Univa Grid Engine ensures that Bright
Cluster Manager for HPC fully integrates the capability to optimally manage HPC workloads. Users can
quickly and easily monitor their HPC workloads through the updated web interface provided by Bright's
user portal. Version 7.1 also incorporates refactored internals for improved performance, as well as
finer-grained management control that includes customized kernels.
"We are excited to share the latest updates and enhancements we've made to Bright Cluster Manager
for HPC. Collectively, they further reduce the complexity of on-premise HPC and help our customers extend
their on-premise HPC environment into the cloud," said Matthijs van Leeuwen, Bright Computing Founder
and CEO. "The latest version allows our customers to manage their HPC environment alongside their platforms
for Big Data Analytics, based on Apache Hadoop and Apache Spark, from a single management interface."
For more information, visit
Download the latest Mellanox OFED package for Centos/RHEL 6.5
The package name looks like this: MLNX_OFED_LINUX-<version>-rhel6.5-x86_64 (The package can be
download either as an ISO or a tarball).
The OFED package is to be copied (one way or another) to all the compute hosts which require an
upgrade of the firmware. (Note, only during a later stage of the article we will be describing
the actual installation of the OFED in the package into the software images. Right now we only want
the file on the live node)
An efficient way to upgrade the firmware on multiple hosts would be to extract (in case of tar.gz
file) or copy (in case of using a ISO) the OFED package directory to a shared location such as /cm/shared
(which is mounted on compute nodes by default).
Then we can use the pdsh tool in combination with category names to parallelize the upgrade.
In our example we extract the OFED package to /cm/shared/ofed.
Before we begin the upgrade we need to remove the cm-config-intelcompliance-slave package to avoid
[root@headnode ~]# pdsh -g category=openstack-compute-hosts-mellanox "yum remove -y cm-config-intelcompliance-slave"
(For now we will only remove it from live nodes. We will remove it from the software image later
in the article. Do not forget to also run this command on the headnode)
In some cases the package 'qlgc-ofed.x86_64' may also need to be removed. In such case the mlnxofed
install will not proceed. A log of the installer can always be viewed in /tmp/MLNX_OFED_LINUX-<version>.<num>.logs/ofed_uninstall.log
to determine which package is conflicting and remove it manually.
And then run the firmware upgrade:
[root@headnode ~]# pdsh -g category=openstack-compute-hosts-mellanox "cd /cm/shared/ofed/MLNX_OFED_LINUX-2.3-1.0.1-rhel6.5-x86_64/
&& echo \"y\" | ./mlnxofedinstall --enable-sriov" | tee -a /tmp/mlnx-firmware-upgrade.log
(Do not forget to execute these two steps on the network node and the headnode)
Note that we are outputting both to the screen and to a temporary file (/tmp/mlnx-firmware-upgrade.log).
This can help spotting any errors that might occur during the upgrade.
Running the 'mlnxofedinstall --enable-sriov' utility does two things:
- installs OFED on the live nodes
- updates the firmware on the InfiniBand cards and enables the SR-IOV functionality.
Notice, that in the case of compute nodes (node001-node003) at this point we're mostly interested
in the latter (firmware update and enabling SR-IOV). Since we've run this command on the live node,
the filesystem changes have not been propagated to the software image used by the nodes (i.e. at
this point they would be lost on reboot). We will take care of that later on in this article by installing
the OFED also to the software image.
In the case of headnode, however, running this command also effectively installs OFED and
update firmware, which is exactly what we want.
Bright Cluster Manager 7
Bright Cluster Manager for HPC lets customers deploy complete HPC clusters on bare metal and
manage them effectively. It provides single-pane-of-glass management for the hardware, operating
system, HPC software, and users. With Bright Cluster Manager for HPC, system administrators can get
their clusters up and running quickly and keep them running reliably throughout their life cycle
– all with the ease and elegance of a fully featured, enterprise-grade cluster manager.
With the latest release, we've added some great new features that make Bright Cluster Manager
for HPC even more powerful.
New Feature Highlights
Image Revision Control – We've added revision control capability which means
you can track changes to software images using standardized methods.
Integrated Cisco UCS Support – With the new integrated support for Cisco UCS
rack servers, you can rapidly introduce flexible, multiplexed servers into your HPC environment.
Native AWS Storage Service Support – Bright Cluster Manager 7 now supports native
AWS storage which means that you can use inexpensive, secure, durable, flexible and simple storage
services for data use, archiving and backup in the AWS cloud.
Intelligent Dynamic Cloud Provisioning – By only instantiating compute resources
in AWS when they're actually needed – such as after the data to be processed has been uploaded, or
when on-site workloads reach a certain threshold – Bright Cluster Manager 7 can save you money.
Bright Cluster Manager Images
The Cluster Management GUI of Bright Cluster Manager 7 illustrating queued jobs. Some jobs are
running on compute nodes that have been dynamically provisioned in the AWS cloud.
The Cluster Management GUI of Bright Cluster Manager 7 capturing a summarized description
Softpanorama hot topic of the month
FAIR USE NOTICE This site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
in our efforts to advance understanding of environmental, political,
human rights, economic, democracy, scientific, and social justice
issues, etc. We believe this constitutes a 'fair use' of any such
copyrighted material as provided for in section 107 of the US Copyright
Law. In accordance with Title 17 U.S.C. Section 107, the material on
this site is distributed without profit exclusivly for research and educational purposes. If you wish to use
copyrighted material from this site for purposes of your own that go
beyond 'fair use', you must obtain permission from the copyright owner.
ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no
less then 90 days. Multiple types of probes increase this period.
Two Party System
as Polyarchy :
Corruption of Regulators :
and Control Freaks : Toxic Managers :
Harvard Mafia :
: Surviving a Bad Performance
Review : Insufficient Retirement Funds as
Immanent Problem of Neoliberal Regime : PseudoScience :
Who Rules America :
: The Iron
Law of Oligarchy :
War and Peace
Finance : John
Kenneth Galbraith :Talleyrand :
Oscar Wilde :
Otto Von Bismarck :
George Carlin :
Propaganda : SE
quotes : Language Design and Programming Quotes :
Random IT-related quotes :
Somerset Maugham :
Marcus Aurelius :
Kurt Vonnegut :
Eric Hoffer :
Winston Churchill :
Napoleon Bonaparte :
Ambrose Bierce :
Bernard Shaw :
Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient
markets hypothesis :
Political Skeptic Bulletin, 2013 :
Unemployment Bulletin, 2010 :
Vol 23, No.10
(October, 2011) An observation about corporate security departments :
Slightly Skeptical Euromaydan Chronicles, June 2014 :
Greenspan legacy bulletin, 2008 :
Vol 25, No.10 (October, 2013) Cryptolocker Trojan
Vol 25, No.08 (August, 2013) Cloud providers
as intelligence collection hubs :
Financial Humor Bulletin, 2010 :
Inequality Bulletin, 2009 :
Financial Humor Bulletin, 2008 :
Bulletin, 2004 :
Financial Humor Bulletin, 2011 :
Energy Bulletin, 2010 :
Malware Protection Bulletin, 2010 : Vol 26,
No.1 (January, 2013) Object-Oriented Cult :
Political Skeptic Bulletin, 2011 :
Vol 23, No.11 (November, 2011) Softpanorama classification
of sysadmin horror stories : Vol 25, No.05
(May, 2013) Corporate bullshit as a communication method :
Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000):
the triumph of the US computer engineering :
Donald Knuth : TAoCP
and its Influence of Computer Science : Richard Stallman
: Linus Torvalds :
Larry Wall :
John K. Ousterhout :
CTSS : Multix OS Unix
History : Unix shell history :
VI editor :
History of pipes concept :
Solaris : MS DOS
: Programming Languages History :
PL/1 : Simula 67 :
History of GCC development :
Scripting Languages :
Perl history :
OS History : Mail :
DNS : SSH
: CPU Instruction Sets :
SPARC systems 1987-2006 :
Norton Commander :
Norton Utilities :
Norton Ghost :
Frontpage history :
Malware Defense History :
GNU Screen :
OSS early history
Principle : Parkinson
Law : 1984 :
The Mythical Man-Month :
How to Solve It by George Polya :
The Art of Computer Programming :
The Elements of Programming Style :
The Unix Hater’s Handbook :
The Jargon file :
The True Believer :
Programming Pearls :
The Good Soldier Svejk :
The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society :
of the IT Slackers Society : Computer Humor Collection
: BSD Logo Story :
The Cuckoo's Egg :
IT Slang : C++ Humor
: ARE YOU A BBS ADDICT? :
The Perl Purity Test :
Object oriented programmers of all nations
: Financial Humor :
Financial Humor Bulletin,
2008 : Financial
Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related
Humor : Programming Language Humor :
Goldman Sachs related humor :
Greenspan humor : C Humor :
Scripting Humor :
Real Programmers Humor :
Web Humor : GPL-related Humor
: OFM Humor :
Politically Incorrect Humor :
IDS Humor :
"Linux Sucks" Humor : Russian
Musical Humor : Best Russian Programmer
Humor : Microsoft plans to buy Catholic Church
: Richard Stallman Related Humor :
Admin Humor : Perl-related
Humor : Linus Torvalds Related
humor : PseudoScience Related Humor :
Networking Humor :
Shell Humor :
Financial Humor Bulletin,
2011 : Financial
Humor Bulletin, 2012 :
Financial Humor Bulletin,
2013 : Java Humor : Software
Engineering Humor : Sun Solaris Related Humor :
Education Humor : IBM
Humor : Assembler-related Humor :
VIM Humor : Computer
Viruses Humor : Bright tomorrow is rescheduled
to a day after tomorrow : Classic Computer
The Last but not Least
Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org
was created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time. This document is an industrial compilation designed and created exclusively
for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong
to respective owners. Quotes are made for educational purposes only
in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness
of the information provided or its fitness for any purpose.
Last modified: April, 21, 2017