Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Unix Configuration Management Tools

Is the King naked ?

Version 2.72 (Mar 21, 2017)

News Unix Configuration Management Tools Recommended Links Software configuration Management Heterogeneous Unix server farms Baseliners  
Parallel command execution Config files distribution: copying a file to multiple hosts Slurping: copying a file from multiple hosts Configuration Files Generation pdsh -- a parallel remote shell C3 Tools rdist
Provisioning rsync Software and configuration management using RPM Building RPMs cdist Expect SSH for System Administrators
Rex Ansible puppet Chief GNU cfengine Etch Bcfg2
Usage of Relax-and-Recover on RHEL etckeeper Red Hat Satellite LCFG A large scale UNIX configuration system Quattor Bright Cluster Manager synctool
Kickstart git Midnight Commander Tips and Tricks IBM Remote System Management Tool Webmin Unix System Monitoring Grid engine
Software Distribution Simple Unix Backup Tools Enterprise Job schedulers WinSCP Tips Sysadmin Horror Stories Humor Etc


“More generally, it's impressive how many people can look at the landscape of dysfunctional technology and failed promises that surrounds us today and still insist that the future won't be like that.

Most of us have learned already that upgrades on average have fewer benefits and more bugs than the programs they replace, and that products labeled "new and improved" may be new but they're rarely improved; it's starting to sink in that most new technologies are simply more complicated and less satisfactory ways of doing things that older technologies did at least as well at a lower cost.

Try suggesting this as a general principle, though, and I promise you that plenty of people will twist themselves mentally into pretzel shapes trying to avoid the implication that progress has passed its pull date…

~ John Michael Greer


Introduction

We know that Linux complexity junkies in Red Hat and Suse is a suicide cult masquerading as Linux distribution vendors ;-).  If we are talking about this unending drive to higher and higher level of overcomplexity, we need to remember that any Linux sysadmin needs intimately know approximately a hundred out of around 250 key utilities (with some of them such as yum, rpm, rsync, vi, find, curl, wget  being quite complex software systems in themselves). Even diff utility is more complex and has more capabilities then most people realize.  To lesser extent is also true even for ls in the current implementation :-) How many sysadmin know the difference between -a and -A in the ls utility, or whether the alias "alias ll=ls -hAlF --group-directories-first" would work on RHEL 5 (it does work in RHEL6 & 7) ?  Add to this constant troubles with colors, when users who use light background in their terminals are completely out of luck with the standard /etc/DIR_COLOR. That's definitely too much for a human brain...   So people use a small subset of available functionality and downgrade to it even if at some point them have known more. And that the only way to survive and preserve sanity in such an environment.

And the utilities only a start the tip of the iceberg. Sysadmins also  need to know about location and structure at least a couple of dozen of important configuration files including but not limited to hosts, passwd, group, shadow, profile (and etc/profile.d/ directory), resolv.conf, ntp.conf, fstabexports, sshd.conf, yum.conf (and related /etc/yum/repos.d/ directory), sysctl.conf, sysconfig/network and several other files in /etc/sysconfig directory, /etc/xinetd.d/ directory /etc/init.d/ directory, and so on and so forth. You might be surprised to see the result of the command find /etc -name "*\.conf" | wc -l  -- more then 60 files typically are listed).

Next comes the knowledge of bash shell with its complex set of built-ins, which is a must for any system administrator.  Next in importance is the knowledge of Perl as a new major scripting tool, available on all major platform and far superior to bash for complex scripts. Next come Apache, PHP and MySQL (so called LAMP stack) which is widely used in many organizations and which sysadmins need to support and be able to troubleshoot typical problems.  This is "bread and butter" of hosting companies, but in any organization you can find applications that depends on LAMP stack such Mediawiki.  Add to this a dozen of common daemons such as  atd, cron, init, iptables, nfs, nis, sshd, vsftprsyslogd(or other variant of syslog), xinetdpostfix (or  old Sendmail),  bind, sysstat, with their own configuration files and quirks.  SELinux is another huge subsystem. Next comes X11 and related daemons such as  VNC, XRDP (and X11 is so complex that you can start understanding it only after you programmed a couple of applications for X11).  Than there is LVM (with its own set of a dozen utilities) -- a really complex subsystem. Then we need to know a set of backup utilities/archivers such tar, gzip, zip/unzip, cpio,  etc.  And a set of command like utilities sysadmin usually uses such as anaconda, expect, screen, lftp, dos2unix, mutt, scp, ssh, etc. 

Python is another important scripting language used in major Red Hat applications such as yum and anaconda and increasingly used for writing system applications,  displacing Perl, but unless you are really gifted programmer three languages (bash, Perl, and Python) is one too much. There is simply no space in brains for the third scripting language unless you limit yourself to the basic subset.  So, in essence, it is iether Perl or Python, but not both.

And we did not even start talking  about all those exciting games connected with compiling applications from the source code using the GNU complier stack (gcc, make, config),  or Intel compliers stack, which is growing in popularity, especially for computational applications where it already became the  standard de-facto.  Add to this multithreading and we get not just single mental "stack overflow" but double mental "stack overflow". And this situation needs to be dealt with in the environment when the demand for your services  is unpredictable, urgent, and above all, relentless. 

It is quite clear that Linux is now a definite example of the system that is far beyond human capacities to understand.  And was for some time. Although this analogy is definitely somewhat stretched, the behaviour of Linux distributors  reminds me the drive of financial institutions toward higher and higher levels of leverage in the quest for higher profits that culminated in 2008.  At some point the population just can't take any more debt and the system crashed.  We already see somewhat similar effect with Microsoft (which is a real king of complexity) in PC world, when some people voluntarily downgrade the functionality of their desktops by switching from Microsoft Windows PCs to simpler (and better watched by NSA ;-) systems such as Chromebooks.  

This toxic mix of Linux (and Unix in general) overcomplexity and proliferation of different versions of Unix/Linux within the same datacenter  (often with almost half-dozen of flavors used, such as RHEL/CentOS/Oracle Linux, SUSE, Solaris, HP-UX and AIX)  creates a need for systems that helps to manage Linux/Unix and protect your sanity from the behaviour of Linux vendors who now are replaying the Unix wars on a new, but no less nasty level, than was the case in the old Unix wars.  In case of Red Hat,  Linux version of Unix wars reminds me some kind of civil war as differences between RHEL6 and RHEL7 are so substantial that they can be called alternatives, not so much as one being the successor of another ;-).  Looks like this so called Red Hat  civil war is fought within Red Hat camp, between server-oriented "traditionalists" and the radical sect of fanatical adherents to Linux desktop (Linux Taliban ;-). In which the latter are winning. With the introduction of systemd Red Hat distribution became something like Mad Hatter in Alice in Wonderland. (slightly rephrasing : Linux is a place like no place on Earth. A land full of wonder, mystery, and danger! Some say to survive it you need to be as mad as a hatter. Which luckily I am.  "):

Mercury was used in the manufacturing of felt hats during the 19th century, causing a high rate of mercury poisoning in those working in the hat industry.[1] Mercury poisoning causes neurological damage, including slurred speech, memory loss, and tremors, which led to the phrase "mad as a hatter"

...In the chapter "A Mad Tea Party", the Hatter asks a much-noted riddle "why is a raven like a writing desk?" When Alice gives up trying to figure out why, the Hatter admits "I haven't the slightest idea!".

With the default RHEL 7 settings systemd tends to talk to itself polluting the syslog with spam  (you can cut this useless chatter with the command  systemd-analyze set-log-level notice ):

Mar  5 03:30:01 srv255 systemd: Starting user-0.slice.
Mar  5 03:30:01 srv255 systemd: Started Session 21356 of user root.
Mar  5 03:30:01 srv255 systemd: Starting Session 21356 of user root.
Mar  5 03:30:01 srv255 systemd: Removed slice user-0.slice.
Mar  5 03:30:01 srv255 systemd: Stopping user-0.slice.
Mar  5 03:40:02 srv255 systemd: Created slice user-0.slice.
Mar  5 03:40:02 srv255 systemd: Starting user-0.slice.
Mar  5 03:40:02 srv255 systemd: Started Session 21357 of user root.
Mar  5 03:40:02 srv255 systemd: Starting Session 21357 of user root.
Mar  5 03:40:02 srv255 systemd: Removed slice user-0.slice.
Mar  5 03:40:02 srv255 systemd: Stopping user-0.slice.
Mar  5 03:50:01 srv255 systemd: Created slice user-0.slice.
Mar  5 03:50:01 srv255 systemd: Starting user-0.slice.
Mar  5 03:50:01 srv255 systemd: Started Session 21358 of user root.
Mar  5 03:50:01 srv255 systemd: Starting Session 21358 of user root.
Mar  5 03:50:01 srv255 systemd: Removed slice user-0.slice.
Mar  5 03:50:01 srv255 systemd: Stopping user-0.slice.
Mar  5 04:00:01 srv255 systemd: Created slice user-0.slice.
Mar  5 04:00:01 srv255 systemd: Starting user-0.slice.
Mar  5 04:00:01 srv255 systemd: Started Session 21359 of user root.
Mar  5 04:00:01 srv255 systemd: Starting Session 21359 of user root.
Mar  5 04:00:01 srv255 systemd: Removed slice user-0.slice.
Mar  5 04:00:01 srv255 systemd: Stopping user-0.slice.
... ... ... 

I strongly encourage you to read the systemd-devel mailing list archive to see issues you can possibly face. Here is one example:

[systemd-devel] hanging reboot

Hajo Locke Hajo.Locke at gmx.de
Wed Mar 1 15:42:21 UTC 2017

Hello list, sometimes i have problems rebooting some machine. i think in that cases shutting down some services fails and machine stays somewhere between life and death.

Unfortunately my ssh window closes at first and no reconnect is possible, it only tells "Connection refused".

If this happens, then i have to do a call to someone who works in datacenter and resets my machine by hand.

I would like to keep sshd alive as long as possible to reconnect and fix this by hand.

How can i achieve this?

System is Ubuntu 16.04 with systemd 229-4ubuntu16

I goggled some similar questions and tried but without success. What could i do?

Thanks,

Hajo

Those are issues that Unix configuration management systems supposedly should help to solve. But can they ?  Can they provide real help, or "the king is naked" and they are only able to do easy, trivial tasks (which is also important), that do not matter much and can be performed equally well with other tools?  That is the question.

Two approaches to the selection of a Unix configuration management system

At first sight the selection of the right Unix configuration management system is an easy task. There is no shortage of systems -- just install one of the popular systems and be happy.  Both open source (with professional support) and proprietary Unix configuration management systems can be used by the enterprise IT.  Several of such systems have books published (in case of Puppet, we can talk about a couple of dozens of low quality books).  But the problem is that the claim that they will make sysadmin life easier and configuration-related tasks a breeze to perform are slightly exaggerated :-).

Most of them suffer from the same disease they try to cure -- they are overly complex.  Moreover, they create an additional (and somewhat artificial) layer of complexity on top of existing layers. For example distribution of packages typically rely on using RPMs, or similar packaging formats, which are already not transparent to most sysadmins, and typically used "as is". If RPMs you want to deploy encounters some complex unresolved dependencies, or server has libraries conflict, or whatever problem Linux package management can create for us, you need to switch to lower level of abstraction and debug the problem in terms of RPM infrastructure. Or recompile the software package yourself. Unix configuration management system at this point is of no help. 

What that means in reality is that when such systems work, everything is fine, but when they don't you are really screwed, because switching to the higher level of abstraction automatically means that you know less about underling layers. In other words you become another variety of Windows system administrator, who knows how to use Control Panel extremely well, but very little about what is inside Windows and its registry.   Also none of the popular systems try to adhere to Unix paradigm of building system with maximum utilization of existing tools. They prefer reinventing the bicycle in best Microsoft style -- create another complex monolithic Swiss army knife with multiple bell and whistles. 

There is a growing realization in Linux sysadmin community that more system software is not always better and adding yet another complex software system that supposedly helps sysadmin on top of multiple (already underutilized) existing systems might produce quite an opposite results.  No Unix system administrator can hope to learn more then a small part of functionality of a set of complex tools that he/she uses in his lifetime. There are just too many of them a And that includes Unix configuration systems.

There is a growing realization in Linux sysadmin community that more system software is not always better and adding yet another complex software system that supposedly helps sysadmin on top of multiple (and already underutilized) existing systems might produce quite an opposite results. 

Still, due to overcomplexity of Linux, as well as proliferation of virtual machines you do need the tools that can simplify your work. The question is what is the optimal variant, optimal path?  There are two paths to achieve this goal

  1. Stay within the Unix paradigm and try to combine simple tools with shell or Perl as the glue (there is also a space for innovation here); Perl was designed as a programming language for automating system administration tasks. Scripting your tasks using Perl or bash tasks using ssh, PDSH, rdist, rsync, tar, etc as components is a pretty powerful approach with zero learning curve; see below tarball approach to Config management ).  In this sense, tar, RPMs,  parallel execution tools rsync,  possibly combined with such tools as Midnight Commander and Expect (or substitute) probably can provide 70-80% of the  necessary "API" for your scripts without extra hassle. Versioning system can be gradually added to provide the central repository of changes on the seed server. I do not recommend them for deployment of each server unless you  are also a good programmer.  
  2. Join the existing, fashionable due to DevOps hoolpa, and growing trend of  "Windowization of Linux" -- conversion to an "integrated" and more "user friendly" systems with the GUI  that has its complexity hidden behind slick GUI interfaces.  If you observe some precautions that might also be not a bad idea. The rule is never fight fashion, especially it is merged with the influential techno-cult, which managed to brainwash top IT brass as it allows to put a smoke screen for further outsourcing.   It might be better to declare at least formally that "you are in" and then use just minimum functionality (the functionality of the parallel execution tool, as in ansible atlanta -a "/sbin/reboot").  Open resistance to the whims of the top brass in enterprise IT usually leads to complications during annular performance review ;-). Also as somebody said that there is no atheists in the trenches, so joining a techno-cult might just increase your chances of survival...  The best path here is to get system that can generate code in Perl, or shell so that it can be inspected, and if necessary, manually adapted,  before applying it to members of the group.  It also helps if the system is rather small and written in the language that you know well, or at least want to learn. Which limits implementation languages to two languages (Perl and Python), unless you are Ruby enthusiast. Such system might also provide some inventory management and have a sophisticated integrated database that simplifies creation of various reports and can integrate some hardware inventory tasks. It also can double as a monitoring system if you requirements to monitoring are just average, typical for medium size datacenters. Please note that nothing can be fixed in a large enterprise environment for less then an hour, so probes that run once an hour are as good or better, then probes that are running each minute ;-). The main drawback is that such GUI-based Swiss army knife hides internals, enforces superficial "click-click-done" mentality of Windows administrators, and add a burden of writing you own scripts in DSL that it uses. Also if the situation is not within narrow parameters that those systems can handle, sysadmin is completely lost, as the lower levels now are hidden from him.

Today such names as Puppet (released in 2005, written in Ruby, and closely associated with DevOps ) and cfengine (one of oldest attempts to create this type of software, dating back to 1993; never got much traction) are pretty well known, if not so widely used. And they are pushed upon us by all this DevOps hoopla, which really has found traction on higher levels of IT management as a smoke screen for further outsourcing.

They want to provide you the abilities  to bind a service to a special network interface or to configure different database servers for your application for  different environments. Or do some other complex staff. Fine. But in the process they make simple tasks complex.  In other words they are just redefining existing API in a new way. Due to that, few, if any, of popular configuration management systems are successful in lessening the load on sysadmin and provide positive return on investment of time and effort to deploy them.  They might have other benefits, but lessening sysadmin workload is not the one.  In other words, for many popular configuration management systems the return on investment in time and effort for accomplishing the task is either negative or close to zero.

They are essentially trying to reinvent the wheel -- repackage the existing functionality in a new way. There are powerful Unix tools that in combination can provide  at least 80% of the necessary functionality without the necessity to deploy and learn yet another complex software system (see Introduction – etch) :

In either cfengine or puppet you have a maze of classes, controls, modules, resources, etc. Where you store your configuration within your cfengine or puppet tree has no obvious correlation to where it ends up on your clients. You can and will spend hours, quite possibly days, studying manuals and searching the web just to get the simplest initial setup.

... cfengine doesn't actually support doing much that is useful. So you end up using it as a framework for a bunch of little shell scripts you hack together. Puppet is somewhat better, but still lacking.

I would say more: cfengine lacks any significant ideas that can lessen admin burden. It is just "wish good" solution in search of useful application domain. This "poverty of ideas" is the real architectural problem and no amount of enhancements can change that.

 Those system supposedly can ensure that complexity of changes to Linux/Unix  hidden in pre-written "recipes" (partially created by others, so there some level of synergy and community in the usage of such a system) and  handled in a more systematic, more close to software development paradigm (or  fashion ;-). While theoretically that  helps to ensure that a system is configured in a correct and reliable manner, the road to hell is always paved with good intentions.

Also please note that an idiot with a tool that handles changes in a systematic manner on multiple servers remains an idiot. The only difference is that now he is more dangerous and can make more damage.   Road to hell is always paved with good intentions.

Three major components of any Unix configuration management system

 There are three major components of any Unix configuration management system

  1. Repository subsystem. This is the place where you store files and RPMs to distribute. RPM repositories are one example of such a repository, specialized for storing ROMs. Regular hierarchical filesystem structure with one node per server also can be used. Full image of the server is also kind of repository in disguise.  
  2. Distribution subsystem. Currently ssh is the most popular protocol for secure retrieval and transmitting of a set of files to a group of servers. NFS for a set of servers in the same datacenter can do the same. Some systems has agents which communicate using SSL with the "mother ship". Most Unix administrators are not even slightly interested in using some half-baked new protocol (with possible security holes) for communication between the master server and clients, if server-client configuration management system is used. 
  3. Configuration description language. Here the jury is out about what is the proper configuration language for this domain and whether it should be declarative or not.  But much depends on its quality, or the lack of thereof.  

What's wrong with DSL (domain specific languages)

"When people are free to do as they please, they usually imitate each other."

Eric Hoffer

Creation of new specific to the particular set of problems language is how humans typically approach to solving new problems. In this sense DSL (domain specific languages) that many Unix configuration management systems introduce are just a natural way to approach this problem. But the devil is in the details and road to hell is paved with good intentions.

Verbosity and absence of new constructive ideas

Creating of the right DSL (domain specific language) is not an easy task. Language design is the area that requires unique, pretty rate talent. Plus a lot of luck (like in being in the right place at the right time). Most current DSL are too verbose and this is a mortal sin, as sysadmins time is limited and valuable resource.  The idea is provide some additional functionality that is absent or more difficult to achieve in standard, classic Unix tools. But the question is: is this true and at what price this functionality is provided? Can simple wrappers written in shell or Perl replace this complex systems in a very few cases when they are needed.   Let's look at the program  written in REX DSL for deploying NTP server on multiple nodes:

# Rexfile
use Rex -feature => ['1.3'];
user "root";
private_key "/root/.ssh/id_rsa";
public_key "/root/.ssh/id_rsa.pub";

group all_servers => "srv[001..150]";

task "setup_ntp", group => "all_servers", sub {
   pkg "ntpd",
     ensure => "present";
   file "/etc/ntp.conf",
      source    => "files/etc/ntp.conf",
      on_change => sub {
         service ntpd => "restart";
      };
   service "ntpd",
     ensure => "started";
};

Is "ensure => "present" is better then "install if absent" (default action of yum install). Is this really better then (using C3 Tools) something like  ?

cexec yum -y ntp 
cpush /myconfig/TT/ntp/ntp.conf /etc/ntp.conf
cexec chkconfig --list ntpd | grep -v "3:on"
cexec server ntp start 
cexec "server ntp status | grep -v "is running" 
timestamp_on_master=`date "+%D %H%M"`
cexec "[[ `date "+%D %H%M"` = $timestamp_on_master ]] || echo time is not correct"  

So far I think that there is  only very limited progress in creating an expressive DSL for configuration management. For example form my point of view Puppet DSL is completely unsatisfactory, amateurish and simply wrong. Similarly, any software version control system like git or subversion can be adapted to keeping system configuration files in sync with repository automatically on multiple servers, but does that means that this is the best way to synchronize configuration files on multiple servers. Definitely not.  So something new and different is not always better. It can be worse.  As John Michael Greer noted on a different subject:

“More generally, it's impressive how many people can look at the landscape of dysfunctional technology and failed promises that surrounds us today and still insist that the future won't be like that.

Most of us have learned already that upgrades on average have fewer benefits and more bugs than the programs they replace, and that products labeled "new and improved" may be new but they're rarely improved; it's starting to sink in that most new technologies are simply more complicated and less satisfactory ways of doing things that older technologies did at least as well at a lower cost.

Try suggesting this as a general principle, though, and I promise you that plenty of people will twist themselves mentally into pretzel shapes trying to avoid the implication that progress has passed its pull date…

The of a better examples of the current breed of Unix configuration systems is probably Ansible.  It has a dozen of so books already published about it.  Ansible is an agent-less IT automation tool developed in 2012 by Michael DeHaan, a former Red Hat associate. For RHEL and RHEL-based (CentOS, Scientific Linux, Unbreakable Linux) systems, versions 6 and 7 have Ansible 2.0+ available from the EPEL repository.  In its simplest form it can be used as just yet another parallel script execution tool that works via ssh.  On more complex level it can be scripted to perform various tasks.

But Ansible idea of deployment scripts ( which are called playbooks) in far from being impressive.  Here is one example

- hosts: webservers
  user: root
  vars:
    apache_version: 2.6
    motd_warning: 'WARNING: Use by ACME Employees ONLY'
    testserver: yes
  tasks:
    - name: setup a MOTD
      copy:
        dest: /etc/motd
        content: "{{ motd_warning }}"   

First on all there is question whether adopting a primitive syntax format is a way to achieve simplicity. If does help to prevent silly mistakes like missing semicoon typical for Perl.   But at the same it looks like this is just adoption of some primitive syntax to express the same set of wrong ideas that are present in Puppet. This small DSL "hello world" type example also looks too verbose for a very simple task it performs. Which is, essentially, equivalent to a single command:

cpush webservers:  /srv/Templates/etc/motd /etc/motd

And here is another that distributes Apache config file and restarts the daemon:

- hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    yum:
      name: httpd
      state: latest
  - name: write the apache config file
    template:
      src: /srv/httpd.j2
      dest: /etc/httpd.conf
    notify:
    - restart apache
  - name: ensure apache is running
    service:
      name: httpd
      state: started
  handlers:
    - name: restart apache
      service:
        name: httpd
        state: restarted

Again, this just another syntax sugar for a wrong set of ideas about to to create DSL for Unix configuration management system: create a set of general then Unix utilities primitives for specific tasks, such as yum, template, notify, service.

Or, more correctly, absence of any constructive ideas. IMHO, you iether should made DSL compatible with the set of tools and languages that sysadmin already using for solving those problems  (as was done in cdist, by adopting Posix shell as DSL), or you need to have some set of new constructive idea that allow abstracting those activities at much higher then present level. And the most primitive, but still somewhat useful measure of the level of abstraction is LOC metric to two scripts doing the same task. If bash comes close or beats particular Unix configuration management system in this metric, be vary.  Be very vary.

Creation of a good, compact and expressive DSL for Unix configuration management equires talent. Creation of yet another Unix configuration management DSL requires none. You do not need to read books about language and compiler design. You just do it :-) 

And  language designer talent is a very rare thing. Even if we look at modern scripting language that achieved huge success the main impression is that none of their creators has outstanding level of talent. They repeated mistakes already known in algorithmic language design from the days of PL/1 if not earlier, stepping at time on the same rakes as designers of Korn shell, awk, PL/1, C and C++. I would say more: creators of PHP were brain-dead in this particular area repeating most of common mistakes in language design. Perl designer (Larry Wall) has had some interesting insights, but he also could do much better with namespaces and such. Why only recently Perl got state variable (soft of replica of PL/1 static variables) is unexplainable. The problem was probably that the project suffered from limited resources most of its life (with the exception of a short period when O'Reilly milked Perl books franchise ;-)

Also we have the problem of "yet another language" which is independent of the quality of the language. In this sense the situation with DSL for Unix configuration management systems is even worse, because this language can't be "primary" language for any sysadmin. 

Writing 30 line file to deploy NFS or NTP (possibly in an incorrect way ;-) also is not a very exiting prospect for any sysadmin just because there are many things that are hanging over his shoulder and those tasks are far from being frequent, or more time consuming task to perform using bash and Unix tools then using Unix configuration management system. Unless it shields you completely from Linux/Unix flavors differences, which on the current stage is not done by any them and requires explicit programming. So the question arise are they really need to be automated, if for most packages RPMs work well and can be supplemented with you own custom bash scripts? Controlling daemons (which for some systems have a tendency to die) is a useful task, but  advertizing this functionality in books devoted to Unix configuration management, demonstrate the lack of good ideas, as this task it belongs to monitoring system domain.

The problem of software maintenance

We get rid of some problems of performing the task manually. But we got another, not less problem, instead. This is what in intelligence community is called blowback.  When you need to reuse this script that you have written a year or two ago, you face typical for software developer problem of maintenance. Options and locations of executables might change, new subsystems such as systemd introduced, one package can be replaced with another (Sendmail with postfix in the past; syslogs with rsyslogd, rsh no longer installed etc). And the script need to incorporate those changes too. Anothe some of those changes you may know, other may co,e as a surprise, when the script stop working or work in a way different from intended.

Also writing, debugging and testing DSL script can take as much time as implementing the same in old style fashion, or via simple bash script.  So only if you use script with huge number of differnt servers (let's say starting from 100) you might get some economy of scale here, because some differences are hidden by the configuration managment system and that makes you task easier.

For small groups of more or less uniform servers, simpler approaches might well be more flexible, more reliable and easier to debug. For example is you have to manage, say, 24 servers in one location (say all RHEL 6.x), 32 in another ( say all CentOS 6.x and 7.x) and 16 in third (various flavor Of Debian), you might not recoup the investment of writing, debugging and them maintaining those "recipes" in some DSL in comparison with custom or borrowed scripts using simpler tools.  Of course much depends of the quality of the particular Unix configuration management system and the set of ideas it is based on. Systems without any innovative ideas, based simply on the idea "let create another way of expressing the same operations, throw it at the wall and see what'll stick" (Puppet, cfengine) are usually the worst.

The problems with a set of DSL scripts, and your own bash scripts is the basically the same: as environment changes, the set of scripts that you wrote today might became inapplicable tomorrow. All major Linux flavors distributors are not known for sticking to the same set of configuration files,  or using the same set of daemons "almost forever" like in Solaris. Red Hat, for example, make quit a bit of previous work done for RHEL 6.x obsolete with introduction of RHEL 7 with its quirks and systemd. To what extent given Unix configuration management system can hide these differences is an interesting test of its functionality. 

I would like to stress it again:  Bash, despite its warts and historical baggage, is a pretty well debugged implementation of Unix shell, and has an optional debugger, high quality books and style guidelines. It is the language that is known by all Unix administrators, kind of "lingua franca" of Unix. All this is iether completely absent or in a very rudimentary stage for DSLs in Unix configuration management systems. Lack of intelligent debugging facilities is especially biting, and ability to perform a "dry run", while definitely useful  is nowhere close to what is needed.   That's probably why many users drop Puppet after trying it for a while (along with multiple bugs).  Pigs just don't fly, and if with enough thrust they attempt to fly it is dangerous to stand where they are going to land.  

Questionable return on investment

So when your manager gives you another 32 servers to manage due to redistribution of workload (read the other sysadmin left and they do not want to hire a replacement)  and tells you that the switch to DevOps which should make everything very easy, the last thing that would excite you is that previous sysadmin created a set of Puppet scripts linked to his own set of Python workflow scripts. What if you do not know and do not like Python and Ruby ? Software maintenance of somebody else software is a more complex activity then software writing by several order of magnitudes. 

Idiosyncrasies can matter too: if you need to understand the scripts and recipes of the guy who was OO-maniac, and can't write even simple straightforward script without using classes and inheritance (and such guys of some reason are often attracted to Puppet) , you are really screwed ;-).  And believe me such perverts are pretty common in Python-land.

Another problems with writing and debugging Unix configuration management script si that some activities, such as, for example, checking attributes of files for compliance with some set of rules is the domain of different systems. Such monitoring systems and hardening scripts (see, for example, old good Tiger ) . The same is true as for the checking if all the necessary for the particular server daemons are running. As such they might be better performed with a more specialized tools, although one advantage of tools like Puppet that I see is that they can double as a Unix monitoring system, in some cases saving you from the necessity to deploy and learn yet another complex software package (although simple Unix monitoring system is definitely preferable to Puppet).  

As for typical examples published in books devoted to Puppet and similar systems I can tell you one thing:  they are extremely naive about the problems with maintenance that you might be facing, even discounting the fact that in the introductory books you  can't provide really complex examples. Let's ask  ourselves a simple question:  how many times a year you deploy daemons  like NFS and NTP -- the most typical examples discussed in such books ? So from one time to another something might change and you need to modify your scripts accordingly. you can't just run them blindly. Don't you do it via kickstart during the initial installation and then simply adapt or copy existing Config from a similar server?  You know the answers.

Still the fact that examples you can find the two or more dozens of Puppet books are so simplistic and detached for reality, should serve as a warning signal that suggests that the king is possible naked. If examples are not worth the paper on which they are printed, to say nothing about the price of the book, that suggest that possibly the system described addresses wrong problem, or addresses the right problem in a wrong way.  

For example, for NTP deployment scripts published in many Puppet and other books that I have read, typically miss the most important, vital test -- correspondence of the time after the deployment to the atomic clock time  (which should be done by comparing the time displayed by local NTP daemon with time displayed on the server about which we know that it is configured correctly and NTP is working properly).  And there are way to many things that can go wrong  with NTP to check them one by one. You need an integral check and if it fails manual troubleshooting should be done.

Similarly, for NFS they often miss firewall related staff, subnet restrictions,  optimization of mounting parameters and what version of protocol the particular server should be using (v4 has huge problems in case of frequent disconnects).

This failure of Puppet books to provide useful in a "real datacenter" information in an engaging (or at least not extremely boring)  suggests one thing: the king can be naked.  My impression is that designers of Puppet were trying to create Unix configuration management system, but ended with a monitoring system with some useful Unix configuration system functionality. Somewhat similar to HP Open View, but more modern, better designed, more programmable and cheaper (if you buy professional support). 

In other words, the present generation of the systems accomplished things that value almost nothing in daily sysadmin workload and does not represent a huge advantage over  the set of custom scripts written in Perl, or bash, or other scripting language you know best. Often they intrude into monitoring area, which can be accomplished with other systems such as Nagios) and skipped tasks that have a real value. 

Agent-based vs. agentless configuration management systems

Unless the same system is used for both configuration management and monitoring I am slightly skeptical about the value of agents in Unix configuration management systems. In my opinion, the capabilities of  ssh in most cases are adequate for the tasks that need to performed, especially if you have fast network and you do not want to substitute your current monitoring system with something more powerful and programmable, but less specialized.

In a way, any configuration management systems which provide their own (often complex) agent working over SSL and does not provide adequate monitoring capabilities is trying to reinvent the bicycle and the quality of such agents is usually suspect. Moreover they can introduce additional security vulnerabilities, that are difficult to understand and slow to fix.  As such they do represent a security risk.

Essentially what they are doing is re-writing of parts of  ssh daemon again and again. Often with less qualification and with additional bugs, creating security problems or even backdoors, that are difficult to understand and that are usually detected only way too late. Actually this was the problem with OpenSSH for some times: in the past it was the most common way to break into ISPs.

Some of Unix configuration management systems are specifically designed as agentless and are simpler then alternatives. Among them:

Ansible 

This is a Linux configuration management system from Red Hat, so it is actively maintained.  The last version is 2.3.0 (March 15, 2017). Written in Python and requires Python on all managed nodes.   Nodes are managed by a controlling machine over SSH. In the simplest form /etc/hosts can serve as an inventory. To orchestrate nodes, Ansible deploys modules to nodes over SSH. Modules are temporarily stored in the nodes and communicate with the controlling machine through a JSON protocol over the standard output. When Ansible is not managing nodes, it does not consume resources because no daemons or programs are executing for Ansible in the background.
The design goals of Ansible include:

It can work both as a parallel execution tool and as a parallel scp command (in Ansible terminology there are called "ad hoc" operations):

ansible atlanta -a  "/sbin/reboot" 
ansible atlanta -m copy -a "src=/etc/hosts dest=/tmp/hosts"
ansible webservers -m file -a "dest=/srv/foo/b.txt mode=600 owner=joeuser group=joeuser"

Rex

Rex in one of very few Unix configuration management systems which requires only Perl 5 and ssh (both on master and nodes); as Perl in present by default on all commercial Unixes and Linuxes). Regular custom DSL, -- looks like nothing special in comparison with puppet or chief. But using Perl as an implementation language and also that language in which tasks are executed is a better choice of the language. Perl is installed on all Unixes by default now. So such systems has an edge. See overview  by Andy Beverlay at  An introduction to Rex - FLOSS UK DevOps York 2015. Was actively maintained until at lease late 2016 (on 2016-07-16 (R)?ex 1.4.1 was released).  There is also a draft of the book Rex Book (work in progress).     

Like Ansible, it can work both as a parallel execution tool and as a parallel scp command. So you can use it at the beginning without writing any DSL at all.

There is also sparrow test framework by Andrey Melezhik which can serve as a monitoring system for REX (sparrow-rex.md at master · melezhik-sparrow · GitHub)

Running sparrow plugin with rex is easy using Rex::Misc::Sparrow, let's amend Rexfile a little bit:
$ nano Rexfile

use Rex::Misc::ShellBlock;
use Rex::CMDB;

set cmdb => { type => 'YAML', path => 'cmdb' };
require Rex::Misc::Sparrow;

task "deploy", sub {
  shell_block <<'EOF';
    test -f ~/web-app/app.pid && kill `cat ~/web-app/app.pid`
    rm -rf ~/web-app
    git clone https://github.com/melezhik/web-app.git ~/web-app
    cd ~/web-app
    nohup plackup app.pl 1>nohup.log 2>&1 & echo -n $! > app.pid
EOF
};

Now having add Rex::Misc::Sparrow tasks into our Rexfile let's create cmdb file:

$ mkdir cmdb
$ nano cmdb/default.yml


sparrow:
    system:
        - checkname: webapp
          plugin: private@web-app-check

Now running rex -T examine what has changed:

$ rex -T

Tasks
 deploy        

 Misc:Sparrow:check    Runs sparrow checks
 Misc:Sparrow:configure    Configure sparrow checks
 Misc:Sparrow:dump_config  Dumps sparrow configuration
 Misc:Sparrow:setup    Setup sparrow

 

cdist

Cdist  does contain one unorthodox idea that brought my attention to it: the usage as DSL of a regular POSIX shell. This is the idea I also subscribe to.

Another notable idea is the idea of code generation of scripts for execution of nodes (in a rudimentary form).  I would also like to mention a creative use of Unix hierarchical directory structure for encoding information about "objects" in this configuration management system.

It was written by Nico Schottelius   and Steven Armstrong around 2011. The designers explicitly advocate simplicity and requires only ssh and  shell (bin/dash is recommended) on the target servers while using Python 3.2 (not available as default on RHEL up to 7.2) on master. Why they needed the latest and great version of Python, while writing a commodity software is a mystery to me . Documentation is very scarce and very bad. It is almost impossible to understand how the system operates and why particlt structure was adopted. But there is cdist group. on Linkedin. The last version is from 2015, but the latest commit in github is from Aug19, 2016.

Usage of shell as DSL means that after you install cdist, you do not need to learn ugly new DSL  and curse the designers for incompetence and bugs. 

The main concept of cdist is so called type, which (in my very limited understanding of the system) is a complex object, consisting of a set of executable (let's say object methods ;-) and files (let's day object variables). The whole cdist looks like pretty sophisticated API for shell scripts, designed to simplify writing complex configuration management scripts. The set of files and directories (subtree) for a type includes:

Types are stored in the directory called $CDIST_ROOT/cdist/conf/type/. The authors recommend to prefixed type named with two underscores (__) to prevent collisions with other executables in $PATH, because in scripts you are just using the names of those components and they should not conflict with system executables: 

All type components are written as files into a special tree with properties stored as files in higher level directories, such as parameter and parameter/default. For example here is the partial definition of the type __nginx_vhost
TARGET=$CDIST_ROOT/cdist/conf/type/__nginx_vhost
echo servername >> $TARGET/parameter/required
echo logdirectory >> $TARGET/parameter/optional
echo loglevel >> $TARGET/parameter/optional
echo use_ssl >> $TARGET/parameter/boolean
mkdir $TARGET/parameter/default
echo warning > $TARGET/parameter/default/loglevel
echo server_alias >> $TARGET/parameter/optional_multiple

As manifest of a type is a shell script, you can call other "types" form it, creating kind of "poor man" inheritance. For example, the __package type, you try to abstract form the type of the os for which package manager is executed in the following way (this is a bad example,  which simultaneously shows the weakness of -- cdist -- the  absence of meaningful abstraction of the os version, but nevermind) :

os="$(cat "$__global/explorer/os")"
case "$os" in
      archlinux) type="pacman" ;;
      debian|ubuntu) type="apt" ;;
      gentoo) type="emerge" ;;
      *)
         echo "Don't know how to manage packages on: $os" >&2
         exit 1
      ;;
esac

__package_$type "$@"

This is actually an ugly solutions (see Migrating away from Puppet to cdist (Python3) Hacker News ) which demonstrates the lack of imagination, but it is better then nothing.

Another interesting feature is that unlike most other systems that I encountered it's explicitly designed to generate the code that can be executed iether on master or on target nodes. In the generated scripts, you have access to the following cdist variables

but only for read operations, means there is no back copy of this files after the script execution.

if [ -f "$__object/parameter/name" ]; then
   name="$(cat "$__object/parameter/name")"
else
   name="$__object_id"
fi

Explorers are scripts that are executed on the target for every created object. The explorers are stored under the "explorer" directory below the type directory. For example, an explorer can check the md5sum of a file on the client, like the example below (shortened version which was derived from the type __file):

if [ -f "$__object/parameter/destination" ]; then
   destination="$(cat "$__object/parameter/destination")"
else
   destination="/$__object_id"
fi

if [ -e "$destination" ]; then
   md5sum < "$destination"
fi

Repositories and note on capabilities of RPM-based systems

Red Hat introduced RPM in 1995. While they never marketed it as a configuration management system, in reality belongs to this class of systems.  It was based on Solaris packaging system and like the latter it operates with the notion of packages (cpio archives with additional pre and post processing scripts added).  It is the most widespread type of repositories of Linux packages (Debian Apt is is a distant second) and as such its architecture and solutions are interesting for anybody who is interested in Linux configuration management systems.

The rpm systems capabilities include:

There are two command like tools which can provide information about installed and available packages:  rpm and yum.  GUI tools are also available. Yum is more sophisticated of two and provides capabilities of  automatic updates and package management, including dependency management. It works with repositories, which are collections of packages and are typically accessible over  HTTP (http://), FTP (ftp://), or filesystem (file:///).  It is a written in Python and is a derivative of Yellowdog Updater -- an updater for now defunct Yellowdog Linux distribution for Apple Macintosh, which was adapted to Red Hat by folks at Duke University Department of Physics.

YUM   has the ability to install groups of packages. Which if you have a prove repository you can create yourself. This is really useful because many tasks require a collection of different software that may on first glance not look at all related.  There are too types of packages in the group: mandatory and optional.  Yum installs only those packages that are marked as mandatory. This is normally fine because it usually installs all of the key packages, but if you find it didn’t install what you’re looking for, you can still install any missing packages individually. To find out what groups are available (and also which ones you have already installed), you use the following:

yum grouplist

One of the groups that sysadmins tend to use a lot is Virtualization. This group contains all the packages you need such as the Xen kernel, support libraries, and administration tools.

To get information about the group including the list of packages use

yum groupinfo Virtualization

To install a group, you use the groupinstall command:

yum groupinstall Virtualization

If the group you want to install has a space in the name, enclose it in quotes:

yum groupinstall "Yum Utilities"

As with installing packages, Yum will present you with a list of packages that it needs to download and install in order to fulfill your request.

Classic example of using this capability is installing X11, if you missed it during the initial install:

yum -y groupinstall "X Window System" "Desktop" "Fonts" "General Purpose Desktop"

To remove all packages, of any type, in the named group use groupremove.

yum groupremove groupname

It will also remove any package that depends on any of these packages.

There is also yum-groups-manager which allows to create groups in the YUM repository. See manpage at yum-groups-manager(1) - Linux man page

 

Is Unix configuration management a special case of software development ?

  I read that book a long time ago. What I remember (perhaps incorrectly) is that there are simple, compound and complex failures. One error causes a simple failure, two a compound and three a complex. Complex failures are usually catastrophic. The errors were 1) failure to learn 2) failure to anticipate 3) failure to adapt. Perhaps a bit overly structural, but it did stick in my mind for years.

Comment by BobW to blog post

 "It aren't what you don't know that gets you into trouble. It's what you know for sure that just aren't so."  

Mark Twain

The popular now "software development" analogy, while interesting from the purely intellectual standpoint and appealing to whose who write their own scripts on a regular basis, falls short if we analyses the realities of sysadmin work with the major problems of accommodating various flavors of Linux/Unix, and unanticipated effects of even trivial changes (which can demonstrate themselves only after the fact, which can be discovered days or weeks after the change was made). The rollback of botched changes  is quite different in such a complex system as linux -- on server that runs applications only total reinstall of OS from previous version returns the previous state. Like in river you can't enter into the same linux system twice :-)

In other words, the Unix/Linux datacenter is fundamentally a chaotic system with a high degree of complexity and indeterminacy and periodic crisis situations (in some of which heads roll).  In this sense Unix system administration is a different activity the software development although OS development probably is closer then other (changing user requirements and high or very high influence of fashion are very  similar). 

First of all software maintenance of complex software systems such as OS, or compilers (the author was involved in the latter) is far from paradise. The code gets less and less architectural clarity with time, with new features, contributions from new people, bug fixes, and workarounds. The result quickly becomes unmanageable: difficult to modify without unexpected side effects, hard to reason about, and increasingly failure-prone. So it is unclear why this is an ideal to which we should strive.

Also the way Unix sysadmins are thinking about changes to the system it is different from how software developers think about development of modification of software. I have done both and I can tell you that, while I am a former programmer, usually I am thinking about system administrations tasks more in terms of  a surgical operation on  "OS image"  -- converting the current image of OS into desirable or curing some ailment.  Sometimes "under anesthesia" -- with users disconnected, applications shut,  and system booted to special level. And like any surgical operation it involves substantial risks, and should adhere to the principle, "First do no harm”. Consequences of "interventions" are often different from what you expect. Sometimes very painfully so (that's why there are a lot of unpatched systems in major datacenters; sysadmin just do not want to take the risk of screwing the complex system up).  While in software development tasks I am usually thinking in more simple terms of adding new features/functionality, or fixing bugs.  I never think about software maintenance as a surgical operation. 

In this sense all this DevOp hoopla is missing the target and as such is just another variant of Agile  marketing scam (see Devops Is a Poorly Executed Scam ) liberating organizational fools from their money:

I've got to hand it to the Agile development guys — they were really good at liberating money out of organizations that all had trouble with something inherently difficult. The geniuses who developed Scrum and Extreme Programming executed masterfully; selling books and training; and they made some serious bank doing it. If you hang around Silicon Valley long enough, you know to applaud the hustle. It's the classic Rainmaker scam. You pay a man to make it rain on your crops, and when it rains, he takes the credit. If it doesn't rain, he comes up with an excuse that involves you paying more money.

While surgical operation on "OS image" analogy is not perfect, to me it makes sense and allow to organize my activities in a more predictable and controllable and safe manner. OS image can really exist as a file in case of virtual servers. The  "target system state" may already exist on one of the servers (test or quality server). After that the task is the "elimination of the differences" with the "ideal state" approach. Much like in sculpture, where creating a statue is just taking the piece of marble and eliminating extra.  Differences between a current system state and the desired state  imply that there is some "delta" -- a set of files and RPMs that needs to be applied to non-conformant system to transform it into desired state. 

And this delta can be visualized as tree of files that needs to be changes and the set of packages that need to installed/updated.  Such a tree typically is compressed into tarball, distributed and them "executed" (applied in a very controlled manner) on all target systems. So creating of such a delta is more of a iterative process of comparing two systems and removing "extra"  files and packages that differ and adding/updating packages that writing a program.  Most sysadmin activities are more close in spirit to some complex task of synchronization, a superset of what rsync is capable to do),  then writing a set of boring, trivial, or, in case of Puppet, "intelligence insulting" scripts that push files and packages to given servers (although in some case such an approach also can be useful) and which essentially hide what one wants to achieve.

That might mean that systems that utilized images of servers in a special filesystem, full or  partial,  and implements instruments for manipulation them are a better way to go that traditional "push the files" approach.   After all 100 full images of linux system directories, say, 6GB each is only 600GB or less then a terabyte and now fits a USB stick. 300 such images (which is pretty large datacenter with more then 300 servers, as one image can correspond to multiple servers) fits 2TB USB drive. And still you can put such drive in your pocket  ;-)

Exaggerated, unrealistic  claims are hallmark of Unix configuration management systems and DevOp hoopla

 

charlatan --

  1. quack • charlatans harming their patients with dubious procedures

  2. one making usually showy pretenses to knowledge or ability

  3. fraud, faker a charlatan willing to do and say virtually anything to remain in the spotlight — Alan Brinkley

 charlatan was our Word of the Day on 06/03/2014. Hear the podcast!

Definition of Charlatan by Merriam-Webster

Most current Unix configuration management systems are still far from being mature. The main push for tier deployment comes from DevOp hoopla. They suffer from verbose, non standardized "configuration definition language" (DSL) and might be a dead end due to overcomplexity. Many suffer from abuse of XML and practice borrowed from Agile folk  -- inventing new terminology for the sake of new terminology and making simple things complex. Selling the king a new cloth is old and still very profitable business. Here is a typical example of small, trivial program in DSL (domain specific language) used in Puppet (Puppet Show: Automating UNIX Administration).  Essentially, the example below is equivalent to "hello world" program used to introduce new programming languages. The purpose here is to create file /tmp/testfile on a node (puppet client) if it doesn't exist:

class test_class {
        file { "/tmp/testfile":
           ensure => present,
           mode   => 644,
           owner  => root,
           group  => root
        }
    }
    node puppetclient {
        include test_class
    }

As everybody understands copying one file to multiple servers with a given set of attributes can be accomplished with a single scp command (using -p option in scp preserves attributes), or two. So this is pretty verbose alternative and creates some concerns about the validity of this approach. Why this type of DSL is optimal? Why it is so verbose?   This example also demonstrates both strong point and weaknesses of the typical approach in creating of such systems --concentration of creation of custom DSL. 

And if somebody suggest that this is a new more advanced way to perform Unix system administration I have some reservations. If you are not involved full time you probably will forget large part of what's need to be done from one encounter to another (and if you are you will become disconnected from real challenges of system administration.) So at the end you will use this system in the most basic way, utilizing probably tiny part of its capabilities. 

 I have impression that developers are simply barking to the wrong tree by creating this level of overcomplexity, or, in some cases, even may be artificially creating franchise that they can milk.  Not unlike the "Pet rock" project.  All of the leading systems of this class are huge monolithic system and as such it has bad integration with classic unix utilities and other components of the datacenter such as monitoring systems, helpdesk, etc. In this sense they does not look superior to the popular "tarball+ Parallel command execution tools" method  because they suffer from lack of constructive ideas about how to maintain complex Unix configurations on multiple servers, that have different versions of Unix. There is no "OS version abstraction layer", unless we consider the system itself to be such a layer. In most cases the differences if file locations and content needs to be explicitly  programmed into recipes.  

And creating a new DSL is not an answer, unless it can be more concise, more expressive and more easily debugged then alternatives. 

The key problem with the existing systems and  the lack of new constructive ideas. Which demonstrates itself in extremely boring  books. In a way most popular systems can be viewed as an "extent and pretend" variant of the set of ideas that were introduced almost 30 years ago in rdist utility (which was included in BSD 4.3 released in 1986 ).  In other words they can be viewed as a slick repackaging of basic ideas that are 30 years old (actually reading a brilliant article about rdist by Benedikt Stockebrand -- Introduction to Rdist -- is probably the best introduction to this set of ideas). 

Adding more "modern"  DSL (instead of shell-style used in rdist) and providing several bells and whistles changes very little.   But, as Agile has shown, "rainmaker" style marketing can be success: just attention is profitable if you can keep it. As the quote above suggests, in this case the income can come from books, training and conferences. In this sense even open source systems is not a panacea. They also can be a variation on the same theme as Agile.    

NOTE: Rdist is a classic Unix utility to maintain identical copies of files over multiple hosts. It probably provided the first DSL for configuration management. Here is a large quote from the manpage that gives you some impression of the power of the utility (the example below really belongs to the Unix as it existed around 1992 -- 25 years ago -- and as such is a historical artifact ;-) :

 It preserves the owner, group, mode, and mtime of files if possible and can update programs that are executing. It can use SSH as transport protocol and in this sense can be viewed as more flexible and powerful form of scp. Utility rdist reads commands from do called distfile to direct the updating of files and/or directories. If distfile is '-', the standard input is used. If no -f option is present, the program looks first for distfile, then 'Distfile' to use as the input. If no names are specified on the command line, rdist will update all of the files and directories listed in distfile.

Otherwise, the argument is taken to be the name of a file to be updated or the label of a command to execute. If label and file names conflict, it is assumed to be a label. These may be used together to update specific files using specific commands.

The -c option forces rdist to interpret the remaining arguments as a small distfile. The equivalent distfile is as follows.

( name ... ) -> [login@]host

To use a transport program other than rsh(1c) use the -P option. Whatever transport program is used, must be compatible with the above specified syntax for rsh(1c). If the transport program is not, it should be wrapped in a shell script which does understand this command line syntax and which then executes the real transport program.

Here's an example which uses SSH as the transport:

rdist -P /usr/bin/ssh -f myDistfile
... ... ...

The distfile contains a sequence of entries that specify the files to be copied, the destination hosts, and what operations to perform to do the updating. Each entry has one of the following formats.

<variable name> '=' <name list>
[ label: ] <source list> '->' <destination list> <command list>
[ label: ] <source list> '::' <time_stamp file> <command list>
The first format is used for defining variables. The second format is used for distributing files to other hosts. The third format is used for making lists of files that have been changed since some given date. The source list specifies a list of files and/or directories on the local host which are to be used as the master copy for distribution. The destination list is the list of hosts to which these files are to be copied. Each file in the source list is added to a list of changes if the file is out of date on the host which is being updated (second format) or the file is newer than the time stamp file (third format).

... ... ...

These simple lists can be modified by using one level of set addition, subtraction, or intersection like this:
list '-' list
or
list '+' list
or
list '&' list

The shell meta-characters '[', ']', '{', '}', '*', and '?' are recognized and expanded (on the local host only) in the same way as csh(1). They can be escaped with a backslash. The '~' character is also expanded in the same way as csh but is expanded separately on the local and destination hosts

The following is a small example.

HOSTS = ( matisse root@arpa)

FILES = ( /bin /lib /usr/bin /usr/games
    /usr/include/{*.h,{stand,sys,vax*,pascal,machine}/*.h}
    /usr/lib /usr/man/man? /usr/ucb /usr/local/rdist )

EXLIB = ( Mail.rc aliases aliases.dir aliases.pag crontab dshrc
    sendmail.cf sendmail.fc sendmail.hf sendmail.st uucp vfont )

${FILES} -> ${HOSTS}
    install -oremove,chknfs ;
    except /usr/lib/${EXLIB} ;
    except /usr/games/lib ;
    special /usr/lib/sendmail "/usr/lib/sendmail -bz" ;

srcs:
/usr/src/bin -> arpa
    except_pat ( \\.o\$ /SCCS\$ ) ;

IMAGEN = (ips dviimp catdvi)

imagen:
/usr/local/${IMAGEN} -> arpa
    install /usr/local/lib ;
    notify ralph ;

${FILES} :: stamp.cory
    notify root@cory ;

As you can see from the example above rdist covered almost all the ground covered in more verbose way by modern unix configuration management systems. And DSLs used in them are nothing new and might be one step forward two step back kind of things. They are far from being  expressive (some are  annoyingly verbose)  and in many cases writing a special script in some new and obscure DSL is not a better/faster solution in comparison with using bash or Perl and command line tools and scripts. And provide you will less control of the steps.  In rdist DSL the "hello world" example written in Puppet DSL presented above would look something like

HOSTS = (puppetclient) 
F=/tmp/testfile
${F} -> ${HOSTS} 
             special chmod 644 ${F}; 
             special chown root:root ${F}
             notify root@master      

Note: actually special chmod 644;  and special chown root:root are not necessary, if file already has those attributes.

And in both cases  the proliferation of such scripts creates the problem  of software maintenance, which is additional task to perform by already stressed and overloaded system administrator.  And this problem rises its ugly head with each release of  RHEL or Suse: there should period of adaptation to the new version for all scripts after such a release.  Nasty errors can be introduced by outdated or buggy scripts tuned to previous versions of OS, but executed on multiple servers in a group that includes new versions of the same OS.  Ask yourself how many of your own daemon control/verification of running scripts survived transition from RHEL 6.8 for, say, RHEL 7.2 without major changes. 

Nevertheless, if you manage multiple flavors of Linux (or worse both multiple flavors of linux and Unix) the need to automate some task does exists. the question is only: what is the best way. And nearly every system administrator tasked with operating a large (as in several dozens) number of  servers  eventually find, or write a set of scripts for executing the most common tasks.  Most brave try to write their own custom mini Unix configuration system (although they probably do not call it as such), increasing the level of automation of their works, but at a price of reinventing the bicycle.

So the first important observation about desirable properties of Unix configuration systems is that they should not force sysadmin hand but allow integration of his own scripts at least at the level the Midnight Commander allows (in user menu). Most sysadmin on senior level are quite smart people and can automate many of the tasks they face themselves. At least using bash (and bash potential here is definitely underestimated; it is difficult to beat bash in LOC metrics for accomplishing a given task even form Perl or Python).

So what system administrators really need is more like a custom IDE that help to write such scripts and provide some minimal API that allow to lessen the tendency of reinventing the bicycle again (such as logging, execution of  multiple server, mechanisms of recovery of changes went wrong are those things that needs to be provided). On the most primitive level that can be just a library of functions in bash or set of modules in Perl. But, in any case, the last thing sysadmin wants is to learn and then debug scripts in yet another badly constructed and badly implemented DSL, the path the most Unix configuration systems designers are hell bent to pursue.  If I do not know the scripting language in which particular configuration management system is written, I would choose bash over new DSL anytime.

There are multiple tools that help to solve this task and they usually fall outside capabilities on Unix configuration management systems. Some "baseliners" can double as such inventory management tools. Comma delimited files can be exported to Excel or other spreadsheet which provide a perfect viewer for this info, far superior to anything that can be achieved via Web interface.

Realities of system administration are quite different from software development: there is quite a lot of changes during the lifecycle of the server that requires modification of scripts. And this is quite a different subject area with a different price for the mistakes (remember how NASA lost probe due to some tiny error) . And due to that stricter discipline of applying changes to large number of servers. 

 The key difference is that each change should be ("uniformly") applied to large number of "slightly different" servers, each of which deviates from "ideal" configuration in its own (possibly dynamic; see the problem of many cooks in the same kitchen) way. While writing software for several different OSes is similar, here we have more variety and complexity. Writing software for 10 different OSes is rare activity. So this "hell made of many small differences" in only superficially similar to the issue of portability in software (although it does has similarities with year 2000 problem).   That makes, for example, the distribution of ntp.conf file to multiple (let's say 50) servers a non-trivial problem just because you can't be sure that you know all the factors that are important. As Mark Twin quipped: "It aren't what you don't know that gets you into trouble. It's what you know for sure that just aren't so."  

For example I once tried to deploy modified version of user dot files (we changed to environment modules package at the time) that (as I discovered later) have home directories of user mounted "on demand" using NIS. Previous sysadmin left three month ago and this "nuance" was never documented.

Also even for common files it can well be that some version of OS on those servers use different name or a  location or a different format for the file you distribute, that you are not aware, until it is too late. So even for such a  trivial operation as, say, as distribution of /etc/DIR_COLORS file you can run into the problem of incompatibility between different versions of linux: version that you created and tested for RHEL 6 will not work with RHEL 5. Who could guess?   So distribution of files to multiple servers is not so much the question of mechanics of distribution. It is mainly the question of knowledge which servers form a uniform group and which are "outliers".

That's where key problems arise even if servers you manage are all RHEL/CensOs/Oracle Linux  and just have different versions varying from 5.11 to 7.2 (almost ideal, dream situation for any sysadmin). That's why sysadmins  usually think about such tasks in terms of group of servers with the particular version of OS,  for which he needs to distribute iether individual file, or tarball or set of RPM that implement (and then can reverse, if needed) the set of changes. And his main concern is about what happens, if the change goes wrong on some servers of the group -- that he did overlook some important existing idiosyncrasies of configuration. And what will happens if he reboots the server after this "trivial" change ;-). Due to Murphy law those servers for which the change   goes wrong can be hundreds or thousands miles away from sysadmin office. 

Again, even such a trivial operation as reboot of the server that is working OK after some trivial changes are made, represents some risk. And cause some fear that server "will no come up", as any seasoned sysadmin can attest. In this case  things go wrong,  he should be able to restore the previous state of the system quickly and hopefully correctly (with servers as with the river, you can't enter the same river twice ;-). Hopefully not discovering other things that went wrong in the process -- a typical example is that the remote control unit such as DRAC or ILO, on which he relies also crashed and he can't login into it (which, at one time,  was pretty common problem for HP servers  due to screw up in the firmware of those units; Dell DRAC also at one time was affected  and those naive folks who believed that they should be able to connect to the server via DRAC without checking, were burned.  Some badly...

This emphasis of high cost of error and the ability to roll the change back is necessary is somewhat different that we have in software systems where roll back is usually trivial. Not so with Linux OS ;-).  Like with the river you can't step into the same linux twice :-). So even if roiled the change back chances are that you have slightly different Os then before, unless you installed it from backup. That's why making full backup before important change is sine qua non   (you should not forget that the author has a PhD ;-) for any seasoned Unix sysadmin.

And only then he is interested in such niceties as history of changes, branches, and other goodies, associated with advanced version control system programmers typically use. His main concert is about the validity of the backups of his systems and complexities of rolling back after the failure of some complex deployment such as RPMs that went wrong and "hosed" the system ( in certain cases making the system unbootable). As for version control system, local backup of files (with timestamps) in the same directory in many cases serve as well as more sophisticated version control.

This tremendous complexity of environment to which often trivial changes are applied distinguishes sysadmin from programmers, who usually needs to worry only about backup of his own programs and the data on which they operate and assume that the system is functional for granted.  Only those programmers who deal with maintenance of legacy system, can appreciate pains of regular sysadmin works.  As a superficial analogy, we can say that "year 2000 saga" is replayed in sysadmin context each day. To remind you all year 2000 fuss was about really trivial change in an old software, often without active maintainers and written in obscure languages.  Even blunders sysadmins made are different from one that programmers usually make. See Sysadmin Horror Stories

And while systems like Puppet  in certain circumstances can be useful, in reality they play really small role of the complex set of tasks, that arise in managing large number of  "somewhat different" servers. Especially, if you maintain three or more different flavor of Unix/Linux, while each exists in at least two different versions. And for sysadmins even such "Spartan" version control system  as  creating a backup in place before each change of configuration file, work surprisingly well in most cases. Files are typically really small and diffing the current version with previous generations is not that difficult. 

There are also few dependencies between various daemons that are not the that simple and more like "indirect influence" type. For example, many sophisticated daemons such as SGE depends on NTP working properly. The same is true about rsync.  Changing your network parameters, such as IP,  while being on ssh connection to the same box has a nuance about which you can easily forget: your connection to box can be cut after the change. So if something goes wrong, that's it. This is a very painful situation, if the server does not have remote control unit such as DRAC or ILO. 

Inventory management as an important part of Unix configuration management

There are also more mundane, but still important things in sysadmin work that lie outside configuration management per se,  but still are interconnected and extremely important. One of such tasks is the managing of the passport of each server with relevant information about each. This can be done with Excel spreadsheet or with  a set of HTML files,  or some more complex scheme, but this needs to be done. The simple quiz that illustrates a set of problems here can be to answer the following two simple questions in less then five minutes (assuming that you have access to your network at this time):

The limits of complexity and the sad reality of Unix sysadmin life: gradual loss of the knowledge of some rarely used components of a complex system 

If you do not use particular system on a daily basis you forget large part of functionality that use at some time had known and essentially degrade to some very basic staff. this is sad reality of Unix sysadmin life and I observed this effect on myself multiple times. For example, I at some point realized that I forgot not most but quite a lot of functionary of find (despite in the past teaching Unix and writing my own tutorial of find usage). 

If you use something only sporadically you can never become an expert in this particular system. And probably eventually set for a small subset of functionality. In this sense rich functionality and complexity is shortcomings, not an advantage of unix configuration management systems.

If you use something only sporadically you can never become an expert in this particular system. And probably eventually set for a small subset of functionality. In this sense rich functionality and complexity is shortcomings, not an advantage of unix configuration management systems.

The truth is that a lot of time   of a regular sysadmin is consumed by activities that that are different from configuration changes and maintenance. Concentrating on just a task of config management and creating and deploying huge system to automate this particular area by assigning a special person to it, as large organizations can afford is also not a good approach, as such person became detaches from he realities of sysadmin life and more often then not start engaging himself with "art for sake of the art" types of activities. 

You need to try to time your typical working week to see what drain your time most. And believe me you will discover that this is not maintenance of configuration of your systems. Dealing with users might be one such activity for certain weeks and this is typically complicated by the fact that ticketing system might be really horrible and more a nuisance then the help.   If this is where your time goes down the drain, that means that unless specialized personnel is involved (and that is possible only in large organizations and has it significant drawbacks) you will use this tool only occasionally. And forget most of you have known from one instance of use to another even if you keep you own journal (as you must).

So there is a clear limit on complexity that one can stomach and for Unix configuration system my hypothesis (please call it Softpanorama hypothesis to promote this site ;-) is that this level is very low, much lower that exists in Puppet and friends.  this is somewhat similar to what I previously observed is computer security area (Softpanorama Laws of Computer Security):

There are also some inherent limitations in the level of security achievable in any given organization. The author formulated three laws of Computer Security:

  1. In a long run the level of security of any large enterprise Unix environment can not be significantly different from the average level of qualification of system administrators responsible for this environment...
     
  2. If a large discrepancy between the level of qualification of system administrators and the level of Computer Security of the system or network exists, the main trend is toward restoring equilibrium at some, not so distant, point...
     
  3. In a large corporate environment incompetent people implementing security solutions are a bigger problem then most OS security weaknesses because users tend to react on their actions that decrease user-friendliness of the system by counteractions that the tend to restore it, simultaneously weakening the security level, often to lower level than existed before. The real computer security skills presuppose not only the knowledge of what should be done, but the knowledge were to stop in order not to cause excessive backlash. The latter skills presuppose understanding of architecture of the environment and are completely lacking in wanna-be security specialists. If incompetents happen to be in charge of security one should expect that they will implement the most destructive for corporate IT security measures dictated by the current fashion, driven by excessive zeal and desire to survive. Measures that backfire and due to use counteractions create security holes bigger then they are trying to patch.

So the tools should simple, preferable very simple with a very low learning curve at the expense of functionality.  That exclude Puppet and similar "all singing, all dancing" software packages from consideration, unless you also use them as a monitoring system. In case they are used only as fir configuration management their complexity is just way too much for a system administrator to handle.  Preferably, on level zero, this tool should behave exactly like pdsh. Only few Unix configuration management system that I encountered can  do that.  Rex is the only one that I know of.

Many sysadmin approach to solving Linux configuration problems  is an iterative guessing game, when you search Google, then try one thing, then another.  This happens mainly due to overcomplexity of environment, when you really do not understand fully the system you are working with, and has no changes ever to advance to this level. And solving problems when you do not fully understand the environment is like searching a black cat in a dark room.

This is especially true for patching Red Hat (and derivatives) servers, which create set of complex and unique to particular package management system problems that cause a lot of headache. On RHEL 6.x if you, for example install Mellanox Infiniband drivers, regular RHEL patching does not work unless you exclude quite a bit of packages.  Installing R from EPEL repository also interferes with patching of RHEL (library conflicts),  but removing EPEL from /etc/yum.repos.d allows patching to proceed OK.  With CentOS the problem is the set of valid repositories. Once I managed to patch the server with CentOS 6.3 to CentOS 6.7 only after replacing the content of /etc/yum.repos.d from a CentOS 6.7 installation (before that most repositories listed returned code 404 -- not found probably because version 6.3 was already removed from those repos). This was a remote server and using DVD for patching was not easy as somebody needs to burn it, and I forgot that I can use USB stick instead.

No configuration management system can solve this type of problems.  Sometime with RHEL one of several "very similar" systems can be patched, but on the other yum complains. Using your private repositories help, but not always. That fact that a typical RHEL installation consists of around 1600 packages excludes any possibility to learn them.  Most system administrators (including myself) now do not understand even the role of daemons that are active on level 3 and level 5 of RHEL 6.  In other words we need to deal with a closed system.

Also the amount of information that you need to remember is such that some of it fades away, despite being essential. Sometimes I look at my old scripts and realize that for example in the past I used to know find much better than I know it today. 

But the key problem is the fact that the system configuration tasks are rarely central to sysadmin life that this severely limits the level of complexity of the system you can "afford".  A lot of time of sysadmin is consumed by mundane problems and dealing with (often clueless) users and (often equally clueless) managers :-). Among such drains of time we can mention: 

The last problem -- the problem with the unending stream of security patches --  probably deserves more close look. Many security problems covered by the stream of patches emanated from Red Hat and Suse are iether impossible to exploit remotely, or not applicable to the particular environment on the datacenter where servers are installed.  Also existence of NSA and CIA guarantee that sufficient set of vulnerabilities are always present to simplify their tasks ;-)

So all those efforts belong  to the category of "waiving dead chicken". Avoiding blatant architectural errors and configuration blunders might be a more modest and more realistic goal, but it is never articulated as such. Instead we are fed with unending stream of  "Corporate speak"  (aka  corporate bullshit) about importance of security with one Potemkin village built after another.  I think Hillary now can have a very successful corporate lecture tour on this particular topic.

The task of applying this stupid stream of security patches from Red hat or Suse is often raised to the level of life-or death problem by the security department, which in order to justify its existence,  insists that they are all applied in a timely fashion, even if those patches mean absolutely nothing at overall (often dismal) level of security of the particular organization or a particular datacenter. For example, if all server access internet via proxy and in addition site and server based firewalls are used, and, hopefully, properly configured why we should bother with the vulnerabilities that target closed ports?  Not only often those patches are related to services already blocked by firewall, they often require very  special condition to exploit (for example an account on the server). And believe me, as a former security specialist,  really good exploits are sold for money to three letter agencies and  "rich" hacker groups long before (often years) they are patched by vendors ;-)  

Soon you start to hate the security people involved. And often not without reason, as they are often dumped from other IT departments because they are useless or gravitated to security themselves , as an opportunity to repair their injured ego ;-).  Sometimes I saw a really amazing level of security paranoia in organizations artificially maintained by the security department in order to preserve and maintain their value (often fictional; as I mentioned before, security in reality is the problem that exist and should be solved on the level of datacenter architecture and the last department involved is deciding architectural issues is the security department).

For example, I saw organizations which deploy their internal DNS root (so you can't resolve any external IP without going via proxy) and simultaneously once a month or so send their sysadmins the list of security patches that need to be applied ASAP, the list created by scanning servers with some third rate vulnerability detection system that produces a lot of false positives. But the latter does not bother anybody. Instead efforts are concentrated on reporting and maintaining the spreadsheets about the percentage of fixes accomplished.  Fortunately,  there are some tricks that you can deploy against those security junkies, but this is quite another topic.   See Softpanorama Bulletin. Vol 23, No.10 (October, 2011) An observation about corporate security departments

Of course, we also know about opposite cases as well, when extremely sensitive systems were configured and administered as if they are home systems. See, for example, Understanding Hillary Clinton email scandal. Which is not surprising and just an opposite side of the same utter incompetence coin: extremes meet.

What problems we are trying to solve

 

The animals were happy as they had never conceived it possible to be. Every mouthful of food was an acute positive pleasure, now that it was truly their own food, produced by themselves and for themselves, not doled out to them by a grudging master."

- George Orwell, Animal Farm, Ch. 3

"I will work harder!"

- George Orwell, Animal Farm, Ch. 3

"All that year the animals worked like slaves. But they were happy in their work; they grudged no effort or sacrifice, well aware that everything they did was for the benefit of themselves and those of their kind who would come after them, and not for a pack of idle, thieving human beings."


- George Orwell, Animal Farm, Ch. 6

The work on Unix system administrators was always hard. Often it requires long hours. Like in popular song "The cowboys work is never done", the work of Unix system administrator is never done. That reminds me the tale of Sisyphus: 

In Greek mythology Sisyphus was the king of Ephyra (now known as Corinth). He was punished for his self-aggrandizing craftiness and deceitfulness by being forced to roll an immense boulder up a hill, only to watch it come back to hit him, repeating this action for eternity.

And that situation does not change with the invention of Unix configuration management systems. You just get more systems to manage. But we are digressing.

There are three main problems that Unix configuration systems are trying to solve:

The level of non-uniformity of the datacenter is probably the most important factor, that corporate IT brass does not want to address. And it by-and large determines which Unix configuration management system to use because  the tool needs to support all flavors of Unix you have. Adding to that Windows installations is probably not wise, so tools that support simultaneously Unix and Windows usually support well none, and should probably be rejected because striving for that is just greed (large market share) and often is an architectural error (unless Cygwin is used on Windows side).

More specific problem that Unix configuration management systems are trying to solve include:

  1. Ability to hide most of the OS differences related to configuration and patching of the servers  (now with the dominance of Linux this is less important, although Solaris and HP-UX are still remain parts of enterprise datacenters)  using a domain-specific language. That was actually the initial idea behind cfengine. Also if you use just RHEL and derivatives you can use kickstart for deployment and yum for package management, but if you have both SLES and RHEL, your situation is more difficult.
     
  2. Reporting about the changes you did to the server yourself and related problem of being informed about (sometime wrong or redundant) changes done by other sysadmin(s)). "Change we can believe in" made by somebody else, and which produced "interesting" side effects is sometimes pretty difficult to detect :-).
     
  3. Ability to put specific configuration files under revision control and to ease the burden of having to remember to commit changes to multiple boxes (using a distribution to a specific group of servers instead). There are attempts to use git for this purpose, but git is badly suited to Unix configuration specifics and unless you use git heavily for software development this is a bad idea. It's a dog which barking to the wrong tree. Such packages as etckeeper can be viewed as a failure. Of course, you can always write you own set of scripts to make work git better using it just as a storage of configuration information, but this is another story.
     
  4. Consistency checks between server belonging to one group and comparing the current configuration with configuration of other server or configuration of the same server N days ago. Existing configuration management systems are bad at this. Baselines are specialized class of program designed with this particular goal as you can diff two baselines (typically being text files). but there are even better  more specialized system for this purpose. Also existing utilities like diff and mc has unique capabilities for this purpose too: few people know that GNU diff can take two directories directly as parameters without any "input substitution magic". Try something like diff /etc /Rescue/Baseline/Etc_old
     
  5. Automation of similar changes (often distribution of patches or changes on configuration) to multiple servers ( for a particular server group within which this group of changes supposedly does not break anything ;-) and maintaining consistency of a set of manually modified configuration files across all servers ( /etc/resolv.confntp.conf/etc/postfix/main.cf, /etc/profile, /etc/bashrc and user dot files are good examples here).  This is actually not that difficult to implement using such tools as rpm, pssh, PDSH, C3 Tools, but it is somewhat better to have an integrated functionality, which created an integrated log of such operation and put them in the general context of the "lifecycle" and "workflows" for the particular set of servers.
     
  6. Automation of collecting of configuration information or hardware information from multiple servers both for resource management and for bare metal recovery. In Puppet such information is called facts and there is a special utility factor to collect them.  If you use daily backup for your systems,  you also have a collection of configuration files for the system as a part of backup. The problem here is that in enterprise datacenter backup is bureaucratized and fossilized.   Baseline of the system and private tarballs is a simpler method to have a collection of basic configuration information in time (usually one year is enough). Baselines are organized for ease of comparing two system or two states of the same system using regular diff.  Similarly tar balls of  /etc directory can be compared with the current state using tar itself. That makes creating a backup of tarball on the first root login each day (from root profile script) of paramount importance.  Many SNAFUs can be avoided if you have a tarball of /etc directory made at the beginning of a particular day. 
     
  7. Control of some daemons that tend to self-destruct, verification that they are running and restart of daemons and applications in case they died (the task is typically performed by monitoring systems, which are suitable for it). Paradoxically, some monitoring systems agents (for example, HP Open View) are so notoriously unreliable that you need an additional layer of software to ensure that they are running properly (HP Open View agent consists of half a dozen daemons that tend to die and sometimes need troubleshooting to recover; here Unix configuration management system can be of great help). In some large enterprises giants of thought from monitoring group (which in feudalized enterprise IT is, of cause, a separate group with its own manager and its own interests, distinct from  the interests of the enterprise as a whole) automatically create tickets for sysadmins for each dead daemon (probably because re-launching daemons would distract them from watching porn on the  job;  this is probably the most close approximation of Sisyphus labor in modern IT :-) 
     
  8. Semi-automatic verification (and reporting of violations) of important OS settings (the set of task which in old days was usually incorporated in what was called  "hardening" scripts) . Unix configurations system provide already pretty developed infrastructure that can simplify (and also can complicate) set of  "system sanity" checks such as checks of file and directories permissions, presence of various banners, absence of typical errors (blunders) in configuration files that open the server wide, etc.   In the past there was a class software systems that were designed to verify certain setting and enforce some parameters. They were known as "hardening scripts".  Such early systems as  Cops by Dan Farmer and Titan by Brad Powell were probably the most well known. Later Solaris Jass  and Linux BASTILLE (badly written, but hugely promoted)  and became somewhat popular. Around 2010 they eventually disappeared or, more correctly, went into semi-forgotten stage, but the idea is still valid and now can be executed on a new level: the level of  scriptable Unix configuration management systems.  Actually the task of re-implementing functionality of a typical set of hardening scripts, such as Titan, is a very good test of for a particular Unix configuration management system. It gives you much better assessment of strong and weak points of such a system, then creation of some stupid or not so stupid "evaluation matrix" -- the sport that became alarmingly popular in enterprise IT environment, as such a matrix can hide the responsibility for a blunder.
     
    The task of re-implementing functionality of a typical set of hardening scripts, such as Titan, is a very good test of for a particular Unix configuration management system. It gives you much better assessment of strong and weak points of such a system, then creation of some stupid or not so stupid "evaluation matrix" -- the sport that became alarmingly popular in enterprise IT environment, as such a matrix can hide the responsibility for a blunder.

     

  9. Documentation of the life cycle of the server, events that happened and operations performed and presentation this information is a blog or wiki format.  Lack of documentation and limitation of human memory when you are dealing with the typical flow of tickets in a corporate datacenter are such that here some aid is not only desirable, it is extremely, utterly necessary. It is a survival tool. And simple paper log that in the past was "good enough" while still useful is not adequate on the current level of complexity. you need a Web site format like blog or wiki to help to deal with this level of complexity. Unfortunately corporate tickets systems (help desk systems) are so bureaucratized and mismanaged that they are more an obstacle then a tools for documenting changes in the system you manage. Here different systems which are less controllable by corporate bureaucracy might help. For example, PuppetDB stores and aggregates data about changes to nodes. All dashboards provide a web interface to review the data from PuppetDB and there are tools that utilize the same DB as a data source. 

    Often a new problem in Unix system administration domain is nothing but  well forgotten past problem.  So maintaining records of your activities in a searchable format (not necessary database, HTML and plain files is as good, or even better) is of paramount importance.  MediaWiki is often used for this purpose too as learning it has value beyond this particular domain. While it is a complex software and uses wiki format which I hate,  it does provide several useful tools such as  discussions, versioning and other wiki services and is pretty well debugged (this is the engine used by Wikipedia). Ability to document your day-by-day activities, and especially blunders, or as they are now called SNAFUs, is now an important part of the life in system administration, because you will lose most of this knowledge in two or three months and if you face the same problem again most likely will try to reinvent the bicycle  ;-). Also people tend to repeat blunders (and different administrators are susceptible to different blunders; our shortcoming are an extension of our strong traits)  them unless they periodically browse their logs. Weaker folk  often try to swipe their mistakes under the carpet, which  usually complicates the situation.  See Sysadmin Horror Stories for some telling examples. 

This (notably incomplete) list shows pretty clearly that such systems overlap with several existing systems, and first of all with monitoring systems and RPM-based systems of distribution of patches such as YUM, especially  YUM ability to use private repositories. RPM format includes the capability of running pre and post scripts. As for overlapping functionality with monitoring systems, as I mentioned before, Puppet can definitely compete with Open View. Actually, only when I started to view Puppet as a monitoring system competing with Open View, its design decision started to make some sense to me. And do not looks like a horrible overkill.   Because agents definitely have a value in monitoring systems. There is actually a book about use of Puppet for pure monitoring: Puppet Reporting and Monitoring by Michael Duffy (Packt Publishing, June 24, 2014)...

Another subset of functionality definitely belongs to version control systems such as subversion and git.  Actually central git repository can be used as a source of distribution of changes which allow a very well controlled mode of distribution of the configuration file with the possibility to reverse the actions and simultaneous documentation of each change, built-in diff mechanism, etc.   

Yes another subset of similar functionality is implemented in so called bug trackers, as most changes include not only description of the problems but also a set of files and other documentation that needs to be stored.  Trac integrates with git and subversion and provides minimal but adequate wiki for documentation. See Comparison of issue-tracking systems - Wikipedia

For  any integrated system there is always some overlaps with the existing systems. That's the nature of the game. The problem is the quality of implementation of particular function that overlaps, in comparison with the "dedicated" implementation of the same.  We all know about tools that can perform many functions, but can't perform any of them well.  Moreover there are some niche products that essentially undermine the whole concept of "Swiss knife for Unix system configuration management".   For example, "environment modules" represent a specialized configuration management system for a very narrow domain -- user .bash_profile and .bashrc scripts.  This package defies  the concept of Swiss army knife for Unix configuration management.  The same is actually true about another unique tool for Unix system administrator -- Midnight Commander.   But here I am not impartial observer...

Also it is clear that there was no clear breakthrough in this type of systems yet. There is only some incremental and rather slow progress and the rising complexity of this category of tools, as if complexity solves the problem, not proliferates them.   No exiting or revolutionary ideas were introduced by this type of software.  All of them belong to the category "same old, same old".

For example, books about Puppet (more then a dozen exists) so are boring that reading them is a real pain. And they typically advertize boring, semi-useless and detached from real sysadmin needs examples like deploying something like NTP daemon as an ultimate achievement. And even for this simple task, the functionality that they provide is not very convincing. For example few such example, even those published in the books,  include the most vital check after the installation -- whether  the time displayed by NTP daemon after the installation is correct (and this is the major real problem with NTP installation in any large organization, as complexities such a proxies and firewall make everything pretty convoluted). In other words what they are doing is not very useful and is a minor  enhancement of the capabilities of the existing RPM package. Which with minor modification would provide the same or better functionality with less efforts.  Moreover, unlike learning Puppet (unless you are a Ruby enthusiast), modifying RPM package instantly teaches you a really valuable skills. Which can be applied in such areas as troubleshooting library conflicts in complex software installations.

All those consideration again beg the question "Is the king naked?"

The fact that some of those configuration management system are used by several large and influential organizations proves nothing: large and influential organizations are notable for using software junk because due to huge available resources including manpower they can make them work and, at least, appear useful. Many such system are examples of "let's do something" approach to creation of Unix configuration management system and lack any constructive ideas and approaches to the problem (see GNU cfengine as a classic example).

The second problem is that their "server configuration description languages' are still at the stage of infancy. With some are not really useable and most far from being comfortable. The typical first reaction of normal Unix sysadmin at seeing such description is "why the hell I need all this additional complexity?". That makes simple tools like baseliners and some adaptation of software version management for configuration files more attractive as they provide, say, 80% of functionality necessary with 20% of troubles.

Related problem is that they try to solve tasks that are solvable by other means no less well and avoid tasks for which configuration management system is of primary importance -- such as automating patching of group of servers and creating a visual map of complex Unix servers configuration which allow better to understand it and make fewer mistakes in modifying it.

Wikipedia defines configuration management in the following way

In information technology and telecommunications, the term configuration management or configuration control has the following meanings:

  1. The management of security features and assurances through control of changes made to hardware, software, firmware, documentation, test, test fixtures and test documentation of an automated information system, throughout the development and operational life of a system. Source Code Management or revision control is part of this.
  2. The control of changes--including the recording thereof--that are made to the hardware, software, firmware, and documentation throughout the system lifecycle.
  3. The control and adaption of the evolution of complex systems. It is the discipline of keeping evolving software products under control, and thus contributes to satisfying quality and delay constraints. Software configuration management (or SCM) can be divided into two areas. The first (and older) area of SCM concerns the storage of the entities produced during the software development project, sometimes referred to as component repository management. The second area concerns the activities performed for the production and/or change of these entities; the term engineering support is often used to refer this second area.
  4. After establishing a configuration, such as that of a telecommunications or computer system, the evaluating and approving changes to the configuration and to the interrelationships among system components.

How a simple task can become a pretty complex one due to complexity of environment:
distribution of a changed config file to multiple servers

One typical task that any Unix configuration management system should do well is to distribute a change in a single config file to multiple servers (a server group). The task looks simple, but actually in a typical datacenter it is not. That is mainly due to multiple flavors of Unix/Linux involved.

There is a lot of hidden knowledge required to implement even simple changes and this knowledge often exist outside of any automated system. Some of it is even difficult to formalize. The main complicating factors here if the number of affected servers and "remoteness" of some servers. The latter means that if they crash there is no simple way to get into the server room where they are located, and often there is no personnel on duty to perform anything more complex then putting DVD into the slot (or there is not personnel at all). Feed back in case you lose networking connection might be limited to pictures taken by smartphone and such.  Here is the list of some complexities that may arise and precautions that might need to the taken (especially for remote servers, where there no personnel on duty at the time of the change due to differences in time zone or other reasons):

  1. You need to preserve previous version of the file on each affected server. Otherwise you can't roll the change back.
  2. You need to create a "manifest -- list of files the you distribute to each server to simplify roll back.
  3. If the change is "critical" you need a set of partial or full backup of each server to be performed first.
  4. Verification of the list of affected server, that constitute the group. The problem here is that your list of affected servers (server group) can be wrong.  This is a typical problem when attempting to propagate a file to a very large number of servers. There are always some outliers.
    1. There can be "manually patched" servers that already contain configuration files with timestamp newer the "cutoff" for the change. In other words file that were edited manually and now can't be "unified" without understanding what was added and why. Such files can be detected because diff file for them will different from a "typical".  That means that you need to save deltas and compare them. 
    2. You need a method to verify the old version of the files does exist on all affected server and you do not overwriting files on wrong servers. In a the most benign version this looks like sending  updated bash profile to HP-UX  server that does not contain bash, or contains too old version of bash.  In more menacing version this looks like overwriting files on RHEL 5.x server with configuration files belonging to RHEL 6.x.
    3. You might also need to verify additional prerequisite for the update, especially if you update the kernel. Red Hat sometimes play bad jokes with kernel updates. 
  5. Change of daemon config files often require restart of daemon or other post-installation actions.  In case you update configuration file for a daemon that is running, you often need to restart the service for the change to take effect
  6. Verification of change. You might need some method to verify that change actually work on all servers in a group and, what is even more important,  produced the necessary change in behaviour.  Even if you view them as identical there might be some hidden differences that will change the behaviour of the update. For example, one of several servers might have the registration expired and no longer can access repositories needed. Actually testing validity of registration is a must if you install some package on multiple servers. 
  7. Creating documentation for this patch. Group of changes typically is viewed in system administration as a patch. It is desirable to generate some documentation about the change made "just now" so that you don't forget about and another system administrators be aware about it too.
  8. If thing went wrong. In case of SNAFY, You need some mechanism to uninstall the change (which in this case means restoration of the old file and possible restart of daemon or similar actions) if you find that you made a mistake or the change does not work as expected (the situation, which typically is detected when it is too late)

See also Config files distribution: copying a file to multiple hosts

Tarballs distribution as a poor man configuration management systems

  Those days we weren't considered fun
A cowboy's work is never done

... ... ...

Right, I'd like to ride again some day
I think I'd still know how to play
I play the game but it's not fun
A cowboy's work is never done

Sony Bono
Sonny & Cher - A Cowboys Work Is Never Done

 

Standard Linux/Unix distributions contains enough powerful tools that can significantly simplify accomplishing of the 80% of tasks that Unix configuration systems perform. Often with less troubles and zero leaning curve.  At the core of any Unix configuration management system there are two very simple concepts which were already present in rdist created more then 30 years ago: 

I would like to stress that the most typical way Linux/Unix administrators perform configuration management tasks connected with distibution of s set of files to multiple servers is to create tarball that contains the changed files and then use so called "parallel execution tools" to backup files to be changes, apply the tarball to the target group of server and then verify that results. This set of parallel distribution tools typically used includes but is not limited to such tools as PDSH, C3 Tools, rsync, rdist. NFS or any other shared filesystem such as GPFS also can be used for this purpose and are typically used in HPC cluster environment.

In Germany, Eastern Europe and xUSSR area file managers such as Midnight Commander (which allow you to compare two directories and works well with RPMs) is often  used as the  sysadmin tool of choice for creating tarballs with changes.

Moreover, I use is as frontend to my own scripts and integrate them into  Midnight Commander user menu, making selecting files that are involved in particular operation simpler and more reliable.  This visualization of what you are changing or putting into your tarball (you can have the content of tarball be visible of the second panel of Midnight  Commander while adding the files, is very important for creating a "right"  tarball with all the necessary files to be changed.

Visual feedback increases "situation awareness" and as such cuts down on mistakes, especially disastrous one. So Midnight Commander can serve as a important part of sysadmin arsenal of tool for configuration management -- a "universal frontend"  that pass list of selected files to your custom scripts. It also has a primitive ability to work with remote filesystem via ssh (providing you with a virtual filesystem).

After the tarball is created such tools such as C3 Tools  cexec/cpush utility or to  rsync  are used to distribute it to the particular group. If you do it from the script the group to  which the change applies can be supplied via environment variable. 

One nice thing to do is to put into each changed file a unique "signature" (version of the file and date of change) by grepping which you can determine that the right file is deployed without diffing it will the "etalon". Of course this is possible only with the file that allows the comment, but version also can be encoded in the time fields as the number of seconds in the files date of creation.

Naturally, the checking,  if all the right servers really received the proper version of the changed files, represent that most important half of the task of deployment of any, no matter how trivial,  change to multiple servers.

Rsync and custom RPMs represent the level above that and can word with "seed servers" and custom repositories.  The latter can contain set of RPMs for common operations and in order to distribute a new change you just update the RPM that implemented this change in the past. This approach is  indispensible if  the task in hand is more complex that tarball can handle and needs some pre and post conditions and/or need per server checking of applicability.  Of course, pre and post scripts can be integrated into tarball as well, so RPMs are not the only game in town is you need this functionality.  Advantage of RPMs is that their deployment using yum provides you with the history and other goodies, which in case of tarballs is missing and you need to create everything from scratch reinventing the bicycle. For large number of servers this is an important advantage. 

Some sysadmins also use versioning system like git with various levels of success (pulling updates files from the central  repository is not a bad idea of implementing some changes; it provide instant backup of previous versions and change control).  But the capabilities required for rolling back change in system administration are still different from those that git provides. That usually involve much more then restoring previous content of files changed.

Still git (used in moderation) can help to maintain the log of changes for critical configuration files and allow to roll individual files back for several generations, if necessary.  Git is also easy to deploy and are not that difficult to learn, at least on basic level and is a very useful tool for your "seed" server.  You just need some set of script that put all changes files into git or other version control system automatically at the end of each day, because in case of multiple sysadmins you can't rely the everybody who touches configuration file uses standard commit operation.   In a way they can be considered as a step up in the direction of a "full" Unix configuration management system.

Seldom Unix sysadmins are excited about makefiles and other tools that software developers are using daily and take for granted.   That last thing any Unix sysadmin wants is to write a script to distribute a single file to multiple servers, or, God forbid, to list explicitly attributes for each file, like in many examples of half-baked books on this topic recommend (Puppet books are especially bad in this respect, reflecting the weakness of the system).  Only to the extent that Unix administrator is often also a programmer (most senior sysadmins know in addition to shell at least one scripting language on professional level; often this is Perl, or Python) he might see the analogies more clearly and enjoy this way of application of changes more.  But he also clearly can see the huge  differences and shortcoming of viewing Unix configuration of multiple servers as a software development task. It is simply not.  Spending an hour on relearning testing and deploying  written a year or two ago script for distribution of updates to /etc/hosts files is not time well spent.  Here problems are well known, and such change can be implemented in 10 minutes using cpush or similar utility with the same reliability and even version control.

It goes without saying that Unix always have tools to simplify performance of those tasks.  For example rdist -- a program to maintain identical copies of files over multiple hosts. (it preserves the owner, group, mode, and mtime of files if possible and can update programs that are executing) is almost as old as Unix.  Later ssh became standard de facto protocol for distributing files to multiple servers, which do not have a common filesystem with "seed" server (such as NSF or GPFS).  For example, because they are on a different continent.

The tarball method of distribute configuration multiple configuration files to multiple server  involves several steps:

  1. Implement manually changes on one of the servers and verify that they work
  2. Create a tarball of changes (possibly using Midnight commander) and "manifest" file with the list of files (just the list of file with absolute path).
  3. Create backup tarball on all servers of the selected group using manifest file. Verify that you are replacing the same set of files on all servers (some server might have manually edited file among the group you intend to replace).  This can be done a loop comparing tarballs from the group, one by one,  with the "pristine" set of file to be changed.
  4. Use one of the method of distributing changes to the target groups of servers:
  5. Verify the result using some custom scripts or comparison with the known working instance.  This last part is usually the most complex and challenging as server that appear identical and belonging the same group might have idiosyncrasies about which you forgot. And even one incorrectly deployed instance defeats the idea.
  6. Restore the servers for which update failed to their initial state using tarball of original files created at step 2.

Another major problem is how to abstract the differences between various flavors of Unix. So far I did not see any bright ideas in this area. All efforts are primitive and ad-hoc. But that is the domain where  Unix configuration system should  put the most efforts as such a difference are the major pain in daily sysadmin work with multiple Linux/Unix flavors.  Currently I do not see anything that exceed the usefulness of a "poor man configuration system" that uses the seed filesystem (can be shared with the Doubletree structure

  1. A set of directories each of which represent one flavor of Linux and contain common configuration files. For example, on the "seed filesystem" might have the following set of directories. For example:
    /Seedfs/US/NJcenter/Linux/Rhel/6/etc^hosts
    /Seedfs/US/NJcenter/Linux/Rhel/7/etc^hosts
    /Seedfs/US/NJcenter/Linux/Sles/11/etc^hosts

    In this example, the files are "flattened" by replacing "/" in path with "^" so that they all can reside in a single directory (which  simplifies editing and processing them with scripts).  All three files can be simlinked from "lower level directory /Seedfs/US/NJcenter/etc^hosts -- the hosts file common for the particular datacenter.  The levels of hierarchy are optional can be adjusted to your particular situation.

    As most configuration files comes from /etc directory you can omit prefix etc^. So any file with zero number of "^" symbols is assumed to be from /etc.
     

  2. A set of directories that contain set of packages that need to be additionally deployed/removed for the particular group of the servers (group can form hierarchical structure with lower nesting level (closer to the root)  groups containing common packages for all higher level (less general) groups (possibly hosts can be simlinked from "lower level directories" a common repository). For example
    /Seedfs/Rmpdb/Rhel/common/
    /Seedfs/Rmpdb/Rhel/6/Webservers
    /Seedfs/Rpmdb/Rhel/7/Webservers
  3. (Optional) A set of "compiled" "partial images" of all groups of servers --  one image per server group -- a set of system directories with the files to be distributed from which tarball can be created.

Please note that in case of configuration files tree  you can not symlink file from more general levels of hierarchy  but you also can programmatically generate them using scripts before distribution.  The key idea here is that you "compile" the image of the server, using methods developed for code generation in compilers.  This complication can involve creating a set of yum command to deploy remove ROMs. And then, after testing,  synchronize this complied image with the set of real servers in the particular group using uniform "image synchronization script". Please note that if your image is full it can be patched "in place" using chroot command.

The level of details can vary. In ultimate form the complete image is stored in each branch of the seed directory. In this case the problem of maintaining multiple servers is reduced to the problem of maintaining of multiple images in the same filesystem, which facilitate sharing of files and other tricks that allow to simplify system administration of multiple servers.   This approach is, for example, used by Bright Cluster Manager, which provides opportunity to reimage the server from the assigned to it image (one image can be used by a group of servers) on the reboot.  This idea of reimaging the servers or workstations from the central image or database was also at the core of LCFG design (which in the key ideas is quite similar to  Kickstart  which, in turn,  was influenced by  Solaris jumpstart).

Kickstart implement  this idea differently allowing you to recreate the image of the standard DVD using so called kickstart file.  In this case you reinstall the server from a kickstart file and then apply a set of additional changes using post scripts to achieve the required configuration. Outside of computational nodes on clusters and other simple server configurations this approach does not work well as each server eventually became too idiosyncratic to be described by you post scripts, unless you generate them automatically (which is also possible)

A variation of the same method avoid creation of the tarball by putting all changes into version control system such as subversion or git and extracting those file on target servers.  While this is a more fancy way to accomplish the same,  it does not produce critical advantages.  It is usually enough to implement version control on the seed server.  The main advantage of is that it provides you with far better documentation as all changes on all servers of the group, as each change is documented in version control system.

The alternative way, more suitable for complex changes, when you need to check if this change is applicable with a script ( test of the timestamp is not enough) or not and may need  be execute some "post-change" scripts, is to create your own RPM (or, better, modify the existing one -- which is easy with Midnight Commander), distribute it to all servers (or put it in your private repository) and use YUM to install this RPM.  In this case documentation exists within yum logs and RPM database.

The third way (suitable only is you already has SGE or similar grid scheduler deployments) is to use grid scheduler and write you own "envelope" (called submission script) script that provide pre and post checks. Then the job can be submitted to all nodes of the group via such scheduler. This method works well for clusters. It can be combined with two previous approaches. Essentially, in this case SGE is just a higher level, more sophisticated version of a parallel execution tools that has some additional capabilities (for example it can wait until the server CPU is not loaded) and is scalable to thousands of hosts.

So you can view it as the "next generation" of such tools as  cexec or PDSH.  It also bring the concept of the group of server on a new, more sophisticated level. Of course this approach is more suitable for clusters, where grid scheduler is deployed by default and does not need to be specifically installed.  In this case you also do not need to suffer from learning curve as this is production tool without which cluster is not operational.

In many cases the existing generation of Unix configuration management can't compete in efficiency and simplicity with those "primitive" approaches and they adds very little or nothing to the capabilities presented by those "poor man" Unix configuration systems. 

Some not so obvious problems

There are several not so obvious problems that arise in the environment when multiple system administrators try to manage multiple intersecting groups of servers.  Among them

There are many other, but that is enough to show that any Unix configuration system has severe limitations in what it is able to accomplish. "Human factor" remains very significant, if not decisive, factor in this business and all this "software development" talk is, in ways,  just an attempt to swipe those problems under the carpet.

Problem of many cooks on the same kitchen

No matter what Unix configuration system you use and what is the major flavor of Linux in your datacenters, you face the set of additional complex problems when several sysadmin administer multiple servers. Usually they have unequal qualifications, some of them can behave badly under stress, and due to this some unique problems arise. System administration area is far from being a paradise. and there are several complex problem that go above and beyond distribution of changes and patches. One of such problem is the problem of multiple cooks on the same kitchen and informing members of this team about actions of each other.  There is a fair amount of backstabbing as well, especially if there are one or two narcissistic jerks in the team, who consider themselves superstars and everybody else "trash".   “Cascade of interventions”  that can happen with multiple administrators especially if they work different shifts can happen when something going wrong, often making the situation worse. When one administrator make some disastrous change and then denies that he made it is easy to get too emotional. But it is better to get technical ;-). For that you need the tool that record changes and allow you to recreate history and reverse changes without too much drama. 

Even if you are a sole system administrator for the particular group of servers it makes sense to keep track of all the small changes you make to the configuration of each of them and understand three things: 

Without supporting tools this three simple items are an impossible task, as there ware way too many changes for human mind to remember. Also with the complexity of modern Unixes answering the second question often represent formidable challenge, especially after month or two since the change was done. The key problems that you forget significant, often critical details way too  soon, typically in a couple of months or even sooner  after the change is made. And what is important is that people tend to forget the most crucial, complex details. Recovering of which later will require substantial work and Google searching. With "reinventing the bicycle taking, hours, days or even  a week. Keeping personal log (for example, in the form of the private  Web site on tablet, or netbook) can help if done religiously, but complexity here is such, that it is not enough.

Also the flow of problems is relentless and often you need to deal with more then one problem a single day.  Juggling several problems a formidable challenge  and switching from one problem to another during the day is productive only if problems are relatively minor. For "real" problem you need 100% concentration, and here other problems are your enemies.

But constant distractions is the reality that Unix system administrators face. Add to this long hours and you really are ready to any tool that can help you. But often such a tool is a false promise.

The worst problem that you face is the problem of limitation of human memory: there is just  way too many things that Linux/Unix sysadmin needs to remember. Even the number of utilities in Linux is such that without personal notes and manpages you are often lost. and forgetting some importance nuance might help you to make some disastrous moves that you somehow managed to avoid the previous time.

Stress also created additional problems. Stressed syadmin usually commit more errors.

 Also some forms of protection from plain vanilla stupidity is welcomed. I have been in this situation several times.  And believe me such things as rebooting the wrong server as just child game in comparison with other blunders that you can step into under stress, in a hurry, or because being too tied (but I do recommend you to renamed the reboot command on production servers to something like reboot_usdell68 as part of post-installation tuning (where usdell68 is the name of particular server). Or replace it with the script that asks a simple question: is the right server to reboot.  How about wiping /etc directory on the critical corporate server in the middle of the day just because you have etc directory in your home directory and accidentally put a slash in front of the etc in the rm command (that's why backing up /etc/ directory should be done on the first login to the server (from your .bash_profile) during the day :-). It's really simple to implement and on "level 0"  can be as simple as adding to your .bash_profile script  the following:

if [ ! -f  ~/backup/etc/etc`date +%y%m%d` ] ; then  
   tar cvzf  ~/backup/etc/etc`date +%y%m%d`  /etc &
fi

Such operation it is almost instant on modern servers.

But additional protection from "stupid" operations on system directories should go far beyond that.  Many Linux distribution now offer primitive but important defense against wiping critical directories such as /etc with the rm command. But we need more sophisticated mechanism in this area that really help sysadmins to avoid unpleasant SNAFUs.  Something like "safety net" can for example be implemented using  AppArmor. Unfortunately this very interesting idea was killed due to RHEL dominance.

The problem of unannounced or forgotten changes, missing files,  "history gaps" and importance of your own knowledgebase

It is very difficult to restore the chain of events and actions using tiny peaces of information that you can extract for /root/.bash_history, logs (if you your organization keeps them that long) and files in your home directory (which should include tar of /etc directory for the last year or at least six months). It instantly becomes clear that  that important things were never documented as they were not considered as such in the heat of the moment.  And later another fire prevented documenting everything.

Here is where using version control of system file can really help. But having version control records is also not a panacea, because it is not enough to have records, you should also understand that logic behind the changes made. The latter is not given. That's why using HTML and Web site format and SSD disk with you logs is better then paper log. The search on SSD disk is reasonable fast and can be done using standard Unix tools such as grep. and if you document your changes even in a very simple format such as one directory one change  (see  Perl Wiki as a System Administrator Tool) in many cases you might uncover additional useful information that you previously recorded, about existence of which you already forgot. 

Another set of problems exist when other sysadmin leaves the company and his servers are transferred to you. No matter how hard you try to obtain the necessary knowledge before he leaves the company and no matter how cooperative he is, huge gaps will be discovered in your knowledge later.  And documenting those problems and the solution found, one by one essentially creates you own knowledge database that help to maintain those servers with less frustration.

Of course we just scratch the surface of this important topic that deserves separate page -- see Perl Wiki as a System Administrator Tool. In a way nothing demonstrate limited capacities of human brains better then modern Linux  systems ;-).  Complexity is just overwhelming and far beyond any human abilities. And vendors trying to fatten/secure their bottom line by continuing to increate complexity with each OS release, each of them imitating Microsoft path to the glory.  

In any case the guiding principle is that you will forget important things and needs to put considerable efforts in preserving the "trail of evidence" for your own activities (if not activities of your colleagues). That's why even such thing as keeping log file of your daily activities via screen log, Teraterm log or some other activities logging tool is a step in right direction.  You need to be your own NSA :-) 

Creating deltas of  /etc, /root  crontab,  and other critical files (including  /root/.bash history )  on a regular basis is also worthwhile. They should be stored on  a remote server  or at least a USB drive, so that they remained available is the server root filesystem went south.   Reading .bash_history in the morning in a good practice that  help to avoid blunders and "revive" your previous actions. And it is vital if there are several sysadmins for the same server. Comparing previous day version of /etc with the current and sending you a difference can be put in your cron script.

In any case you need to take steps to prevent typical SNAFUs caused by misunderstanding some aspects of OS or utilities or plain vanilla human error. When a serious disaster strikes particular server you can get to your files instantly, not after hour of talking about retrieving backup tapes.  Also the most typical "serious" problems arise the problem itself is trivial but the latest or all backups unreadable to some unfortunate confluence of factors.  Which elevates this problem to the level of major SNAFU. For example, HP Data Protector can abruptly stop backing up files and if this situation is not noticed, you are up for major problems if something bad happen with disks and filesystem is lost (for example RAID controller died, or server room was flooded). In this case your own private backups is all that left.

If the situation is similar to what you experienced before (and many such cases are),  browsing history and your personal log them might help to revive essential facts and ideas about what you did, why you did and how to do it again recovering from this problems the last time (or, if you made some blunder, not do repeat it again ;-).

Those "memory crutches" are far from being perfect, but they better then nothing and as with the current level of overcomplexity of Linux they are a must.  A typical Linux configuration management system does not address this important area at all. They concentrated on "operations" part, which represents only a tiny subset, a tip of the iceberg of problems you face. "Knowledge database" part is probably more important. 

Putting undue amount of efforts only on "change implementation/change control" part is just a barking to the wrong tree.

"Knowledge gaps" and lost parts of your own experience, misplaces or lost files, scripts and notes,  are probably the most important problem that you face even if you are the only administrator, who administer a set of Linux/Unix servers. That's why organizing them as a web site is so important and you should not spare efforts on creating this "private knowledgebase".

The situation of many cooks at the same kitchen just adds additional stress and complexity and require additional efforts to avoid misunderstand, but does not present anything new in this respect.

Only your own knowledgebase can help you promptly remember details of how  resolves previous problem with your servers, when they reoccur (possibly in a new context, but still when your previous experience of solving them is vital). Even remembering critical switches and options of Unix commands and utilities (which are way too numerous and duplicate each other) is simpler with your own pages, which can be populated each time you frantically search Google and man pages for some forgotten switch, example or combination of switches.  

And they can help to understand how the system evolved with years. without this knowledge dealing with complex problems can be more difficult, and if you take a wrong direction you can easily make the situation worse (especially under pressure).

So creating and keeping your own knowledge base is probably the major part of the art of modern Unix configuration management  and Unix sysadmin skills in general.

Configuration management tools supposed to help to answer the problem of  "too many cooks in one kitchen" in some way by standardizing common procedures and writing scripts for them iether in a standard scripting language such as bash, Perl, Python or Ruby, or in a special "domain specific language" (DSL).   This approach is more helpful if the number of the number administrators for the server is more then one and the number of servers is more then a hundred. Medium size datacenter usually has around 100-300 "real" servers. Large data center are a special case anyway and they have resources to tackle those problems.  

Tracking changes in a server configuration files is critical to understand problems and often substantially help to find the root cause and repair the server or the OS, including security problems.  Making mistakes is easy. It is troubleshooting them what is hard.

Inability to find the necessary information, that you know exists somewhere

When you manage couple of dozen systems you can no more view each system as an individual box and risk catastrophic errors like making changes on a wrong box or not enough boxes. You need the log the changes "per group" not "in general" as different groups of servers present different sets of the problems. That does not exclude having "master" journal, but the only way to get it right is to use entries from "groups" journals.

Even more nasty situation arise when you make changes on the right box but using a wrong set of assumptions about it as between changes you forgot some important facts pertaining the box or a group of boxes.

You can utilize for this purpose a separate small tablet (7" Samsung tablet with bluetooth keyboard works OK), or netbook (Dell 10" netbooks work perfectly well), so that it remains portable like paper "lab journal." And reading your journal entries pertaining particular group for systems before making any important changes usually can save you from a lot of troubles. Just the act printing and reading them (if you commute by train) is often worth more then the best configuration management system. Typically Bug Tracking system also be used as a personal journal and provides a lot of useful functionality but I have found that such simple tool as HTML editor (for example Frontpage) with each group represented as one Web site is good enough too.  Perl Wiki or blog engine also are viable options.

Avoiding SNAFU due to typos, making change to the wrong server and other trivial blunders

The key to avoiding SNAFUs in making changes to multiple server is strict following of a standard software development process: use IDE with editor that has syntax coloring, each change should the standard sequence of steps,  which includes such steps as "documented", tested and only then applied. In other words, in complex environment there are no simple changes. All changes are complex and require full software development cycle to be successful.

It is especially important to adhere to this simple rule for remote systems, visiting which involves driving over 100 miles or, worse, an airline trip.  Using corporate bullshit we can state that :

Unmanaged configuration changes impact an organization's ability to prevent outages, understand the impact of planned changes, and especially in today's regulatory environment, adhere to corporate and government policies. Knowing who changed what and when is vital to complying with today's security requirements.

Tom Perrine of the San Diego Supercomputer Center recently offered this guidance to an Internet newsgroup aimed at university security administrators. It offers sage advice for anyone managing and securing networks of heterogeneous UNIX systems. I actually do not share his excitement over cfengine -- IMHO badly architecture agent-based system. Also in a way cfengine is a misguided attempt to reinvent TCL by a person who has no real talent for language design. As happens in such cases such attempts lead to a predictable bad results.

Let me take a small step back and philosophize from a wider perspective.

The local Cray folks have a saying: "Wanna-bees worry about GigaFLOPS, and nanoseconds; real computer companies worry about *cooling*..."

I think that the real "higher ground" is security will be won (if it ever is) in two strongly-related areas: software quality (process) and (automated) configuration management.

Let's face it, the quality of most commercial software is pretty pitiful at worst, and sub-standard at best. As an industry, we have pretty much ignored 40 years of software process research and lessons learned. The first paper on what we now call "buffer overflows" was published in 1965. This paper and those related to it was influential in the design of Multics, portions of the original UNIX system-call interface, and security kernels. They called this problem "insufficient argument validation" in those papers), and it also influenced language design and the move towards higher-level languages.

We have ignored all the "formal methods", strong specification, structured design and adequate testing strategies. We have forgotten (or never learned) all the lessons of Mythical Man-Month, Peopleware, The Psychology of Computer Programming, Software Tools, and many other books, methodologies and studies. As in the security arena, we have most of the technology and lessons figured out, we just don't apply them :-(

Configuration management is related (a part of any proper development process), but we often fail to use it in non-software-development areas, even if we do use it for software. There is no reason for a person to *ever* ask "What version of *anything*, is this?" and not get a good answer. There is *no* reason for computers to have "version drift" where patches or software are inconsistent. Again, we have the technology, whether it is cfengine, SMS, or vendor-supplied or home-grown scripts, it is just not being applied.

So why are these basic technologies not being applied? The answer is short-term thinking, similar to that that drives the quarterly earnings drives of most US companies.

Let's face it, it initially takes longer to establish a proper software development (or any other) process. You have a steeper, longer initial spending/development curve, and pay more of the costs "up front", and dramatically lower costs in the maintenance and update phase. (You also have fewer bugs to fix, pushing the support costs even lower, but I digress.)

... ... ...

So I guess I believe that "wanna-bees" worry about exploits and patches; real security people are more concerned with process and management..."

For more of my heretical views, see "Security as Infrastructure: Are you shooting rabbits, or building fences", a USENIX LISA Invited Talk.

http://www.sdsc.edu/~tep/Presentations/1998.LISA.Security.Infrastructure/index.htm

Sorry for the rant, but this has been a hot-button for several years, as you may have noticed.

Tom Perrine
San Diego Supercomputer Center

Classic "missing backup" problem


Yesterday,
All those backups seemed a waste of pay.
Now my database has gone away.
Oh I believe in yesterday.

unknown source (well originally Paul McCartney :-)

The classic "missing backup problem" looks trivial, but it is not. The essence is that you made some complex change, realize that it is not desirable (or worse botched the server) and now want to restore the system from backup. And at this point you discover that backup does not exist, or exist but is corrupted, or is not full, etc.

The key solution to this problem is to reverse the course of your actions. The implementation of change should start with the verification of the backup, or making a full backup backup yourself. In very rare case it take more then a few hours, so there no excuse not to to perform this step.  Both Relax-and-Recover  and rsnapshot allow to use USB drives for this purpose. The largest size of USB drive is now 8TB (with larger drives in the pipeline), so it is adequate for most local backup needs.

See also  Bare metal recovery of Linux systems

How to pick up Unix management configuration system, if you need or forced to

The key problem with existing configuration management system is that it is pretty difficult to distill the key ideas they are based on and determine their worth without actually using them for a prolong period of time. Books are mostly descriptive and tell you how the system can do this and that, not why this particular method was chosen. Articles that compare configuration management system are mostly superficial (see, for example, Comparison of open-source configuration management software ). In no way they answer the key question: why I should use particular configuration management system and does it provide for me real benefits in comparison with the collection of simpler tools. 

When you need to choose the right system for deployment, which should correspond to the needs of your organization your knowledge of particular scripting language in one of primary factors.  For example if you know Perl well, you better limit yourself to Perl based Unix configuration management systems. Unless you want to learn Ruby.  If you know only shell well, you should think about learning one additional scripting language ASAP, but meanwhile can choose the system that is "shell" friendly and generates target scripts in shell.

If your part of servers is more or less uniform and consists of different version of the same flavor of Linux (say RHEL) you can choose a very simple tool.  If you need to support in additional to Linux Solaris, HP-UX and AIX the tool should be more complex as here the differences with Linux are considerable (especially with AIX and HP-UX).

There is also a problem of the tool fitting the size of datacenter. The problems that exist in giant datacenters like Facebook or Yahoo are quite different then the problems in regular enterprise datacenter, or some research center like university labs.  Yahoo and Facebook can allow themselves to hire developers to help them to maintain and deploy Unix configuration tools so they have local experts.  This is typically out of question for enterprises. Enterprise IT outside financial institutions is usually understaffed and overworked. The same it true for most universities, although by definition such places are more friendly to developers of open source software.  But both often they just cannot afford additional complex software system to be implemented due to the lack of manpower, even if at the end of the day that might resolve some existing problems. 

In any case without trial period lasting at least a couple of month (60 days) it is impossible to choose the right tool. And even with trial period mistakes can be made, of you evaluated only a single tool. Such as evaluation should include at last three different tool belonging to different weight categories with at lest one of them agentless.  That gives you some perspective.

If you have a freedom of choice and really need one (two big ifs) you should always pick up Unix configuration system written in the scripting language you know best, be it Perl, Python and Ruby. If the system uses "plain vanilla" scripting language it is a better system for Linux/Unix administrators. How many DSL a regular human can learn? This problem id "yet another DSL" actually kills interests in such systems, unless they are pushed by higher management (with enough thrust pigs can fly; it is just unsafe to stand where they are going to land).  The Catch 22 here is the following: to learn complex system like Puppet is close to a full time job. But if this is your full time job, you are by definition not a Unix administrator anymore, and as such is useless. Because you can write only simple things, not really challenging deployments  scenarios where such system can provide real value, because you no longer involved with day-to-day administration tasks and do not understand interplay of complex nuances involved, which only can be obtained by doing day to day administration.  Large rich companies such  as Facebook and Google can actually bury real IT talent in such "monkey see -- monkey do" jobs and achieve some level of success (at the expense of people involved), but for other companies this is neither possible, nor desirable as top IT talent is a scarce commodity.  

Also sysadmin  can benefit from using the whole "undiluted"  scripting language and just using API to the system.  This way it will help him to stay current in his favorite scripting language (and that's why the system should be chosen from this angle).  Even if they use DSL for some things it should be iether  maximally close to the underlying scripting language (in which the system is written), or they should use YAML.  In this sense Chef, which uses YAML,  is somewhat preferable to Puppet (it also has better written books, such as Learning Chef A Guide to Configuration Management and Automation).  Still both are pretty complex, agent based systems.  And that has significant downside.

The second thing is simplicity. Linux is already overwhelmingly complex even without configuration management system :-). So any system that at least declare that being minimalist as their design goal is preferable to alternatives. Simplicity also implies low learning curve.

Simplicity also depends on  whether you know the scripting language in which such system is written or not. You always learn quicker and will be more productive in a system that is written in the scripting language you can program yourself. Not only most conventions used will be natural for you,  the learning curve also will be less steep. For this reason I think picking up Puppet only because it is probably the most poplar Unix configuration management system is not very wise move, if you do not know  (or at least want to learn in depth) Ruby.

Excessive verbosity and attempt to be more catholic the Pope are clear warning signs that this is a wrong system to be deployed.  You can spend a lot of time learning this crap with little or no tangible results. As somebody mentioned after initial periods of excitement,  such systems tend to became a nuisance, rather then help. And believe me this happen more often then people admit. So try to chose the system wisely because from now on you are essentially forced to use it. And it would be sad if most of tasks it perform can be accomplished better by other means.  Not having a Unix configuration management system is a much better deal then having a wrong one.

A large part of additional functionality of the current generation of Unix configuration management system
is just reinventing the bicycle: a similar or even better functionality already exist in other tools

Unix configuration management is far from being a new topic. On a basic level you need just to understand who and when made changes to particular system and compare two server configurations belonging to two moments of the server life. For example the current and 60 days ago. The simplest tool for this are so called baseliners. They can be taken daily and stored offsite to prevent games with their modifications "after the fact". Analyzing /root/.bash_history can help but usually is not enough. This is just a useful starting point (but that amplify the importance of using timestamps in bash history -- this is must for "enlightened"  sysadmins.)

After you understand what needs to be changed, think about making the change as a software development process. You need to prepare the file, document your change, test it and then distribute is to the set of nodes using some kind of tool. If your testing was deficient and you got into SNAFU you  need a way to reverse the change.

Tasks that go above this functionality includes some more sophisticated methods of synchronizing configuration files and patches applied to similar systems.  There are several already widely used "tried and true" methods beside what a typical Unix configuration management system offers. All such "alternative tools" are available as RPMs for all major Linux distributions, or can be easily installed  on all your systems. Extensive literature including books exists about their capabilities and use.

Among them:

Those methods can be combined: for example grid schedulers can be used to deploy RPMs or make file. Tools are known, well debugged and involve zero learning curve for most senior level system administrators.

Conclusions

  The fact is that large sections of the officer corps ... had no desire to fight for the republic, which they despised...  The constant tension between political and military leaders is exacerbated by wartime conditions.

You should not fell under the spell of magic words "configuration management system". In your particular circumstances and especially for smaller side of organizations and datacenters, it well might be useless or even harmful.  Other things equal simpler system or even set of existing tools will suit your need better then a complex one. Return on investment for additional complexity is negative in most cases. Also social factors are often play more important role. Mismanagement in large organizations sometimes takes really epic scale and here no Unix configuration system can change the situation to the better. It will remain a horror show, and you need just to suffer or quit.   And with current trend to outsourcing and virtualization of everything that management can look at such situations happen pretty often. Like Minsky moments in economics they looks almost inevitable.

As Unix sysadmin are overloaded and need to know way to many software packages adding another one can break the camel back. That means that if you choose one, you need to chose the system with the most flat learning curve, which allow initial usage3 as a simple parallel execution tool. Such as Rex.  Only when you will feel ready to delve into more complex staff you need to start converting your pre-existing scripts into Unix configuration management framework. And again no framework can replace your own brain. It is just a  very limited tool that may ( or may not ) help you in your complex environment, you face. 

Any Unix configuration management system does not exist in vacuum. They heavily interact with other systems such as version control systems, helpdesk systems, "knowledge" wikis,  and monitoring systems. Some Unix configuration system replicate lion share of functionality of Unix monitoring systems and are useful as such, as usually they are better written and better architectured than "pure" monitoring system. That also justifies some level of additional complexity (for example existence of the agents and need to configure, maintain and secure them on all servers). And, at best, they solve only a small part of the problem, and probably not the most difficult part. Especially in situation when organizations use monitoring system, bug tracking system, help desk system, etc which all intersect with the typical functionality of a typical configuration management system. In other words it is important to understand to what extent already deployed system complement and partially duplicate capabilities of a typical configuration management system. 

Due to his factor even such simple and well known utilities as scp, rsync and NFS filesystem can be productively used for automation of Unix configuration management instead of a more complex system.  And can accomplish complex tasks if used in script written in bash or Perl.  Moreover a typically configuration management systems does not provide functionality of  baseliners (which are typically used by Linux vendors for troubleshooting complex problems) and backup tools, especially bare metal backup tools such as Relax and Recover (and believe me a recent tarball is a perfect store of all configuration information about particular server you want).  And ability to restore the system after failed deployment of changed is must for any system that deals with complex production environment. Changes sometimes tend to destabilize working normally systems and understanding why this happened can take weeks or even months of your time. Even in a very simple and uniform environment of Web hosting, people periodically blow their systems out of the water due to rushed changes, causing pain for thousands of users and inflicting financial losses for their organizations (some users quits after such as an incident). 

While tar is rarely viewed as system configuration tool, but in reality it is a very useful one. That means that knowing in detail its capabilities (which are not trivial) is really important for Unix system administrators.   Creating tarball with changes, distributing it to servers and then untarring it can be simplified by using Midnight commander. And you, for example, can skip changing files with more recent timestamp then timestamp of the change (possible "outliers" about existence of which you might did not know or forget ).

Remember that if you encounter a real SNAFU after some configuration change on multiple servers (especially connected with deploying software using RPMs or other packages ), often only backup can save your skin.  Retuning to the initial state by reinstalling RPMs often fails. Actually re-reading Sysadmin Horror Stories before making complex change on multiple and, especially, remote servers is a good method of raising situational awareness. And that might be more effective than using for deployment of the change a complex configuration management system ;-). 

There is a huge advantage in sticking to just simple tools (KISS principle), which does not exclude some clever ways to combine them in order to enhance their usefulness. The main advantage of simple tools is that they do not stand between you and the task in a way complex systems do, when you need to learn how to troubleshoot them in addition to how performing tasks using them.  After all Unix philosophy of software development is based on the idea of the reuse of existing tools. That's what, for example, Unix pipes and Unix shells are about: they allow you combine simple tools to perform very complex tasks.   That means that you can concentrate on the task in hand instead of learning intricacies of some complex and potentially not very helpful tool, which reinvents the bicycle, contains its own set of bugs, gotchas, security vulnerabilities and require time to learn to use properly.  In Unix configuration management there is always, as people say " more than one way to skin a cat" ;-)

When an automated tool complicates the tasks that are relatively easy (forcing you to write long descriptions of what you intend to do) and makes more difficult to perform the tasks which are really complex,  one would wonder why you need such a tool at all.  In this case you need a courage to say: "the king is naked" and choose another path.

As the final note, please understand that Unix configuration management systems are useless for fighting the incompetence of IT management of large corporations which can be simply staggering and can be compared with the description of military bureaucracy in The Good Soldier Švejk. For sure it produces the same mixed feelings:  

“All along the line,' said the volunteer, pulling the blanket over him, 'everything in the army stinks of rottenness. Up till now the wide-eyed masses haven't woken up to it. With goggling eyes they let themselves be made into mincemeat and then when they're struck by a bullet they just whisper, "Mummy!"

Heroes don't exist, only cattle for the slaughter and the butchers in the general staffs. But in the end every body will mutiny and there will be a fine shambles. Long live the army! Goodnight!”

Jaroslav Hašek, The Good Soldier Švejk

Environment modules can also be considered as for them the area of deployment is much larger the HPC cluster where they are typically used (there is also a more modern version written in LUA). Mastering on professional level of those (actually quite complex) tools can pay greatly in all other aspects of sysadmin activity, beside maintaining a proper configuration of the servers.  Environment modules beat Puppet and similar tools hands down as for managing user dot files and creating standard environment for applications.

There are cases when tools such as Puppet fit the environment better,  for example when the requirements are so strict that "one step left or right and the guards shot without warning", but  those are very rare cases in civil institutions.

Only when the number of servers you manage exceeds any human capacity to understand them (which probably is true for any number above 100; can be much less if servers are non-uniform)  you probably need a more strict, more regulated approach, which involves switch to some (preferably agentless) Unix configuration management system. but it should be  written in the scripting language that you know well (the last requirement is really important for lasting success, unless you want to learn new scripting language) and or be able to generate "execution scripts" in bash or Perl.  Because the damage from a SNAFU when you manage hundreds servers (or virtual instances) can be tremendous.

Tricky deployments usually make it necessary to use scripts that takes care about all special cases, testing them (possibly involving QA) and then deploying it on groups of the servers that are affected by  a particular change.  Custom RPMs typically work very well for such cases and have an advantage that all the necessary infrastructure is already in place (rpm, yum, etc) and is well debugged. All you need is to create and populate custom repository. Which also can be used for deployment of "non-standard" packages RPMs of which can't be found in standard repositories to which your systems are connected.  

Another important case which warrants extreme caution, and additional efforts are remote datacenters. Here the number of servers does not matter, as even error on one is very costly (may involve your trip to some God forgotten location) and they tend to be non-uniform. In this case, if you seriously screw something up  because you forgot about important differences when you make a particular change. As a result you need iether  to drive more then a hundred miles, or fly to fix the mess.  But what is funny, remote datacenters are far less suitable to typical activities that Puppet tries to automate, unless you view it as a monitoring system in which role it is definitely an OK (but not very exiting) solution,

I would like to stress that in many aspect Puppet is competitive with Open View. Even Puppet agents in this case make sense and have the right for existence. 

Often modification of existing RPMs is simpler then playing with the level of complexity Puppet designers enforce on you, unless you are a Ruby enthusiast and would like to learn it better.  The same is true of all other Unix configuration management systems that Puppet compete with it.  They are all created by overcomplexity junkies, who in reality do not care about the reality of Unix system administration work, and tradeoff involved. May be initially they cared but later development deviated into "art for the sake of the art" type of functionality and "Microsoft mentality" prevailed.  This inability to keep the system simple and transparent, even at the cost of avoiding implementing some peripheral functionality,  is very upsetting.

And the last thing most sysadmins need is to master yet another complex software system; we have them more then enough already. Attempts to design yet another DSL, without attempts to standardize them and take into account the learning curve probably should be considered a special case of software graphomania.

Graphomania ... refers to an obsessive impulse to write....

Outside the psychiatric definitions of graphomania and related conditions, the word is used more broadly to label the urge and need to write excessively, whether professional or not...

Milan Kundera ironically explains proliferation of non-professional writing as follows:

"Graphomania inevitably takes on epidemic proportions when a society develops to the point of creating three basic conditions:

  1. An elevated level of general well-being, which allows people to devote themselves to useless activities;
  2. A high degree of social atomization and, as a consequence, a general isolation of individuals;
  3. The absence of dramatic social changes in the nation's internal life. (From this point of view, it seems to me symptomatic that in France, where practically nothing happens, the percentage of writers is twenty-one times higher than in Israel)."

— Milan Kundera, The Book of Laughter and Forgetting, 1978

That's probably why the majority of "Puppet-related" books are so utterly useless (as in "do not contain information for solving your current problems")  and extremely boring to read. And again, as far as I know, few sysadmins are Ruby enthusiasts.  Most probably know some Perl or Python, but while Ruby is a Perl-derivative there is a big distance from programming Perl to programming in Ruby.

 

Dr. Nikolai Bezroukov


Top updates

Softpanorama Switchboard
Softpanorama Search



NEWS CONTENTS

Old News ;-)

[Feb 04, 2017] Quickly find differences between two directories

You will be surprised, but GNU diff use in Linux understands the situation when two arguments are directories and behaves accordingly
Feb 04, 2017 | www.cyberciti.biz

The diff command compare files line by line. It can also compare two directories:

# Compare two folders using diff ##
diff /etc /tmp/etc_old  
Rafal Matczak September 29, 2015, 7:36 am
§ Quickly find differences between two directories
And quicker:
 diff -y <(ls -l ${DIR1}) <(ls -l ${DIR2})  

Why You Need a Configuration Management Tool to Automate IT

There are a number of reasons why automated configuration management tools play a vital role in managing complex enterprise infrastructures. Here are four of the most popular reasons:

[Oct 08, 2014] CentOS 6.1, local VMs and Opscode Chef by chrisdag

Dec 29, 2011 | bioteam.net | 3 comments

Automating Internal Infrastructure Orchestration with Chef

BioTeam maintains it's internal company IT infrastructure across a distributed mix of servers hosted both “in the cloud” as well as within our own offices and colocation cages. We've long been using Opscode Chef to “orchestrate” our cloud systems and recently have found it invaluable for automatic configuration management of our own local servers and VMs.

This blog post is just a quick one-off article to highlight how well Chef plays with non-cloud systems including local virtual machines that BioTeam is running via Citrix XenServer. It was so easy to spin up a new VM (“staff.bioteam.net”) and then use a single Chef one-liner command to bootstrap the server to configure user accounts, install new software (denyhosts) and adjust the configuration of the /etc/sudoers file that I wanted to screencast and share the process.

First things first …

Thanks to Steve Danna for publishing a CentOS-6 bootstrapping template script. In the screencast below where you see me typing the “knife bootstrap …” command I'm directly invoking the bootstrapping script for CentOS 6 systems that Steve put on github.

Screencast Ahead

In the video recorded below we start with a CentOS 6.1 Linux system. The VM was created from a pre-existing barebones XenServer template and really just contains a minimal operating system and network stack with almost no installed software.

Normally in “Xen” land, I'd fire up the new VM from a template and then do manual sysadmin “stuff” to the server to make it do what it needed to do.

For this particular server (“staff.bioteam.net”) we really just needed a few things to start with:

And wouldn't you know … BioTeam ALREADY has Chef recipes to do all those things because we need them on just about every cloud server we create.

The screencast below simply shows how I can do all the tasks listed above via my personal Mac OS X laptop with a single call to the Opscode Chef CLI tool named ‘knife'. The exact command used was:

 $ knife bootstrap -d centos6-gems --ssh-user root \
 --run-list "recipe[users::sysadmins], recipe[sudo], recipe[denyhosts]" \
 staff.bioteam.net

It's literally that easy.

The video below is not edited for time in any way. It really does take less than 4 minutes to take a ‘barebones' CentOS system, install all the software dependencies, build and configure chef, download the cookbooks and runlist and then “process them”. The end result is 100% automated provisioning of a new server while I check Facebook in another browser window.

And for people new to Opscode Chef this is a great example of how powerful and flexible these “infrastructure orchestration” systems have become. The Chef client running on the new server is doing far more than just simple installs of software from remote repositories. Of course it's doing that but it's also installing personal individual SSH keys, editing the contents of the /etc/sudoers file and installing, configuring and starting a new network security service (denyhosts). Try doing that amount of “custom” server config work using a “golden image” or Kickstart type method!

Note: The text-heavy screencast may best be viewed directly on youtube.com, particularly in the “big” 720p HD version …

About the Author

Chris is an infrastructure geek specializing in the applied use of IT to enable and enhance scientific research in life science informatics environments.

[Oct 28, 2011] synctool

Written in Python
freshmeat.net

synctool is a cluster administration tool that keeps configuration files synchronized across all nodes in a cluster. Nodes may be part of a logical group or class, in which case they need a particular subset of configuration files. synctool can restart daemons when needed, if their relevant configuration files have been changed. synctool can also be used to do patch management or other system administrative tasks.

[Feb 03, 2011] SPAM

Implemented in Perl
freshmeat.net

SPAM is a tool that assists in the management of system configuration and compliance. SPAM tracks, reports on, and compares system configurations across AIX systems.

Enterprise configuration tools

[Aug 24, 2010] Etch freshmeat.net

Ruby based...

Etch is a tool for system configuration management. It manages the configuration files of the operating system and core applications. It is easy for a professional system administrator to start using, yet is scalable to large and complex environments.

pacha - Project Hosting on Google Code

Basically, any running program that uses a configuration file can use Pacha to safeguard the changes made. Easily revert from mistakes in configuration (since it is already versioned via Mercurial) and keep track o what changed at what time.

As long as you have Python, Mercurial and SSH installed, you are good to go!

[Aug 03, 2010] Puppet vs Chef BHUGA WOOGA!

I spent a while going over recipes, and comparing them to Puppet. For example, here's some code to manage sudo for Chef. The Chef code was written by Chef's authors; the Puppet code was written by myself. The Chef code is spread across 3 files.
# recipes/default.rb:
package "sudo" do
  action :upgrade
end
 
template "/etc/sudoers" do
  source "sudoers.erb"
  mode 0440
  owner "root"
  group "root"
  variables(
    :sudoers_groups => node[:authorization][:sudo][:groups], 
    :sudoers_users => node[:authorization][:sudo][:users]
  )
end
# attributes.rb:
authorization Mash.new unless attribute?("authorization")
 
authorization[:sudo] = Mash.new unless authorization.has_key?(:sudo)
 
unless authorization[:sudo].has_key?(:groups)
  authorization[:sudo][:groups] = Array.new 
end
 
unless authorization[:sudo].has_key?(:users)
  authorization[:sudo][:users] = Array.new
end
# metadata.rb:
maintainer        "Opscode, Inc."
maintainer_email  "cookbooks@opscode.com"
license           "Apache 2.0"
description       "Installs and configures sudo"
version           "0.7"
 
attribute "authorization",
  :display_name => "Authorization",
  :description => "Hash of Authorization attributes",
  :type => "hash"
 
attribute "authorization/sudoers",
  :display_name => "Authorization Sudoers",
  :description => "Hash of Authorization/Sudoers attributes",
  :type => "hash"
 
attribute "authorization/sudoers/users",
  :display_name => "Sudo Users",
  :description => "Users who are allowed sudo ALL",
  :type => "array",
  :default => ""
 
attribute "authorization/sudoers/groups",
  :display_name => "Sudo Groups",
  :description => "Groups who are allowed sudo ALL",
  :type => "array",
  :default => ""

Here's more or less the same thing for Puppet:

class sudo {

  package { ["sudo","audit-libs"]: ensure => latest }

  file { "/etc/sudoers":
    owner   => root,
    group   => root,
    mode    => 440,
    content => template("sudo/files/sudoers.erb"),
    require => Package["sudo"],
  }
}

Both Chef and Puppet then take this information and output it through an ERB template, which is an exercise for the reader, since it's basically the same for both.

There's a few things worth noting here. First of all, Puppet has zero metadata available. If you want to set sudo-able groups, you need to know those variable names ahead of time and set them to what you want. Both your template and whatever code sets your sudo-able groups must magically 'just know' this information. Since the Puppet DSL is not even Ruby, you have *zero* ability to perform any kind of metadata analysis on these attributes in order to make code more generic.

Chef gives you complete metadata about the variables it's using. This is powerful and indeed critical in my imagined use domains for Chef (keep reading). That metadata comes at a cost of a lot of boilerplate code, though. Chef comes with some rake tasks to generate some scaffolding. I'm always uncomfortable with scaffolding like this; I think this kind of code generation is a bad way to do metaprogramming.

Chef spreads this information across 3 files, named a particular way. Puppet has a similar scheme of magically named files, but it's basically just a folder structure, a file called init.pp, and templates/source files. For a fairly simple task, Chef requires you to know a folder structure and 3 file names, and which data goes in which files. This is congruent with the Ruby world's (perhaps specifically the rails/merb world's?) general practice of 'convention not configuration'. This is in addition to all of the 'you just have to know' parts of the Chef system which are taken from Merb, such as where models and controllers live, though you would not need to edit those save for pretty advanced cases.

Lastly, Chef provides you with an actual data structure that is fed to the sudoers template. Puppet simply uses available dynamically-scoped variables in its template files. This is *awful*, and a big loss for puppet. I administrate Zimbra servers, for example, which require extra content in sudoers. I cannot add this to the zimbra module unless the zimbra module were to be the one including the sudo module. There are solutions to this, of course, but this is a really, really simple use case and we're already shaving yaks. Chef's method is undeniably superior.

All 3 of these are part of the same core difference between the two: Puppet is an application, and Chef is a part of one.

Chef is a library to be used in a combined system of resource management in which the application itself is aware of the hardware it's using. This allows certain kinds of applications to exist on certain kinds of platforms (particularly EC2) that simply couldn't before--an application using this system can declare a database just as well as it can declare an integer. That's fundamentally powerful, awesome, amazing.

Puppet is an application which has an enormous built-in library of control methods for systems. The puppet package manager, for example, supports multiple kinds of *nix, Solaris, HPUX, and so forth. Chef cookbooks can certainly be written to do this, but I imagine by the time you supported everything puppet does I don't think Chef would get a smiley-face sticker for being tiny and pure with extra ruby sauce. Puppet's not a fundamental change, it's just a really nice workhorse.

I picked puppet for the project I'm working on now. It made sense for a lot of reasons. Probably first and foremost, there are 3 other sysadmins working with me, some split between this project and others. None of us are ruby programmers. We don't write rake tasks like we configure Apache, we don't want to explain to new hires the difference between a symbol or a variable, or where the default Merb configuration files, or 100 other ruby-isms. Meanwhile, most puppet config, silly folder structure aside, is not any harder to configure than something like Nagios. I think it would be a mistake for an IT shop with a lot of existing systems running various old-fashioned stateful applications like databases or LDAP to suddenly declare that sysadmins need to be Merb programmers.

Puppet's much deeper out-of-the-box support for a lot of systems provides the kind of right-now real improvements that a lot of IT shops and random contractors desperately need. System administration is depressingly rarely about being elegant or 'the best' and much more frequently about being repeatable and reliable. It's just the nature of the business--if the systems ran themselves, there would be no administrators. Having a bunch of non-programmers become not just programmers but programmers specializing in a tiny subset of the ruby world is a lot of yaks to shave for an organization. This is not some abstract jab at my colleagues: I am most certainly not a Merb programmer, and even if I were, I have too many database copies to make, SQL queries to run, mysterious performance problems to diagnose and deployments to make to give this kind of development the attention it requires. How many system administrators do you know that use the kind of TDD that Merb can provide for their bash scripts? What would make one think that's going to happen with Chef?

The other big reason I picked Puppet is that it's got a sizable mailing list, a friendly and frequently used google group for help, and remains in active development after a couple of years. I don't think Reductive Labs is going away, and if it did, there have been a lot of contributors to the code base over those 2 years.

It's worth noting, though, that the Chef guys come with an impressive set of resumes. It seems to be somehow tied in with Engine Yard (several presentations about Chef include Ezra Zygmuntowicz as a speaker). I worry, though, that they are working the typical valley business model, namely to explode about a year after launch. Chef was released about 8 months before I write this. The organization I am installing Puppet for does not have the Ruby talent base required to ensure that they can fix bugs as required in the long term if Opscode goes away, or if they get hired on to Engine Yard and they make Chef into the kind of competitive differentiation secret it could be.

Chef currently manages the EC2 version of Engine Yard, and that's just the kind of thing I cannot imagine using puppet for: interact with a giant ruby application to manage itself. If you have a lot of systems joining and leaving the resource pool as required, Chef's ability to add nodes dynamically is going to save you. The ability to define resources programmatically is very powerful--one could easily imagine reducing the number of web server threads if a system's CPU use goes over a certain threshold, for example. I would not try that in puppet! But note that this is an application built from scratch to expect such a command and control system to exist. If you're just managing a bunch of LAMP stacks and samba servers, this is more power than you need. One of the Opscode founders has some slides that talk about this kind of model.

And Chef is powerful for that model, sure, but is that even the model you want for your applications? Applications should not have to worry about the hardware they use. Making an application's own hardware use visible to itself encourages programmers to spend time thinking about issues they should be trying their hardest to ignore. A better model is App Engine's, where the system just scales forever without developer intervention. Even Azure's service configuration schema model is better, in which different application roles (web, proxy, etc) are described as resources and given a dynamic instance count, and transparently scalable data stores are available. The number of 'nodes' in the system is never an issue for either model.

Chef is what you'd use to build that auto-scaling backend. Engine Yard uses it for, well, Engine Yard--scalable rails hosting, transparently sold as a service to folks who can then just blissfully program in rails and never think about Chef. Very few organizations are making that infrastructure, and most of them that are are shaving really big yaks and need to stop and use one of the available clouds.

Meanwhile, a very many organizations are running 6 kinds of *nix to maintain tens of older applications built on the POSIX or LAMP paradigms, or hosting virtual machines running applications made who knows when. For these organizations, Puppet is probably the easiest thing that could work, and thus probably the best option.

I'm sure there are sysadmins out there who think I'm completely wrong, and that you just can't beat the elegance Chef provides. There are a lot of people better than me out there, and I'm sure they have a point. But in my experience, bad system administration happens when sysadmins try and do everything for themselves. For a given situation in system administration, it's highly unlikely a sysadmin can do a better job than an available tool. Puppet's sizable default library is what most organizations need, not the ability to write their own.

And all of the above aside, one thing is clear: there is little excuse for an organization with 3 or more *nix servers not to be using Puppet, Chef, cfengine, or *something*. I would argue that about 80% of the virtualization push is dodging some of the core questions of system administration, making systems movable to new resources indefinitely rather than making their configuration repeatable, but that's a topic for another post. Especially since nobody got this far on this one anyway.

Adam Jacob

Hi John! Thanks for being passionate about my favorite space - configuration management. You do great work, and I know your intent wasn't necessarily to sow discord - but I wanted to take a moment to comment on a few of your points that I think are either wrong or missing some important context.

1) Large installed base

Chef has somewhere in the neighborhood of ~1500 working installations. It's true that our early adopters are primarily large web players like Wikia, Fotopedia, and 37signals. We also have a growing number of people integrating Chef directly into their service offering - it's not just Engine Yard, it's RightScale and others.

2) Large developer base

According to Ohloh, 39 developers have contributed to Puppet in the last 12 months, and 71 over the projects entire history.

Chef has been open source for a year. We just had our 100th CLA (contributor license agreement, meaning they can contribute code). Over the course of the year, 52 different people have contributed to Chef, including significant functionality (for the record, 5 of them work for Opscode.) We're incredibly proud of the community of developers who have joined the project in the last year, and the huge amount of quality code they produce.

3) Dedicated Configuration Language

To each their own, man. :) My preference for writing configuration management in a 3GL was born out of frustration with doing the higher order systems integration tasks. By definition, internal DSLs aren't meant to do that - when they start being broadly applicable, they loose the benefits they gained from domain specificity. For me, the benefit of being able to leverage the full power of a 3GL dramatically outweigh the learning curve, and I think a side-by-side comparison of the two languages shows just how close you can get to never having to leave the comfort of your DSL most of the time.

4) Robust Architecture

Chef is built to scale horizontally like a web application. It's a service oriented architecture, built around REST and HTTP. Like cfengine, it pushes work to the edges, rather then centralizing it. There are large (multi-thousand node) chef deployments, and larger ones coming. Chef scales just fine.

5) Documentation

It's true, we've been focused pretty intently on refining Chef in tandem with our earlier adopters, and that focus has had an impact on the clarity of our documentation. Rest assured, we're working on it.

6) Language/Framework Neutral

I'm not sure where this comes from, other than we've had great adoption in the Ruby community. People deploy and manage every imaginable software stack with Chef - Java, Perl, Ruby, PHP - it's all being managed with Chef.

7) Multi-Platform

It's true that, at release a year ago, Chef didn't support many platforms. Since then, we've been growing that support steadily - all the platforms you list run Chef just fine, with the exception of AIX. We have native packages for Red Hat (community maintained by the always awesome Matthew Kent!) and Ubuntu that ship regularly at every release. As for the Chef Server only running on Ubuntu - that's just not true.

8) Doesn't re-invent the wheel

Again, to each their own. I think Chef's deterministic ordering, ease of integration, wider range of actions, directly re-usable cookbooks, and lots of other things make it quite innovative. I'm pleased to explain it to you over beer, on my dime. :)

9) Dependency Management

While I understand how you can think this would be true, it isn't. Chef does have dependency management, and a more robust notification system then Puppet. Each resource is declarative and idempotent. Within a recipe, resources are executed in the order they are written - meaning the way you write it is the way it runs. This is frequently the way puppet manifests are written as well. The difference being, there is no need to declare resource-level dependency relationships.

With Chef, you focus on recipe-level dependencies. “Apache should be working before I install Tomcat”. You can ensure that another recipe has been applied at any point, giving you great flexibility, along with a high degree of encapsulation.

One added benefit of the way Chef works is that the system behaves the exact same way, every time, given the same set of inputs. This greatly eases debugging of ordering issues, and results in a system that is, in my opinion, significantly easier to reason about at scale (thousands of resources under management).

10. Big Mindshare

There is a bit of survivor bias happening here. I meet people every day who are starting with, or switching to, Chef. You don't, because, well - you don't use Chef.

* Conclusion

Thanks for taking the time to write about Puppet and Chef - I know your heart is in the right place. Next time, come talk to us - we're pretty accessible guys, and I would be happy to provide feedback and education about how Chef works. I won't even try and convince you to switch. :)

Best regards,
Adam

[Aug 03, 2010] Puppet versus Chef 10 reasons why Puppet wins Bitfield Consulting

Puppet, Chef, cfengine, and Bcfg2 are all players in the configuration management space. If you're looking for Linux automation solutions, or server configuration management tools, the two technologies you're most likely to come across are Puppet and Opscode Chef. They are broadly similar in architecture and solve the same kinds of problems. Puppet, from Reductive Labs, has been around longer, and has a large user base. Chef, from Opscode, has learned some of the lessons from Puppet's development, and has a high-profile client: EngineYard.

You have an important choice to make: which system should you invest in? When you build an automated infrastructure, you will likely be working with it for some years. Once your infrastructure is already built, it's expensive to change technologies: Puppet and Chef deployments are often large-scale, sometimes covering thousands of servers.

Chef vs. Puppet is an ongoing debate, but here are 10 advantages I believe Puppet has over Chef today.

1. Larger installed base

Put simply, almost everyone is using Puppet rather than Chef. While Chef's web site lists only a handful of companies using it, Puppet's has over 80 organisations including Google, Red Hat, Siemens, lots of big businesses worldwide, and several major universities including Stanford and Harvard Law School.

This means Puppet is here to stay, and makes Puppet an easier sell. When people hear it's the same technology Google use, they figure it works. Chef deployments don't have that advantage (yet). Devops and sysadmins often look to their colleagues and counterparts in other companies for social proof.

2. Larger developer base

Puppet is so widely used that lots of people develop for it. Puppet has many contributors to its core source code, but it has also spawned a variety of support systems and third-party add-ons specifically for Puppet, including Foreman. Popular tools create their own ecosystems.

Chef's developer base is growing fast, but has some way to go to catch up to Puppet - and its developers are necessarily less experienced at working on it, as it is a much younger project.

3. Choice of configuration languages

The language which Puppet uses to configure servers is designed specifically for the task: it is a domain language optimised for the task of describing and linking resources such as users and files.

Chef uses an extension of the Ruby language. Ruby is a good general-purpose programming language, but it is not designed for configuration management - and learning Ruby is a lot harder than learning Puppet's language.

Some people think that Chef's lack of a special-purpose language is an advantage. “You get the power of Ruby for free,” they argue. Unfortunately, there are many things about Ruby which aren't so intuitive, especially for beginners, and there is a large and complex syntax that has to be mastered.

There is experimental support in Puppet for writing your manifests in a domain language embedded in Ruby just like Chef's. So perhaps it would be better to say that Puppet gives you the choice of using either its special-purpose language, or the general-purpose power of Ruby. I tend to agree with Chris Siebenmann that the problem with using general-purpose languages for configuration is that they sacrifice clarity for power, and it's not a good trade.

4. Longer commercial track record

Puppet has been in commercial use for many years, and has been continually refined and improved. It has been deployed into very large infrastructures (5,000+ machines) and the performance and scalability lessons learned from these projects have fed back into Puppet's development.

Chef is still at an early stage of development. It's not mature enough for enterprise deployment, in my view. It does not yet support as many operating systems as Puppet, so it may not even be an option in your environment. Chef deployments do exist on multiple platforms, though, so check availability for your OS.

5. Better documentation

Puppet has a large user-maintained wiki with hundreds of pages of documentation and comprehensive references for both the language and its resource types. In addition, it's actively discussed on several mailing lists and has a very popular IRC channel, so whatever your Puppet problem, it's easy to find the answer. (If you're getting started with Puppet, you might like to check out my Puppet tutorial here.)

Chef's developers have understandably concentrated on getting it working, rather than writing extensive documentation. While there are Chef tutorials, they're a little sketchy. There are bits and pieces scattered around, but it's hard to find the piece of information you need.

6. Wider range of use cases

You can use both Chef and Puppet as a deployment tool. The Chef documentation seems largely aimed at users deploying Ruby on Rails applications, particularly in cloud environments - EngineYard is its main user and that's what they do, and most of the tutorials have a similar focus. Chef's not limited to Rails, but it's fair to say it's a major use case.

In contrast, Puppet is not associated with any particular language or web framework. Its users manage Rails apps, but also PHP applications, Python and Django, Mac desktops, or AIX mainframes running Oracle.

To make it clear, this is not a technical advantage of Puppet, but rather that its community, documentation and usage have a broader base. Whatever you're trying to manage with Puppet, you're likely to find that someone else has done the same and can help you.

7. More platform support

Puppet supports multiple platforms. Whether it's running on OS X or on Solaris, Puppet knows the right package manager to use and the right commands to create resources. The Puppet server can run on any platform which supports Ruby, and it can run on relatively old and out-of-date OS and Ruby versions (an important consideration in many enterprise environments, which tend to be conservative about upgrading software).

Chef supports fewer platforms than Puppet, largely because it depends on recent versions of both Ruby and CouchDB. As with Puppet, though, the list of supported platforms is growing all the time. Puppet and Chef can both deploy all domains of your infrastructure, provided it's on the supported list.

8. Doesn't reinvent the wheel

Chef was strongly inspired by Puppet. It largely duplicates functionality which already existed in Puppet - but it doesn't yet have all the capabilities of Puppet. If you're already using Puppet, Chef doesn't really offer anything new which would make it worth switching.

Of course, Puppet itself reinvented a lot of functionality which was present in earlier generations of config management software, such as cfengine. What goes around comes around.

9. Explicit dependency management

Some resources depend on other resources - things need to be done in a certain order for them to work. Chef is like a shell script: things are done in the order they're written, and that's all. But since there's no way to explicitly say that one resource depends on another, the ordering of your resources in the code may be critical or it may not - there's no way for a reader to tell by looking at the recipe. Consequently, refactoring and moving code around can be dangerous - just changing the order of resources in a text file may stop things from working.

In Puppet, dependencies are always explicit, and you can reorder your resources freely in the code without affecting the order of application. A resource in Puppet can ‘listen' for changes to things it depends on: if the Apache config changes, that can automatically trigger an Apache restart. Conversely, resources can ‘notify' other resources that may be interested in them. (Chef can do this too, but you're not required to make these relationships explicit - and in my mind that's a bad thing, though some people disagree. Andrew Clay Shafer has written thoughtfully on this distinction: Puppet, Chef, Dependencies and Worldviews).

Chef fans counter that its behaviour is deterministic: the same changes will be applied in the same order, every time. Steve Traugott and Lance Brown argue for the importance of this property in a paper called Why Order Matters: Turing Equivalence in Automated Systems Administration.

10. Bigger mindshare

Though not a technical consideration, this is probably the most important. When you say ‘configuration management' to most people (at least people who know what you're talking about), the usual answer is ‘Puppet'. Puppet owns this space. I know there is a large and helpful community I can call on for help, and even books published on Puppet. Puppet is so widely adopted that virtually every problem you could encounter has already been found and solved by someone.

Conclusion

Currently ‘Chef vs. Puppet' is a rather unfair comparison. Many of the perceived disadvantages of Chef that I've mentioned above are largely due to the fact that Chef is very new. Technically, Puppet and Chef have similar capabilities, but Puppet has first mover advantage and has colonised most corners of the configuration management world. One day Chef may catch up, but my recommendation today is to go with Puppet.

Selected Comments

Julian Simpson:

Culture is an important reason as to why people gravitate to one tool or another. Chef will draw in Ruby developers because it's not declarative, and because it's easy.

My experience is that most developers don't do declarative systems. Everyday languages are imperative, and when you're a developer looking to get something deployed quickly, you're most likely to pick the tool that suits your world view.

Systems Administrators tend to use more declarative tools (make, etc.)

Developers and Systems Administrators also have a divergent set of incentives. Developers are generally rewarded for delivering systems quickly, and SA's are rewarded for stability. IMHO, Chef is a tool to roll out something quickly, and Puppet is the one to manage the full lifecycle. That's why I think Chef makes a good fit for cloud deployment because Vm instances have a short lifespan.

I think it's still anybody's game. The opportunity for Chef is that the developer community could build out an ecosystem very quickly.

vvuksan:

It seems to me that both system have quite a bit of support out there and it really comes down to what you as the sysadmin/developer prefer.

I would also agree with ripienaar's tweet about disagreeing with point 6. Configuration management systems are not really intended for deploying software but for making sure that systems conform to a certain policy ie. webserver policy etc.

Nick Anderson:

I'm a SA and have worked closely with developers for years. It never ceased to amaze me how differently we think. It does boil down to priorities, culture, and incentives as Julian mentioned. I have not used Chef but I saw quite the stir the last time I mentioned puppet Puppet Works Hard To Make Sure Nodes Are In Compliance.

I have used puppet both as a deployment tool and a configuration management tool. It really can do both just fine as a deployment is essentially a configuration change. But I have found it easier to use a tool like fabric when I need to perform “actions” on a group of machines, especially when those actions are many and very possibly one time. I have found it a bit daunting if you put too much into your configuration management tool as over time it becomes a lot to sift through, and when its time to remove a configuration you have to leave that part of the configuration there (the part that removes whatever it was).

Maybe I haven't looked around enough but I really want to see a puppet reporting tool. I know bcfg2 has a decent one. I want to be able to know the current stats of my nodes, who is in compliance, who isn't, when I last spoke with what node, last time nodex changed and what changed.

John Arundel:

It is hard to be objective - probably impossible. I'm sure I haven't been.

My background is that I've used Puppet for commercial sysadmin work for several years (basically since it came out), and it currently manages many infrastructures for many of my clients (I'm a freelancer). The biggest deployment I've worked on is probably 25-30 servers, and a comparable number of desktops. Maybe 6,000 lines of manifest code (not counting templates).

When Chef was first announced, I set aside time to build a Chef server and try it out, with a view to adopting it if it was superior to Puppet. I found it quite hard going (admittedly that was early days for Chef), and I didn't find sufficient advantages for Chef to migrate any of my clients to it. If a client asked for Chef specifically, I'd be quite happy to use it, but so far no-one has.

So based on what I know, I use Puppet and that's what I recommend to others. I'm very interested in hearing from anyone who knows different.

Anonymous

Readers, do you homework too and stop reading articles with the title ‘versus', the hallmark of propaganda. If you must read on, some specific points, with disclosure that I'm a Chef early adopter with previous Puppet exposure.

#1, #2, #5, #7, #10: puppet is more mature than Chef

All software starts with a small install base, fewer adherents, etc. That doesn't make it more suitable for your specific environment or taste in software development (configuration management is development too). The answer here is to try both systems yourself and compare them - something the author of this article seems to not have done yet. It's not just about the code, it's about the software used to deploy it, the way it authenticates, etc. These things should also influence your decision.

#9: Dependency management

“Chef has no support for specifying dependencies (ordering resources). Chef is like a shell script: things are done in the order they're written, and that's all.”

Chef's default behavior is to process resources in the order you write them. It has other dependency features just like Puppet does - see below.

“A resource in Puppet can ‘listen' for changes to things it depends on: if the Apache config changes, that can automatically trigger an Apache restart. Conversely, resources can ‘notify' other resources that may be interested in them.”

This has been possible in Chef for a long time. See this real world example: http://gist.github.com/276246

http://wiki.opscode.com/display/chef/Resources - See the ‘notifies' attribute in the Meta section.

#3 Dedicated configuration language

“Ruby is a good general-purpose programming language, but it is not designed for configuration management - and learning Ruby is a lot harder than learning Puppet's language.”

Sysadmins who can code can learn Ruby quickly, and there are plenty of resources on how to write Ruby. While most of the time you can stick to the Chef style of Ruby, you have access to the power of a mature programming language for free. If you think this language is easier, show why that would be the case for someone who already knows at least one programming language.

I see nothing inherent in Puppet's language that makes it better suited to configuration management. If you think there is, show some examples.

#6: Language/framework neutral

Straight up bullshit here. There is nothing in Chef specific to Ruby on Rails. All chef deployments I know of (including our own) are used for deploying entire stacks of software totally unrelated to Ruby or Rails, just like Puppet.

Conclusion: In the next installment, show more code examples and tell us why Chef didn't work for you where Puppet did. Try both software packages the day before you write the article, not 6 months before. Assume your readers write code and already know that adopting less mature software is more risky.

R.I.Pienaar:

I'd agree with almost everything above, this strikes me as mostly self promoting b/s written with the express intend on driving traffic to a blog. Especially given the spammy nature of its promotion.

As an aside, and I wouldn't want to distract from the fantasy here with actual facts, but Puppet is getting native Ruby base DSL some time soon and so will please both sides of that particular fence.

[Aug 03, 2010] Puppet Labs, Cfengine, and Chef by Opscode rPath

Configuration files contain complex information associated with a system's host environment, including settings for network, storage and other run-time resources. Application, OS and middleware configuration files typically need to be heavily modified to "contextualize" a system for its local host environment.

Today, rPath supports open source configuration tools such as Puppet, Cfengine and Opscode's Chef in two ways:

According to Sorofman: "rPath offers the most advanced capabilities available for provisioning and maintaining software systems across physical, virtual or cloud environments. Increasingly, advanced IT shops—including several rPath customers—are using tools like Puppet, Opscode's Chef and Cfengine to manage configuration settings. But they recognize that these tools are poorly suited to managing software systems, which is rPath's strength. It's a logical combination."

[Mar 6, 2010] Server configuration management track changes with subversion and be notified - VACS Blog

This is an interesting idea but not a real solution as /etc/ is a dynamic directory into which files are often installed as new packages are added. This is especially typical for Linux.
Tracking changes in a server configuration can be critical to understand problems, identify security breaches and repair a server. When several people are in charge of administering one or several servers, sharing the configuration changes is helpful to inform each other about these modifications. The article describes a simple organization that uses subversion and daily mail notifications in case of change.

The overall idea is to put the server configuration files stored in /etc directory under a version control system: subversion. The VCS is configured to send an email to the system administrators. The email contains the differences with a previous version. A cron script is executed every day to automatically commit the changes, thus triggering the email.

The best practice is of course that each system administrator commits their changes after they validated the new running configuration. If they do so, they are able to specify a comment which is helpful to understand what was done.

First, you should install subversion with its tools.

sudo apt-get install -y subversion subversion-tools

Mail notification

For the mail notification, you may use postfix, exim or sendmail. But to avoid to setup a complete mail system, you may just use a simple mail client. For this, you can use the combination of esmtp and procmail.

sudo apt-get install -y procmail esmtp

Create the subversion repository

The subversion repository will contain all the version and history of your /etc. It must be protected carefully because it contains sensitive information.

sudo mkdir /home/svn
sudo svnadmin create /home/svn/repos
sudo chmod 700 /home/svn
sudo chmod 700 /home/svn/repos

Now, setup the subversion repository to send an email for each commit. For this, copy or rename the post-commit.tmpl file and edit it to specify to whom you want the email to be sent:

sudo cp /home/svn/repos/hooks/post-commit.tmpl  \
          /home/svn/repos/hooks/post-commit

and change the last line to something like (with your email address)

/usr/share/subversion/hook-scripts/commit-email.pl \
 --from yoda+mercure@alliance.com \
 "$REPOS" "$REV" yoda@alliance.com

Initial import

To initialize the repository, we can use the svn import command:

sudo svn import -m 'Initial import of /etc' \
              /etc file:///home/svn/repos/etc

Subversion repository setup in /etc

Now the hard stuff is to turn /etc into a subversion environment without breaking the server. For this, we extract the subversion /etc repository somewhere and copy only the subversion files in /etc.

sudo mkdir /home/svn/last
sudo sh -c "cd /home/svn/last && svn co file:///home/svn/repos/etc"
sudo sh -c "cd /home/svn/last/etc && tar cf - `find . -name .svn` | (cd /etc && tar xvf -)"

At this step, everything is ready. You can go in /etc directory and use all the subversion commands. Example:

sudo svn log /etc/hosts

to see the changes in the hosts file.

Auto-commit and detection of changes

The goal now is to detect every day the changes that were made and send a mail with the changes to the supervisor. For this, you create a cron script that you put in /etc/cron.daily. The script will be executed every day at 6:25am. It will commit the changes that were made and send an email for the new files.

#!/bin/sh
SVN_ETC=/etc
HOST=`hostname`
# Commit those changes
cd $SVN_ETC && svn commit -m "Saving changes in /etc on $HOST"
# Email address to which changes are sent
EMAIL_TO="TO_EMAIL"
STATUS=`cd $SVN_ETC && svn status`
if test "T$STATUS" != "T"; then
  (echo "Subject: New files in /etc on $HOST";
   echo "To: $EMAIL_TO";
   echo "The following files are new and should be checked in:";
   echo "$STATUS") | sendmail -f'FROM_EMAIL' $EMAIL_TO
fi

In this script you will replace TO_EMAIL and FROM_EMAIL by real email addresses.

Complete setup script

To help setup and configure all this easily, I'm now using a script that configures everything. You can download it: mk-etc-repository. The usage of the script is really simple, you just need to specify the email address for the notification:

sudo sh mk-etc-repository 

[Sep 11, 2008] The LXF Guide 10 tips for lazy sysadmins Linux Format The website of the UK's best-selling Linux magazine

Roll out changes to multiple systems

The one-button install concept should extend to other aspects of your systems, for much the same reasons. Puppet enables you to manage your systems centrally - you change files or settings in the repository on the central Puppet server, and they're rolled out automatically to all your Puppet clients. You will still have to change things twice (once on a test machine to make sure what you're doing, then once in the central Puppet repository), but it'll save a lot of time and reduce mistakes. (Remember that it really is important to test - Puppet also makes it really fast to propagate an error across all your systems.)

... ... ...

Send commands to several PCs

Not everything that you want to do on all machines will work well with Puppet, - you might for example want to temporarily mount a particular disk on all machines. ClusterSSH is great for this - it enables you to log onto a number of machines at once, and issue the same command on all of them simultaneously. Usefully, you can also click on a particular machine's screen and issue a command just on that machine, in case one machine is misbehaving.

You can set up groups of machines, as well, so that you can log in immediately to all your servers, or all your desktops. Combine this with a root ssh key and ssh-agent, and save yourself both typing and time.

[Aug 25, 2008] pssh 1.4.0 by Brent N. Chun -

About: pssh provides parallel versions of the OpenSSH tools that are useful for controlling large numbers of machines simultaneously. It includes parallel versions of ssh, scp, and rsync, as well as a parallel kill command.

Changes: A 64-bit bug was fixed: select now uses None when there is no timeout rather than sys.maxint. EINTR is caught on select, read, and write calls. Longopts were fixed for pnuke, prsync, pscp, pslurp, and pssh. Missing environment variables options support was added.

[May 6, 2008] Project details for Silk Tree by Aleksandr O. Levchuk

Ruby script

Silk Tree propagate /etc/passwd and /etc/group files from a master to a list of hosts via SSH. Neither the sending nor the receiving end connect to each other as root. Instead there is a read-only sudo sub-component on the receiver's side that makes the final modifications in /etc. Many checks are made to ensure reliable authorization updates. ACLs are used to enforce a simple security policy. Differences between old and new versions are shown. Two small scripts are included for exporting LDAP users and groups.

Project details for schily

About: The "Schily" Tool Box is a set of tools written or managed by Jörg Schilling. It includes programs like: cdrecord, cdda2wav, readcd, mkisofs, smake, bsh, btcflash, calc, calltree, change, compare, count, devdump, hdump, isodebug, isodump, isoinfo, isovfy, label, mt, p, sccs, scgcheck, scpio, sdd, sfind, sformat, smake, sh, star, star_sym, suntar, gnutar, tartest, termcap, and ved.

Changes: The source for "copy" (an accurate sparse file enabled copy program) has beeen added. The source for the "mountcd" program from SchilliX has been added. The source for "udiff", a diff program with human readable output has been added. Star has been bumped to 1.5-final. bsh and sh now skip BASH time stamps from the .history file. smake adds MAKE_SHELL_FLAG/MAKE_SHELL_IFLAG macros.

[Apr 22, 2008] Project details for Multi Remote Tools

Apr 18, 2008 | freshmeat.net

MrTools is a suite of tools for managing large, distributed environments. It can be used to execute scripts on multiple remote hosts without prior installation, copy of a file or directory to multiple hosts as efficiently as possible in a relatively secure way, and collect a copy of a file or directory from multiple hosts.

Release focus:

Initial freshmeat announcement

Changes:

Hash tree cleanup in thread tracking code was improved in all tools in the suite. Mrtools Has now adopted version 3 of the GPL. A shell quoting issue in mrexec.pl was fixed. This fixed several known limitations, including the ability to use mrexec.pl with Perl scripts and awk if statements. This fix alone has redefined mrexec.pl's capabilities, making an already powerful tool even more powerful.

[Feb 8, 2008] Project details for Scmbug

Written in Perl
Feb 8, 2008 | freshmeat.net

Scmbug integrates software configuration management (SCM) with bug-tracking. It aims to solve the integration problem once and for all. It will glue any source code version control system (such as CVS/CVSNT, Subversion, and Git) with any bug tracking system (such as Bugzilla, Mantis, Request Tracker, Test Director).

[Feb 7, 2008] System Configuration Collector 1.8.7 (Stable) by siem

Feb 7, 2008 | freshmeat.net

About: System Configuration Collector (SCC) is yet another configuration collector. It consists of a client and a server part. The client collects configuration data in a structured snapshot, compares the new snapshot with the previous one, and adds differences to a logbook.

Then the snapshot and the logbook are converted to HTML for local inspection. Optionally, the data can be sent to a system running the server software. On the server, summaries of the data are generated, and search/compare operations on the snapshots and logbooks are available via a Web interface.

Changes: Some changes to support ServerOrientedLinux have been implemented. The determination of an active name has been corrected. This release avoids messages when the LVM directory is absent on a cluster node. Config files in /etc/rc.d have been added.

[Jan 24, 2008] Project details for cgipaf

The package also contain Solaris binary of chpasswd clone, which is extremely useful for mass changes of passwords in corporate environments which include Solaris and other Unixes that does not have chpasswd utility (HP-UX is another example in this category). Version 1.3.2 now includes Solaris binary of chpasswd which works on Solaris 9 and 10.
Jan 23, 2009 | freshmeat.net

cgipaf is a combination of three CGI programs.

All programs use PAM for user authentication. It is possible to run a script to update SAMBA passwords or NIS configuration when a password is changed. mailcfg.cgi creates a .procmailrc in the user's home directory. A user with too many invalid logins can be locked. The minimum and maximum UID can be set in the configuration file, so you can specify a range of UIDs that are allowed to use cgipaf.

[Jan 10, 2008] ProShield - Debian Linux security program

Written in shell. Looks very similar to Titan as simple configuration management tool with the security/hardening bent.
ProShield is a system administration program for Ubuntu/Debian Linux. It helps ensure your system is secure and up-to-date by checking many different aspects of your system. Regular use is recommended.

Whether you are a Linux novice or a system administrator with a dozen servers, ProShield is designed to be useable by all. ProShield's main goal is to help secure a newly installed box (computer), as well as maintain the security of an existing box on a maintenance basis. It's part security, part security administration.

The main features of ProShield are:

When the program is done analyzing your system, it displays an "advisory report", and then (if necessary), guides you through a series of interactive questions to help you solve any problems it found.

[Dec 30, 2007] Project details for ns4

ns4 is a configuration management tool which allows the automated backup of node configurations.

Commands are defined within a configuration file, and when they are executed, the output is sent to a series of FTP servers for archiving. As well as archiving configurations, it allows scripts to be run on nodes; this allows configurations to be applied en masse and allows conditional logic so different bits of scripts are run on different nodes.

[Oct 1, 2007] Several useful articles

[May 24, 2007] Cultured Perl Managing Linux configuration files by Teodor Zlatanov

The idea of storing files without full path is questionable: "In my configuration scheme, each configuration file is in a single directory or in one of its subdirectories. The configuration files are be named uniquely, and the directories denote machines or platforms rather than location."
More interesting variant of the same sceme was proposed with subvertion Tracking, auditing and managing your server configuration with Subversion in 10 minutes » The R Zone
Jun 10, 2004 | DeveloperWorks

The average developer spends more time navigating, learning, and debugging configuration files than you'd expect. But you can save that time -- and loads of energy and frustration -- with one of the tools you probably use every day: your CVS tree. Take these tips on backing up, distributing, and making portable your peskiest Linux™ (and UNIX®) config files.

Working with configuration files can be a bewildering part of using Linux and computers in general. No standards exist, though several have been proposed. For example, Samba and rsync use INI-style configurations; passwd is in a decades-old colon-separated format that doesn't allow colons in any field; sudo comes with a visudo program to keep people from entering wrong information in the sudoers file; Emacs uses Lisp for configuration files. And the list goes on...

Now, I'm not complaining about the variety of configuration files. I understand the historical and practical reasons for this Configuration Tower of Babel. Changing the Samba configuration format, for instance, would annoy thousands upon thousands of administrators. In another example, Emacs' internal language is Lisp, a powerful high-level language, so using anything else for Emacs configuration files would be ridiculous.

No, my point is the effect all this variety has on the Linux user: a large portion of a Linux user's computer time is spent learning, writing, and debugging configuration files. Thus, it is useful to have a system in which these configuration files (1) are backed up automatically, (2) are distributed automatically, and (3) work on multiple flavors of UNIX and distributions of Linux. This article explains how to achieve the first two goals, and gets you started on the road to achieving the third one.

The Plan

We'll use CVS to hold the configuration files. Feel free to use any other versioning system. Subversion is gaining popularity quickly. The FSF has GNU tla (GNU arch), another nice versioning system. The essential features you need are provided by all those and many others, including the non-free ones like Rational® ClearCase®.

In my configuration scheme, each configuration file is in a single directory or in one of its subdirectories. The configuration files are be named uniquely, and the directories denote machines or platforms rather than location. Thus, the file name maps uniquely to a location in the filesystem. For example, passwd will always be used for /etc/passwd, while cshrc will be used for /home/tzz/.cshrc for user tzz.

For a few programs I use daily, I'll show how I handle multiple platforms with the help of my configuration system and changing the configuration files themselves.

All the examples I show use the C shell to set environment variables. Modifying them to use GNU bash or something else should not be terribly difficult.

Setting up CVS

You probably already have CVS installed on your machine. If not, get it (see the Resources section) and install it. If you are using another versioning system, try to set up something similar to what I show below.

First of all, you need to create a CVS repository. I'll assume you have access to a machine that can be used as a CVS server through OpenSSH or Pserver CVS access (Pserver is the communication protocol for CVS; see Resources for more information). Then, you need to create a module called config, which I will use to hold the sample configuration files. Finally, you need to arrange a way to use your CVS repository remotely non-interactively, through OpenSSH, Pserver, or whatever is appropriate. This last point is highly dependent on your particular system administration skills, level of paranoia, and environment, so I can only point you to some information in the Resources. I will assume you have configured non-interactive (ssh-agent) logins through OpenSSH for the rest of this article.

Listing 1. Set up the CVS repository on a machine

# assume that /cvsroot is your repository's home
> setenv CVSROOT /cvsroot
# this will use $CVSROOT if no -d option is specified
> cvs init
# check that it worked
> ls /cvsroot
# you should see one directory called CVSROOT
CVSROOT

Now that the repository is set up, you can continue using it remotely (you can do the steps below on the CVS server, too -- just leave CVSROOT as in Listing 1).

Listing 2. Remotely add the config module to CVS

# user tzz, machine home.com, directory /cvsroot is the CVSROOT
> setenv CVSROOT tzz@home.com:/cvsroot
# use SSH as the transport
> setenv CVS_RSH ssh
# use a temporary directory for the module creation
> cd /tmp
> mkdir config
> cd config

# tzz is the "vendor name" and initial is the "release tag", they can
# be anything; the -m flag tells CVS not to ask us for a message

# if this fails due to SSH problems, see the Resources
> cvs import -m '' config tzz initial
No conflicts created by this import
# now let's do a test checkout
> cd ~
> rm -rf /tmp/config
> cvs co config
cvs checkout: Updating config
# check everything is correct
> ls config
CVS

Now you have a copy of the config CVS module checked out in your home directory; we'll use that as our starting point. I'll use my user name tzz and home directory /home/tzz in this article, but, of course, you should use your own user name and directory as appropriate.

Let's create a single file. The CVS options file, cvsrc, seems appropriate since we'll be using CVS a lot more.

Listing 3. Create and add the cvsrc file

> cd ~/config
> echo "cvs -z3" > cvsrc
> echo "update -P -d" >> cvsrc
> cvs add cvsrc
# you really don't need log messages here
> cvs commit -m ''
> ln -s ~/config/cvsrc ~/.cvsrc

From this point on, all your CVS options will live in ~/config/cvsrc, and you will update that file instead of ~/.cvsrc. The specific options you added tell CVS to retrieve directories when they don't exist, and to prune empty directories. This is usually what users want. For the remaining machines you want to set up this way, you need to check out the config module again and make the link again.

Listing 4. Check out the config module and make the cvsrc link

> cd ~
# set the following two for remote access
> setenv CVSROOT ...
> setenv CVS_RSH ...
# now check out "config" -- this will get all the files
> cvs checkout config
> cd ~/config
> ln -s ~/config/cvsrc ~/.cvsrc

You may also know that Linux allows for hard links in addition to the symbolic ones you just created. Because of the limitations of hard links, they are not suitable to this scheme. For instance, say you create a hard link, ~/.cvsrc, to ~/config/cvsrc and later you remove ~/config/cvsrc (there are many ways this could happen). The ~/.cvsrc file would still hold the old contents of what used to be ~/config/cvsrc. Now, you check out ~/config/cvsrc again. The ~/.cvsrc file, however, will not be updated. That's why symbolic links are better in this situation.

Let's say you change cvsrc to add one more option:

Listing 5. Modify and commit cvsrc

> cd ~/config
> echo "checkout -P" > cvsrc
> cvs commit -m ''

Now, to update ~/.cvsrc on every other machine you use, just do the following:

Listing 6: Modify and commit cvsrc


> cd ~/config
> cvs update

This is nice and easy. What's even nicer is that the CVS update shown above will update every file in ~/config, so all the files you keep under this CVS scheme will be up-to-date at once with one command. This is the essence of the configuration scheme shown here; the rest is just window dressing.

Note that once you've checked out a module, there's a directory in it called "CVS." The CVS directory has enough information about the CVS module that you can do update, commit, and other CVS operations without specifying the CVSROOT variable.

Automatic updates and commits

For automatic updates and commits, I have written a very simple Perl program, maintain.pl. The longest part of the program is the help text, so you can imagine it's not full of complex code. I will go through it regardless, but keep in mind that a shell script could do the same job if needed.

The only thing maintain.pl does not do is make the symbolic links. Since that has to be done just once, and on some systems you do not want the links wholesale, the complexity of the task compared to the simplicity of doing it manually was simply too much. I know because I wrote the symbolic link code and got rid of it later.

I had to write and maintain yet another configuration file that mapped out many filenames. There were many exceptions; for example, two Linux and Solaris systems I use have radically different setups. There were just too many things to worry about, and I found that manually installing the links was much easier. Of course, your experience may vary -- I encourage you to try to find the most appropriate approach for your own environment.

... ... ...

Conclusion


I hope you found this article interesting and useful. Take what you can from it -- I've spent years perfecting my setup, and it should serve you in good stead.

Convert to this scheme a little at a time, don't get overwhelmed. You can easily spend days rewriting your configurations -- so do it gradually and you'll enjoy the process.

The greatest benefit you'll see is the automatic update function. On any of your machines, you can commit a file and it will show up everywhere else the next time maintain.pl is run! Even if you disagree with the directory structure, think about the power of the automatic updates and how they can be useful to you.

The second benefit you get is configuration archiving. Every version of your configurations will be in the revision control system! If you make a mistake, you can go back to an earlier version. If you lose a whole machine to, say, disk failure -- you can recover all the time-consuming configuration files you wrote for it in minutes.

Don't be tempted to convert everything to this scheme. Convert just the things you want to keep or reuse. Binary files don't work well with CVS -- at the very least, you won't have the diff capability that CVS provides for text files. Also, CVS has trouble with renaming directories, although it's certainly possible if you also rename the directory in the repository.

Finally, keep good backups of your CVSROOT repository, wherever it is. I hope you never need them.

Resources

About the author
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since 1992, using Perl, Java™, C, and C++. His interests are in open source work, Perl, text parsing, three-tier client-server database architectures, and UNIX system administration. Suggestions and corrections are welcome; contact Ted at tzz@bu.edu

[May 23, 2007] freshmeat.net Project details for MID

The Machine Inventory Database (MID) is a Perl-based CGI interface to manage the machines on and off your network, both from the IP assignment perspective and the asset-tracking perspective. On top of acting as a frontend to a handful of MySQL tables, it handles IP assignment and acts as a frontend to the configuration files for BIND, YP, and DHCPD to reduce the chance for typos in the configuration files which tend to bring down service.

[May 22, 2007] Linux Distros with CVS-RCS for Config Files

Slashdot

Just do it (Score:1)
by choi (189590) on Monday July 19, @07:47PM (#9743061)

nothing prevents you from just installing cvs and importing/checking out your config directories. i think it's really not that much work to justify a distro on its own.

Do it yourself (Score:1) Matt Perry (793115) on Monday July 19, @07:51PM (#9743096)

Why not just do it yourself? I keep all of my config files in CVS on my Debian and RedHat boxes. It's pretty easy to set things up to do this.

Gentoo does this. (Score:4, Informative)
by djcapelis (587616) on Monday July 19, @07:54PM (#9743121)
(http://new.se.foml.inodetech.com/)

Gentoo offers several choices in managing the configuration files in /etc, one of these options is the dispatch-conf script which keeps all changes in RCS. This is mostly for updating... so it's not everything, but it's definitely a strong start and you could likely use the same system to keep track of your own modifications.

Nothing is stopping you from doing this. (Score:5, Informative)
by Feztaa (633745) on Monday July 19, @08:02PM (#9743198)
(http://rbpark.ath.cx/ | Last Journal: Wednesday June 30, @04:56AM)

Just go into your /etc/, do a 'mkdir RCS', and then start checking your config files in and out of RCS to edit them. There's no code anywhere in linux that says 'if there's a directory I don't recognize, then crash spectacularly', so just adding the RCS directory itself isn't going to adversely affect anything.

That's actually a really good idea, too, I'm not sure why I never thought of it myself...

works for my user accounts (Score:5, Interesting)
by x00101010x (631764) on Monday July 19, @08:04PM (#9743224)
(http://slashdot.org/~x00101010x/ | Last Journal: Monday February 16, @04:44AM)

I keep my entire home directory in a Subversion repository. Works great for linux and my windows boxes. Firefox and thunderbird user directories are compatible across platforms.

I just add 'svn up' to my login script and 'svn ci --message "%HOST%@%TIME%%DATE%"' to my logout script.
No reason it shouldn't work for a whole system with an initial 'svn up' somewhere in rc.local and periodic updates in a chron job. Just do a commit whenever you change things on your template system and 5 minutes later it'll be on all your boxen.

There was a slashdot article about putting a home directory under version control a few months ago from which I got the idea, too lazy to find the link at the moment though.

BitKeeper (Score:2)
by twoflower (24166) on Monday July 19, @08:36PM (#9743457) Larry McVoy designed BitKeeper with the specific aim of doing this. I believe they also offer special single-user free licenses for this; you may want to check the BitKeeper documentation to see if there are any Linux distributions who actually took him up on this. [ Reply to This ] Most distros have CVS installed, right?
by Neil Blender (555885) on Monday July 19, @08:41PM (#9743490) so:

[user@localhost]# su
password:
[root@localhost]# cd /
[root@localhost]# cvs import . -m 'my linux distro' mydistro username start

Yes, Gentoo... (Score:2, Informative)
by andrewdk (760436) on Monday July 19, @08:49PM (#9743568)
(http://turbogfx.homelinux.org/)

YEs, Gentoo can do this. Just emerge rcs, make an /etc/config-archive dir, setup /etc/dispatch-conf.conf, and just do dispatch-conf in place of etc-update.

An old idea for modern times... (Score:3, Insightful)
by Deagol (323173) on Monday July 19, @11:24PM (#9744742)
(http://slashdot.org/)

I think it was OpenVMS (fuzzy memories of a freshman computer class) that had version control built into the filesystem. I'm amazed that this hasn't been introduced into the more popular filesystem(s) yet. I've wished for it on many occasions.

Or am I just being impatient? Will Reiser4 provide this capability?

FreeBSD (Score:2, Interesting)
by Scythe0r (197724) on Tuesday July 20, @12:14AM (#9745206)

You should really check out a utility for FreeBSD called mergemaster. You run it after rebuilding/upgrading your system and it compares the latest "vanilla" system configuration files to what you've got.

You can choose to overwrite your file, keep your file or merge the two together. I like to think of it as the ultimate choice in system housekeeping.

System Restore (Score:2)
by yotaku (26455) on Tuesday July 20, @01:51AM (#9745748)
(http://yotaku.homeip.net/)

As many people have pointed out having versioning on the config of a system is hardly a new idea. If you think about what might happen if you try to make this idea simple and easy to use it might end up being something like System Restore for Windows, which stores versions before updates, and if you're smart you make a check point before installing any questionable software or drivers. And then allows you to roll back if something goes wrong and the uninstall doesn't fix it.

changetrack (Score:1)
by Christopher Cashell (2517) on Tuesday July 20, @04:19AM (#9746397)
(http://www.zyp.org/ | Last Journal: Saturday August 18, @01:39AM)

sudo apt-get install changetrack
For non-Debian users, download changetrack [sourceforge.net] from SourceForge.

changetrack uses RCS as it's backend, not CVS (support for CVS is on the Todo list), but the end result is the same. It is specifically intended for tracking system files like those in /etc.

dispatch-conf (Score:1)
by trickycamel (696375) on Tuesday July 20, @09:21AM (#9747791)

Gentoo does this for your files in /etc. Use dispatch-conf and forget about etc-update. You can set it to use RCS, so no more overwrites of your configs.

RCS and vim (Score:1)
by wolf31o2 (778801) on Tuesday July 20, @10:43AM (#9748872)
(http://www.gentoo.org/)

At work, we have a simple wrapper for vim that does all of the RCS stuff for us, like checking in and checking out files. We use it on all of our production servers, as it gives use nice revision control over our files.

#!/bin/bash

ORIGVI=rcsvi

case $1 in
-r[0-9]*) VERSION=$1; shift ;;
esac

[ $# -eq 1 ] || { echo usage: vi [-rrev] filename >&2; exit 1; }
DIR=`dirname $1`
FILE=`basename $1`

### let vi handle error conditions
cd $DIR || exec $ORIGVI $1
[ -d $FILE ] && exec $ORIGVI $FILE

### skip certain directories
{ [ -r $HOME/.rcsvirc ] && . $HOME/.rcsvirc; } ||
{ [ -r /etc/rcsvirc ] && . /etc/rcsvirc; } ||
EXCLUDE="/tmp | /tmp/* | /etc/skel | /etc/skel/* | /home | /home/* | /usr/home | /usr/home/*"
[ -n "$EXCLUDE" ] && eval "case $PWD in $EXCLUDE) exec $ORIGVI $FILE ;; esac"

### create RCS directory if not exist
[ -d RCS ] || { mkdir RCS || exit $?; }

### check $FILE for existence, break possible lock or exit, check in
[ -e $FILE ] && { [ -e RCS/$FILE,v ] && { rcs -l $FILE || exit 1; }
ci -q -l $FILE </dev/null; }
[ -n "$VERSION" ] && { co $VERSION $FILE; chmod u+w $FILE; }

### edit $FILE
$ORIGVI $FILE

### check in $FILE
ci -u $FILE

# EOF

cfengine (Score:2, Informative)
by bandix (184495) on Tuesday July 20, @06:36PM (#9754240)
(http://www.geekpunk.net/)

You'll spend years fooling around with RCS and CVS for configuration versioning before realizing that what you really need is cfengine. CVS or svn for source code, cfengine for configuration. Cut to the chase:

http://www.cfengine.org/

[May 21, 2007] freshmeat.net Project details for cvs2cl.pl

Perl script to generate GNU-style ChangeLogs for CVS

cvs2cl.pl generates GNU-style ChangeLogs for a CVS working copy. There are many options to control the output.

[Dec 3, 2006] UsingCfrubyTutorial - SciRuby

Cfruby allows managed system administration using Ruby by David Powers and PjotrPrins. It is both a library of Ruby functions for system administration and an Cfengine-like clone. Cfruby is current deployed on servers, clusters and workstations. See below for an introduction on both.

Cfruby can be downloaded from http://rubyforge.org/projects/cfruby/ as a gem. You can also access the SVN repository through the Rubyforge web interface.

It is important to understand that Cfruby is really two in one:

  1. Cfrubylib is a pure Ruby library with classes and methods for system administration. This includes file copying, finding, checksumming, package management, user management etc. etc.
  2. Cfenjin is a simple scripting language for system administration - allowing for scripting of configuration tasks (without knowledge of Ruby). Naturally Cfenjin uses Cfrubylib itself.

So, if you are looking for a Ruby API check out Cfrubylib. But if you are looking for a scripting language check out Cfenjin.

To confuse matters more: you can use Ruby mixed with Cfenjin style scripting - but that is for those who have a weird streak - also known as geekishness.

Cfrubylib

Cfrubylib is a Ruby library for system administration. It can do most of the common tasks like file tidying, editing etc. etc. Best to study the API and code in:

http://cfruby.rubyforge.org/cfrubylib/

and the source repository:

http://rubyforge.org/viewvc/lib/libcfruby/?root=cfruby

More written documentation can be found in the source repository:

http://rubyforge.org/viewvc/documentation/libcfruby/?root=cfruby

Why reinvent the wheel? And you'll find it gives a lot more power than most configuration tools. Cfrubylib includes cfyaml - a YAML configurator. And support for FreeBSD Portage, Linux Debian, Linux Gentoo and OS-X Fink packages. Adding support for your favourite package manager should be straightforward.

Cfenjin

Cfenjin is a GNU Cfengine clone written in Ruby. It does not offer a full replacement for Cfengine (for one we don't have a client/server protocol, though cfrubylib has some support for that itself) - but it is Ruby and consists of few lines of code using Cfrubylib.

Documentation has been written, bits and pieces, but for now it is probably the best idea to study the examples in:

http://rubyforge.org/viewvc/documentation/cfenjin/examples/?root=cfruby

after reading the tutorial below.

Enjoy!

[May 27, 2006] Tracking, auditing and managing your server configuration with Subversion in 10 minutes

The R Zone

I'm assuming that you have Subversion installed; in other words, you should have the svn and svnadmin commands and they should work properly. I'm also assuming that you'll be performing the following tasks as root

The ideal situation to begin applying this tutorial is right after your server has been freshly installed. However, for practical purposes, any server that's configured and running will do.

Okay. That's enough of the lists and introductions. Time for some action.

Creating the Subversion repository

If you're familiar with UNIX, you'll know /var is the customary directory for files that pertain to the whole system and are changed. So, following tradition, we'll create a /var/preserve/config repository. Type the following command at your console:

[rudd-o@amauta2 ~]# svnadmin create \ /var/preserve/config

(note the backslashes are being used to add whitespace)

That should create a /var/preserve/config directory, with a couple of files in it. Those files are not meant to be editing, and they'll be opaque to us for the rest of the tutorial. As usual, I'd advise you to secure that directory so only root can read and write files to it.

Now, you'll create two directories directly into the repository. You'll use these directories to travel back and forth between known configuration states.

To perform this task, just type:

[rudd-o@amauta2 ~]# svn create \ file:///var/preserve/config/trunk/ \ file:///var/preserve/config/tags/ \ -m 'Creating trunk and tags directories'

The -m argument specifies a message to attach to the operation. You can consult these messages afterwards through the svn log command.

Preparing the configuration directory

In true UNIX tradition, /etc is the place to go for system-wide configuration. For the rest of the tutorial, I'll assume those are the files you want to keep in check.

To track files in /etc, you need to both:

That's easily accomplished via the following command:

[rudd-o@amauta2 ~]# svn checkout \ file:///var/preserve/config/trunk/ /etc

Once you've done that, /etc will be a working copy. Time to add existing files into Subversion.

Checking existing configuration files into the repository

[rudd-o@amauta2 ~]# cd /etc [rudd-o@amauta2 /etc]# svn status

You should see a long listing of files, like this:

? 4Suite ? acpi ? adjtime

The question marks at the beginning of each line mean that Subversion has no idea what those files are doing there. So, you'll add them to the repository:

[rudd-o@amauta2 /etc]# svn add *

You'll see svn working intensely to add those files. Note that the files are not being added to the repository yet — they're only being queued for addition. To commit these files into the repository:

[rudd-o@amauta2 /etc]# svn commit \ -m 'Initial addition of files'

And svn should start doing its magic. Once it's done, it'll tell you the revision number.

Followup maintenance

Okay, let's review a few things you need to keep in mind from now on.

When configuration files are added to /etc

Check for added files with svn status /etc. You should see them listed with a question mark.

You should use svn add to add them to the working copy, and then svn commit the added files into the repository. Many people make the mistake of configuring freshly installed files. Do not do that. Instead, commit new files first, then edit. That way, you'll have a way to track modifications right back to the pristine configuration files.

When configuration files are deleted from /etc

Check for deleted files with svn status /etc. You should see them listed with an exclamation sign.

After doing the check, svn delete them. Don't forget to commit at the end.

[Oct 7, 2005] mValent ¦ Powerful Change Control

mValent Integrity tracks changes to deployed servers and monitors configuration drift alerting IT teams to potentially critical problems. By comparing application environments in mValent Integrity for differences in granular configuration items, IT teams rapidly isolate root causes of production incidents. These teams can then model fixes to problems to validate their impact and automatically deploy them.

[Jul 13, 2005] System Configuration Collector

Configuration Collector (SCC) is yet another configuration collector. It consists of a client and a server part. The client collects configuration data in a structured snapshot, compares the new snapshot with the previous one, and adds differences to a logbook. Then the snapshot and the logbook are converted to HTML for local inspection. Optionally, the data can be sent to a system running the server software. On the server, summaries of the data are generated, and search/compare operations on the snapshots and logbooks are available via a Web interface.

Changes: This release will not update the keep file when running in interactive mode. It ignores differences in the main log file when moving data to "split" hosts. Split conditions have been extended with a simple process check. A correction for Debian for large lines with many fields. Include files have been added for logrotate.conf. Includes for Apache have been corrected. Netscape Fasttrack server has been added.

Remote System Management Tool Overview

Remote Server Management Tool is an Eclipse plug-in that provides an integrated graphical user interface (GUI) environment and enables testers to manage multiple remote servers simultaneously.

alphaWorks

What is Remote System Management Tool?

Remote Server Management Tool is an Eclipse plug-in that provides an integrated graphical user interface (GUI) environment and enables testers to manage multiple remote servers simultaneously. The tool is designed as a management tool for those who would otherwise telnet to more than one server to manage the servers and who must look at different docs and man pages to find commands for different platforms in order to create or manage users and groups and to initiate and monitor processes. This tool handles these operations on remote servers by using a user-friendly GUI; in addition, it displays configuration of the test server (number of processors, RAM, etc.). The activities that can be managed by this tool on the remote and local server are divided as follows:

How does it work?

This Eclipse plug-in was written with the Standard Widget Toolkit (SWT). The tool has a perspective named Remote System Management; the perspective consists of test servers and a console view. The remote test servers are mounted in the Test Servers view for management of their resources (process, file system, and users or groups).

At the back end, this Eclipse plug-in uses the Software Test Automation Framework (STAF). STAF is an open-source framework that masks the operating system-specific details and provides common services and APIs in order to manage system resources. The APIs are provided for a majority of the languages. Along with the built-in services, STAF also supports external services. The Remote Server Management Tool comes with two STAF external services: one for user management and another for proving system details.

[Apr 18, 2005] Taking the Configuration Management Database to the Next Level The Federated Data Model - Computerworld by Doug Mueller...

Apr 18, 2005 | COMPUTERWORLD

With the growing interest in adopting best practices across IT departments, particularly according to standards such as the Information Technology Infrastructure Library (ITIL), many organizations are deciding to implement a configuration management database (CMDB). A CMDB should help them discover and manage the elements in their IT infrastructure so they can better understand the relationships among components and facilitate changes effectively. This is important because there is a significant business value in having a single "source of record" that provides a logical model of the IT infrastructure to identify, manage and verify all configuration items in the environment.

Having reliable data requires more than a database. It requires a well-conceived configuration management strategy; without knowing what's in your environment, you can't hope to control it, maintain it or improve it.

Since configuration items are at the heart of the CMDB, it's important to understand what they encompass. A configuration item is an instance of a physical, logical or conceptual entity that is part of your environment and has configurable attributes specific to that instance. Examples of configuration items would be a computer system (attributes could include a serial number or IP address) or even an employee (with configurable attributes such as hours worked and department number).

Getting Started: Developing the Right Strategy

Once you have determined that you may need a CMDB, how do you select the approach that's best for you? Everything begins with ITIL, the industry framework for IT service management. To get started on developing a configuration management strategy, set your objectives according to ITIL goals, which state that configuration management accounts for all the IT assets and configurations within the organization and its services. According to ITIL, the ideal CMDB should also provide accurate information on configurations and their documentation to support all the other service management processes. In addition, it must provide a sound basis for incident management, problem management, change management and release management. It must be able to verify the configuration records against the infrastructure and correct any exceptions. If you think that creating a CMDB is a major undertaking, you're right. But it can be done effectively if you follow the right approach for your organization.

Lessons Learned: The Evolution of the CMDB

The concept of a CMDB has evolved over the years from a collection of isolated data stores to integrated data stores to a single, central database. Each time, it gets closer to being the source of record for configuration data without taking a toll on the infrastructure. However, those who have tried these approaches find that they have serious drawbacks that make them difficult or impossible to scale. A better alternative is the federated data model. This approach features a centralized database linked to other data stores with a common data model that carries information from one point to another, without the need to rewrite code. I will describe this model in more detail after providing an overview of how it evolved.

The predecessors to CMDBs, popular in the 1990s, consisted of several applications that stored their own data, including configuration data. This approach could meet ITIL's goal of accounting for IT assets and services, but because the data wasn't integrated, the approach fell short of other objectives, such as understanding dependencies and relationships among configuration items. With isolated data stores, your asset management application may not see data from a discovery application, and your service-impact management application may not be able to modify service-level agreements.

IT organizations also tried to create CMDBs by directly integrating their various data sources and applications, connecting each data consumer to each provider from which it needed data. This approach allowed different configuration management processes to share data, greatly improving the CMDB's usefulness as a means to integrate applications and IT processes they support. But it required a lot of resources to create and maintain what tend to be brittle, hard-coded connections between systems.

Recently, vendors have been offering a single, all-encompassing CMDB to hold configuration data that's accessible by all applications that need the data. But an all-encompassing database isn't feasible in a large, distributed organization. It creates an access bottleneck because all requests for and updates to data pass through the same path. It also requires a massive migration to get all of your data into the single database, creating a complicated data model that must change if any application integrated with the CMDB changes.

Putting It All Together With a Federated Data Model

The most effective approach is the federated data model. It's the best way to share configuration data without the high setup and maintenance costs associated with the pure centralized approach. It puts primary and widely shared configuration-item data in a common data store and federates other noncritical attribute data from other application databases. According to a recent Gartner Inc. study ("Defining a Configuration Management Database," by P. Adams and R. Colville, November 2004), "A practical approach for a successful implementation of a configuration management database will require a federated data model with a consistent view that receives at least some data from element-specific tools (for example, desktop configuration management, server configuration management, network management and storage management)."

This federated approach to a CMDB offers a single, common set of information on each configuration item and its relationships with other configuration items in a manner that can be leveraged by all relevant IT processes -- creating cost-saving synergy among different service management functions. A federated data model enables you to fully integrate critical service and infrastructure management applications and break down the traditional functional silos that often exist within an IT organization, all of which streamlines delivery of IT services.

Important Benefits of a Federated Approach

What should this federated model look like?

This model refines ITIL's idea of a CMDB by breaking up the CMDB and its infrastructure into three layers. These are the CMDB itself; related data linked to or from the CMDB, called the CMDB Extended Data; and applications that interact with these two layers, called the CMDB Environment.

The CMDB and CMDB Extended Data layers together contain the information ITIL suggests be stored in a CMDB. Separating this information into two layers is what distinguishes the federated CMDB approach from other, less-successful CMDB approaches. The CMDB holds only configuration items and their relationships. However, not all available configuration-item attributes must be stored in the CMDB. In fact, to keep the CMDB scalable and manageable, you should store only the key attributes here and link to the less-important ones in the CMDB Extended Data.

The CMDB Extended Data layer holds related data, such as help desk tickets, change events, contracts, service-level agreements, a definitive software library and much more. Although these things aren't configuration items, they contain information about your configuration items and form an important part of your IT infrastructure. In addition, the CMDB Extended Data layer holds any configuration-item attributes judged as unnecessary to be stored in the CMDB.

The data in the CMDB Extended Data layer is linked to the configuration item data in the CMDB. By definition, federated configuration-item attributes are linked from their instances in the CMDB, allowing requests to the CMDB to reach these attributes. But for other types of extended data, the link can be in either or both directions. For example, a change-request record could have a link through which you can access the instances of the configuration items it will change, and each configuration-item instance could have a link through which you can access the change requests that affect it.

To pursue ITIL's goals for configuration management, you should consider the advantages of a federated data model and what it can do for you.

Doug Mueller is the chief technology officer at the Service Management business unit of BMC Software Inc. and a co-founder of Remedy Corp., now a part of BMC.

Doug Mueller is the chief technology officer for the Service Management Business Unit of BMC Software and a co-founder of Remedy, now a part of BMC.

[Jan 21, 2005] Enterprise Systems Management BMC Debuts Configuration-Management Database

InformationWeek

The software is designed to help businesses unify service- and infrastructure-management tools to promote database management consistency and simplified integration among processes.

By Darrell Dunn, InformationWeek
Jan. 21, 2005
URL: http://www.informationweek.com/story/showArticle.jhtml?articleID=57702869

BMC Software on Monday will announce the availability of its Atrium Configuration Management Database (CMDB), intended to help customers unify their service and infrastructure management.

Based on industry-standard IT Infrastructure Library requirements for enterprisewide database management with consistency and simplified integration among different management processes, the CMDB is also the first offering by BMC to be branded under the Atrium name, says Andrej Vlahcevic, senior product marketing manager for change and configuration management at BMC.

Over the course of the year, BMC plans to introduce other management products under the Atrium brand. "A lot of people see a CMDB as a common set of information that captures data on the configuration and relationship of items in your IT environment," Vlahcevic says. "We believe it has to be more." The Atrium database was designed to integrate both service and infrastructure-management applications, he says, as well as complement the company's existing line of discovery tools.

The Atrium CMDB includes a reconciliation engine that lets users combine input from multiple data sources and identify and reconcile any differences to establish a configuration profile. "If you don't have strong reconciliation, the CMDB will end up with repetitive data that ultimately will create confusion," Vlahcevic says.

The Atrium CMDB was designed with industry standards in mind, he says, including those endorsed by the Distributed Management Task Force and the Common Information Model. The platform supports all primary IT Infrastructure Library configuration item classes and more than 80 potential relationship types that can be leveraged to characterize an IT environment.

The Atrium CMDB is integrated with eight existing BMC applications, including the IT Discovery Suite, Service Impact Manager version 5.0, and Remedy IT Service Management Suite version 6.0. It's available now and can be purchased as part of any BMC Remedy IT Service Management version 6.0 products and the Service Impact Manager version 5.0.

PIKT - system monitoring, configuration management software

PIKT's initial release was in 1998. Written in C.

PIKT® is a registered trademark of the University of Chicago. Copyright © 1998-2005 Robert Osterlund. All rights reserved.

PIKTis cross-categorical, multi-purpose software to monitor and configure computer systems, report and fix problems, manage system security, arrange job scheduling, format documents, install files, assist command-line work, and perform many other common systems administration tasks. PIKT is used primarily for system monitoring, and secondarily for configuration management, but its flexibility and extendibility evoke many other uses limited only by your imagination. One reviewer said of PIKT, "this is by far one of the most interesting/powerful tools I have seen for Linux administration." Another wrote that PIKT "excels at handling a diverse collection of machines, saves time and eliminates repetition, and gives you a global view of your site." PIKT has been compared favorably to commercial software costing hundreds of thousands of dollars. Yet PIKT costs you nothing! Who uses PIKT? The answer might surprise you. To learn more, read the Introduction pages. For example uses and configurations, visit the Samples pages.

What is PIKT

PIKT is Open Source software distributed under the GNU GPL.

What is PIKT not?


Why the name "PIKT"?
PIKT is like a military picket, "a group of soldiers or a single soldier stationed, usually at an outpost, to guard a body of troops from surprise attack" (Webster's New World College Dictionary). A pickets' primary mission is to warn of the enemy's advance, but to fight if necessary. Similarly, PIKT's primary task is to warn of problems, but to fix those problems when needed.

How do you pronounce "PIKT"?
"PIKT" rhymes with "ticket".

Kickstart, APT and RGANG usage note for farm administration Mirko Corosu INFN Genova, Alex Barchiesi, Marco Serra INFN Roma

Contents

1 Introduction

This document is a basic introduction about few useful tools for a sysadmin that wants to install OS, to perform simultaneous operations on multiple machines via ssh and to upgrade a machine already installed using an automatic (or manual) procedure. For more detailed information please refer to the bibliography added in the following paragraphs.
IMPORTANT: this document is based on our experience with a farm running Scientific Linux CERN 3.0.4 and should not be considered a general guide, i2ady described in another document:

how it is possible to setup a kickstart installation server. Here we will add only few notes about the customization of the kickstart file, providing an example:

that must be changed accordingly with a specific site configuration. This example was written with the idea to install a Scientific Linux CERN OS, from which we removed few packages (or turned off few services) not strictly needed for machines not located at CERN. To find all the possible options for a kickstart file please refer to:

2.1 Add/Remove groups or single package

In our kickstart file example it is shown how it is possible to add (or remove) different groups of packages, for example:

@ Text-based Internet

add the packages: mutt, fetchmail and elink.

It is possible to use a graphical tool redhat-config-packages to show the full list of package in a group like Text-based Internet.
To add/remove a single rpm it is possible to use a single line like:

-phone

to exclude the installation of the phone rpm. Vice-versa to add a rpm it is possible to use:

+<package name>

for example if you want to install wget it is sufficient to add:

+wget

2.2 Start/Stop services

In our example of kickstart file there are few services explicitly started or stopped using chkconfig.
The pcmcia service is turned off:

chkconfig pcmcia off

vice-versa ntpd is turned on:

chkconfig ntpd on

to have time synchronization.

2.3 Post-install examples

In the kickstart file it is possible to include operations to be performed after the OS installation, at the first reboot. In our kickstart file few example are present as a reference, in the section %post. We will comment about them in the APT section.
As an example if you want to configure the INFN AFS cell add the following lines in the post-install section of the kickstart file:

mv /usr/vice/etc/ThisCell /usr/vice/etc/ThisCell.orig
cat >> /usr/vice/etc/ThisCell <<EOF
infn.it
EOF

3 APT in the Scientific Linux CERN

The CERN Scientific Linux distribution uses the APT tool as package manager. You can find more detailed information about APT here:

In this distribution by default APT comes with CERN configuration to use the CERN RPM repository. Details of this configuration, with some explanation about apt commands, are available here:

3.1 Local RPM repository for APT

In our kickstart file example we included a post-install section to re-configure APT in order to use a local RPM repository (see also http://grid-it.cnaf.infn.it/fileadmin/sysadm/akserver/akserver.html).
You can change the APT sources.list.d configuration via post-install:

mv /etc/apt/sources.list.d/dag.list /etc/apt/sources.list.d/dag.list.orig
cat >> /etc/apt/sources.list.d/local.list <<EOF
# Your local repository
rpm http://<YOUR_KICKSTART_SERVER> rep/slc304-i386 os updates extras localrpms
EOF 

where <YOUR_KICKSTART_SERVER> is your RPM server configured for APT usage. Our re-configuration will add a local repository (localrpms) that could be used to customize your OS including for example ``private'' RPMs (example: ssh configuration, tools, ....).

3.2 Update/upgrade

Follows few examples of how to run APT manually from a node you want to upgrade.
To check the available updated packages run:

apt-get update
 

To perform necessary dependency resolution, download packages and install them run:

apt-get upgrade

Alternatively you can configure APT to automatically update your machines using the apt-autoupdate too. It is possible to run it by hand:

apt-autoupdate

or to configure it as a service:

chkconfig --add apt-autoupdate

3.3 Kernel

Please notice that kernel upgrade is not included in the previous section commands and you have to force it in this way:

apt-get upgrade-kernel

3.4 Pin preference for local repository

If in your installation you need to give preferences to some RPMs it is possible to use the APT ``pin'' feature, for details refer to:

In our kickstart example we included the APT preferences modification to give higher priority to all the RPMs in the localrpms section of the repository.

mv /etc/apt/preferences /etc/apt/preferences.orig
cat >> /etc/apt/preferences <<EOF
# Maximum priority to local rpms
Package: *
Pin: release c=localrpms
Pin-Priority: 1001
EOF

For example in the CERN-SL, pine is installed via a CERN customized rpm. If you will put a ``plain'' pine rpm in localrpms repository - after apt-autoupdate will run for the first time - this one will replace the previous one.

Also if will be available a higher version of ``CERN'' pine in the CERN-SL apt-autoupdate will preserve the ``localrpms'' one.

It is also possible to use a pin mechanism for a single rpm instead of a directory, for example for sylpheed package including in the APT preferences:

Package: sylpheed
Pin: version 0.4.99*

4. Introduction to RGANG

Nearly every system administrator tasked with operating a cluster of Unix machines will eventually find or write a tool which will execute the same command on all of the nodes.
At Fermilab has been created a tool called "rgang", written by Marc Mengel, Kurt Ruthmansdorfer, Jon Bakken (who added "copy mode") and Ron Rechenmacher (who included the parallel mode and "tree structure").

The tools was repackaged in an rpm and it is available here:

It relies on files in /etc/rgang.d/farmlets/ which define sets of nodes in the cluster.

For example, "all" (/etc/rgang.d/farmlets/all) lists all farm nodes, "t2_wn" lists all your t2_wn nodes, and so forth.
The administrator issues a command to a group of nodes using this syntax:

rgang farmlet_name command arg1 arg2 ... argn

On each node in the file farmlet_name, rgang executes the given command via ssh, displaying the result delimited by a node-specific header.
"rgang" is implemented in Python and works forking separate ssh children which execute in parallel. After successfully waiting on returns from each child or after timing out it displays the output as the OR of all exit status values of the commands executed on each node.
To allow scaling to kiloclusters it can utilize a tree-structure, via an "nway" switch. When so invoked, rgang uses ssh to spawn copies of itself on multiple nodes. These copies in turn spawn additional copies.

4.1 Required Hardware and Software

Users will need to have python (tested on Python 1.5.2 and 2.3.4) installed too. It is also supplied a "frozen" version of rgang that does not need any additional package and can be found in /usr/lib/rgang/bin/.

4.2 Product Installation

Install the rpm and that's it.

rpm -iv rgang.rpm

It has been created a "pre-script" (/usr/bin/rgang ) that sets the appropriate environmental variables and then execs the python script or "frozen" version. You have to change the name of the executable depending on the one you are planning to use. In the python case:

#!/bin/sh
pathToRgang=/usr/lib/rgang/bin
rgOpts="--rsh=ssh --rcp=scp"
# this has to be uncommented if you have a Python version over 2.3
#pyOpts="-W ignore::FutureWarning" 
exec python $pathToRgang/rgang.py $rgOpts "$@"

if you need to use the frozen version modify the pre-script 
  as follows: 
#!/bin/sh
pathToRgang=/usr/lib/rgang/bin
rgOpts="--rsh=ssh --rcp=scp"
# this has to be uncommented if you have a Python version over 2.3
#pyOpts="-W ignore::FutureWarning" 
exec $pathToRgang/rgang $rgOpts "$@"

4.3 Running the Software

In the following lines it's shown by examples the typical usage of 'rgang' refer to the documentation or usage/help from 'rgang -h' for the whole of the options.

5 Troubleshooting

6 Appendix

6.1 Setting the RSA keys

It could be useful to distribute the RSA-key from your mother-node to your target-nodes so that you can use ssh-agent for authentication.
To create a key on your mother-node:

ssh-keygen -t dsa

then to copy the public key to the target-nodes in interactive mode (``-pty''):

rgang --pty -c <nodes-spec> /root/.ssh/id_dsa.pub /root/.ssh/authorized_keys

then on your mother-node:

ssh-agent <your shell>
ssh-add

and type the pass-phrase you choose when created the key, then use 'rgang' as usual (no interactive option).

Freshmeat admin script selection:

[Mar 25, 2004] Interview with Siem Korteweg System Configuration Collector By Benjamin D. Thomas

3/25/2004

In this interview we learn how the System Configuration Collector (SCC) project began, how the software works, why Siem chose to make it open source, and information on future developments.

Introduction:

Have you ever noticed changes on your departmental server, but couldn't quite pinpoint what exactly happened? How many times have staff forgotten to make an entry in the log-book, or the entries made were not detailed enough? Administrators are faced with these problems on a day-by-day basis. The System Configuration Collector (SCC) project attempts to automate this process. Rather than depending on staff to keep accurate records, SCC enables a system to record all changes taking place. Additionally, the software has the functionality to send all configuration data to a central server so that it can be analyzed when needed.

System Configuration Collector Project Website: http://www.open-challenge.nl/scc/index.html

LinuxSecurity.com: Please tell us about the SCC project and how it began. When did it start, and who are some of the key contributors?

Siem Korteweg: In 2001 a younger colleague asked whether it was possible to automatically track the changes that were made to the configuration of a system. I told him that was impossible due to variable nature of the output of the commands we have to use to show the configuration of a system. Being a much younger colleague he accepted this answer. But I did not like to say it was "impossible" and it kept nagging me.

I thought that when I could split the variable and fixed parts of the output of system commands, I would be able to track changes. I started a small, hobby project by collecting configuration data and preceding each line with "fix:" or "var:". After some time I was able to detect some changes made to configuration. But when a kernel parameter was changed, all I saw was a change from 128 to 256. I had to search in the snapshot to find out what part of the configuration had changed. Therefore I extended the fix-var classification with a hierarchy of keywords indicating the nature of the data.

The development continued and the customer where I was developing the software, was wondering how to maintain this software without hiring me indefinitely. By that time I realized that this software also could/should be used by others. I talked to the manager of the customer and to the manager of the company I am working for and suggested to make SCC a GPL project. They both agreed and from then on, SCC was an Open Source project. To extend the collection of configuration data I looked at the code of cfg2html and check.sh (HP specific) and the FAQ's of several newsgroups. At the customer site where I started developing SCC, we deployed the software on some 300 systems. This gave us a great opportunity to tune the "fixed" and "variable" parts of the configuration to avoid unnecessary changes.

The first versions of the software collected configuration data and converted the data and logbook to HTML on a per system basis. At the customer site, Bram Lous started to collect all snapshots and logbooks on a server and built the first version of the CGI-interface. Later on, Paul te Vaanholt contributed much for the HP OpenView modules. His main contribution is the analysis and conversion to SCC-format of the Operations Center database. A colleague Oscar Meijer wrote the Windows version of the SCC-client, based on WMI and WSH. The configuration of the data we are collecting on Windows systems still needs to be tuned. The software itself is stable, but it detects too many changes. The whole process of tuning what data is "fixed" and what data is "variable" takes quiet some time.

LinuxSecurity.com: What is the most important benefit an administrator can get out of SCC? How can this improve the overall security of a network or host?

Siem Korteweg: Each administrator should document his/her systems. We all know that, but we all lack time to do this properly. SCC automates the documentation process. For HP-UX systems SCC collects more than 95% of the configuration of the system is covered by SCC. For other system the percentage is somewhat lower at the moment.

The logbooks and snapshots can assist administrators in finding the cause of an incident. Configuration changes can have unwanted side-effects (on other systems). By examining the logbooks for the changes during the last days/weeks an administrator might find the cause of an incident easier/faster. Another way of using the SCC-data to find the cause of an incident is to compare (parts of) the configuration of a system with a comparable system that does function correctly.

Comparing the configuration of systems can also be used to assure that the systems in a cluster are consistent and identical. Do they run the same (versions of) software? Do they have the same kernel-configuration? It is also possible to check your security policies. Just check the snapshots on the server for the aspects of the policies. By default the server checks and signals accounts without a password.

Another use of the SCC-data on the server is to quickly identify systems. After an advisory from Sun, I was able to identify within one minute the 100 systems that needed to be addressed out of a total of 600 systems. Because the selection was automated and because the collection of SCC-data was accurate and outdate, I did not miss a system. This obviously contributes to the safety of the network.

LinuxSecurity.com: How difficult is it to get started? How long would it take for an administrator to get the system fully setup? Can you describe at a high level the steps necessary to setup SCC?

Siem Korteweg: The easiest way to start and get the feeling of the software is to install only the client part and keep the data and logbook on the client. Just create a simple cron-job after the installation of the client and you are finished. This way you are able to pilot the software before you deploy it more widely.

The setup of the server takes some more steps. First you have to decide how to transport the SCC-data from the clients to the server. Supported mechanisms are email (optionally encrypted, using OpenSSL), scp, rcp and cp. Then setup the webserver to display the data. To achieve this, you have to indicate the path under the document-root and indicate the CGI-script of SCC. Then schedule a cron-job to transfer the SCC-data that is sent by the clients from the transfer-area to the website Finally all cronjobs of the clients have to be extended with the proper options to transfer the SCC-data to the scc-server.

For several systems I recorded the entire process of configuring the server in logbooks. These logbooks are present at the website. For our HP-UX 11.i system: http://www.open-challenge.nl/scc/scc-web-demo/scc.hpux11i.log.html

LinuxSecurity.com: What improvement would you like to make in the future? What direction is this project heading?

Siem Korteweg: When running SCC on a system that uses clustering software, like MC ServiceGuard from HP, switching a "package" from one system to another, results in changes of the SCC-data for both systems involved in switching. We want to make the software cluster-aware by extracting the configuration data for each package and sending it separately to the scc-server.

Another future extension is the collection of the configuration of network devices like routers and switches.

LinuxSecurity.com: What advantage does SCC have over using a typical pen & paper log book for recording system changes?

Siem Korteweg: It is automated, so it does not "forget" to record a change (supposing the changed attribute is part of the SCC-snapshot). It is not lazy (once you run it through cron). - The pen & paper logbook is a physical item that can only be at one place. Each admin of a group of systems can be at a different place, without access to the paper logbook. Suppose 7x24 systems, where the admins "follow the sun". - By consolidating all snapshots on a system with scc-srv, you obtain much data that can be searched automatically. This enables you to quickly identify the systems that need an update or to compare two systems when one of them does not function correctly. This is impossible with pen & paper.

LinuxSecurity.com: What operating systems does SCC run on? What type of license is it under?

Siem Korteweg: HP-UX, Solaris, AIX, Linux (RedHat, Suse, Gentoo). As the code of SCC only uses "standard" Unix tools, I think it runs on almost all Unix/Linux systems. The coverage of the configuration data depends on the OS. For example the coverage of HP-UX configuration is more than 90%. For other systems this will be less. The license is GPL.

LinuxSecurity.com: If an administrator needs assistance setting up or configuring SCC is support available? If so, how can support be obtained?

Siem Korteweg: Besides the documentation on our website, SCC comes with documentation and manual pages. We offer an implementation service, where a consultant visits a customer and installs the server and at most 5 clients and introduces the software to the admins of the customer. This is only feasible in the Netherlands. Otherwise, support via email is possible. When the requested support is more than a few simple questions, we have to agree upon payment.

LinuxSecurity.com: How does SCC differ from other similar configuration collectors? What are some of the strengths and weaknesses of SCC?

Siem Korteweg: SCC collects configuration data without formatting it immediately to HTML. Instead it prefixes each line of configuration data with fix/var and a hierarchical classification. This makes it easy to process the snapshots. The processing consists of comparing consecutive snapshots to generate the logbook, formatting the snapshot to HTML and comparing the snapshots of two systems to determine the differences.

The philosophy of SCC is to collect data, not to judge its value or correctness. Stupid configuration errors in Apache/Samba are not detected in scc, this should be done at the server where all snapshots are collected. Some might question the value of all the data in the snapshots. It is true that a considerable part of the snapshots will never change during the lifetime of a system. Nevertheless this data is collected, just in case someone needs it sometimes.

One commercial configuration collector works by allowing remote root-access to all clients from their server. This is not very security minded. I had security in mind when coding scc and scc-srv.

A weakness of SCC is that I coded the classifications of all collected configuration data. This classification has to be used when an admin wants to view specific information. I decided to store cron configuration data under classification "software:cron:" and swap info under classification "system:swap:". Each user of SCC has to follow my intuition.

Another weak point is that the clients are autonomous. The scc-srv can be DOSed by mailing much snapshots from seemingly different systems. Therefore, I suggest to install scc-srv only in a "trusted" network. Finally, scc has to do "reverse engineering" to collect for example the Apache configuration. Apache can be installed and configured in dozens of different locations. We have to determine the correct paths and files from the running processes.

LinuxSecurity.com: How can the project benefit from the open source community?

Siem Korteweg: The project can benefit from the open source community when admins use it and contribute their extensions. These extensions can be specific applications/hardware/OS they use or new features. At the moment some people already contribute knowledge of specific software. Feedback concerning the strong and weak aspects admins experience while they are using SCC, is also valuable.

Area's for future extensions are SAN/NAS and network devices. I am looking for people and organisations that are willing to contribute in any way in these areas.

LinuxSecurity.com: I wish to thank Siem, and other contributors to the System Configuration Collector project. We at LinuxSecurity.com would like to wish you the best of luck!

Brains2Bytes Consulting

About: Alist is a program that collects hardware and software information about systems and stores it in a database for users to browse and search via a Web interface. The program consists of three parts: a client portion that collects the information, a daemon that receives data sent from clients, and a CGI that displays and lets you search for information. Clients for Solaris, Linux, FreeBsd, OpenBSD, and Mac OS X are currently available.

Changes: There is a new Windows module (MSWIN32.pm), a new Irix module (irix.pm), bugfixes for the Linux module on Debian, and bugfixes for client/alist and hpux.pm.

Alist is written entirely in Perl 5. The server portion has been tested on Linux, Solaris, and Mac OS X, and should run without any problems on any modern Unix OS, but may not work on non-Unixlike operating systems, due to calls to fork(). The server needs to have a web server, Perl 5, and the Perl CGI.pm module.

The client portion requires Perl 5, but no modules outside the core distribution are required. There are currently clients for Solaris, Linux, OS X, FreeBSD, and OpenBSD, Windows, HP-UX and Irix. Clients explicitly tested can be found here.

SSGDOC - System Administration at cs.unm.edu

BitKeeper - The Scalable Distributed Software Configuration Management System

BitMover builds and markets enterprise level development tools for software and web developers. Our flagship product is BitKeeper, a powerful replicated and distributed configuration management system. BitKeeper is supported on most platforms, such as Microsoft Windows as well as the various commercial and free Unix platforms. See the products section for more information about BitKeeper and our other products.

Never used BitKeeper? Take the test drive and see how easy it is to get started!

Please enjoy our web site and let us know if there is anything we can

ITracker

About: ITracker is a Java J2EE issue/bug tracking system designed to support multiple projects with independent user bases. It supports features such as multiple versions and project components, detailed histories, issue searching, file attachments, dynamic reports with charts, and multiple email notifications.

Team Development with WebSphere Studio Application Developer -- Part 3 Installing and Configuring CVS on RedHat Linux 7 as an SCM Repository

>This article, the third one in a series on team development in IBM® WebSphere® Studio Application Developer, focuses on installing and configuring CVS on RedHat Linux 7 as an SCM Repository. WebSphere Studio Application Developer (hereafter called Application Developer) works seamlessly with CVS, the dominant open-source, network-transparent version control system. CVS runs on most platforms, including Windows®, Linux, AIX®, and UNIX®. Installing it with Application Developer on RedHat Linux has several advantages:

[Jan 04, 2002] O'Reilly Network: Introduction to CVS

LinuxProgramming: GNOME 2.0 Summary (How to compile GNOME 2.0 from CVS)(May 02, 2001)
LinuxPlanet: Don't Trip on the Red Carpet, Evolve with GNOME CVS(Feb 23, 2001)
Advogato: CVS mixed-tagging for massive Open Source Project Management(Feb 21, 2001)
zez.org: Version Control management with CVS - Part 2(Nov 26, 2000)
zez.org: Version Control Management with CVS - Part 1(Nov 07, 2000)

[July 23, 2001] Automating UNIX system administration with Perl

Note: the article disappeared with IBM site. Probably the author was Teodor Zlatanov (tzz@iglou.com), Programmer, Gold Software Systems
developerWorks
... ... ...

The tool cfengine

If you are serious about automating system administration, cfengine is a tool you should know. Ignoring cfengine is a viable option only if you like to spend your days in the vi editor.

cfengine is a system configuration engine. It takes configuration scripts as input, and then takes actions based on these scripts. It is currently at version 1.6.3 (a very stable release), and version 2.0 is on the horizon. For more information on cfengine development, visit the cfengine Web site (see Resources later in this article).

You don't have to use everything cfengine offers, and you will probably not need the whole thing all at once. Your cfengine configuration files should start out simple, and grow as you discover more things that you want automated.

From the cfengine command reference, here are its most notable features:

Even though you can do with Perl all the things that cfengine does, why would you want to reinvent the wheel? Editing files, for instance, can be a simple one-liner if you want to replace one word with another. When you start allowing for system subtypes, logical system divisions, and all the other miscellaneous factors, your one-liner could end up being 300 lines. Why not do it in cfengine, and produce 100 lines of readable configuration code?

From my own experience, introducing cfengine to a site is quite easy, because you can start out with a minimal configuration file and gradually move things into cfengine over time. No one likes sudden change, least of all system administrators (because they will get blamed if anything goes wrong, of course).

Configuration file management

Managing configuration files is tough. You can start by considering whether cfengine is adequate for the task. Unfortunately, cfengine's editing is line oriented, so complex configuration files will probably not be a good match for it. But simple files such as the TCP wrappers configuration file /etc/hosts.allow are best done through cfengine.

Usually, you will want to keep more than one version of configuration files. For instance, you may need two sets of DNS configurations in /etc/resolv.conf, one for external, and another for internal machines. The external DNS resolv.conf file could, naturally, go into a directory called "external", while the internal resolv.conf could go into the corresponding "internal" directory. Let's assume both directories are under a global "spec" directory, which is a sort of root for configuration files.

The following code will traverse the spec directory, searching for a filename suitable for a given machine. It will start at /usr/local/spec and go down, looking for files that match the one requested. Furthermore, it will check whether or not each directory's name is the same as the class belonging to some machine. Thus, if we request locate_global('resolv.conf', 'wonka'), the function will look under /usr/local/spec for files named resolv.conf that are in either the root directory, or in children of the root directory whose names match the classes that the "wonka" machine belongs to. So, if "wonka" belongs to the "chocolate" class, and if there is a /usr/local/spec/chocolate/resolv.conf file, then locate_global() will return "/usr/local/spec/chocolate/resolv.conf".

If locate_global() finds multiple matching versions of a file (for instance, /usr/local/spec/chocolate/resolv.conf and /usr/local/spec/resolv.conf), it will give up. The assumption is that we are better off with no configuration than with one of the two wrong ones. Also, note that machines can belong to more than one class.

You can build on this structure. For instance,

will contain files for external and internal "chocolate" and "sugar" machines. You just have to set up the your machine_belongs_to_class() function correctly.

Once locate_global() returns a file name, it's pretty simple to copy it to the remote system with scp or rsync. Remember, always preserve the permissions and attributes of the file. Scp needs the "-p" flag, and rsync needs the "-a" flag. Consult the documentation for the file copy command you want to use. And there you have a unified configuration file tree.

Listing 1: Spec directory traversal

# {{{ locate_global: use spec directory to find a file matching the current class
sub locate_global($$)
{
# this code uses File::Find
my $spec_dir = '/usr/local/spec';
my $file = shift || return undef; # file name sought
my $machine = shift || return undef; # machine name
my @matches;
my $find_sub =
sub
{
print "found file $_\n";

push @matches, $File::Find::name if ($_ eq $file);
# the machine_belongs_to_class sub returns true if a machine
# belongs to a class; we stop traversing down otherwise
$File::Find::prune = 1 unless
machine_belongs_to_class($machine, $_) || $_ eq '.';
};

find($find_sub, $spec_dir);

if (scalar @matches > 1)
{
print "More than one match for file $file,",
"machine $machine found: @matches\n" ;
return undef;
}
elsif (scalar @matches == 1)
{
return $matches[0]; # this is the right match
}
else
{
return undef; # no files found
}
}
# }}}

One challenge once you set up this sort of /usr/local/spec structure is: how do we know that resolv.conf should go into /etc? You either have to do without the nice hierarchical structure shown here, adapt it (replace "/" with "+", for instance -- a risky and somewhat ugly approach), or maintain a separate mapping between symbolic names and real names. For instance, "root-profile" can be the symbolic name for "~root/.profile". The last approach is the one I prefer, because it flattens out filenames and eliminates the problem of having hidden filenames. Everything is visible and tidy, under one directory structure. Of course, it's a little more work every time you add a file to the list. The program has to know that "resolv.conf" should be copied to "/etc/resolv.conf" on the remote system, and "dfstab" should go to "/etc/dfs/dfstab" (the Solaris file for sharing NFS filesystems).

Now let's talk about what you can do once you have this spec directory hierarchy set up. You could, if you wanted to, look for all the users named Joe:

Listing 2: Find all password files and grep them for Joe


grep Joe `find /usr/local/spec -name passwd`

Or you can use a tool such as rep.pl (link to rep.pl), written by David Pitts, to replace every word with another:

Listing 3: Find all hosts files and change "wonka" to "willy"


find /usr/local/spec -name hosts -exec rep.pl wonka willy {} \;

Now, you can write both Listing 2 and 3 in Perl, if you want; the find2perl utility was written just for that. It's much simpler, however, to just use find from the start. It really is a wonderful utility that every system administrator should use. More importantly, it took me 5 minutes to write the two listings. How long would it take you to figure out how to use find2perl, store the code it produces in a file, then run that file? Try it and see for yourself!

Task automation

Task automation is an extremely broad topic. I will limit this section to only simple automation of non-interactive UNIX commands. For automation of interactive commands, Expect is the best tool currently available. You should either learn its syntax, or use the Perl Expect.pm module. You can get Expect.pm from CPAN; see Resources for more details.

With cfengine, you can automate almost any task based on arbitrary criteria. Its functionality, however, is a lot like the Makefile functionality in that complex operations on variables are hard to do. When you find that you need to run commands with parameters obtained from a hash, or through a separate function, it's usually best to switch to a shell script or to Perl. Perl is probably the better choice because of its functionality. You shouldn't discard shell scripts as an alternative, though. Sometimes Perl is overkill and you just need to run a simple series of commands.

Automating user addition is a common problem. You can write your own adduser.pl script, or you can use the adduser program provided with most modern UNIX systems. Make sure the syntax is consistent between all the UNIX systems you will use, but don't try to write a universal adduser program interface. It's too hard, and sooner or later someone will ask for a Win32 or MacOS version when you thought you had all the UNIX variants covered. This is one of the many problems that you just shouldn't solve entirely in Perl, unless you are very ambitious. Just have your script ask for user name, password, home directory, etc. and invoke adduser with a system() call.

Listing 4: Invoking adduser with a simple script


#!/usr/bin/perl -w

use strict;

my %values;                             # will hold the values to fill in

# these are the known adduser switches
my %switches = ( home_dir => '-d', comment => '-c', group => '-G',
                 password => '-p', shell => '-s', uid => '-u');

# this location may vary on your system
my $command = '/usr/sbin/adduser ';

# for every switch, ask the user for a value
foreach my $setting (sort keys %switches, 'username')
{
 print "Enter the $setting or press Enter to skip: ";
 $values{$setting} = ;
 chomp $values{$setting};
 # if the user did not enter data, kill this setting
 delete $values{$setting} unless length $values{$setting};
}

die "Username must be provided" unless exists $values{username};

# for every filled-in value, add it with the right switch to the command
foreach my $setting (sort keys %switches)
{
 next unless exists $values{$setting};
 $command .= "$switches{$setting} $values{$setting} ";
}

# append the username itself
$command .= $values{username};

# important - let the user know what's going to happen
print "About to execute [$command]\n";

# return the exit status of the command
exit system($command);

Another task commonly done with Perl is monitoring and restarting processes. Usually, this is done with the Proc::ProcessTable CPAN module, which can go through the entire process table, and give the user a list of processes with many important attributes. Here, however, I must recommend cfengine. It offers much better process monitoring and restarting options than a quick Perl tool does, and if you get serious about writing such a tool, you are just reinventing the wheel (and cfengine is stealing your hubcaps). If you do not want to use cfengine for your own reasons, consider the pgrep and pkill utilities that come with most modern UNIX systems. pkill -HUP inetd will do in one concise command as much as a Perl script four or more lines long. This said, you should definitely use Perl if the process monitoring you are doing is very complex or time sensitive.

For the sake of completeness, here is a Proc::ProcessTable example that shows how to use the kill() Perl function. The "9" as a parameter is the strongest kill() argument, meaning roughly "kill process with extreme prejudice, then feed it to the piranhas." Do not run this as root, unless you really want to kill your inetd processes.

Listing 5: Running through the processes, and killing all inetds


use Proc::ProcessTable;

$t = new Proc::ProcessTable;

foreach $p (@{$t->table}) 
{
 # note that we will also kill "xinetd" and all processes
 # whose command line contains "inetd"
 kill 9, $p->pid if $p->cmndline =~ 'inetd';
}

Host Factory (white paper)

A typical Unix contains 20,000 files. A typical large site contains 100 or more hosts. Keeping each of the resultant 2 million files correct and consistent is a difficult version control problem. Often the problem is not solved, and each host becomes a unique collection of files from differing operating system versions. Reliability plummets as versions of programs interact that vendors never tested for interoperability, and the cost of maintenance soars as the same problem is solved differently for each host. What is needed is a place to store operating system distributions under version control, a place to generate configuration files that differ between hosts, and a method to install these files onto running systems with minimum interruption and maximum automation. The Host Factory software from Working Version fulfills all of these needs. Components of Host Factory include the Pgfs version control filesystem, a Host Profile developed for your site, and the Pdist filesystem replicator.

netSwitch 0.1.3 A boot-time network configuration tool for Linux laptops.

Helix Setup Tools 0.2.0 A simplified interface for Unix workstation configuration.

Information Resource Manager - IRM is a Web-based asset and problem tracking system built for IT departments and helpdesks. It keeps detailed information, both hardware and software, about each computer, as well as a complete history of all work requests ever placed.

SFI Director - The SFI Director is a tool for managing distributed, hetergeneous UNIX Systems.

Its functionality includes System Configuration, Application Distribution, NIS & NIS+ Management, User Creation and Dynamic System Documentation.

LANdb: The Network Administration DB

LANdb is a network administration CGI package written in Perl. It uses a RDBMS (ie MySQL or Oracle) to store information on all network hardware, connections, and connection statuses.

cfengine daemon - Perl-cfd is an superior implementation of the cfengine 1.x server daemon. It has been tested with cfengine v1.4.17 and v1.5.3 clients. It should work with older v1.4.x and other v1.5.x clients.

SysWatch - SysWatch is a Perl CGI to display current information about your UNIX system. It can display drive partitions, drive use, as well as resource hogs, and what current users are doing.

Large Scale System Administration

[This article is essentially a compacted-for-LINK.bnl version of one of the topics covered by MIX (Monthly Information eXchange) Meeting Notes - 09/24/97, written by Susan Sevian. The speaker for this topic was Jim Flanagan of CCD's Advanced Technology and Planning Section.

Notes from any of our MIXes -- generally more detailed than what we provide in LINK.bnl -- are available on the web. Please see the reference to MIXed Notes at the bottom of our MIX page.]

Tools for large scale system administration are being developed in conjunction with the RCF (RHIC Computing Facility) / CCD effort to set up and manage computing systems for RHIC. With a large number (hundreds) of RHIC computers, such system administration tools are needed in order to avoid tedious and error-prone manual efforts to synchronize operating system and node configuration changes.

Under the strategy adopted by RHIC/CCD, configuration information is kept in a hierarchical, class-based central repository, with the configuration of each node viewed as a specialization of more abstract configuration classes. The tool being developed for manipulating this repository is SyRCS, a wrapper around the Revision Control System (RCS), written in Perl. SyRCS provides simple, familiar commands (emulating such UNIX and RCS commands as ls, ci, co), which are used to maintain and inspect the repository and to check node configurations against the repository for "undisciplined" or unauthorized changes.

Unix SysAdm Resources Automated Unix SysMgmt Software

[May 12, 2001] Sys Admin Magazine Online Automatic UNIX Documentation with unixdoc by Roman Marxer

There's no need to spend days documenting your servers. I've written a program that can help. unixdoc collects all the configuration files and other information about your computers into an HTML file and sends it to a display server where it can be viewed with a browser. It works on Solaris 2.6/7/8 and on HP-UX 10.20. On the display server, you can see an overview page with all your systems as shown in Figure 1. By selecting a computer, the unixdoc HTML page of this computer will be displayed as shown in Figure 2.

The unixdoc HTML file of a Solaris computer consists of the following 18 sections:

  1. Hardware
  2. Eeprom
  3. Kernel
  4. Networking
  5. Software
  6. Nameservices
  7. Bootup
  8. Disk
  9. Disk Hardware
  10. Users
  11. dmesg
  12. Printers
  13. Cron
  14. Rhosts
  15. Quota
  16. Syslog
  17. Xntpd
  18. Sendmail

The information in these sections consists of either config files or the output of a command. With unixdoc, it is easy to compare the configuration of two servers. You just have to open the two unixdoc HTML pages of the servers and compare the content, section after section. You don't have to do a login on the two servers, or to remember all those commands to display the configuration. I find subsection 4.1.1 ifinfo helpful, because it provides a good overview of all the network interfaces (speed, mode, etc.). (Subsection 4.1.1 is shown in Figure 3.) The information in this subsection is very useful when verifying the speed/mode settings between your switches and servers. An example of the entire unixdoc HTML page can be found at:
http://www.net.li/article The software can be found at: http://www.net.li/article

[Mar 19, 2001] In Daniel Robbins' newest tutorial, learn to use CVS to check out the latest software sources, or begin using CVS as a full-fledged developer. (Linux)


Document Management Systems

[Apr 04, 2001] Ecora -- very nice package that includes Solaris documenter with HTML output

Whether you are an IT manager, systems integrator, consultant, or reseller, the demands on the IT environments you support are considerable and complex. Preparing for an IT audit, for example, is a time-consuming and tedious process. Our Documentor and IT Auditor products automatically create a comprehensive, natural-language report of your IT infrastructure. This can be used to create an audit trail to meet HIPAA requirements, prepare for a security audit or provide thorough documentation for a system audit. We invite you to experience for yourself the benefits of documentation. Click here to download an .exe file to document a server for free.

Benefits to system documentation:

 

Perl Rescues a Major Corporation

www.perl.com

Company B received a contract to develop a new piece of hardware. As part of this contract, they were to supply their documents online.

First, company B looked into a Commercial, Off-The-Shelf (COTS) document management system. It seemed to meet all of their needs, until they found out that the cost was over $600,000. The price was way too high, in fact it was higher than the original budget for the whole contract!

Next, they decided to go with a proprietary document management system (DMS) that the company had an enterprise license for. This DMS was supposed to be the "do-all, end-all" DMS that would solve all of their problems. And since it was a commercial product and they had an enterprise license for it, the managers of the project assumed that there must be plenty of support available for it.

Company B spent over 6 months installing, configuring, and tweaking this DMS system on the new hardware that they had to buy in order to run it. When they ran into trouble, they called the people within company B who were supposed to be experts on the system for help. These experts didn't know the system any better than the group working on the project and support from the software company was either too pricey, or not much help. So much for the availability of support for this COTS product!

After 6 months of frustration, they gave up on the company standard DMS and implemented a "solution" using File Manager. This solution provided no features of a DMS, was cumbersome and documents were hard to find.

Perl to the Rescue

At this point I came along - and I was completely confident that I could solve their dilemma using a web-based solution with Perl. What other language would I use?

I talked with the program managers and we discussed what the needs of the DMS were. Next, I gathered user input, which, in my opinion, is the most important factor. When developing a system that is going to impact the way your users work on a system, it is important to understand their needs. After considering the needs of users and management, I proposed a Web-based DMS which management quickly approved. Now all I had to figure out was: how am I going to pull this off?

I started to develop the new system and the pieces seemed to fall into place. Eight weeks later, when we rolled out the new Perl DMS system, I completely shut off the existing File Manager access so users had no choice but to use the new system. It was a rather brutal way to force them onto the new system, but one that I felt was necessary.

The New System

The new Perl DMS system has the following features (and more):

[Sep 30, 2000] Linux PR OpenWatcom.org to Use Perforce the Fast Software Configuration Management System

The Open Watcom project requires an industrial strength source control system, that's why we selected Perforce for the job.

ALAMEDA, Calif., Sept. 29 /PRNewswire/ -- Perforce Software, Inc. today announced that SciTech Software has selected the Perforce source code control system to manage the Open Watcom source code base. The Perforce software will enable the large team of developers participating in the Open Watcom worldwide to have up-to-the-minute access to the latest Open Watcom source code via the Internet.

"Perforce itself has benefited tremendously from Open Source software, and we feel it is only fitting that we return the favor. We're especially happy to be supporting the Watcom C++ compiler, which powers a number of our platforms," said Christopher Siewald, president and chief technology officer of Perforce Software.

Perforce Software makes its Fast Software Configuration Management System available at no charge to bona fide organizations developing freely available software, such as OpenWatcom.org. The Open Watcom code base consists of nearly three million lines of code.

"The Open Watcom project requires an industrial strength source control system, that's why we selected Perforce for the job," said Kendall Bennett, Director of Engineering at SciTech Software, Inc. "SciTech uses Perforce for internal projects, so we know that it can handle the massive demands that the Open Watcom project is going to place on a distributed source control system."

Developers wishing to access the Open Watcom Perforce system can register at Open Watcom's web site ( http://www.openwatcom.org ) to be automatically notified when it comes online.

About Open Watcom

Open Watcom is the result of the Open Source release of the Sybase Watcom C/C++ and Fortran compilers. The Open Watcom products are the first mass market, proprietary compilers to be open sourced and, weighing in at nearly three million lines of source code, represent one of the largest pools of commercial source code of any type ever released under an Open Source license. Sybase, Inc. developed the original Watcom code and SciTech Software, Inc. is the official maintainer of the project. The project has already stirred tremendous interest among thousands of developers worldwide, who will use and contribute to its further development. Open Watcom supports software development in Windows, DOS, OS/2, Netware, QNX, and other operating systems. A Linux version of Open Watcom is planned. The Open Watcom web address is http://www.openwatcom.org.

BitKeeper - Distributed source management and version control

A scalable configuration management system, supporting globally distributed development, disconnected operation, compressed repositories, change sets, and named lines of development (branches).

Distributed means that every developer gets their own personal repository and the tool handles moving changes between repositories. SSH, RSH, and/or SMTP can all be used as communication transports between repositories; or, if both are local, the system just uses the file system. For example, this resyncs from a local file system to a remote system using ssh:

bk resync /home/lm/bk bitmover.com:/home/bk

Other features: file names are revisioned and propogated just like contents; graphical interfaces are provided for merging, browsing, and creating changes; changes are logged to a private or public change server for centralized tracking of work; bug tracking is in the works and will be integrated.

Autoconfiscating Amd Automatic Software Configuration of the Berkeley Automounter -- a very interesting paper.

Process Improvement -- slides

Wilma 1.xMN

Wilma is a suite of CGI scripts that allows you to easily manage a list of items (broken into discrete categories) on the Web. With Wilma, you can make lists of bookmarks, resources, reviews, classified ads, 'what's new' lists, bulletin boards and much more. Anything that needs to be indexed and easily maintained is a good candidate for Wilma.

Version 1.xMN of Wilma is independent of the original distribution by E-doc. It is free for non-commercial use (i.e., as long as you don't make money off it-- see the license), and requires Perl 5 on a Unix machine.

Using Wilma

Wilma is extremely flexible. You can have a public submission facility, to allow anyone to add resources, or you can password protect it (with .htaccess) to restrict access to selected people; in this way, you can manage lists of meeting minutes, job offerings or items for sale. You can even use Wilma (or several Wilmas) to manage an entire site's index. By keeping control over the organization of a site with Wilma while allowing people to add and update pages at will, you can take the headache out of Intranet management.

Downloading Wilma

The most current version of Wilma is 1.36MN, which includes bugfixes and several new features. It's probably a good idea to read some documentation first. Wilma is available in a tarred, gzipped archive. To unpack it, move it to the desired directory and type

$ gzip -d wilma1.36.tar.gz $ tar -xvf wilma1.36.tar

I'd love to hear what you think of my version of Wilma; drop me a line!

About this Version

This version of Wilma is by Mark Nottingham, and is unsupported by E-doc. While there have been many enhancements, none of it would be possible without their generous contribution of the original software to the 'net. Thanks, Andrew and Daniel! Support queries and bug reports should go to Mark Nottingham. Please check the FAQ before mailing. If you're upgrading from a previous version, you'll find that changing to this version only requires entering your values to the new wilma.conf file, as well as copying your data directory over. Please pay attention to the license information found in the docs/ directory, as use of this software implies responsibilities to the current author, as well as the original authors. Enjoy!

5/12/97


Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

Guidelines

Configuration files

Wikipedia

General

Tutorials

FAQs

 

Recommended Papers

Love/Hate

Although, or perhaps because, I quit my first real job (at a quickly defunct startup company called Enfoprise, building "business workstations") on the first day because they had changed my job assignment from UNIX driver writing to "Systems Integration", I have had a longstanding love/hate relationship with configuration management tools like SCCS and RCS.

Boxes

My first published paper was "Boxes, Links, and Parallel Trees: Elements of a Configuration Management System" in the first USEnix Workshop on Software Management. In this I described a centralized RCS database, with multiple "views" and hardlink cloning to save space and time, as used by Gould Computer Systems Division's UNIX team.

Dissed by CVS

Brian Berliner (who preceded me at Gould, before he left for Prisma) deprecates my approach in one of the CVS papers, mainly because he advocates an optimistic concurrency control approach, whereas he thought that I advocated locking. Actually, I advocate optimistic concurrency control, but I also advocate locking in case the optimistic version gets into livelock; and, I usually insist that there be a single, identified, serial schedule of source code checkins so that testing can proceed in a linear manner. I require programmers to test that their new code works in a system with all previous fixes applied. (Although I recognize that even this requirement can be relaxed.) I am amused that locking has slowly been creeping back into CVS.

ITworld.com - How to manage system files (and anything else) with SCCS

How often does this happen to you? You add a new Web server to the network, inserting its IP address in /etc/hosts with plenty of time to spare before the Demo For Big People. At T-minus one hour to demo, your browser can't resolve the hostname. Neither can anyone else's.

Frantic, you check everything before finally coming back to /etc/hosts. Your change is gone, probably because someone else edited the file around the same time and overwrote or removed your edits. You either need some strong configuration control, or a truly loud warning bell that signals anyone's attempt to modify a critical file. Text editors aren't databases -- they don't impose transactional consistency or concurrency control for multiple updates. This doesn't affect you one bit if you're the sole system manager at your site, but as soon as two or more people are chartered to maintain the environment, you need some sort of control system to serialize and document configuration changes. The downside is that you'll spend a non-trivial amount of time deciphering changes made by your peers or un-doing valid work that conflicts with items on your own task list.

In this feature we look at the source code control system, or SCCS, bundled into nearly every Unix operating system and a staple of simple configuration control.

After explaining the basics of SCCS file administration, we'll look at the more difficult issues of merging changes and dealing with files owned by root. Our goal is to reduce the mystery and annoyance factor of SCCS, and make it a viable tool for producing an electronic version of your "site book" documenting the who, what, and why of system-configuration changes.

Rewriting history
SCCS is really a collection of tools that control updates to ASCII files. You can use SCCS with binary data, which will be converted into ASCII form using uuencode, but we'll limit this discussion to ASCII data since that's the source for most configuration files. SCCS lets you put files under configuration control, check out read-only copies, acquire write locks for updates, check in and document changes, print histories, and identify and combine specific updates. Any text file can be put under SCCS's control, making it useful for managing plain text documentation and meeting notes.

Before going into the functional details, here's a bit of terminology:

When you place a file under SCCS control, SCCS creates the history file. To change the file, you check it out for editing, and then each subsequent change to the file is annotated in the history file when you check the modified version back in. SCCS locks the history file while one user is editing it to prevent concurrent updates.

Bones of contention
Let's walk through some basic SCCS operations to see how the components fit together, and then get into the grittier problems that make SCCS more of a benefit than an added burden. First, you'll need to have /usr/ccs/bin in your path, since that's where the SCCS commands live (in SunOS, they're part of /usr/bin).

You can call the individual SCCS commands, or use the sccs front-end tool to simplify life. We'll use the front-end for illustrative purposes, but you can also call the SCCS subcommands directly. Make sure you have an obvious place to store history files, such as a subdirectory called SCCS. SCCS commands look for this subdirectory if you don't give an explicit history file location.

Take a vanilla ASCII file and put it under SCCS control, using the admin command:

 huey% sccs admin -ihosts hosts 
This creates an SCCS history file called hosts initialized with the content of the file named hosts. You want the history file and the actual file to be namesakes unless you're particularly good at associating strange path names with your /etc files. You can choose any file you want for the initialization; if you've just sorted your hosts file into /tmp/hosts.sorted, the above command line might be:


   
 huey% sccs admin -i/tmp/hosts.sorted hosts 
If all goes well, sccs admin returns quietly to the shell prompt. The most common complaint is that the initial file doesn't contain any ID keywords, which are magic strings filled in by SCCS with the file name, delta numbers, and date and time stamps. We'll talk about the keywords and how to maximize your enjoyment of them shortly. Successful submission of a file to SCCS creates a new s-file in the SCCS directory. The file is primarily ASCII text, with SCCS records marked with an ASCII SOH (start of header) character, showing up as control-A in most editors. All revisions, delta histories and access control information goes into the s-file.

When you're ready to use the file, check out a read-only copy:

 huey% sccs get hosts 1.2 10 lines 
SCCS tells us the current SID of the file and its size. The get operation produces a read-only file in the current directory, and it will complain if there's a writeable version of the file already present. After you initialize a history file, be sure to rename or remove the initial file to prevent problems on your first check-out operation.


   

Edit the file by checking out a writeable version, using sccs get -e or the shorthand sccs edit:

 huey% sccs edit hosts 1.2 new delta 1.3 10 lines 

This time, we're told the new delta number to be created by our editing session. If someone else is editing the file at the time, SCCS produces an error:

 huey% sccs edit hosts 1.2 ERROR [SCCS/s.hosts]: being edited: `1.2 1.3 stern 95/06/16 17:41:22' (ge17) 

Our first contention point is removed: any request to edit a file that is already being consumed by another system administrator is met with a cryptic yet gentle slap on the keyboard. If you want to find out who is currently editing SCCS-controlled files, use the info subcommand:

 huey% sccs info hosts: being edited: 1.2 1.3 stern 95/06/16 17:41:22 aliases: being edited: 1.45 1.46 wendyt 95/06/17 14:50:33 

Make your changes a part of the file's permanent record using sccs delta:

 huey% sccs delta hosts comments? added two new host entries 1.3 2 inserted 0 deleted 10 unchanged 

Your writeable source file is removed when you file the deltas, so you have to do another sccs get to fetch the latest, read-only copy, or merge the delta and get operations together with sccs delget hosts.

At this point, you can feed the read-only file into whatever system management step comes next: running an NIS make, executing newaliases, or restarting a daemon with its new configuration file.

Letters of intent
How can you determine the version number of a file, or if it's even SCCS controlled? When you check a file out, the get subcommand fills in SCCS keywords with values such as the SID, pathname of the history file, date, and time. The SCCS magic cookie indicating a keyword is a single, capital letter between percent signs, such as %Z%. Put the SCCS keywords in a comment header in your file, and you have a built-in identification scheme. Here's a sample header for a configuration file that uses the pound sign (#) as a comment character:

 # %M% %I% %H% %T% 

This set of keywords gives you the filename (M), the file revision or SID (I), the current date (H), and the time of checkout (T). You may also choose to insert the pathname to the s-file (P). (Here is a partial list of SCCS magic cookies.) The %W% keyword generates the filename and SID prefixed with the string @(#), which is assumed to be unique to the SCCS system. The what utility searches for the SCCS prefix and prints any information after it, allowing you to quickly identify any number of files.

To include other information to be picked up by what, use the %Z% keyword to insert an SCCS cookie and then build your own identification string. A more verbose version of the example above is easily found by what:

 # %Z% common hosts file revision %I% of %H% at %T% 

>what> is smart enough to look in the string tables of executables and libraries, so it will identify the SCCS versions of each object component. Bundle an SCCS string into a C program with a global definition like this:

 char *sccs_id = "%Z% %I% %H% %T%"; 

While peeking at the SID and file origins is useful for quick sanity checks, reviewing the delta history of a file is more likely to tell you who changed something and why. When you create the delta, SCCS asks for a comment which is then recorded with your login in the history file. Dump the delta history using sccs prs:

 huey% sccs prs hosts SCCS/s.hosts: D 1.2 95/06/16 16:49:32 stern 2 1 00002/00002/00008 COMMENTS: added alias for wind, new host shower D 1.1 95/06/16 16:43:30 stern 1 0 00010/00000/00000 COMMENTS: date and time created 95/06/16 16:43:30 by stern 
The line introducing each delta shows you the SID, date and time of change, and the login of the person making the change. The slash-separated numbers are the line counts of new, deleted and unchanged lines. The manual pages for the prs subcommand also list all of the possible SCCS keywords and their expanded values.


   

Merge ahead
We still haven't tackled two of the hardest problems in change management: how do you get multiple users to access SCCS files, particularly when the files are owned by root, and how do you merge changes together? The first problem doesn't have an easy solution. You can keep all of your SCCS history files in /etc/SCCS, and insist that system administrators include their user names when making changes as root. Since this is fairly unlikely, the next step is to make the SCCS history files group-writeable by members of your system management group (creating a new user group if you need to). Create private SCCS work areas for each system manager using symbolic links to the actual history file location: >

huey% mkdir ~stern/etc
huey% ln -s /etc/SCCS ~stern/etc/SCCS
huey% cd ~stern/SCCS
huey% sccs edit hosts

Within ~stern/SCCS, an sccs edit hosts picks up the s-file /etc/SCCS/s.hosts, giving me a private copy of the hosts file to work on.

When I check it back in, the single host-specific copy is returned where other managers (and the system) can find it, but it has my user name attached to changes instead of root. To publicize the changes, I need to su to root, cd into /etc, and then do an sccs get hosts to fetch my latest changes and install the file. Note that the symbolic link points to a machine-specific location, which means I have to be logged on to the machine on which I want to make the edits before doing the checkout. I can always move SCCS files around, as long as files get installed on the appropriate machines.

If you're worried about giving up some measure of security regarding permissions on /etc/hosts, remember that only root can install the file in /etc and rebuild NIS maps or restart daemons. For an added layer of safety, using the SCCS access control feature, explicitly name allowed users with sccs admin -a:

 huey% sccs admin -astern huey% sccs admin -awendyt 

But the opening question still lingers: how do I find out what happened to my hosts file at 3:30 on Friday afternoon June 16, and who did it? The easiest way is to look at the delta history since that time:

 huey% sccs prs -l -c95-06-16-15-30 hosts 

The -l flag says I'm interested in things that occurred after the time specified with the -c flag. The time and date are given in YYMMDDHHMM format, with any non-white space character separating the items. This example shows me the revision history comments and the user names responsible for making changes.

If I want to see the actual line by line edits, it's sccs diffs to the rescue:

 huey% sccs diffs -c95-06-16-15-30 hosts 

Like the diff command, this compares the current working copy of a file to any older delta, identified by SID or by a timestamp. In this example, I'll see the list of changes between the current hosts file and the one that existed at 3:30 PM on June 16. Want to regenerate the hosts file, minus a few changes? get lets you include or exclude any SID, providing a simple mechanism to drop changes from the current copy of a file:

 huey% sccs get -x1.6,1.7 hosts 

The current hosts file is retrieved without the changes applied in SIDs 1.6 and 1.7. If you want to extract the changes made in those deltas, generate the differences with context in a form that can be later fed to sed, just like the output of the standard Unix diff command:

 huey% sccs get -r1.6 hosts huey% sccs diffs -r1.5 hosts > hosts.sed.6 

If you plan on applying the patches at a later time, when the hosts file may have undergone some additional minor edits, you'll need to generate context differences that can be fed through patch:

 huey% sccs diffs -C -r1.5 hosts > hosts.sed.6 

>diff takes the -c flag for generating context differences, but sccs diffs takes -C to avoid conflict with the timestamp flag.

Control freaks
Like all powerful system administration tools, SCCS has a number of poorly documented but interesting features and subtle caveats:

There's certainly much more that can be done with SCCS. In the last issue of Advanced Systems, Chuck Musciano suggested using a Web browser front end for checking files in and out, and viewing the history. A bit of creative perl or awk programming lets you generate HTML out of the sccs prt output. Send us your marriage proposals for HTML and SCCS, and we'll attach the interesting submissions to this page.

The hidden agenda of using SCCS is accountability. You want to know who inflicted a change, and why, and under whose authority. A rigorous policy for attributing changes and accepting responsibility for their implementation and effects is fundamental to any robust, mission-critical environment.

Dan Geer, noted security expert and frequent speaker, tells the story of an investment bank executive who demanded a systems change to circumvent normal reporting and control code. The hole was later exploited to execute trades that violated various internal and external regulations. Who was responsible?

Tracing the changes from idea to deployment gives you the first measure of accountability. It's a good thing to have when you hear those warning bells.

alphaWorks Remote System Management Tool Overview

What is Remote System Management Tool?

Remote Server Management Tool is an Eclipse plug-in that provides an integrated graphical user interface (GUI) environment and enables testers to manage multiple remote servers simultaneously. The tool is designed as a management tool for those who would otherwise telnet to more than one server to manage the servers and who must look at different docs and man pages to find commands for different platforms in order to create or manage users and groups and to initiate and monitor processes. This tool handles these operations on remote servers by using a user-friendly GUI; in addition, it displays configuration of the test server (number of processors, RAM, etc.). The activities that can be managed by this tool on the remote and local server are divided as follows:

How does it work?

This Eclipse plug-in was written with the Standard Widget Toolkit (SWT). The tool has a perspective named Remote System Management; the perspective consists of test servers and a console view. The remote test servers are mounted in the Test Servers view for management of their resources (process, file system, and users or groups).

At the back end, this Eclipse plug-in uses the Software Test Automation Framework (STAF). STAF is an open-source framework that masks the operating system-specific details and provides common services and APIs in order to manage system resources. The APIs are provided for a majority of the languages. Along with the built-in services, STAF also supports external services. The Remote Server Management Tool comes with two STAF external services: one for user management and another for proving system details. About the technology author(s):
Geetha Adinarayan is an advisory software specialist from IBM Software Labs, Bangalore, India. She has five years of experience in IBM messaging middleware products. Ms. Adinarayan holds a degree in information systems from BITS, Pilani, India; she is also a Certified Software Test Engineer and IBM Certified System Administrator for WebSphere Business Integration Message Broker 5. Currently, Ms. Adinarayan works with the High Performance On Demand Solutions (HiPODs) team in India. Her interests are in performance analysis of complex customer solutions and in autonomic computing.

Shashi K. Dalmia is a staff software engineer from IBM Software Labs, Bangalore, India. He has been with IBM for five years and in the IT field for a total of ten years. He has experience in application development, systems software, and messaging middleware. Mr. Dalmia holds a master's degree in software systems from BITS, Pilani, India, and he is an IBM Certified Systems Administrator for Websphere Business Integrator 2.1. Currently, he works on Websphere Business Integrator, Message Broker 6.0, with the Systems Test team in India. His interests include learning new technologies and creating tools to help ease the work of testers and developers.

Rahul Gupta is a computer science engineer from the National Institute Of Engineering, Mysore. He is skilled in the Software Test Automation System (STAF) and Eclipse plug-in development.

Sreenandan Iyengar is a computer science engineer from National Institute Of Engineering, Mysore. He is skilled in the Software Test Automation System (STAF) and Eclipse plug-in development.

 

PIKT Intro The Big Picture

 

scr+dmi - summary

System Configuration Repository (SCR) capture and store information about your system's configuration on your request or at a scheduled times. Desktop Management Interface (DMI) operates between your management software and your system's components. The DMI standard gives technical support personnel, IT managers, and individual users a common path to access information about all aspects of a computer system.
Version B.11.11.32, B.11.00.32 and version B.10.20.32 of SCR+DMI for HP-UX are now available free for download and use from this Web site. There is also a CD containing the product that you can order. Select the link above to see how.

InterWorks 99 Session 027 - Managing System Config Data

The System Configuration Repository (SCR) is an application that tracks changes in a system's configuration over time. SCR can take snapshots of system configuration information periodically or manually before and after major configuration changes. SCR provides tools to filter and compare snapshots from different times or from different machines.

The information that is stored in snapshots comes from DMI, and is stored in a database. Currently, the configuration information available through DMI includes system information such as devices, volume groups, file systems, kernel parameters, etc., and information about software products, including information such as bundles and filesets. (Developers can write their own DMI instrumentation in order to expand the information stored in SCR.)

SCR is highly configurable and can be used in many ways. For example, SCR can be used to maintain consistency on a system or across systems, or to a recover a machine's configuration information in case of disaster, or to maintain consistency between test systems and production systems,...

Included in this presentation is an overview of SCR, future directions, and example scripts for how to use SCR most efficiently. In addition, we will be soliciting input on additional APIs and additional data coverage.

SCR+DMI for HP-UX

Etc

Depot -- a discontinued project

Host Factory

Working Version

Creating multiple, identical copies of a system can be hard work; it becomes even harder if patches and diffs need to be maintained. Multiply this by hundreds of computers ... and Unix sysadmins go crazy.

The Working Version company has created a system version control and distribution mechanism to manage entire installed system versions.

Safari

Infrastructure A Prerequisite for Effective Security

History



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: March 22, 2017