Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Missing backup horror stories

News Enterprise Unix System Administration Recommended Links Simple Unix Backup Tools Unix rm command Unix mv command
Acronis True Image Rush/absence of testing Creative uses of rm Abuse of privileges LVM and disk related mishaps Working on the wrong computer
Safe-rm Typical Errors In Using Find Tips Unix History Humor Etc

If you try to distill the essence of horror stories, most of them are about inadequate backups. Everyone who has worked as system administrator in a large corporation for substantial period of time can tell that as a general observation, large organizations/corporations tend to opt for incredibly expensive, incredibly complex, incredibly overblown backup "solutions" sold to them by vendors rather than using the stock, well-tested, reliable tools that they already have. (e.g., Data Protector, Tivoli backup, or other expensive closed-source/proprietary/non-portable/slow/bulky software.

Home users have their own set of problems: According to a recent Carnegie-Mellon University report, hard drive failures affect up to 13 percent of all personal computer users each year. And yet surveys show almost half of users do not back up their data.  Of cause now SSD are not that expensive, but they fail too, although they are more resistant to falling from the desk on the floor.

Having a good recent backup that can be restored is the key feature that distinguishes mere nuisance from full blown disaster. note that phzse " that can be resotred". This point is very difficult to understand by novice enterprise administrators.  often the "missing backup" situation arise when backup is available but can't be used for restoration or restores only a part of filesystem, or is not current. There are some rules that help both prevent such situation and recover from it

Rephrasing Bernard Show we can say "Experience keeps the most expensive school, but most sysadmins are unable to learn anywhere else". Please remember that in enterprise environment you will almost never be rewarded for innovations and contributions but in many cases you will be severely punished for blunders. In other words typical enterprise IT is a risk averse environment and you better understand that sooner rather then later...

If you try to distill the essence of horror stories most of them are about inadequate backups. Having a good recent backup is the key feature that distinguishes mere nuisance from full blown disaster.

You should not be passing in accepting you fate. There should be couscous efforts to locate and test backup before engaging in some potentially dangerous manipulations with the OS.

Test your backups to make sure they are readable before starting any potentially dangerous manipulations with the OS.

Handle the format program (and anything else that writes directly to disk devices) like nitroglycerine.

If you've never done sysadmin work before, take a formal vendor training class even if this means paying your own money.

Testing your backups periodically should be a habit and it is better to be integrated into your monitoring system. Attempt at least to browse the backup and see if data are intact is a must. comparing it with the server state is even better.  In any case that should be done. Skipping this means negligence on the part of system administrator.

Please remember that backup is the last change for you to restore the system if something went terribly wrong. That means that before any dangerous steps you need to locate and check the existence of backup.

In eneterprise environment making a private  backup is also a good idea to that you have two or more recent copies of your OS and some user and data directories. It does not need to be complete. FIT falsh drives limit the total size to 128GB, but they are almost invisible after you insert them into USB port on the server and they provide improtant and cheap insurance for your OS, baseline and critical user and data files.  

The felling of desperation one is experiencing after getting into this classic horror story are well reflected in the following parody on Yesterday

John Lennon's Yesterday -- variation for sysadmins

Yesterday,
All those backups seemed a waste of pay.
Now my database has gone away.
Oh I believe in yesterday.

Suddenly,
There's not half the files there used to be,
And there's a milestone hanging over me
The system crashed so suddenly.

I pushed something wrong
What it was I could not say.
Now all my data's gone
and I long for yesterday-ay-ay-ay.

Yesterday,
The need for back-ups seemed so far away.
I knew my data was all here to stay,
Now I believe in yesterday.


Top updates

Softpanorama Switchboard
Softpanorama Search


NEWS CONTENTS

Old News ;-)

[Dec 16, 2011] Acronis restore story

This is about Windows, but the lesson is valuable in any case

If you are not careful you can wipe out your C disk performing a restore of the Windows C partition image to a USB drive, as selection of bootable recovery image somehow redirects recovery to disk C. The warning sign is when Acronis True Image wants to reboot computer to proceed.

If you are brave enough to go past this point, then despite the fact that you explicitly made your target different from bootable drive you need to face unpleasant consequences -- your C partition is now gone.

You can imagine your surprise with the results. I once did that. Thanks God there was no critical data on this wiped C drive. I already migrate it to a new PC. My first reaction was to throw this garbage program where it belongs. But the problem is that other similar programs are not much better and now I am trained not to trust Acronis and probably can do better in future. Another factor is that if you don't use Acronis True Image often you forget about it capabilities (in this case the write decision would be to use cloning of the disk operating, not restoration from the image but the problem was that the disk and image were slightly different and I want the content of the image not the content of the disk.

Still right way would be to do first clone of the disk and then perform restoration of the image to this drive. As I don't use complex operations with Acronis often, I forgot about that and was punished. And believe me you jaw really drops in such cases when you see the results...

AIX/370 cluster story

Another time, our AIX/370 cluster managed to trash the /etc/passwd file. All 4 machines in the cluster lost their copies within milliseconds. In the next few minutes, I discovered that (a) the nightly script that stashed an archive copy hadn't run the night before and (b) that our backups were pure zorkumblattum as well. (The joys of running very beta-test software).

I finally got saved when I realized the cluster had *5* machines in it - a lone PS/2 had crashed the night before, and failed to reboot. So it had a propogated copy of /etc/passwd as of the previous night.

Go to that PS/2, unplug it's Ethernet.. reboot it. Copy /etc/passwd to floppy, carry to a working (?) PS/2 in the cluster, tar it off, let it propogate to other cluster sites. Go back, hook up the
crashed PS/2s ethernet.. All done.

Only time in my career that having beta-test software crash a machine saved me from bugs in beta-test software. ;)

Bad backup story

Once I was in the position of upgrading a Gould PN/9080. I was a good sysadmin, took a backup before I started, since the README said that they had changed the I-node format slightly. I do the upgrade, and it goes with unprecidented (for Gould) smoothness. mkfs all the user partitions, start restoring files. Blam.

I/O error on the tape. All 12 tapes. Both Sets of backups.

However, 'dd' could read the tape just fine.

36 straight hours later, I finally track it down to a bad chip on the tape controller board - the chip was involved in the buffer/convert from a 32-bit backplane to a 8-bit I/O cable. Every 4 bytes, the 5th bit would reverse sense. 20 mins later, I had a program written, and 'dd 3 my_twiddle 3 restore -f -' running.

Moral: Always *verify* the backups - the tape drive didn't report a write error, because what it *received* and what went on the tape were the same....

I'm sure I have other sagas, but those are some of the more memorable ones I've had...

Valdis Kletnieks
Computer Systems Engineer
Virginia Tech

"on-the-job training"

From: rca@Ingres.COM (Bob Arnold)
Organization: Ask Computer Systems Inc., Ingres Division, Alameda CA 94501

Many moons ago, in my first sysadmin job, learning via "on-the-job training", I was in charge of a UNIX box who's user disk developed a bad block. (Maybe you can see it already ...)

The "format" man page seemed to indicate that it could repair bad blocks. (Can you see it now?) I read the man page very carefully. Nowhere did it indicate any kind of destructive behavior.

I was brave and bold, not to mention boneheaded, and formatted the user disk.

Heh.

The good news:
1) The bad block was gone.
2) I was about to learn a lot real fast :-)
The bad news:
1) The user data was gone too.
2) The users weren't happy, to say the least.

Having recently made a full backup of the disk, I knew I was in for a miserable all day restore. Why all day? It took 8 hours to dump that disk to 40 floppies. And I had incrementals (levels 1, 2, 3, 4, and 5, which were another sign of my novice state) to layer on top of the full.

Only it got worse. The floppy drive had intermittent problems reading some of the floppies. So I had to go back and retry to get the files which were missed on the first attempt.

This was also a port of Version 7 UNIX (like I said, this was many moons ago). It had a program called "restor", primordial ancestor of BSD's "restore". If you used the "x" option to extract selected files (the ones missed on earlier attempts), "restor" would use the *inode number* as the name of the extracted files. You had to move the extracted files to their correct locations yourself (the man page said to write a shellscript to do this :-(). I didn't know much about shell scripts at the time, but I learned a lot more that week.

Yes, it took me a full week, including the weekend, maybe 120 hours or more, to get what I could (probably 95% of the data) off the backups.

And there were a few ownership and permissions problems to be cleaned up after that.

Once burned twice shy. This is the only truly catastrophic mistake I've ever made as a sysadmin, I'm glad to be able to say.

I kept a copy of my memo to the users after I had done what I could. Reading it over now is sobering indeed. I also kept my extensive notes on the restore process - thank goodness I've never had to use them since.

Morals:
1) The "man" pages don't tell you everything you need to know.
2) Don't do backups to floppies.
3) Test your backups to make sure they are readable.
4) Handle the format program (and anything else that writes directly to disk devices) like nitroglycerine.
5) Strenuously avoid systems with inadequate backup and restore programs wherever possible (thank goodness for "restore" with an "e"!).
6) If you've never done sysadmin work before, take a formal training class.

Well, I haven't thought about that one in a while! I can laugh about it now ....

Bob

Some lessons about cutting costs

From: rca@Ingres.COM (Bob Arnold)
Organization: Ask Computer Systems Inc., Ingres Division, Alameda CA 94501

In article <1992Oct12.233524.13463@pony.Ingres.COM> I wrote:

>I was brave and bold, not to mention boneheaded, and formatted the user disk.

> U rest of story deleted ... Bob ~

>Morals:
> 1) The "man" pages don't tell you everything you need to know.
> 2) Don't do backups to floppies.
> 3) Test your backups to make sure they are readable.
> 4) Handle the format program (and anything else that writes directly
> to disk devices) like nitroglycerine.
> 5) Strenuously avoid systems with inadequate backup and restore
> programs wherever possible (thank goodness for "restore" with
> an "e"!).
> 6) If you've never done sysadmin work before, take a formal
> training class.

Just thought of a few more related morals (managers pay attention now):

7) You get what you pay for.
8) There's no substitute for experience.
9) It's a lot less painful to learn from someone else's experience than your own (that's what this thread is about, I guess :-) )

Part of the story I should tell here. My employer had been looking for a way to cut costs. I was 15% cheaper than their previous sysadmin so they let him go and hired me. It wasn't as nasty as it sounds, since they kept him on as a consultant at 4 hours a week and he ended up with a better job too (so did I). Everyone benefited in the end. I leaned heavily on his consulting, which was great. He was older and wiser, and probably had his own horror stories to tell. After this one, so did I!

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

...



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haterís Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: July 20, 2017