Systems Administration

Jan

2012

Cluster Services Built With FOSS

Built on the Free/Open Source Software (FOSS) model for cluster deployments, LinuxForce staff has been hard at work over the past months developing and deploying LinuxForce Cluster Services built upon exclusively FOSS technologies and on December 15th we put out a press release:

Announcing LinuxForce Cluster Services

In September Laird Hariu wrote the article “File Servers – The Business Case for High Availability” where, in addition to building a case to use clusters, he also briefly outlined how Debian and other FOSS could be used to create a cluster for a file server. File servers are just the beginning, we have deployed clusters which host web, mail, DNS and more.

The core of this infrastructure uses Debian 6.0 (Squeeze) 64-bit and then depending upon the needs and budget of the customer, and whether they have a need for high availability, we use tools including Pacemaker, Corosync, rsync, drbd and KVM. Management of this infrastructure is handled remotely through the virtualization API libvirt using the virsh and Virtual Machine Manager.

The ability to use such high-quality tools directly from the repositories in the stable Debian distribution keeps our maintenance costs down, avoids vendor lock-in and gives companies like ours the ability offer these enterprise-level clustering solutions to small and medium size businesses for reasonable prices.

May

2010

Please Document the Shop: On the importance of good systems documentation

We have all heard this: You need to document the computer infrastructure. You never know when you might be “hit by a bus”. We hear this and think many frightening things, reassure ourselves that it will never happen and then put the request on the back burner. In this article I will expand on the phrase “hit by a bus” and then look at the consequences.

Things do happen to prevent people from coming into work. The boss calls home. Talks to the wife and makes the sad discovery that Mike wont be coming in anymore. He passed away last night in bed. People get sudden illnesses that disable them. Car accidents happen.

More often than these tragedies occur, thank goodness, business conditions change without warning. In reorganizations whole departments disappear, computer rooms are consolidated and moved, companies are bought and whole workforces replaced. I have had the unhappy experience of living through some of this.

Some organizations have highly transient workforces because of the environment that they operate in. Companies located near universities benefit from an influx of eager young, upwardly mobile university graduates. These workers are eager to gain experience but soon find higher paying jobs in the “real world” further away from campus. These companies have real turnover problems. People are moving up so quickly, they don’t have time to write things down.

Even when you keep people in place and maintain a fairly stable environment, people discover that what they have documented in their heads can just fade away. This is getting to be more and more of an issue. Networks and servers and other such infrastructure functions have been around for 20 years in many organizations. Fred the maintainer retired five years ago. Fred the maintainer was transferred to sales. The longer systems are around, the more things can happen to Fred. Fred might be right where he was 20 years ago. He just can’t remember what he did.

What does all this mean? What are the consequences of losing organizational knowledge in a computer organization? To be blunt, it creates a hideous environment for your computer people. The system is a black box to them. They are paralyzed. They are rightfully afraid. Every small move they make can bring down the system in ways they cannot predict. Newcomers take much longer to train. Old-timers learn to survive by looking busy while doing nothing. The politics of the shop and the whole company is made bloody by the various interpretations of the folklore of the black box. He/she who waves their arms hardest rules the day. This is no way for your people to live.

This is no way for the computer infrastructure to live as well. While the games are played the infrastructure evolves more slowly and slowly. Before long the infrastructure is frozen. Nobody dares to touch it. The only way to fix it is to completely replace it at considerable expense. In elaborate infrastructures this is easier said than done. The productive lifetime of the platform is shortened. It was not allowed to grow and evolve to lengthen its lifetime. Think of the Hubble Telescope without all the repairs and enhancements over the years. It would have burned out in re-entry long ago.

Having made my case, I ask again; for your own good, please document the shop. Make these documents public and make them accurate. Record what actually is rather than what you wish it to be. It is better to be a little embarrassed for a short while than to be mislead later on. Update the documentation when changes occur. An out of date document can be as bad as no document at all. Make an effort to record facts. At the same time don’t leave out general philosophies that guided the design and other qualitative information because it helps your successors interpret the facts when ambiguities occur.

Think of what you leave behind. Persuade your boss to make this a priority as well. Hopefully the people at your next workplace will do the same.

Dec

2009

Given 250,000 tools on the shelf, how do you manage them?

Although I haven’t seen a thoroughly researched study, I figure there must be at least 250,000 FOSS (Free and Open Source Software) tools available to every systems administrator on the planet (230,000 at SourceForge + 15,000 at Launchpad + 12,000 at CodePlex + 5,000 at Google Code and that doesn’t count the Linux kernel or any of the myriad other self-hosted projects). These 250,000+ resources comprise the full “toolbox” that admins can use for building solutions with FOSS; they represent the FOSS equivalent of COTS (Commercial Off-The-Shelf). Of course, if you add open source but non-free or commercial tools, the problem explodes combinatorially.

How can a systems administrator support the largest possible subset of these “on the shelf” resources to best service the next need from a stakeholder (like the boss or a new client)?

First let me emphasize the difficulty of the task with a list of items that systems administrators and systems management firms like LinuxForce are expected to do whenever a stakeholder presents a software need:

Find and Evaluate software that can meet the need:

Identify several candidate applications that might meet the business requirements for a given project, function, or need
Research the options to assess their ability to meet the requirements (actually we, the systems administrators of the world, are actually expected to know which tool is “best of breed”: just from our past experience. The false assumption is, if it isn’t well known it must not be any good. The long tail applies to the 250,000+ FOSS tools also!). In our experience such research is essential, unfortunately, there is rarely enough budget to carefully explore the options.
Install the tool(s) in a “sandbox” to allow the stakeholder to “try it out”
Select a tool to use or look for more options

Put the tool into production

Read the docs to identify best practices for the software’s configuration
Prepare an installation plan that will address (as best as possible) any upgrade glitches (yes, you have to anticipate them now or suffer the consequences later!) so that you’re prepared for when a security advisory is released (or when the stakeholder starts begging for features from a new release)
Figure out a support plan to handle the inevitable questions that will arise during operations
Integrate these considerations into the process of either installing a package or using the “make, configure, make install” steps that most FOSS tools provide for installation
Carefully document the “as built” configuration including all assumptions and anticipated glitches to help yourself or future admins during the maintenance phase

On-Going Maintenance

Monitor the software
Subscribe to any relevant security mailing lists for the software so that you are apprised when a security (or other major) problem is detected
Track general trends relating to the software and its alternatives so that you are ready to respond if the project goes dormant or is eclipsed by newer, superior technology.
Upgrade routinely

About 15 years ago I noticed that the explosion of ready to use FOSS tools plus the trend toward general purpose tools and away from custom software was leading to a combinatorial crisis in software maintenance. I saw that it was the systems administrator’s responsibility to address the situation.

It has become apparent to me that the solution would require use of convention, standards and policy to reduce the complexity of the problem to manageable proportions. I searched for the most “standardized” conventions and policy-enforcing environment that would also provide the most flexible access to the most FOSS tools. The solution I found is Debian/GNU Linux, the universal operating system (although Ubuntu and other Debian derivatives also provide most of these benefits as well).

Debian simplifies the software evaluation process (apt-get [search|show]). Debian simplifies installation (apt-get install), security and new version upgrades (apt-get [upgrade|dist-upgrade]). Debian uses conventions and packages to simplify identifying best practices for administering the software (/usr/share/doc/[package]/, /var/lib/dpkg/info/[package].postinst, and wikis, mailings lists, bug reports, etc.). But the key benefit for managing the combinatorial explosion of FOSS tools is the Debian community’s value of striving to configure each package to automatically support the most common use cases while also providing support for unusual configurations (so you save tons of time in configuring the software).

In summary, the Debian/GNU Linux system provides the infrastructure needed to manage the combinatorial explosion of off the shelf FOSS tools cost effectively. If you have to service a lot of users, customers, or clients with challenging, diverse needs, I think Debian is the most cost effective way to meet their needs and deliver quality maintenance on an on-going basis year after year after year.

Dec

2009

A FOSS Perspective On Richard Schaeffer’s Three Tactics For Computer Security

Federal Computer Week published a great, succinct quote from Richard Schaeffer Jr., the NSA’s (National Security Agency) information assurance director, on three approaches that are effective in protecting systems from security attacks:

We believe that if one institutes best practices, proper configurations
[and] good network monitoring that a system ought to be able to
withstand about 80 percent of the commonly known attack mechanisms
against systems today, Schaeffer said in his testimony. You can
actually harden your network environment to raise the bar such that
the adversary has to resort to much, much more sophisticated means,
thereby raising the risk of detection.”

Taking Schaeffer’s three tactics as our lead, here is a FOSS perspective on these protection mechanisms:

Best practices implies community effort: discussing, sharing and collectively building understanding and techniques for managing systems and their software components. FOSS (Free and Open Source Software) communities develop, discuss and share these best practices in their project support and development forums. Debian’s package management system implements some of these best practices in the operating system itself thereby allowing users who do not participate in the development and support communities to realize the benefits of best practices without understanding or even knowing that they exist. This is one of the important benefits of policy- and package-based operating systems like Debian and Ubuntu.

Proper configuration is the tactical implementation of best practices. Audit is a critical element here. Debian packages can use their postinst scripts (which are run after a package is installed, upgraded, or re-installed) to audit and sometimes even automatically fix configuration problems. Right now, attentive, diligent systems administrators, i.e., human beings, are required to ensure proper configuration as no vendor — not even Debian — has managed to automate the validation let alone automatically fix bad configurations. I think this is an area where the FOSS community can lead by considering and adopting innovations for ensuring proper configuration of software.

Good network monitoring invokes the discipline of knowing what services are running and investigating when service interruptions occur. Monitoring can contribute to configuration auditing and can help focus one’s efforts on any best practices that should be considered. That is, monitoring helps by engaging critical thinking and building a tactile awareness of the network — what it does and what is exposed to the activities of a frequently malicious Internet. So, like proper configurations, monitoring requires diligent, attentive systems administrators to maintain security. LinuxForce’s Remote Responder℠ services builds best practices around three essential FOSS tools for good network monitoring: Nagios, Munin, and Logcheck.

Nov

2009

Forthcoming Design Science Symposium and Systems Administration

Ever since I started doing systems administration, I’ve been interested in applying Buckminster “Bucky” Fuller’s comprehensive anticipatory design science to the task. Bucky extolled the virtues of a comprehensive approach. Put bluntly, the comprehensive perspective says “since what you don’t attend to will get ya, you had better consider everything. Or said positively: only by considering all elements in a system and all its interrelationships with other (relevant) systems can you ensure reliable on-going operation. In addition, the proactive or anticipatory approach is essential to prevent system complexities from impacting operations. I think of design as human initiative-taking to provide a service or artifact and science as experience-based learning. Evidently, design science is implicit in the work of systems administrators. I think the discipline of comprehensive anticipatory design science can be positively applied to the practice of systems administration.

So I am excited that on November 14 & 15, I will be attending the Synergetics Collaborative’s two-day Symposium on “Design Science” at the Rhode Island School of Design (RISD). Together with the organizing committee, we (I serve as volunteer Executive Director of the Synergetics Collaborative) have put together a program that will develop a deeper understanding of design science. So even though computer systems administration is not on the agenda, I think anyone with a problem-solving focus in their work (including systems administrators) would benefit by attending.

To find out more about this exciting event visit the Design Science: Nature’s Problem Solving Method Symposium home page here.

Oct

2009

Introducing RemoteResponder.LinuxForce.Net

If you know the history of LinuxForce, you know that we’ve been doing remote systems administration using FOSS (Free and Open Source Software) since our founding in 1995. And we’ve called our remote systems administration service Remote Responder℠ for a long time too. But the website RemoteResponder.LinuxForce.Net is new.

The new site is part of our educational initiative to explain the issues involved in administering FOSS-based IT infrastructures to achieve the promise of greater reliability and ever-improving functionality while keeping costs low and meeting an organizations’ ever-evolving business needs. Check out our new website RemoteResponder.LinuxForce.Net and let us know what you think.

Systems Administration

Cluster Services Built With FOSS

Please Document the Shop: On the importance of good systems documentation

Given 250,000 tools on the shelf, how do you manage them?

A FOSS Perspective On Richard Schaeffer’s Three Tactics For Computer Security

Forthcoming Design Science Symposium and Systems Administration

Introducing RemoteResponder.LinuxForce.Net

Pages

Recent Posts

Categories

Archives

Pages

Recent Posts

Categories

Archives

Tag Cloud