Eternally Regenerative Software Administration

Oct

2012

Debian-Powered Drupal Configuration Policy

LinuxForce’s web hosting services are designed to provide our customers with the benefits of strong security, simple on-going administration and maintenance, and support for the web software to work well in a large number of situations including multi-site support for applications.

We achieve these objectives by operating a Debian infrastructure with close adherence to Debian’s policy, web application policy and PHP policy.

In particular, we use the Debian package for Drupal 6. This provides community supported upgrades with a strong, well-documented policy. Like other software which ships in Debian, the software version is often somewhat older than the most recent upstream release, but it is regularly patched for security by the maintainer and the Debian security team.

In addition, this infrastructure also offers:

Only one package needs to be upgraded whenever there is a security patch, instead of every site individually
Strong separation from site specific themes, modules, libraries, uploaded files helps keep files you want to edit away from the Drupal core files (which you don’t want to edit)
The automation the infrastructure provides disk, memory, and sysadmin time to be minimized, thus reducing costs
The benefits of code maturity, as the Debian Drupal maintainers have thought through many boundary cases which it would take our staff time and trial-and-error to re-discover

Site layout

Experienced Drupal admins may find some of the file locations for the package confusing at first, so let’s clarify the differences to ease the transition.

Our infrastructure supports multiple sites per server. This is implemented by providing each site with their own dbconfig.php and settings.php files, plus the following directories:

files/
themes/
modules/
libraries/

All of these are configurable by the user, and are located in /etc/drupal/6/sites/drupal.example.com/. The site also inherits the contents of these default directories from core Drupal.

Access to these files via FTP is discussed below.

Core Drupal files are located in /usr/share/drupal6/ and are shared between all the Drupal sites on the server. They should not be edited since all changes to these files will be lost upon upgrade of the Debian Drupal package.

Users & Permissions

A jailed userdrupal FTP account is created to manage the files for the Drupal install.

The userdrupal account is the owner of all the configurable files located in /etc/drupal/6/sites/drupal.example.com/

The system-wide www-data user must have access to the following by having the www-data group own them and be given the appropriate permissions:

Read access to dbconfig.php
Write access to the files/ directory (this is where Drupal typically stores uploaded files)

All files (with the exception of dbconfig.php) should be writable by userdrupal, the userdrupal group itself is enforced by the system but all users should be configured client-side to respect group read-write permissions to maintain strict security, this is a umask of 002.

Additional Users and non-Drupal Directories

If there is a non-developer who needs to have access to the files directory, for instance, a specific FTP user for that use may be created and added to the userdrupal group.

For ease of administration, if multiple users exist and we are able to support an ssh account (static IP from client required) for handling administration, sudo can be configured to allow said user to execute commands on behalf of the userdrupal ftp user. Remember to add “umask 002” to the .bashrc to respect group read-write permissions.

If additional directories outside the Drupal infrastructure are required for a site, they will be placed in /srv/www/example.com/ and a separate jailed ftp user created to manage the files here. If a cgi-bin is required, it will be placed in /srv/www/example.com/cgi-bin/

Caveats and Cautions

Because of shared use for core Drupal files located in /usr/share/drupal6/ you may experience problems with the following:

Some drush commands and plugins assume the drupal files are editable files within the same document root as the rest of your files, thus may not work as expected
When you upgrade the drupal6 package all sites are upgraded at once without testing, customers who are concerned about the impact of the upgrade changes and require very high uptimes can be accommodated by testing the site with the upgraded versions of PHP and Drupal in a testing VM
Since the Debian package name does not change, you cannot install the drupal6 package from an older Debian version alongside one from a current version, a separate virtualized Debian environment may be needed for testing upgrades if support is uncertain (this issue does not exist with upgrades from drupal6 to drupal7, as they can be installed alongside each other)
A policy for handling root level .htaccess files should be developed if they wish to be used

Conclusion

Although the Debian way differs from the Drupal tarball approach, it makes it possible to scale the service to many sites saving disk, memory, and sysadmin effort. By leveraging this Drupal infrastructure provided by Debian, Linuxforce provides one-off Debian package deployments to dedicated systems, shared arrangements for small businesses who are running several sites, and infrastructure deployments for businesses who provide hosting services. We also offer a boutique hosting service for select customers on one of our systems.

Sep

2011

File Servers – The Business Case for High Availability

Introduction

You have probably heard of high availability transaction processing servers. You have most likely read about the sophisticated systems used by the airlines to sell tickets online. They have to be non-stop because downtime translates to lost orders and revenue. In this article I will discuss the economics of using non-stop technologies for everyday applications. I will show that even ordinary file sharing applications can benefit from inexpensive Linux based Pacemaker clustering technology.

Availability Goal

What is our availability goal? Our goal should be to take prudent and cost effective measures to reduce computer downtime to nil in the required service window. I’m not talking about 99.999 % (five 9s) up time. This is the popular (and very expensive) claim made by high availability vendors. I’m talking about maintaining enough up time to service the application. Take a simple example, for office document preparation the service time window is office hours (9-5). The rest of the time the desktop PCs can be turned off, nobody is there to operate them anyway. You only need the PCs for 5 days a week for 8 hours a day or for 2080 hours per desktop PC per year. This translates into an up time requirement of 24 percent. Ideally you want the desktop PCs to be available all the time during office hours but are willing to give up availability for routine maintenance and for the infrequent breakdowns that may occur only once per workstation every five years or so. Perhaps you have two spare desktop PC workstations for every 100. This extra capacity allows your office workers to resume their work on a spare while their workstation is being repaired. In this example the cost of maintaining adequate availability is the cost of maintaining two spare desktop PCs. You might adjust this cost to account for real world conditions at the work site. Wide swings in operating temperature or poor quality electricity supply, might dictate that you increase the number of spare PCs. Sounds like a low stress, straightforward availability solution.

Network Effects

The problem gets more complicated when the desktop PCs are networked together and all the documents are stored on a central file server rather than on each workstation’s hard drive. There is a multiplicative effect. If the file server is not available then all 100 document processing PCs are rendered unavailable. Then you have 98 (remember the 2 spares from above) workers being paid but not producing documents. A failure during office hours can become expensive. One hour of downtime can cost as much as $1500 in lost worker wages. A day of downtime can cost $12,000 of lost worker wages. How long will it take for a hardware repair person to travel to your site? How long will it take for spare parts to arrive? How long will it take the repair person to replace the parts? How long will it take for damaged files to be replaced from backup by your own people? A serious but not unlikely failure can take several days to be completely resolved. Its not unreasonable to assume that such a $24,000 failure can occur once every 5 years. This is a very simple example. We are not talking about a complicated order-entry or inventory control system. We are talking about 98 office workers saving files to a central file share so that they can be indexed and backed up.

The Effects of Time

I’m going to add another wrinkle to our office document processing example. This file sharing setup has been in use for 4 years. Time flies. The hardware is getting old faster than you realize. Old hardware is more likely to fail. It has been through more thunderstorms, more A/C breakdowns, people knocking the server by accident and all that. You’ve been noticing that your hardware maintenance plan is costing more every year. How long is the hardware vendor going to stock spare parts for your obsolete office equipment? Please forgive me for playing on your paranoia but the real world can be rude.

Time for an Upgrade

In this scenario you conclude that you are going to have to replace that file server soon. Its going to be a pain to migrate all the files to a new unit. I am going to have to upgrade to a new version of Windows server. How much is that going to cost? How much has Windows changed? If I am going to have to go to all this trouble, why not get some new improvement out of it. I know I can get bigger disks and more RAM (random access memory) for less money than I paid for the old server. Whoops. Windows is going to cost more. I have to pay a charge for every workstation attached to it. That CAL (Client Access License) price has gone up. I read something about high availability clustering in Windows. Enterprise Server does that. Wow. Look at the price of that! Remember that $12,000 per day of downtime cost overhang? It’s more of an issue now that you are dealing with an old system.

A Debian Cluster Solution

Enough of this already. Since I asked so many questions and raised so many doubts, I owe you, the reader, some answers. Debian Linux provides a very nice high availability solution for file servers. You need two servers with directly attached storage and also a third little server that can be little more than a glorified workstation. You need Debian Squeeze 64 bit edition that has the Pacemaker, Corosync, drbd and Samba packages installed for each server. The software is free. You pay for the hardware and a trustworthy Linux consultant who can set everything up for you. What you get is a fully redundant quorum cluster with fully redundant storage, multiple CPU cores on each node, much more RAM than you had before and much more storage capacity.

Here are hardware price estimates:

Tie breaker node: Two hard drives, 512MB RAM	$500.00
Name brand file server node: 8 2TB SATA drives, 24GB RAM, 1 4 core CPU chip, 3 year on site parts and labor warranty.	$6,000.00
Second file server node like above.	$6,000.00
Misc parts for storage and control networks.	$200.00
Total:	$12,700.00

Each file server node has software RAID 5 and each node holds 14 terabytes of disk storage. Because it is completely redundant across nodes, total cluster storage capacity is 14 terabytes. Performance of this unit will be much better than the old unit. It effectively has 4 CPUs per file storage node and much more RAM for file buffering. Software updates from Debian are free. You just need someone to apply the security patches and version upgrades.

The best feature is complete redundancy for file processing. In our file server example, any one of the nodes can completely fail and file server processing will continue. Based on the lost labor time cost estimates above, this system pays for itself if it eliminates 1 day of downtime in a five year period. You also have hardware maintenance savings of whatever the yearly charge is for your old system times 3 years because you get 3 years of warranty coverage on the new hardware. You have the consultant’s charges for converting to the new system, but remember, you were going to have to pay that fee for a new Windows system as well.

Conclusion

I hope I have stirred your interest in Linux Pacemaker based clusters. I have shown a file server upgrade that pays for itself by reducing downtime. You also upgrade your file server’s performance while reducing out of pocket expenses for software and hardware maintenance. Not a bad deal.

May

2010

Please Document the Shop: On the importance of good systems documentation

We have all heard this: You need to document the computer infrastructure. You never know when you might be “hit by a bus”. We hear this and think many frightening things, reassure ourselves that it will never happen and then put the request on the back burner. In this article I will expand on the phrase “hit by a bus” and then look at the consequences.

Things do happen to prevent people from coming into work. The boss calls home. Talks to the wife and makes the sad discovery that Mike wont be coming in anymore. He passed away last night in bed. People get sudden illnesses that disable them. Car accidents happen.

More often than these tragedies occur, thank goodness, business conditions change without warning. In reorganizations whole departments disappear, computer rooms are consolidated and moved, companies are bought and whole workforces replaced. I have had the unhappy experience of living through some of this.

Some organizations have highly transient workforces because of the environment that they operate in. Companies located near universities benefit from an influx of eager young, upwardly mobile university graduates. These workers are eager to gain experience but soon find higher paying jobs in the “real world” further away from campus. These companies have real turnover problems. People are moving up so quickly, they don’t have time to write things down.

Even when you keep people in place and maintain a fairly stable environment, people discover that what they have documented in their heads can just fade away. This is getting to be more and more of an issue. Networks and servers and other such infrastructure functions have been around for 20 years in many organizations. Fred the maintainer retired five years ago. Fred the maintainer was transferred to sales. The longer systems are around, the more things can happen to Fred. Fred might be right where he was 20 years ago. He just can’t remember what he did.

What does all this mean? What are the consequences of losing organizational knowledge in a computer organization? To be blunt, it creates a hideous environment for your computer people. The system is a black box to them. They are paralyzed. They are rightfully afraid. Every small move they make can bring down the system in ways they cannot predict. Newcomers take much longer to train. Old-timers learn to survive by looking busy while doing nothing. The politics of the shop and the whole company is made bloody by the various interpretations of the folklore of the black box. He/she who waves their arms hardest rules the day. This is no way for your people to live.

This is no way for the computer infrastructure to live as well. While the games are played the infrastructure evolves more slowly and slowly. Before long the infrastructure is frozen. Nobody dares to touch it. The only way to fix it is to completely replace it at considerable expense. In elaborate infrastructures this is easier said than done. The productive lifetime of the platform is shortened. It was not allowed to grow and evolve to lengthen its lifetime. Think of the Hubble Telescope without all the repairs and enhancements over the years. It would have burned out in re-entry long ago.

Having made my case, I ask again; for your own good, please document the shop. Make these documents public and make them accurate. Record what actually is rather than what you wish it to be. It is better to be a little embarrassed for a short while than to be mislead later on. Update the documentation when changes occur. An out of date document can be as bad as no document at all. Make an effort to record facts. At the same time don’t leave out general philosophies that guided the design and other qualitative information because it helps your successors interpret the facts when ambiguities occur.

Think of what you leave behind. Persuade your boss to make this a priority as well. Hopefully the people at your next workplace will do the same.

Nov

2009

Seven Observations On Software Maintenance And FOSS

The November 2009 issue of Communications of the ACM (CACM) has a very interesting article by Paul Stachour and David Collier-Brown entitled “You Don’t Know Jack About Software Maintenance”. The authors argue energetically for using versioned data structures and “continuous upgrading” to improve the state of the art of software maintenance.

The piece got me thinking about FOSS (Free and Open Source Software) and “continuous upgrading”. Here are seven observations on FOSS software maintenance that occurred to me as I reflected on the CACM article:

FOSS projects “continuously” apply bug fixes and feature enhancements at no additional cost to their users. By applying these improvements “continuously”, the user reaps a steady stream of “interest payments” providing ever-improving security, performance, and functionality.
Since FOSS incurs no licensing or license management costs, upgrading FOSS is not hindered by capital expenses.
Typically support in FOSS projects is focused on the current stable version. Therefore, upgrading to the current stable version is the preferred way to receive the best support from FOSS communities.
One of the key reasons behind Debian‘s strong track record of “continuous upgrading” is its way of handling the tricky issues involved with dependent library upgrades (such as libc6, libssl.so.0.9.8, & etc). The chapter on Shared Libraries in the Debian Policy Manual details a proven method to effectively handle library upgrade issues (including its sophisticated handling of versions).
When upgrading is applied routinely and “continuously”, it becomes crucial to support customizations across upgrades which can be one of the biggest obstacles to a smooth upgrade (see my earlier post on customization and upgradeability ). One reason for Debian’s effectiveness in this regard is its robust configuration file handling policy.
It is worth noting that the “continuous” implied here is not the one emphasized in dictionaries (which takes its nuances from the mathematical / physics concept of “no interruptions” and the epsilon-delta definition that students of Calculus learn). That concept of “continuous” is impossible in systems administration which is necessarily discrete as are all computer operations. The connotation required here is, perhaps, “unending”, or “eternal” or somesuch.
The “right” frequency for “continuous” upgrades is a complex tradeoff between business requirements and upgrade infrastructure maturity. Debian and Ubuntu provide vary mature support for “continuous upgrading”. They support the upgrade of production servers through release after release after major release with minimal downtime or risk of a glitch that could affect users. Their current release frequency of about 2 years may be the best we can do given the current state of the art of software maintenance. I hope we can learn to increase the frequency as better engineered upgrade policies are developed.

I prefer the name “eternally regenerative software administration” over “continuous upgrading”. It avoids the philosophical problems with the word “continuous” and emphasizes the active, “ecological” approach needed to envision the engineering of “regenerativity” in software. By that I mean software maintenance should involve building the system so each new version enables installation of the next while facilitating management of any customizations and integration with other software (including libraries and other “helper” applications). Regenerativity is the process of growth and change used by Nature itself. Software maintenance needs to follow similar principles.

Oct

2009

Customization, Upgradeability and Eternally Regenerative Software Administration

Mary Hayes Weier wrote an interesting article in this week’s edition of InformationWeek on "Alternative IT: CIOs are more receptive than ever to new software models". What is great about her article is how she captured the divergent views on IT models (such as SaaS, cloud computing, etc.) and gave nice vignettes of different organizations trying different parts of various models. I especially valued her use of cognitive dissonance to leave the reader thinking … better informed but without a firm conclusion.

There are so many parts of the article that I could blog about, but the one that touched the core of my thinking about “eternally regenerative software administration” was the quote by Bill Louv, CIO at GlaxoSmithKline, who said

"And here’s the rub: When you customize software, it’s difficult to implement future upgrades from the vendor"

Louv touched the very bane of eternally regenerative software administration! Software should accommodate both customization and upgradeability: these two elements of software administration are at the heart of my notion of eternally regenerative software administration: how to preserve customizations and provide smooth (near zero downtime with almost no glitches) upgrades through major release after major release. It is a big challenge, but in our experience the Free and Open Source Software (FOSS) communities are at the leading edge in finding solutions to these conflicting objectives. Here are some of the innovative ideas from the FOSS world which should serve as models or design patterns for all software developers (if only these ideas would become commonplace!).

First, Debian (a FOSS operating system which is the root of Ubuntu, Knoppix, Xandros and many other Linux distributions) requires that their official packages, a collection of software prepared for easy administration, must adhere to a very mature policy. Debian’s policy is a marvel in the FOSS world and to a very large degree is responsible for its strong support for both customization and upgradeability. I think Debian’s reputation for stability and maintainability is almost certainly due to their decision to develop a consensus-driven policy that its software must implement.

For example, the Debian package maintainer, Luigi Gangitano, for Drupal, a FOSS content management platform, did a great job making the software both customizable and maintainable. The package supports configuration of multiple virtual hosts which can all be upgraded at once! And the Debian drupal6 package stores the look-n-feel in /etc/drupal/6/themes/ so that each site’s GUI can be customized without interfering with upgrades. If only all web applications were built to be as maintainable as Debian’s Drupal package!

Another example is the overlay support included in RT: Request Tracker, a FOSS ticket tracking system. This allows putting replacement subroutines in special files in /usr/local/share/ which overlay or substitute the upstream code. This approach is more likely to break on upgrades, but it supports minimal changes to the business logic with a decent chance that upgrades will be smooth.

There are countless more examples from the FOSS world of innovative solutions to inter-accommodate customization and upgrades in support of eternally regenerative software administration. What are some of your favorite examples?

Eternally Regenerative Software Administration

Debian-Powered Drupal Configuration Policy

Site layout

Users & Permissions

Additional Users and non-Drupal Directories

Caveats and Cautions

Conclusion

File Servers – The Business Case for High Availability

Introduction

Availability Goal

Network Effects

The Effects of Time

Time for an Upgrade

A Debian Cluster Solution

Conclusion

Please Document the Shop: On the importance of good systems documentation

Seven Observations On Software Maintenance And FOSS

Customization, Upgradeability and Eternally Regenerative Software Administration

Pages

Recent Posts

Categories

Archives

Site layout

Users & Permissions

Additional Users and non-Drupal Directories

Caveats and Cautions

Conclusion

Introduction

Availability Goal

Network Effects

The Effects of Time

Time for an Upgrade

A Debian Cluster Solution

Conclusion

Pages

Recent Posts

Categories

Archives

Tag Cloud