Petter Reinholdtsen <pere@td.org.uit.no>, 2000-04-17

Administrating the CIIPS Network - the easy way

Some suggestions to make it easier to maintain and improve the CIIPS network.
To maintain a multi-user network it is vital to reduse the possibility for human errors as much as possible. It is also necessary to make sure any error get as little impact as possible.

To achieve this, I believe the following must be done:

Keep things consistent
Make sure knowledge is generic and not related to the different computers. Make sure every supported program is available on all platforms and operating systems, with the same version and the same configuration. (cfengine, Store)
Automate as much as possible.
By leaving the repetitive tasks to the computer, one assures they are done the same way everywhere, every time. (cfengine)
Make it possible to back out changes.
When problems arise, make sure it is easy to get back to a earlier state when things where working. Use version control systems to keep track of all human-edited configuration files, and make sure it is easy to remove a newly upgraded software package if problems are discovered. (CVS, Store)
Detect problems as early as possible.
If potential problems are detected and fixed before they become problems, less work is required to fix the resulting domino-effect when one system fail. (Example: a full disk can stop the backup system from working properly, or a failing NFS server can hang processes on other servers and eventually bring other servers to their knees).
Make sure to monitor all hosts and services, and warn when something is about to go wrong. Keep statistics to find repeating problems. (Palantir, mon)
Keep the users informed
Make sure the users know where to find information on current problems, and the status of their reported problems. This is easiest solved using a database and web based problem tracking system. (Bugzilla)
Prepare for disaster
Some suggestions:

Tools

I've already installed Store in the robotics lab, and it will take ~15 minutes to set it up on other hosts as well. With proper setup, this will make most hosts more independent from the NFS servers.