blog/tags/sysadmin/sysadmin.rss

   1 <?xml version="1.0" encoding="utf-8"?>
   2 <rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'>
   3         <channel>
   4                 <title>Petter Reinholdtsen - Entries tagged sysadmin</title>
   5                 <description>Entries tagged sysadmin</description>
   6                 <link>http://people.skolelinux.org/pere/blog/</link>
   7
   8
   9         <item>
  10                 <title>Some notes on fault tolerant storage systems</title>
  11                 <link>http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html</link>
  12                 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Some_notes_on_fault_tolerant_storage_systems.html</guid>
  13                 <pubDate>Wed, 1 Nov 2017 15:35:00 +0100</pubDate>
  14                 <description>&lt;p&gt;If you care about how fault tolerant your storage is, you might
  15 find these articles and papers interesting.  They have formed how I
  16 think of when designing a storage system.&lt;/p&gt;
  17
  18 &lt;ul&gt;
  19
  20 &lt;li&gt;USENIX :login; &lt;a
  21 href=&quot;https://www.usenix.org/publications/login/summer2017/ganesan&quot;&gt;Redundancy
  22 Does Not Imply Fault Tolerance.  Analysis of Distributed Storage
  23 Reactions to Single Errors and Corruptions&lt;/a&gt; by Aishwarya Ganesan,
  24 Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi
  25 H. Arpaci-Dusseau&lt;/li&gt;
  26
  27 &lt;li&gt;ZDNet
  28 &lt;a href=&quot;http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/&quot;&gt;Why
  29 RAID 5 stops working in 2009&lt;/a&gt; by Robin Harris&lt;/li&gt;
  30
  31 &lt;li&gt;ZDNet
  32 &lt;a href=&quot;http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/&quot;&gt;Why
  33 RAID 6 stops working in 2019&lt;/a&gt; by Robin Harris&lt;/li&gt;
  34
  35 &lt;li&gt;USENIX FAST&#39;07
  36 &lt;a href=&quot;http://research.google.com/archive/disk_failures.pdf&quot;&gt;Failure
  37 Trends in a Large Disk Drive Population&lt;/a&gt; by Eduardo Pinheiro,
  38 Wolf-Dietrich Weber and Luiz André Barroso&lt;/li&gt;
  39
  40 &lt;li&gt;USENIX ;login: &lt;a
  41 href=&quot;https://www.usenix.org/system/files/login/articles/hughes12-04.pdf&quot;&gt;Data
  42 Integrity.  Finding Truth in a World of Guesses and Lies&lt;/a&gt; by Doug
  43 Hughes&lt;/li&gt;
  44
  45 &lt;li&gt;USENIX FAST&#39;08
  46 &lt;a href=&quot;https://www.usenix.org/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram_html/&quot;&gt;An
  47 Analysis of Data Corruption in the Storage Stack&lt;/a&gt; by
  48 L. N. Bairavasundaram, G. R. Goodson, B. Schroeder, A. C.
  49 Arpaci-Dusseau, and R. H. Arpaci-Dusseau&lt;/li&gt;
  50
  51 &lt;li&gt;USENIX FAST&#39;07 &lt;a
  52 href=&quot;https://www.usenix.org/legacy/events/fast07/tech/schroeder/schroeder_html/&quot;&gt;Disk
  53 failures in the real world: what does an MTTF of 1,000,000 hours mean
  54 to you?&lt;/a&gt; by B. Schroeder and G. A. Gibson.&lt;/li&gt;
  55
  56 &lt;li&gt;USENIX ;login: &lt;a
  57 href=&quot;https://www.usenix.org/events/fast08/tech/full_papers/jiang/jiang_html/&quot;&gt;Are
  58 Disks the Dominant Contributor for Storage Failures?  A Comprehensive
  59 Study of Storage Subsystem Failure Characteristics&lt;/a&gt; by Weihang
  60 Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky&lt;/li&gt;
  61
  62 &lt;li&gt;SIGMETRICS 2007
  63 &lt;a href=&quot;http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf&quot;&gt;An
  64 analysis of latent sector errors in disk drives&lt;/a&gt; by
  65 L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler&lt;/li&gt;
  66
  67 &lt;/ul&gt;
  68
  69 &lt;p&gt;Several of these research papers are based on data collected from
  70 hundred thousands or millions of disk, and their findings are eye
  71 opening.  The short story is simply do not implicitly trust RAID or
  72 redundant storage systems.  Details matter.  And unfortunately there
  73 are few options on Linux addressing all the identified issues.  Both
  74 ZFS and Btrfs are doing a fairly good job, but have legal and
  75 practical issues on their own.  I wonder how cluster file systems like
  76 Ceph do in this regard.  After all, there is an old saying, you know
  77 you have a distributed system when the crash of a computer you have
  78 never heard of stops you from getting any work done.  The same holds
  79 true if fault tolerance do not work.&lt;/p&gt;
  80
  81 &lt;p&gt;Just remember, in the end, it do not matter how redundant, or how
  82 fault tolerant your storage is, if you do not continuously monitor its
  83 status to detect and replace failed disks.&lt;/p&gt;
  84
  85 &lt;p&gt;As usual, if you use Bitcoin and want to show your support of my
  86 activities, please send Bitcoin donations to my address
  87 &lt;b&gt;&lt;a href=&quot;bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&amp;label=PetterReinholdtsenBlog&quot;&gt;15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b&lt;/a&gt;&lt;/b&gt;.&lt;/p&gt;
  88 </description>
  89         </item>
  90
  91         <item>
  92                 <title>Detecting NFS hangs on Linux without hanging yourself...</title>
  93                 <link>http://people.skolelinux.org/pere/blog/Detecting_NFS_hangs_on_Linux_without_hanging_yourself___.html</link>
  94                 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Detecting_NFS_hangs_on_Linux_without_hanging_yourself___.html</guid>
  95                 <pubDate>Thu, 9 Mar 2017 15:20:00 +0100</pubDate>
  96                 <description>&lt;p&gt;Over the years, administrating thousand of NFS mounting linux
  97 computers at the time, I often needed a way to detect if the machine
  98 was experiencing NFS hang.  If you try to use &lt;tt&gt;df&lt;/tt&gt; or look at a
  99 file or directory affected by the hang, the process (and possibly the
 100 shell) will hang too.  So you want to be able to detect this without
 101 risking the detection process getting stuck too.  It has not been
 102 obvious how to do this.  When the hang has lasted a while, it is
 103 possible to find messages like these in dmesg:&lt;/p&gt;
 104
 105 &lt;p&gt;&lt;blockquote&gt;
 106 nfs: server nfsserver not responding, still trying
 107 &lt;br&gt;nfs: server nfsserver OK
 108 &lt;/blockquote&gt;&lt;/p&gt;
 109
 110 &lt;p&gt;It is hard to know if the hang is still going on, and it is hard to
 111 be sure looking in dmesg is going to work.  If there are lots of other
 112 messages in dmesg the lines might have rotated out of site before they
 113 are noticed.&lt;/p&gt;
 114
 115 &lt;p&gt;While reading through the nfs client implementation in linux kernel
 116 code, I came across some statistics that seem to give a way to detect
 117 it.  The om_timeouts sunrpc value in the kernel will increase every
 118 time the above log entry is inserted into dmesg.  And after digging a
 119 bit further, I discovered that this value show up in
 120 /proc/self/mountstats on Linux.&lt;/p&gt;
 121
 122 &lt;p&gt;The mountstats content seem to be shared between files using the
 123 same file system context, so it is enough to check one of the
 124 mountstats files to get the state of the mount point for the machine.
 125 I assume this will not show lazy umounted NFS points, nor NFS mount
 126 points in a different process context (ie with a different filesystem
 127 view), but that does not worry me.&lt;/p&gt;
 128
 129 &lt;p&gt;The content for a NFS mount point look similar to this:&lt;/p&gt;
 130
 131 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
 132 [...]
 133 device /dev/mapper/Debian-var mounted on /var with fstype ext3
 134 device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
 135         opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
 136         age:    7863311
 137         caps:   caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
 138         sec:    flavor=1,pseudoflavor=1
 139         events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0
 140         bytes:  166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809
 141         RPC iostats version: 1.0  p/v: 100003/3 (nfs)
 142         xprt:   tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
 143         per-op statistics
 144                 NULL: 0 0 0 0 0 0 0 0
 145              GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
 146              SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
 147               LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
 148               ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
 149             READLINK: 125 125 0 20472 18620 0 1112 1118
 150                 READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
 151                WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
 152               CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
 153                MKDIR: 3680 3680 0 773980 993920 26 23990 24245
 154              SYMLINK: 903 903 0 233428 245488 6 5865 5917
 155                MKNOD: 80 80 0 20148 21760 0 299 304
 156               REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
 157                RMDIR: 3367 3367 0 645112 484848 22 5782 6002
 158               RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
 159                 LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
 160              READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
 161          READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
 162               FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
 163               FSINFO: 2 2 0 232 328 0 1 1
 164             PATHCONF: 1 1 0 116 140 0 0 0
 165               COMMIT: 0 0 0 0 0 0 0 0
 166
 167 device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
 168 [...]
 169 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
 170
 171 &lt;p&gt;The key number to look at is the third number in the per-op list.
 172 It is the number of NFS timeouts experiences per file system
 173 operation.  Here 22 write timeouts and 5 access timeouts.  If these
 174 numbers are increasing, I believe the machine is experiencing NFS
 175 hang.  Unfortunately the timeout value do not start to increase right
 176 away.  The NFS operations need to time out first, and this can take a
 177 while.  The exact timeout value depend on the setup.  For example the
 178 defaults for TCP and UDP mount points are quite different, and the
 179 timeout value is affected by the soft, hard, timeo and retrans NFS
 180 mount options.&lt;/p&gt;
 181
 182 &lt;p&gt;The only way I have been able to get working on Debian and RedHat
 183 Enterprise Linux for getting the timeout count is to peek in /proc/.
 184 But according to
 185 &lt;ahref=&quot;http://docs.oracle.com/cd/E19253-01/816-4555/netmonitor-12/index.html&quot;&gt;Solaris
 186 10 System Administration Guide: Network Services&lt;/a&gt;, the &#39;nfsstat -c&#39;
 187 command can be used to get these timeout values.  But this do not work
 188 on Linux, as far as I can tell.  I
 189 &lt;ahref=&quot;http://bugs.debian.org/857043&quot;&gt;asked Debian about this&lt;/a&gt;,
 190 but have not seen any replies yet.&lt;/p&gt;
 191
 192 &lt;p&gt;Is there a better way to figure out if a Linux NFS client is
 193 experiencing NFS hangs?  Is there a way to detect which processes are
 194 affected?  Is there a way to get the NFS mount going quickly once the
 195 network problem causing the NFS hang has been cleared?  I would very
 196 much welcome some clues, as we regularly run into NFS hangs.&lt;/p&gt;
 197 </description>
 198         </item>
 199
 200         <item>
 201                 <title>Debian Jessie, PXE and automatic firmware installation</title>
 202                 <link>http://people.skolelinux.org/pere/blog/Debian_Jessie__PXE_and_automatic_firmware_installation.html</link>
 203                 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Debian_Jessie__PXE_and_automatic_firmware_installation.html</guid>
 204                 <pubDate>Fri, 17 Oct 2014 14:10:00 +0200</pubDate>
 205                 <description>&lt;p&gt;When PXE installing laptops with Debian, I often run into the
 206 problem that the WiFi card require some firmware to work properly.
 207 And it has been a pain to fix this using preseeding in Debian.
 208 Normally something more is needed.  But thanks to
 209 &lt;a href=&quot;https://packages.qa.debian.org/i/isenkram.html&quot;&gt;my isenkram
 210 package&lt;/a&gt; and its recent tasksel extension, it has now become easy
 211 to do this using simple preseeding.&lt;/p&gt;
 212
 213 &lt;p&gt;The isenkram-cli package provide tasksel tasks which will install
 214 firmware for the hardware found in the machine (actually, requested by
 215 the kernel modules for the hardware).  (It can also install user space
 216 programs supporting the hardware detected, but that is not the focus
 217 of this story.)&lt;/p&gt;
 218
 219 &lt;p&gt;To get this working in the default installation, two preeseding
 220 values are needed.  First, the isenkram-cli package must be installed
 221 into the target chroot (aka the hard drive) before tasksel is executed
 222 in the pkgsel step of the debian-installer system.  This is done by
 223 preseeding the base-installer/includes debconf value to include the
 224 isenkram-cli package.  The package name is next passed to debootstrap
 225 for installation.  With the isenkram-cli package in place, tasksel
 226 will automatically use the isenkram tasks to detect hardware specific
 227 packages for the machine being installed and install them, because
 228 isenkram-cli contain tasksel tasks.&lt;/p&gt;
 229
 230 &lt;p&gt;Second, one need to enable the non-free APT repository, because
 231 most firmware unfortunately is non-free.  This is done by preseeding
 232 the apt-mirror-setup step.  This is unfortunate, but for a lot of
 233 hardware it is the only option in Debian.&lt;/p&gt;
 234
 235 &lt;p&gt;The end result is two lines needed in your preseeding file to get
 236 firmware installed automatically by the installer:&lt;/p&gt;
 237
 238 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
 239 base-installer base-installer/includes string isenkram-cli
 240 apt-mirror-setup apt-setup/non-free boolean true
 241 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
 242
 243 &lt;p&gt;The current version of isenkram-cli in testing/jessie will install
 244 both firmware and user space packages when using this method.  It also
 245 do not work well, so use version 0.15 or later.  Installing both
 246 firmware and user space packages might give you a bit more than you
 247 want, so I decided to split the tasksel task in two, one for firmware
 248 and one for user space programs.  The firmware task is enabled by
 249 default, while the one for user space programs is not.  This split is
 250 implemented in the package currently in unstable.&lt;/p&gt;
 251
 252 &lt;p&gt;If you decide to give this a go, please let me know (via email) how
 253 this recipe work for you. :)&lt;/p&gt;
 254
 255 &lt;p&gt;So, I bet you are wondering, how can this work.  First and
 256 foremost, it work because tasksel is modular, and driven by whatever
 257 files it find in /usr/lib/tasksel/ and /usr/share/tasksel/.  So the
 258 isenkram-cli package place two files for tasksel to find.  First there
 259 is the task description file (/usr/share/tasksel/descs/isenkram.desc):&lt;/p&gt;
 260
 261 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
 262 Task: isenkram-packages
 263 Section: hardware
 264 Description: Hardware specific packages (autodetected by isenkram)
 265  Based on the detected hardware various hardware specific packages are
 266  proposed.
 267 Test-new-install: show show
 268 Relevance: 8
 269 Packages: for-current-hardware
 270
 271 Task: isenkram-firmware
 272 Section: hardware
 273 Description: Hardware specific firmware packages (autodetected by isenkram)
 274  Based on the detected hardware various hardware specific firmware
 275  packages are proposed.
 276 Test-new-install: mark show
 277 Relevance: 8
 278 Packages: for-current-hardware-firmware
 279 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
 280
 281 &lt;p&gt;The key parts are Test-new-install which indicate how the task
 282 should be handled and the Packages line referencing to a script in
 283 /usr/lib/tasksel/packages/.  The scripts use other scripts to get a
 284 list of packages to install.  The for-current-hardware-firmware script
 285 look like this to list relevant firmware for the machine:
 286
 287 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
 288 #!/bin/sh
 289 #
 290 PATH=/usr/sbin:$PATH
 291 export PATH
 292 isenkram-autoinstall-firmware -l
 293 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
 294
 295 &lt;p&gt;With those two pieces in place, the firmware is installed by
 296 tasksel during the normal d-i run. :)&lt;/p&gt;
 297
 298 &lt;p&gt;If you want to test what tasksel will install when isenkram-cli is
 299 installed, run &lt;tt&gt;DEBIAN_PRIORITY=critical tasksel --test
 300 --new-install&lt;/tt&gt; to get the list of packages that tasksel would
 301 install.&lt;/p&gt;
 302
 303 &lt;p&gt;&lt;a href=&quot;https://wiki.debian.org/DebianEdu/&quot;&gt;Debian Edu&lt;/a&gt; will be
 304 pilots in testing this feature, as isenkram is used there now to
 305 install firmware, replacing the earlier scripts.&lt;/p&gt;
 306 </description>
 307         </item>
 308
 309         <item>
 310                 <title>Scripting the Cerebrum/bofhd user administration system using XML-RPC</title>
 311                 <link>http://people.skolelinux.org/pere/blog/Scripting_the_Cerebrum_bofhd_user_administration_system_using_XML_RPC.html</link>
 312                 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Scripting_the_Cerebrum_bofhd_user_administration_system_using_XML_RPC.html</guid>
 313                 <pubDate>Thu, 6 Dec 2012 10:30:00 +0100</pubDate>
 314                 <description>&lt;p&gt;Where I work at the &lt;a href=&quot;http://www.uio.no/&quot;&gt;University of
 315 Oslo&lt;/a&gt;, we use the
 316 &lt;a href=&quot;http://sourceforge.net/projects/cerebrum/&quot;&gt;Cerebrum user
 317 administration system&lt;/a&gt; to maintain users, groups, DNS, DHCP, etc.
 318 I&#39;ve known since the system was written that the server is providing
 319 an &lt;a href=&quot;http://en.wikipedia.org/wiki/XML-RPC&quot;&gt;XML-RPC&lt;/a&gt; API, but
 320 I have never spent time to try to figure out how to use it, as we
 321 always use the bofh command line client at work.  Until today.  I want
 322 to script the updating of DNS and DHCP to make it easier to set up
 323 virtual machines.  Here are a few notes on how to use it with
 324 Python.&lt;/p&gt;
 325
 326 &lt;p&gt;I started by looking at the source of the Java
 327 &lt;a href=&quot;http://cerebrum.svn.sourceforge.net/viewvc/cerebrum/trunk/cerebrum/clients/jbofh/&quot;&gt;bofh
 328 client&lt;/a&gt;, to figure out how it connected to the API server.  I also
 329 googled for python examples on how to use XML-RPC, and found
 330 &lt;a href=&quot;http://tldp.org/HOWTO/XML-RPC-HOWTO/xmlrpc-howto-python.html&quot;&gt;a
 331 simple example in&lt;/a&gt; the XML-RPC howto.&lt;/p&gt;
 332
 333 &lt;p&gt;This simple example code show how to connect, get the list of
 334 commands (as a JSON dump), and how to get the information about the
 335 user currently logged in:&lt;/p&gt;
 336
 337 &lt;blockquote&gt;&lt;pre&gt;
 338 #!/usr/bin/env python
 339 import getpass
 340 import xmlrpclib
 341 server_url = &#39;https://cerebrum-uio.uio.no:8000&#39;;
 342 username = getpass.getuser()
 343 password = getpass.getpass()
 344 server = xmlrpclib.Server(server_url);
 345 #print server.get_commands(sessionid)
 346 sessionid = server.login(username, password)
 347 print server.run_command(sessionid, &quot;user_info&quot;, username)
 348 result = server.logout(sessionid)
 349 print result
 350 &lt;/pre&gt;&lt;/blockquote&gt;
 351
 352 &lt;p&gt;Armed with this knowledge I can now move forward and script the DNS
 353 and DHCP updates I wanted to do.&lt;/p&gt;
 354 </description>
 355         </item>
 356
 357         </channel>
 358 </rss>