]> pere.pagekite.me Git - homepage.git/blob - blog/tags/sysadmin/sysadmin.rss
e0eee3cee3a3eb94c0b52979d91fdc0dfec6c4aa
[homepage.git] / blog / tags / sysadmin / sysadmin.rss
1 <?xml version="1.0" encoding="utf-8"?>
2 <rss version='2.0' xmlns:lj='http://www.livejournal.org/rss/lj/1.0/'>
3 <channel>
4 <title>Petter Reinholdtsen - Entries tagged sysadmin</title>
5 <description>Entries tagged sysadmin</description>
6 <link>http://people.skolelinux.org/pere/blog/</link>
7
8
9 <item>
10 <title>Detecting NFS hangs on Linux without hanging yourself...</title>
11 <link>http://people.skolelinux.org/pere/blog/Detecting_NFS_hangs_on_Linux_without_hanging_yourself___.html</link>
12 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Detecting_NFS_hangs_on_Linux_without_hanging_yourself___.html</guid>
13 <pubDate>Thu, 9 Mar 2017 15:20:00 +0100</pubDate>
14 <description>&lt;p&gt;Over the years, administrating thousand of NFS mounting linux
15 computers at the time, I often needed a way to detect if the machine
16 was experiencing NFS hang. If you try to use &lt;tt&gt;df&lt;/tt&gt; or look at a
17 file or directory affected by the hang, the process (and possibly the
18 shell) will hang too. So you want to be able to detect this without
19 risking the detection process getting stuck too. It has not been
20 obvious how to do this. When the hang has lasted a while, it is
21 possible to find messages like these in dmesg:&lt;/p&gt;
22
23 &lt;p&gt;&lt;blockquote&gt;
24 nfs: server nfsserver not responding, still trying
25 &lt;br&gt;nfs: server nfsserver OK
26 &lt;/blockquote&gt;&lt;/p&gt;
27
28 &lt;p&gt;It is hard to know if the hang is still going on, and it is hard to
29 be sure looking in dmesg is going to work. If there are lots of other
30 messages in dmesg the lines might have rotated out of site before they
31 are noticed.&lt;/p&gt;
32
33 &lt;p&gt;While reading through the nfs client implementation in linux kernel
34 code, I came across some statistics that seem to give a way to detect
35 it. The om_timeouts sunrpc value in the kernel will increase every
36 time the above log entry is inserted into dmesg. And after digging a
37 bit further, I discovered that this value show up in
38 /proc/self/mountstats on Linux.&lt;/p&gt;
39
40 &lt;p&gt;The mountstats content seem to be shared between files using the
41 same file system context, so it is enough to check one of the
42 mountstats files to get the state of the mount point for the machine.
43 I assume this will not show lazy umounted NFS points, nor NFS mount
44 points in a different process context (ie with a different filesystem
45 view), but that does not worry me.&lt;/p&gt;
46
47 &lt;p&gt;The content for a NFS mount point look similar to this:&lt;/p&gt;
48
49 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
50 [...]
51 device /dev/mapper/Debian-var mounted on /var with fstype ext3
52 device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
53 opts: rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
54 age: 7863311
55 caps: caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
56 sec: flavor=1,pseudoflavor=1
57 events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0
58 bytes: 166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809
59 RPC iostats version: 1.0 p/v: 100003/3 (nfs)
60 xprt: tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
61 per-op statistics
62 NULL: 0 0 0 0 0 0 0 0
63 GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
64 SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
65 LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
66 ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
67 READLINK: 125 125 0 20472 18620 0 1112 1118
68 READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
69 WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
70 CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
71 MKDIR: 3680 3680 0 773980 993920 26 23990 24245
72 SYMLINK: 903 903 0 233428 245488 6 5865 5917
73 MKNOD: 80 80 0 20148 21760 0 299 304
74 REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
75 RMDIR: 3367 3367 0 645112 484848 22 5782 6002
76 RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
77 LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
78 READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
79 READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
80 FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
81 FSINFO: 2 2 0 232 328 0 1 1
82 PATHCONF: 1 1 0 116 140 0 0 0
83 COMMIT: 0 0 0 0 0 0 0 0
84
85 device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
86 [...]
87 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
88
89 &lt;p&gt;The key number to look at is the third number in the per-op list.
90 It is the number of NFS timeouts experiences per file system
91 operation. Here 22 write timeouts and 5 access timeouts. If these
92 numbers are increasing, I believe the machine is experiencing NFS
93 hang. Unfortunately the timeout value do not start to increase right
94 away. The NFS operations need to time out first, and this can take a
95 while. The exact timeout value depend on the setup. For example the
96 defaults for TCP and UDP mount points are quite different, and the
97 timeout value is affected by the soft, hard, timeo and retrans NFS
98 mount options.&lt;/p&gt;
99
100 &lt;p&gt;The only way I have been able to get working on Debian and RedHat
101 Enterprise Linux for getting the timeout count is to peek in /proc/.
102 But according to
103 &lt;ahref=&quot;http://docs.oracle.com/cd/E19253-01/816-4555/netmonitor-12/index.html&quot;&gt;Solaris
104 10 System Administration Guide: Network Services&lt;/a&gt;, the &#39;nfsstat -c&#39;
105 command can be used to get these timeout values. But this do not work
106 on Linux, as far as I can tell. I
107 &lt;ahref=&quot;http://bugs.debian.org/857043&quot;&gt;asked Debian about this&lt;/a&gt;,
108 but have not seen any replies yet.&lt;/p&gt;
109
110 &lt;p&gt;Is there a better way to figure out if a Linux NFS client is
111 experiencing NFS hangs? Is there a way to detect which processes are
112 affected? Is there a way to get the NFS mount going quickly once the
113 network problem causing the NFS hang has been cleared? I would very
114 much welcome some clues, as we regularly run into NFS hangs.&lt;/p&gt;
115 </description>
116 </item>
117
118 <item>
119 <title>Debian Jessie, PXE and automatic firmware installation</title>
120 <link>http://people.skolelinux.org/pere/blog/Debian_Jessie__PXE_and_automatic_firmware_installation.html</link>
121 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Debian_Jessie__PXE_and_automatic_firmware_installation.html</guid>
122 <pubDate>Fri, 17 Oct 2014 14:10:00 +0200</pubDate>
123 <description>&lt;p&gt;When PXE installing laptops with Debian, I often run into the
124 problem that the WiFi card require some firmware to work properly.
125 And it has been a pain to fix this using preseeding in Debian.
126 Normally something more is needed. But thanks to
127 &lt;a href=&quot;https://packages.qa.debian.org/i/isenkram.html&quot;&gt;my isenkram
128 package&lt;/a&gt; and its recent tasksel extension, it has now become easy
129 to do this using simple preseeding.&lt;/p&gt;
130
131 &lt;p&gt;The isenkram-cli package provide tasksel tasks which will install
132 firmware for the hardware found in the machine (actually, requested by
133 the kernel modules for the hardware). (It can also install user space
134 programs supporting the hardware detected, but that is not the focus
135 of this story.)&lt;/p&gt;
136
137 &lt;p&gt;To get this working in the default installation, two preeseding
138 values are needed. First, the isenkram-cli package must be installed
139 into the target chroot (aka the hard drive) before tasksel is executed
140 in the pkgsel step of the debian-installer system. This is done by
141 preseeding the base-installer/includes debconf value to include the
142 isenkram-cli package. The package name is next passed to debootstrap
143 for installation. With the isenkram-cli package in place, tasksel
144 will automatically use the isenkram tasks to detect hardware specific
145 packages for the machine being installed and install them, because
146 isenkram-cli contain tasksel tasks.&lt;/p&gt;
147
148 &lt;p&gt;Second, one need to enable the non-free APT repository, because
149 most firmware unfortunately is non-free. This is done by preseeding
150 the apt-mirror-setup step. This is unfortunate, but for a lot of
151 hardware it is the only option in Debian.&lt;/p&gt;
152
153 &lt;p&gt;The end result is two lines needed in your preseeding file to get
154 firmware installed automatically by the installer:&lt;/p&gt;
155
156 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
157 base-installer base-installer/includes string isenkram-cli
158 apt-mirror-setup apt-setup/non-free boolean true
159 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
160
161 &lt;p&gt;The current version of isenkram-cli in testing/jessie will install
162 both firmware and user space packages when using this method. It also
163 do not work well, so use version 0.15 or later. Installing both
164 firmware and user space packages might give you a bit more than you
165 want, so I decided to split the tasksel task in two, one for firmware
166 and one for user space programs. The firmware task is enabled by
167 default, while the one for user space programs is not. This split is
168 implemented in the package currently in unstable.&lt;/p&gt;
169
170 &lt;p&gt;If you decide to give this a go, please let me know (via email) how
171 this recipe work for you. :)&lt;/p&gt;
172
173 &lt;p&gt;So, I bet you are wondering, how can this work. First and
174 foremost, it work because tasksel is modular, and driven by whatever
175 files it find in /usr/lib/tasksel/ and /usr/share/tasksel/. So the
176 isenkram-cli package place two files for tasksel to find. First there
177 is the task description file (/usr/share/tasksel/descs/isenkram.desc):&lt;/p&gt;
178
179 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
180 Task: isenkram-packages
181 Section: hardware
182 Description: Hardware specific packages (autodetected by isenkram)
183 Based on the detected hardware various hardware specific packages are
184 proposed.
185 Test-new-install: show show
186 Relevance: 8
187 Packages: for-current-hardware
188
189 Task: isenkram-firmware
190 Section: hardware
191 Description: Hardware specific firmware packages (autodetected by isenkram)
192 Based on the detected hardware various hardware specific firmware
193 packages are proposed.
194 Test-new-install: mark show
195 Relevance: 8
196 Packages: for-current-hardware-firmware
197 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
198
199 &lt;p&gt;The key parts are Test-new-install which indicate how the task
200 should be handled and the Packages line referencing to a script in
201 /usr/lib/tasksel/packages/. The scripts use other scripts to get a
202 list of packages to install. The for-current-hardware-firmware script
203 look like this to list relevant firmware for the machine:
204
205 &lt;p&gt;&lt;blockquote&gt;&lt;pre&gt;
206 #!/bin/sh
207 #
208 PATH=/usr/sbin:$PATH
209 export PATH
210 isenkram-autoinstall-firmware -l
211 &lt;/pre&gt;&lt;/blockquote&gt;&lt;/p&gt;
212
213 &lt;p&gt;With those two pieces in place, the firmware is installed by
214 tasksel during the normal d-i run. :)&lt;/p&gt;
215
216 &lt;p&gt;If you want to test what tasksel will install when isenkram-cli is
217 installed, run &lt;tt&gt;DEBIAN_PRIORITY=critical tasksel --test
218 --new-install&lt;/tt&gt; to get the list of packages that tasksel would
219 install.&lt;/p&gt;
220
221 &lt;p&gt;&lt;a href=&quot;https://wiki.debian.org/DebianEdu/&quot;&gt;Debian Edu&lt;/a&gt; will be
222 pilots in testing this feature, as isenkram is used there now to
223 install firmware, replacing the earlier scripts.&lt;/p&gt;
224 </description>
225 </item>
226
227 <item>
228 <title>Scripting the Cerebrum/bofhd user administration system using XML-RPC</title>
229 <link>http://people.skolelinux.org/pere/blog/Scripting_the_Cerebrum_bofhd_user_administration_system_using_XML_RPC.html</link>
230 <guid isPermaLink="true">http://people.skolelinux.org/pere/blog/Scripting_the_Cerebrum_bofhd_user_administration_system_using_XML_RPC.html</guid>
231 <pubDate>Thu, 6 Dec 2012 10:30:00 +0100</pubDate>
232 <description>&lt;p&gt;Where I work at the &lt;a href=&quot;http://www.uio.no/&quot;&gt;University of
233 Oslo&lt;/a&gt;, we use the
234 &lt;a href=&quot;http://sourceforge.net/projects/cerebrum/&quot;&gt;Cerebrum user
235 administration system&lt;/a&gt; to maintain users, groups, DNS, DHCP, etc.
236 I&#39;ve known since the system was written that the server is providing
237 an &lt;a href=&quot;http://en.wikipedia.org/wiki/XML-RPC&quot;&gt;XML-RPC&lt;/a&gt; API, but
238 I have never spent time to try to figure out how to use it, as we
239 always use the bofh command line client at work. Until today. I want
240 to script the updating of DNS and DHCP to make it easier to set up
241 virtual machines. Here are a few notes on how to use it with
242 Python.&lt;/p&gt;
243
244 &lt;p&gt;I started by looking at the source of the Java
245 &lt;a href=&quot;http://cerebrum.svn.sourceforge.net/viewvc/cerebrum/trunk/cerebrum/clients/jbofh/&quot;&gt;bofh
246 client&lt;/a&gt;, to figure out how it connected to the API server. I also
247 googled for python examples on how to use XML-RPC, and found
248 &lt;a href=&quot;http://tldp.org/HOWTO/XML-RPC-HOWTO/xmlrpc-howto-python.html&quot;&gt;a
249 simple example in&lt;/a&gt; the XML-RPC howto.&lt;/p&gt;
250
251 &lt;p&gt;This simple example code show how to connect, get the list of
252 commands (as a JSON dump), and how to get the information about the
253 user currently logged in:&lt;/p&gt;
254
255 &lt;blockquote&gt;&lt;pre&gt;
256 #!/usr/bin/env python
257 import getpass
258 import xmlrpclib
259 server_url = &#39;https://cerebrum-uio.uio.no:8000&#39;;
260 username = getpass.getuser()
261 password = getpass.getpass()
262 server = xmlrpclib.Server(server_url);
263 #print server.get_commands(sessionid)
264 sessionid = server.login(username, password)
265 print server.run_command(sessionid, &quot;user_info&quot;, username)
266 result = server.logout(sessionid)
267 print result
268 &lt;/pre&gt;&lt;/blockquote&gt;
269
270 &lt;p&gt;Armed with this knowledge I can now move forward and script the DNS
271 and DHCP updates I wanted to do.&lt;/p&gt;
272 </description>
273 </item>
274
275 </channel>
276 </rss>