- <div class="title"><a href="http://people.skolelinux.org/pere/blog/Discharge_rate_estimate_in_new_battery_statistics_collector_for_Debian.html">Discharge rate estimate in new battery statistics collector for Debian</a></div>
- <div class="date">23rd May 2016</div>
- <div class="body"><p>Yesterday I updated the
-<a href="https://tracker.debian.org/pkg/battery-stats">battery-stats
-package in Debian</a> with a few patches sent to me by skilled and
-enterprising users. There were some nice user and visible changes.
-First of all, both desktop menu entries now work. A design flaw in
-one of the script made the history graph fail to show up (its PNG was
-dumped in ~/.xsession-errors) if no controlling TTY was available.
-The script worked when called from the command line, but not when
-called from the desktop menu. I changed this to look for a DISPLAY
-variable or a TTY before deciding where to draw the graph, and now the
-graph window pop up as expected.</p>
-
-<p>The next new feature is a discharge rate estimator in one of the
-graphs (the one showing the last few hours). New is also the user of
-colours showing charging in blue and discharge in red. The percentages
-of this graph is relative to last full charge, not battery design
-capacity.</p>
-
-<p align="center"><img src="http://people.skolelinux.org/pere/blog/images/2016-05-23-battery-stats-rate.png"/></p>
-
-<p>The other graph show the entire history of the collected battery
-statistics, comparing it to the design capacity of the battery to
-visualise how the battery life time get shorter over time. The red
-line in this graph is what the previous graph considers 100 percent:
-
-<p align="center"><img src="http://people.skolelinux.org/pere/blog/images/2016-05-23-battery-stats-history.png"/></p>
-
-<p>In this graph you can see that I only charge the battery to 80
-percent of last full capacity, and how the capacity of the battery is
-shrinking. :(</p>
-
-<p>The last new feature is in the collector, which now will handle
-more hardware models. On some hardware, Linux power supply
-information is stored in /sys/class/power_supply/ACAD/, while the
-collector previously only looked in /sys/class/power_supply/AC/. Now
-both are checked to figure if there is power connected to the
-machine.</p>
-
-<p>If you are interested in how your laptop battery is doing, please
-check out the
-<a href="https://tracker.debian.org/pkg/battery-stats">battery-stats</a>
-in Debian unstable, or rebuild it on Jessie to get it working on
-Debian stable. :) The upstream source is available from <a
-href="https://github.com/petterreinholdtsen/battery-stats">github</a>.
-Patches are very welcome.</p>
-
-<p>As usual, if you use Bitcoin and want to show your support of my
-activities, please send Bitcoin donations to my address
-<b><a href="bitcoin:15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b">15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b</a></b>.</p>
+ <div class="title"><a href="http://people.skolelinux.org/pere/blog/Detecting_NFS_hangs_on_Linux_without_hanging_yourself___.html">Detecting NFS hangs on Linux without hanging yourself...</a></div>
+ <div class="date"> 9th March 2017</div>
+ <div class="body"><p>Over the years, administrating thousand of NFS mounting linux
+computers at the time, I often needed a way to detect if the machine
+was experiencing NFS hang. If you try to use <tt>df</tt> or look at a
+file or directory affected by the hang, the process (and possibly the
+shell) will hang too. So you want to be able to detect this without
+risking the detection process getting stuck too. It has not been
+obvious how to do this. When the hang has lasted a while, it is
+possible to find messages like these in dmesg:</p>
+
+<p><blockquote>
+nfs: server nfsserver not responding, still trying
+<br>nfs: server nfsserver OK
+</blockquote></p>
+
+<p>It is hard to know if the hang is still going on, and it is hard to
+be sure looking in dmesg is going to work. If there are lots of other
+messages in dmesg the lines might have rotated out of site before they
+are noticed.</p>
+
+<p>While reading through the nfs client implementation in linux kernel
+code, I came across some statistics that seem to give a way to detect
+it. The om_timeouts sunrpc value in the kernel will increase every
+time the above log entry is inserted into dmesg. And after digging a
+bit further, I discovered that this value show up in
+/proc/self/mountstats on Linux.</p>
+
+<p>The mountstats content seem to be shared between files using the
+same file system context, so it is enough to check one of the
+mountstats files to get the state of the mount point for the machine.
+I assume this will not show lazy umounted NFS points, nor NFS mount
+points in a different process context (ie with a different filesystem
+view), but that does not worry me.</p>
+
+<p>The content for a NFS mount point look similar to this:</p>
+
+<p><blockquote><pre>
+[...]
+device /dev/mapper/Debian-var mounted on /var with fstype ext3
+device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
+ opts: rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
+ age: 7863311
+ caps: caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
+ sec: flavor=1,pseudoflavor=1
+ events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0
+ bytes: 166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809
+ RPC iostats version: 1.0 p/v: 100003/3 (nfs)
+ xprt: tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
+ per-op statistics
+ NULL: 0 0 0 0 0 0 0 0
+ GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
+ SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
+ LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
+ ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
+ READLINK: 125 125 0 20472 18620 0 1112 1118
+ READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
+ WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
+ CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
+ MKDIR: 3680 3680 0 773980 993920 26 23990 24245
+ SYMLINK: 903 903 0 233428 245488 6 5865 5917
+ MKNOD: 80 80 0 20148 21760 0 299 304
+ REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
+ RMDIR: 3367 3367 0 645112 484848 22 5782 6002
+ RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
+ LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
+ READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
+ READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
+ FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
+ FSINFO: 2 2 0 232 328 0 1 1
+ PATHCONF: 1 1 0 116 140 0 0 0
+ COMMIT: 0 0 0 0 0 0 0 0
+
+device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
+[...]
+</pre></blockquote></p>
+
+<p>The key number to look at is the third number in the per-op list.
+It is the number of NFS timeouts experiences per file system
+operation. Here 22 write timeouts and 5 access timeouts. If these
+numbers are increasing, I believe the machine is experiencing NFS
+hang. Unfortunately the timeout value do not start to increase right
+away. The NFS operations need to time out first, and this can take a
+while. The exact timeout value depend on the setup. For example the
+defaults for TCP and UDP mount points are quite different, and the
+timeout value is affected by the soft, hard, timeo and retrans NFS
+mount options.</p>
+
+<p>The only way I have been able to get working on Debian and RedHat
+Enterprise Linux for getting the timeout count is to peek in /proc/.
+But according to
+<ahref="http://docs.oracle.com/cd/E19253-01/816-4555/netmonitor-12/index.html">Solaris
+10 System Administration Guide: Network Services</a>, the 'nfsstat -c'
+command can be used to get these timeout values. But this do not work
+on Linux, as far as I can tell. I
+<ahref="http://bugs.debian.org/857043">asked Debian about this</a>,
+but have not seen any replies yet.</p>
+
+<p>Is there a better way to figure out if a Linux NFS client is
+experiencing NFS hangs? Is there a way to detect which processes are
+affected? Is there a way to get the NFS mount going quickly once the
+network problem causing the NFS hang has been cleared? I would very
+much welcome some clues, as we regularly run into NFS hangs.</p>