From nobody@FreeBSD.org  Wed Sep  5 16:47:32 2001
Return-Path: <nobody@FreeBSD.org>
Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21])
	by hub.freebsd.org (Postfix) with ESMTP id 1CC7437B406
	for <freebsd-gnats-submit@FreeBSD.org>; Wed,  5 Sep 2001 16:47:32 -0700 (PDT)
Received: (from nobody@localhost)
	by freefall.freebsd.org (8.11.4/8.11.4) id f85NlWb42283;
	Wed, 5 Sep 2001 16:47:32 -0700 (PDT)
	(envelope-from nobody)
Message-Id: <200109052347.f85NlWb42283@freefall.freebsd.org>
Date: Wed, 5 Sep 2001 16:47:32 -0700 (PDT)
From: mark wolgemuth <mark@node.to>
To: freebsd-gnats-submit@FreeBSD.org
Subject: vmstat returns impossible data
X-Send-Pr-Version: www-1.0

>Number:         30360
>Category:       bin
>Synopsis:       vmstat(8) returns impossible data
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          analyzed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Sep 05 16:50:00 PDT 2001
>Closed-Date:    
>Last-Modified:  Fri Sep  9 01:40:02 UTC 2011
>Originator:     mark wolgemuth
>Release:        4.2-STABLE FreeBSD 4.2-STABLE #2: Thu Feb  1 16:01:55 GMT 2001
>Organization:
employease inc
>Environment:
FreeBSD rogueblue.eease.com 4.2-STABLE FreeBSD 4.2-STABLE #2: Thu Feb  1 16:01:55 GMT 2001 root@rogueblue.tek.eease.com:/usr/src/sys/compile/ROGUEBLUE  i386
This is a 2 CPU machine !!!
>Description:
After extended uptime:
rogueblue root ~> uptime 
 7:40PM  up 193 days, 17:02, 2 users, load averages: 0.16, 0.10, 0.08
on a reasonably busy server,
vmstat returns this:
rogueblue root ~ *> vmstat   
 procs      memory     page                    disks     faults      cpu
 r b w     avm   fre  flt  re  pi  po  fr  sr da0 md0   in   sy  cs us sy id
 0 0 0  271196 13148  154   0   0   0 204   1   0   0   26   46  75 -1750 -1843 3693

The problem here is the usr / sys / idle count:
-1750 / -1843 / 3693

>How-To-Repeat:
The box needs to have a large number of cpu ticks.
I can get this number for you if I knew how.
Run vmstat with no arguments. It returns bad data.
>Fix:
I imagine the problem is that there is an integer somewhere that cannot handle the number of clock ticks this machine has had, with 2 cpus after 193 days. The integer is then improperly taking on a sign. I have experienced the same problem on Redhat linux machines (except there the vmstat segfaults instead returning incorrect data).

>Release-Note:
>Audit-Trail:

From: Alexander Best <arundel@freebsd.org>
To: bug-followup@freebsd.org
Cc:  
Subject: Re: bin/30360: vmstat(8) returns impossible data
Date: Wed, 6 Oct 2010 02:20:27 +0000

 this problem still exists. the following lines are `vmstat` outputs from:
 
 hub.freebsd.org (7.3-STABLE):
 
  procs      memory      page                   disk   faults         cpu
  r b w     avm    fre   flt  re  pi  po    fr  sr mf0   in   sy   cs us sy id
  0 0 0   2383M  1856M   144   1   2   0   231  74   0  315   27   94 -62 -15 177
 
 kern.cp_time: 410252301 85725730 109247565 6553778 2886521518
 
 
 freefall.freebsd.org (8.1-PRERELEASE):
 
  procs      memory      page                    disks     faults         cpu
  r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy id
  0 0 2   1036M   294M   561   0   0   0    37  46   0   0  498  700  683 -11 -14 125
 
 kern.cp_time: 100842337 28852004 161611939 12315794 2770482371
 
 the problem seems to be in percent() or cpustats() in vmstat.c.
 
 cheers.
 alex
 
 -- 
 a13x

From: Sergey Kandaurov <pluknet@gmail.com>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: bin/30360: vmstat(8) returns impossible data
Date: Sat, 1 Jan 2011 21:23:24 +0300

 That's a type overflow bug which I think isn't easy to fix, b.c. it
 breaks cp_time ABI.
 cp_time is (roughly) an array[CPUSTATES] of longs.
 long type is 4-bytes on i386, and 8-bytes on amd64.
 That's why I don't see this bug on amd64 boxes.
 
 Sometimes the bug might not manifest on i386 sysctl kern.cp_time, but
 generally it does.
 That's because the exported cp_time[] fmt (used by /sbin/sysctl) is
 different ("UL"),
 and that gives extended type capacity (for a while) by casting signed
 to unsigned.
 
 In this example bug manifests for `id' as well with /sbin/sysctl on
 i386 (uptime 597 days):
 # sysctl kern.cp_time
 kern.cp_time: 4021277307 75175092 2025746497 49748493 2746074583
 # vmstat
  procs      memory      page                    disks     faults      cpu
  r b w     avm    fre  flt  re  pi  po  fr  sr da0 da1   in   sy  cs us sy id
  1 5 0   93720 458992   14   0   0   3  53   1   0   0   37    1   5
 -61 633 -472
 
 Both boxes, hub and freefall, reported by arundel@ are i386.
 
 In this example /sbin/sysctl abuses "UL" fmt, but it doesn't work for vmstat
 which uses libdevstat which in turn properly uses cp_time[] as long signed.
 
 # sysctl kern.cp_time
 kern.cp_time: 795491304 5844771 246148418 43709451 2752874123
 # ./test
 printf("%lu\n", l): 2752874123
 printf("%ld\n", l): -1542093173 [compare]
 
 # ./vmstat
  procs      memory      page                   disk   faults         cpu
  r b w     avm    fre  flt  re  pi  po  fr  sr aa0   in   sy  cs us sy id
  3 3 0   5776M   172M  173  39  22   5 617 444   0  743  193  60
 cpustats(): before 'total += cur.cp_time[state]': cp_time[]: 795758944
 cpustats(): before 'total += cur.cp_time[state]': total: 0.000000
 cpustats(): after  'total += cur.cp_time[state]': cp_time[]: 795758944
 cpustats(): after  'total += cur.cp_time[state]': total: 795758944.000000
 
 cpustats(): before 'total += cur.cp_time[state]': cp_time[]: 5844771
 cpustats(): before 'total += cur.cp_time[state]': total: 795758944.000000
 cpustats(): after  'total += cur.cp_time[state]': cp_time[]: 5844771
 cpustats(): after  'total += cur.cp_time[state]': total: 801603715.000000
 
 cpustats(): before 'total += cur.cp_time[state]': cp_time[]: 246218512
 cpustats(): before 'total += cur.cp_time[state]': total: 801603715.000000
 cpustats(): after  'total += cur.cp_time[state]': cp_time[]: 246218512
 cpustats(): after  'total += cur.cp_time[state]': total: 1047822227.000000
 
 cpustats(): before 'total += cur.cp_time[state]': cp_time[]: 43723365
 cpustats(): before 'total += cur.cp_time[state]': total: 1047822227.000000
 cpustats(): after  'total += cur.cp_time[state]': cp_time[]: 43723365
 cpustats(): after  'total += cur.cp_time[state]': total: 1091545592.000000
 
 cpustats(): before 'total += cur.cp_time[state]': cp_time[]:
 -1541158615 [compare]
 cpustats(): before 'total += cur.cp_time[state]': total: 1091545592.000000
 cpustats(): after  'total += cur.cp_time[state]': cp_time[]: -1541158615
 cpustats(): after  'total += cur.cp_time[state]': total: -449613023.000000
 
  -178 -64 343
 ^^1   ^^2    ^^3
 
 (1) and (2) is negative b.c. both multiplied by neg. total cp_time index;
 (3) is positive b.c. it's neg. cp_time[CP_IDLE] multiplied by neg.
 total cp_time index
 After summation, total has wrong sign and wrong value hence high pct. values.
 
 -- 
 wbr,
 pluknet
State-Changed-From-To: open->analyzed 
State-Changed-By: arundel 
State-Changed-When: Tue Feb 15 14:03:33 UTC 2011 
State-Changed-Why:  
Thanks Sergey for the in depth analysis. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=30360 

From: Ed Maste <emaste@freebsd.org>
To: <bug-followup@FreeBSD.org>
Cc:  
Subject: Re: bin/30360: vmstat(8) returns impossible data
Date: Thu, 8 Sep 2011 21:27:13 -0400

 See also the thread at
 http://lists.freebsd.org/pipermail/freebsd-current/2010-October/020564.html
 
 A suggestion from one of the followups:
 
 > I'd be very happy if all vmstat and iostat would get a command line
 > switch to suppress the "summary since last reboot" line.
 > This information may be useful for some cases but in other cases, like
 > creating performance data for monitoring systems like Icinga / Nagios
 > one has to remove the first line(s) manually.
 
 Adding this mode would solve the problem for non-interactive use of
 vmstat (in that overflow doesn't matter for the delta between two
 samples), but will leave the bogus data in the normal case.
 
 Maybe -q (quiet) could suppress the header and another -q could
 suppress also the first line?  So vmstat -qq -w 1 -c 2 would provide
 just the one delta summary line after one second.
 
 (For reference, spare flags on vmstat and iostat seem to be
 b e g j l q r u v y)
 
>Unformatted:
