From nobody@FreeBSD.org  Sat Jun 22 01:48:35 2013
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
	by hub.freebsd.org (Postfix) with ESMTP id 062F24BF
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Jun 2013 01:48:35 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from oldred.freebsd.org (oldred.freebsd.org [8.8.178.121])
	by mx1.freebsd.org (Postfix) with ESMTP id DD6901483
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Jun 2013 01:48:34 +0000 (UTC)
Received: from oldred.freebsd.org ([127.0.1.6])
	by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id r5M1mYAp076077
	for <freebsd-gnats-submit@FreeBSD.org>; Sat, 22 Jun 2013 01:48:34 GMT
	(envelope-from nobody@oldred.freebsd.org)
Received: (from nobody@localhost)
	by oldred.freebsd.org (8.14.5/8.14.5/Submit) id r5M1mYaJ076069;
	Sat, 22 Jun 2013 01:48:34 GMT
	(envelope-from nobody)
Message-Id: <201306220148.r5M1mYaJ076069@oldred.freebsd.org>
Date: Sat, 22 Jun 2013 01:48:34 GMT
From: adrian chadd <adrian@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org
Subject: [hwpmc] process-mode counters aren't correctly read on multi-core machines
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         179827
>Category:       kern
>Synopsis:       [hwpmc] process-mode counters aren't correctly read on multi-core machines
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 22 01:50:00 UTC 2013
>Closed-Date:    
>Last-Modified:  Sun Jun 23 01:10:00 UTC 2013
>Originator:     adrian chadd
>Release:        9-STABLE, r252049, amd64 (and i386)
>Organization:
Netflix, Inc
>Environment:
>Description:
When using a process mode counter PMC on a multi-core machine, the pmc_read() -> PMC_OP_PMCRW event is not correctly reading the PMC.

What's going on in hwpmc_mod.c is thus:

* It grabs the current thread CPU;
* It then checks if the PMC is active;
* If it's currently active, it reads the PMC from the current CPU rather than the target CPU;
* .. otherwise it returns the old counter value.

When I enabled pmc debugging: DEBUG in the kernel, then:

# sysctl kern.hwpmc.debugflags="csw=swo,swi pmc=ops"

I'd see this:

* lots of swi, swo (swapin, swapout) of the process, which updates the cached PMC value
* a PMC,OPS event every second whilst it was reading the process mode counter
* .. but the PMC,OPS value would always be the "old" one, rather than the updated one.

The only time the counter value was updated was if the PMC,OPS (ie, the PMC_OP_PMCRW event) occured _during_ the running of the process - ie, the scheduler put the process being counted _and_ the pmcstat process on the same CPU, switched out the process under profiling  (which would switch it out and update the value) and switched in pmcstat to do the read.

This reads like it was really designed for reading counters from an active process - ie, where you'd put pmc_read() at specific points in a programs execution in order to count how long certain events were taking.

It's totally bogus for the SMP case, where your current process being run is on another CPU and doesn't get switched out for long stretches of time, whilst pmcstat wakes up once a second on another CPU to do a counter read.

>How-To-Repeat:
* grab a multi-core device;
* Grab some CPU chewing process that doesn't do any kind of blocking - ie, some minute long math problem;
* run pmcstat in counting mode - pmcstat -p instructions -w 1 ./process
* watch the counters look totally bogus.
* See above for the sysctl line to enable pmc debugging and you'll see how bogus the sampling is.

>Fix:
I'm not sure. There may be multiple problems here:

* One is ensuring the PMC stats are read from the CPU the pmc counter is actively active on, as it may be a different CPU to the thread calling pmc_read() -> PMC_OP_PMCRW;
* Another is checking to see if the PMC is actually active at the present moment on _any_ CPU and if it is, fetch the statistics from said correct CPU.
* I'm not sure what the deal is with running counting mode samples on a multi-core CPU with multiple threads running in the process under count. Ie, if you have 4 threads on 4 separate CPUs running and you're trying to count '-p instructions', then I think the right behaviour is to enable the instructions counter on -all- the CPUs with those threads active. So what should pmc_read() do? Does the kernel-side correctly read all the CPUs with threads active for that particular registered PMC? Or what is actually going on?


>Release-Note:
>Audit-Trail:

From: hiren panchasara <hiren@FreeBSD.org>
To: bug-followup@FreeBSD.org, adrian@FreeBSD.org
Cc:  
Subject: Re: kern/179827: [hwpmc] process-mode counters aren&#39;t correctly
 read on multi-core machines
Date: Sat, 22 Jun 2013 02:35:55 -0700

 This is what I am seeing on head:
 
 -bash-4.2$ uname -a
 FreeBSD head.box.some.com 10.0-CURRENT FreeBSD 10.0-CURRENT #0
 r251604: Tue Jun 11 19:08:35 UTC 2013
 hirenp@head.box.some.com:/home/hirenp/head/sys/amd64/compile/GENERIC
 amd64
 -bash-4.2$ sysctl hw.ncpu
 hw.ncpu: 24
 -bash-4.2$
 
 -bash-4.2$ sysctl -a | grep hwpmc
 kern.features.hwpmc_hooks: 1
 kern.hwpmc.softevents: 16
 kern.hwpmc.callchaindepth: 8
 kern.hwpmc.debugflags: csw=swo,swi pmc=ops
 kern.hwpmc.hashsize: 16
 kern.hwpmc.nsamples: 512
 kern.hwpmc.mtxpoolsize: 32
 kern.hwpmc.logbuffersize: 4
 kern.hwpmc.nbuffers: 64
 -bash-4.2$
 
 -bash-4.2$ cat count.c
 #include <stdio.h>
 #include <stdint.h>
 #include <time.h>
 
 int main() {
 
 for (int i=0; i<60; i++) {
 sleep(1);
 }
 
 return 0;
 }
 
 
 -bash-4.2$ pmcstat -p instructions -w 1 ./count
 CSW:SWO:1: cpu=2 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 PMC:OPS:1: start pmc=0xfffffe005d81e800 mode=3 ri=21
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=2 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 ri=21 new=0
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21 tmp=6285116
 CSW:SWI:1: cpu=20 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=20 ri=21 new=6285116
 CSW:SWO:1: cpu=20 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=20 ri=21 tmp=6245302
 CSW:SWI:1: cpu=1 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=1 ri=21 new=12530418
 CSW:SWO:1: cpu=1 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=1 ri=21 tmp=11918927
 
 #  PMC:OPS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 1751141
 CSW:SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSWnstructions
             244493:S45 WO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=1 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1:
  c#  p/instructionpu=s
 1 ri=21 new=24449345
 PMC:OPS:1: rw id=-16579051 flags=0x20
 CSW:SWO:1: cpu=1 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 PMC:OPS:2: rw id=21 -> old 1751141
 C                SW:0 SWO:1: cpu=1 ri=21 tmp=97027040
 CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 
 PMC#  p/instruction:Os
 PS:1: rw id=-16579051 flags=0x20
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 PMC:OPS:2: rw id=21 -> old 73d9521
 C         9702704SW:0 SWI:1: cpu=16 ri=21 new=121476385
 CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21 tmp=121961432
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old e8290f9
         12196143CS2 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=16 ri=21 new=243437817
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21 tmp=56320362
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 11ddf263
          5632036CSW2 :SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=16 ri=21 new=299758179
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21 tmp=56408687
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 153aacd2
          5640868CSW7 :SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=16 ri=21 new=356166866
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21 tmp=56364300
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 1896b9de
          5636430CS0 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=16 ri=21 new=412531166
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21 tmp=56239777
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 1bf0e07f
          5623977CS7 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=16 ri=21 new=468770943
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=16 ri=21
 tm#  p/instructionp=5s
 6380282
 PMC:OPS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 1f4d2bf9
          5638028CSW2 :SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=16 ri=21
 ne#  p/instructionw=s
 525151225
 PMC:OPS:1: rw id=-16579051 flags=0x20
 CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 PMC:OPS:2: rw id=21 -> old 1f4d2bf9
 C                SW:0 SWO:1: cpu=16 ri=21 tmp=96934506
 CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3
 pr#  p/instructionoc=s
 0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 PMC:OPS:1: rw id=-16579051 flags=0x20
 CSW:SWI:1: cpu=3 ri=21 new=622085731
 PMC:OPS:2: rw id=21 -> old 25144663
 C         9693450SW6 :SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616,
 count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWO:1: cpu=3 ri=21 tmp=105656199
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 2b6075ea
         10565619CSW9 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=3 ri=21 new=727741930
 CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=3 ri=21 tmp=56325090
 
 PMC#  p/instruction:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 2ebbe9cc
          5632509CSW0 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=3 ri=21 new=784067020
 CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=3 ri=21 tmp=56354857
 
 PMC#  p/instruction:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 3217d1f5
          5635485CSW7 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=3 ri=21 new=840421877
 CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=3 ri=21 tmp=56379009
 
 PMC#  p/instruction:OPs
 S:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 35741876
          5637900CSW9 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=3 ri=21 new=896800886
 CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=3 ri=21 tmp=56376678
 
 PMC#  p/instruction:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 38d055dc
          5637667CSW8 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=3 ri=21 new=953177564
 CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=3 ri=21 tmp=56363010
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 3c2c5dde
          5636301CS0 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 C
 SW#  p/instruction:SWs
 I:1: cpu=3 ri=21 new=1009540574
 PMC:OPS:1: rw id=-16579051 flags=0x20
 CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWO:1: cpu=3 ri=21 tmp=96981520
 CSW:SWO:1: cpu=0 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 PMC:OPS:2: rw id=21 -> old 3c2c5dde
                 CSW0 :SWO:1: cpu=0 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 
 PM#  p/instructionC:Os
 PS:1: rw id=-16579051 flags=0x20
 CSW:SWI:1: cpu=14 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=0 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=14 ri=21 new=1106522094
 PMC:OPS:2: rw id=21 -> old 41f42fee
 C         9698152SW0 :SWO:1: cpu=14 proc=0xfffffe0047afc970 (2616,
 count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=17 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWO:1: cpu=14 ri=21 tmp=106788274
 
 P#  p/instructionMC:s
 OPS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 4851a5a0
         10678827CS4 W:SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=10 ri=21 new=1213310368
 CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=10 ri=21 tmp=57088659
 
 P#  p/instructionMC:s
 OPS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 4bb8c033
          5708865CSW9 :SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=10 ri=21 new=1270399027
 CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=10 ri=21 tmp=57123975
 
 P#  p/instructionMCs
 :OPS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 4f2064ba
          5712397CSW5 :SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=10 ri=21 new=1327523002
 CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=10 ri=21 tmp=57119065
 
 P#  p/instructionMCs
 :OPS:1: rw id=-16579051 flags=0x20
 PMC:OPS:2: rw id=21 -> old 5287f613
          5711906CSW5 :SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWI:1: cpu=10 ri=21 new=1384642067
 CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 CSW:SWO:1: cpu=10 ri=21 tmp=57119883
 
 <truncated> as it was too long.

From: Adrian Chadd <adrian@freebsd.org>
To: hiren panchasara <hiren@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: kern/179827: [hwpmc] process-mode counters aren&#39;t correctly
 read on multi-core machines
Date: Sat, 22 Jun 2013 10:03:46 -0700

 You can't do sleep() ; that yields the CPU.
 
 Replace it with a busy loop that constantly does some math.
 
 
 
 
 Adrian
 
 On 22 June 2013 02:35, hiren panchasara <hiren@freebsd.org> wrote:
 > This is what I am seeing on head:
 >
 > -bash-4.2$ uname -a
 > FreeBSD head.box.some.com 10.0-CURRENT FreeBSD 10.0-CURRENT #0
 > r251604: Tue Jun 11 19:08:35 UTC 2013
 > hirenp@head.box.some.com:/home/hirenp/head/sys/amd64/compile/GENERIC
 > amd64
 > -bash-4.2$ sysctl hw.ncpu
 > hw.ncpu: 24
 > -bash-4.2$
 >
 > -bash-4.2$ sysctl -a | grep hwpmc
 > kern.features.hwpmc_hooks: 1
 > kern.hwpmc.softevents: 16
 > kern.hwpmc.callchaindepth: 8
 > kern.hwpmc.debugflags: csw=swo,swi pmc=ops
 > kern.hwpmc.hashsize: 16
 > kern.hwpmc.nsamples: 512
 > kern.hwpmc.mtxpoolsize: 32
 > kern.hwpmc.logbuffersize: 4
 > kern.hwpmc.nbuffers: 64
 > -bash-4.2$
 >
 > -bash-4.2$ cat count.c
 > #include <stdio.h>
 > #include <stdint.h>
 > #include <time.h>
 >
 > int main() {
 >
 > for (int i=0; i<60; i++) {
 > sleep(1);
 > }
 >
 > return 0;
 > }
 >
 >
 > -bash-4.2$ pmcstat -p instructions -w 1 ./count
 > CSW:SWO:1: cpu=2 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > PMC:OPS:1: start pmc=0xfffffe005d81e800 mode=3 ri=21
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=2 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 ri=21 new=0
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21 tmp=6285116
 > CSW:SWI:1: cpu=20 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=20 ri=21 new=6285116
 > CSW:SWO:1: cpu=20 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=20 ri=21 tmp=6245302
 > CSW:SWI:1: cpu=1 proc=0xfffffe0047afc970 (2616, pmcstat) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=1 ri=21 new=12530418
 > CSW:SWO:1: cpu=1 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=1 ri=21 tmp=11918927
 >
 > #  PMC:OPS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 1751141
 > CSW:SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSWnstructions
 >             244493:S45 WO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=1 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1:
 >  c#  p/instructionpu=s
 > 1 ri=21 new=24449345
 > PMC:OPS:1: rw id=-16579051 flags=0x20
 > CSW:SWO:1: cpu=1 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > PMC:OPS:2: rw id=21 -> old 1751141
 > C                SW:0 SWO:1: cpu=1 ri=21 tmp=97027040
 > CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 >
 > PMC#  p/instruction:Os
 > PS:1: rw id=-16579051 flags=0x20
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > PMC:OPS:2: rw id=21 -> old 73d9521
 > C         9702704SW:0 SWI:1: cpu=16 ri=21 new=121476385
 > CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21 tmp=121961432
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old e8290f9
 >         12196143CS2 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=16 ri=21 new=243437817
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21 tmp=56320362
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 11ddf263
 >          5632036CSW2 :SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=16 ri=21 new=299758179
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21 tmp=56408687
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 153aacd2
 >          5640868CSW7 :SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=16 ri=21 new=356166866
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21 tmp=56364300
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 1896b9de
 >          5636430CS0 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=16 ri=21 new=412531166
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21 tmp=56239777
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 1bf0e07f
 >          5623977CS7 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=16 ri=21 new=468770943
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=16 ri=21
 > tm#  p/instructionp=5s
 > 6380282
 > PMC:OPS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 1f4d2bf9
 >          5638028CSW2 :SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=16 ri=21
 > ne#  p/instructionw=s
 > 525151225
 > PMC:OPS:1: rw id=-16579051 flags=0x20
 > CSW:SWO:1: cpu=16 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > PMC:OPS:2: rw id=21 -> old 1f4d2bf9
 > C                SW:0 SWO:1: cpu=16 ri=21 tmp=96934506
 > CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3
 > pr#  p/instructionoc=s
 > 0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > PMC:OPS:1: rw id=-16579051 flags=0x20
 > CSW:SWI:1: cpu=3 ri=21 new=622085731
 > PMC:OPS:2: rw id=21 -> old 25144663
 > C         9693450SW6 :SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616,
 > count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWO:1: cpu=3 ri=21 tmp=105656199
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 2b6075ea
 >         10565619CSW9 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=3 ri=21 new=727741930
 > CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=3 ri=21 tmp=56325090
 >
 > PMC#  p/instruction:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 2ebbe9cc
 >          5632509CSW0 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=3 ri=21 new=784067020
 > CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=3 ri=21 tmp=56354857
 >
 > PMC#  p/instruction:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 3217d1f5
 >          5635485CSW7 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=3 ri=21 new=840421877
 > CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=3 ri=21 tmp=56379009
 >
 > PMC#  p/instruction:OPs
 > S:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 35741876
 >          5637900CSW9 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=3 ri=21 new=896800886
 > CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=3 ri=21 tmp=56376678
 >
 > PMC#  p/instruction:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 38d055dc
 >          5637667CSW8 :SWO:1: cpu=12 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=3 ri=21 new=953177564
 > CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=3 ri=21 tmp=56363010
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 3c2c5dde
 >          5636301CS0 W:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > C
 > SW#  p/instruction:SWs
 > I:1: cpu=3 ri=21 new=1009540574
 > PMC:OPS:1: rw id=-16579051 flags=0x20
 > CSW:SWO:1: cpu=3 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=6 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWO:1: cpu=3 ri=21 tmp=96981520
 > CSW:SWO:1: cpu=0 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > PMC:OPS:2: rw id=21 -> old 3c2c5dde
 >                 CSW0 :SWO:1: cpu=0 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 >
 > PM#  p/instructionC:Os
 > PS:1: rw id=-16579051 flags=0x20
 > CSW:SWI:1: cpu=14 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=0 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=14 ri=21 new=1106522094
 > PMC:OPS:2: rw id=21 -> old 41f42fee
 > C         9698152SW0 :SWO:1: cpu=14 proc=0xfffffe0047afc970 (2616,
 > count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=17 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWO:1: cpu=14 ri=21 tmp=106788274
 >
 > P#  p/instructionMC:s
 > OPS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 4851a5a0
 >         10678827CS4 W:SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=10 ri=21 new=1213310368
 > CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=10 ri=21 tmp=57088659
 >
 > P#  p/instructionMC:s
 > OPS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 4bb8c033
 >          5708865CSW9 :SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=10 ri=21 new=1270399027
 > CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=10 ri=21 tmp=57123975
 >
 > P#  p/instructionMCs
 > :OPS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 4f2064ba
 >          5712397CSW5 :SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=10 ri=21 new=1327523002
 > CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=10 ri=21 tmp=57119065
 >
 > P#  p/instructionMCs
 > :OPS:1: rw id=-16579051 flags=0x20
 > PMC:OPS:2: rw id=21 -> old 5287f613
 >          5711906CSW5 :SWO:1: cpu=3 proc=0xfffffe0047ea6970 (2615, pmcstat) pp=0
 > CSW:SWI:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWI:1: cpu=10 ri=21 new=1384642067
 > CSW:SWO:1: cpu=10 proc=0xfffffe0047afc970 (2616, count) pp=0xfffffe005dbd6c00
 > CSW:SWO:1: cpu=10 ri=21 tmp=57119883
 >
 > <truncated> as it was too long.

From: Adrian Chadd <adrian@freebsd.org>
To: hiren panchasara <hiren.panchasara@gmail.com>
Cc: bug-followup@freebsd.org
Subject: Re: kern/179827: [hwpmc] process-mode counters aren&#39;t correctly
 read on multi-core machines
Date: Sat, 22 Jun 2013 13:23:24 -0700

 Right. Do that but with the test running in another window, so you
 don't get the message overlap.
 
 THen look at how the results show lots of '0' from pmcstat, and then
 correlate that with the one-second "PMC,OPS" sample lines. See how
 whenever they're run whilst the process is running (ie, the last event
 was SWI) and it hasn't yet been de-scheduled. because it's still
 running, the saved counter is never updated with the PMC counter and
 subsequent reads (until it does get de-scheduled!) return the same
 cached value. Hence, lots of '0's, followed by a big, big counter
 value.
 
 
 
 
 adrian

From: hiren panchasara <hiren.panchasara@gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
Cc: bug-followup@freebsd.org
Subject: Re: kern/179827: [hwpmc] process-mode counters aren&#39;t correctly
 read on multi-core machines
Date: Sat, 22 Jun 2013 18:05:13 -0700

 On Sat, Jun 22, 2013 at 1:23 PM, Adrian Chadd <adrian@freebsd.org> wrote:
 > Right. Do that but with the test running in another window, so you
 > don't get the message overlap.
 
 Here is how it looks:
 
 #  p/instructions
         293898819
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
                 0
 #  p/instructions
      201677536862
 
 >
 > THen look at how the results show lots of '0' from pmcstat, and then
 > correlate that with the one-second "PMC,OPS" sample lines. See how
 > whenever they're run whilst the process is running (ie, the last event
 > was SWI) and it hasn't yet been de-scheduled. because it's still
 > running, the saved counter is never updated with the PMC counter and
 > subsequent reads (until it does get de-scheduled!) return the same
 > cached value. Hence, lots of '0's, followed by a big, big counter
 > value.
 
 Precisely how it looks.
 
 Thanks for the explanation.
 
 cheers,
 Hiren
>Unformatted:
