From nobody@FreeBSD.org  Tue Jan 17 18:13:34 2012
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 55BBE1065678
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 17 Jan 2012 18:13:34 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22])
	by mx1.freebsd.org (Postfix) with ESMTP id 452D28FC17
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 17 Jan 2012 18:13:34 +0000 (UTC)
Received: from red.freebsd.org (localhost [127.0.0.1])
	by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q0HIDXeZ059039
	for <freebsd-gnats-submit@FreeBSD.org>; Tue, 17 Jan 2012 18:13:34 GMT
	(envelope-from nobody@red.freebsd.org)
Received: (from nobody@localhost)
	by red.freebsd.org (8.14.4/8.14.4/Submit) id q0HIDXnd059034;
	Tue, 17 Jan 2012 18:13:33 GMT
	(envelope-from nobody)
Message-Id: <201201171813.q0HIDXnd059034@red.freebsd.org>
Date: Tue, 17 Jan 2012 18:13:33 GMT
From: Oliver <grimeton@gmx.net>
To: freebsd-gnats-submit@FreeBSD.org
Subject: Journal Overflow
X-Send-Pr-Version: www-3.1
X-GNATS-Notify:

>Number:         164252
>Category:       kern
>Synopsis:       [geom] gjournal overflow
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-geom
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jan 17 18:20:07 UTC 2012
>Closed-Date:    
>Last-Modified:  Wed May  9 16:20:08 UTC 2012
>Originator:     Oliver
>Release:        9.0-RELEASE
>Organization:
n/a
>Environment:
FreeBSD test 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012     root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
>Description:
I'm running some performance tests with iozone to see which filesystem/
volume manager/raid version/... fits my needs best.

After running the test on top of ZFS I destroyed the pool and created a
freebsd-ufs partition on the "empty" drive. Here the output of "gpart show"

=>       34  976773101  ada0  GPT  (465G)
         34  976773101     1  freebsd-ufs  (465G)

I then labeled the partition ada0p1 with gjournal and created a new UFS2
on top of ada0p1.journal with journaling enabled (-J).

Then the partiton was mounted and the iozone test was started inside.
The command was "iozone -RA -g 16G". When starting to write 1GB of data
in 64k chunks, the system crashed (check the - literally - screenshot
attached).

The system wasn't even able to write more than 2GB of the 8GB memory
back to disk before it completly froze.

I did the exact same test with this system and ZFS on the empty disk
and everything worked as expected.

I wasn't able to attach the screenshot, so I uploaded it here:
http://img515.imageshack.us/img515/9622/gjournal.jpg

If you have any questions just contact me.

KR, Grimeton

>How-To-Repeat:
- Setup 9.0-RELEASE on one harddrive
- Fetch ports, compile benchmarks/iozone
- create a GPT scheme on another, empty, disk
- setup a freebsd-ufs partition
- label the partition with gjournal
- create UFS2 on top of the journaled partiton with journaling
  enabled (-J)
- mount the new fs to /test
- cd to /test
- start iozone like "iozone -Ra -g 16G"
- wait and watch it crash
>Fix:
No clue.

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->freebsd-geom 
Responsible-Changed-By: linimon 
Responsible-Changed-When: Wed Jan 18 01:02:55 UTC 2012 
Responsible-Changed-Why:  
Over to maintainer(s). 

http://www.freebsd.org/cgi/query-pr.cgi?pr=164252 

From: Andreas Longwitz <longwitz@incore.de>
To: bug-followup@freebsd.org
Cc: grimeton@gmx.net
Subject: Re: kern/164252: [geom] gjournal overflow
Date: Thu, 26 Apr 2012 17:17:01 +0200

 On one of my machines I can reproduce this panic and give some debug
 informations. if I ran
    mount -u -o noatime -o async -o snapshot /prod/.snap/fscktest /prod
 on one gjournaled disk and some seconds later
    cd /backup; iozone -Ra -g 2G
 on a second gjournaled disk, the panic is triggered immediately.
 
 My disk configuration is:
 Filesystem                    Size    Used   Avail Capacity  Mounted on
 /dev/mirror/gm0p7.journal     989M    207M    702M    23%    /
 devfs                         1.0k    1.0k      0B   100%    /dev
 /dev/mirror/gm0p8.journal     3.9G    477M    3.1G    13%    /var
 /dev/mirror/gm0p9.journal      15G    3.1G     11G    21%    /usr
 /dev/mirror/gm0p10.journal    101G    120M     93G     0%    /home
 /dev/mirror/gm1p2.journal     1.8T     94G    1.5T     6%    /prod
 /dev/mirror/gm2p2.journal     1.8T     94G    1.5T     6%    /backup
 
 From /var/log/messages:
 FreeBSD 8.2-STABLE #0: Mon Apr 23 00:20:44 CEST 2012
     root@dsssrvt2.incore:/usr/obj/usr/src/sys/SERVER64 amd64
 ...
 GEOM: da0: the secondary GPT header is not in the last LBA.
 GEOM: da1: the secondary GPT header is not in the last LBA.
 GEOM: da2: the secondary GPT header is not in the last LBA.
 GEOM: da3: the secondary GPT header is not in the last LBA.
 GEOM_MIRROR: Device mirror/gm0 launched (2/2).
 GEOM: da4: the secondary GPT header is not in the last LBA.
 GEOM_MIRROR: Device mirror/gm1 launched (2/2).
 GEOM: da5: the secondary GPT header is not in the last LBA.
 GEOM_MIRROR: Device mirror/gm2 launched (2/2).
 GEOM_JOURNAL: Journal 2877441109: mirror/gm0p3 contains journal.
 GEOM_JOURNAL: Journal 4086979879: mirror/gm0p4 contains journal.
 GEOM_JOURNAL: Journal 1430126911: mirror/gm0p5 contains journal.
 GEOM_JOURNAL: Journal 2027035949: mirror/gm0p6 contains journal.
 GEOM_JOURNAL: Journal 2877441109: mirror/gm0p7 contains data.
 GEOM_JOURNAL: Journal mirror/gm0p7 clean.
 GEOM_JOURNAL: Journal 4086979879: mirror/gm0p8 contains data.
 GEOM_JOURNAL: Journal mirror/gm0p8 clean.
 GEOM_JOURNAL: Journal 1430126911: mirror/gm0p9 contains data.
 GEOM_JOURNAL: Journal mirror/gm0p9 clean.
 GEOM_JOURNAL: Journal 2027035949: mirror/gm0p10 contains data.
 GEOM_JOURNAL: Journal mirror/gm0p10 clean.
 GEOM_JOURNAL: Journal 3941023214: mirror/gm1p1 contains journal.
 GEOM_JOURNAL: Journal 3941023214: mirror/gm1p2 contains data.
 GEOM_JOURNAL: Journal mirror/gm1p2 clean.
 GEOM_JOURNAL: Journal 817888587: mirror/gm2p1 contains journal.
 GEOM_JOURNAL: Journal 817888587: mirror/gm2p2 contains data.
 GEOM_JOURNAL: Journal mirror/gm2p2 clean.
 Trying to mount root from ufs:/dev/mirror/gm0p7.journal
 em0: link state changed to UP
 GEOM_MIRROR[1]: Kernel dump will go to da0.
 
 Sometimes I see messages like
 fsync: giving up on dirty
 0xffffff0002ef6000: tag devfs, type VCHR
     usecount 1, writecount 0, refcount 18 mountedhere 0xffffff0002d8ec00
     flags (VV_COPYONWRITE)
     v_object 0xffffff0002eda360 ref 0 pages 79834
     lock type devfs: EXCL by thread 0xffffff0002a00460 (pid 18)
         dev mirror/gm2p2.journal
 GEOM_JOURNAL: Cannot suspend file system /backup (error=35).
 This should be harmless according to
 http://freebsd.monkey.org/freebsd-stable/200609/msg00020.html.
 
 My kernel has a patch for geom/journal/g_journal.c, because I want to
 know which of the gjournals triggers the panic:
 
 <   panic("Journal overflow (joffset=%jd active=%jd inactive=%jd)",
 <       (intmax_t)sc->sc_journal_offset,
 ---
 >   panic("Journal overflow (id=%d joffset=%jd active=%jd inactive=%jd)",
 >       sc->sc_id, (intmax_t)sc->sc_journal_offset,
 
 Further I have (same 'id' as in panic string):
 
 gjournal dump mirror/gm2p1:
 Metadata on /dev/mirror/gm2p1:
      magic: GEOM::JOURNAL
    version: 0
         id: 817888587
       type: 2
      start: 0
        end: 2147483136
    joffset: 0
        jid: 0
      flags: 1
 hcprovider:
   provsize: 2147483648
   MD5 hash: 38681a7ffb1c2e570a0ce09d6da60a5f
 
 gjournal dump mirror/gm2p2:
 Metadata on mirror/gm2p2:
      magic: GEOM::JOURNAL
    version: 0
         id: 817888587
       type: 1
      start: 0
        end: 2147483136
    joffset: 182301184
        jid: 2952916330
      flags: 0
 hcprovider:
   provsize: 1998251415552
   MD5 hash: 3a3039ec8bf7be3ed1293fc82b52691a
 
 After the panic is triggered I give some ddb-output:
 
 panic: Journal overflow (id=817888587 joffset=1027771904
 active=182301184 inactive=1026233856)
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 kdb_backtrace() at kdb_backtrace+0x37
 panic() at panic+0x187
 g_journal_flush() at g_journal_flush+0x8c8
 g_journal_worker() at g_journal_worker+0x14bb
 fork_exit() at fork_exit+0x11f
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff82452d0d00, rbp = 0 ---
 KDB: enter: panic
 [thread pid 36 tid 100089 ]
 Stopped at      kdb_enter+0x3b: movq    $0,0x46ffb2(%rip)
 db:0:kdb.enter.panic> watchdog
 No argument provided, disabling watchdog
 db:0:kdb.enter.panic>  run ddbinfo
 db:1:ddbinfo> capture on
 db:1:on>  run lockinfo
 db:2:lockinfo> show lock Giant
  class: sleep mutex
  name: Giant
  flags: {DEF, RECURSE}
  state: {UNOWNED}
 db:2:Giant>  show lockedvnods
 Locked vnodes
 
 0xffffff01b40f0b10: tag ufs, type VDIR
     usecount 4, writecount 0, refcount 5 mountedhere 0
     flags ()
     v_object 0xffffff01c8b75ca8 ref 0 pages 1
     lock type ufs: SHARED (count 1)
         ino 3, on dev mirror/gm1p2.journal
 
 0xffffff01b43f1588: tag ufs, type VREG
     usecount 1, writecount 0, refcount 45167 mountedhere 0
     flags (VV_SYSTEM)
     lock type ufs: EXCL by thread 0xffffff00348aa8c0 (pid 80175)
  with shared waiters pending
         ino 4, on dev mirror/gm1p2.journal
 
 0xffffff01e6844ce8: tag ufs, type VREG
     usecount 1, writecount 1, refcount 1027 mountedhere 0
     flags ()
     v_object 0xffffff01cee58510 ref 0 pages 4100
     lock type ufs: EXCL by thread 0xffffff0042fe1460 (pid 80395)
         ino 4, on dev mirror/gm2p2.journal
 db:2:lockedvnods>  show lockchain
 thread 100089 (pid 36, g_journal mirror/gm) running on CPU 0
 db:2:lockchain>  show sleepchain
 thread 100089 (pid 36, g_journal mirror/gm) running on CPU 0
 db:1:sleepchain>  show allpcpu
 Current CPU: 0
 
 cpuid        = 0
 dynamic pcpu = 0x236980
 curthread    = 0xffffff0002ded000: pid 36 "g_journal mirror/gm"
 curpcb       = 0xffffff82452d0d10
 fpcurthread  = none
 idlethread   = 0xffffff0002359460: tid 100006 "idle: cpu0"
 curpmap      = 0xffffffff808557d0
 tssp         = 0xffffffff808c3100
 commontssp   = 0xffffffff808c3100
 rsp0         = 0xffffff82452d0d10
 gs32p        = 0xffffffff808c1f38
 ldt          = 0xffffffff808c1f78
 tss          = 0xffffffff808c1f68
 
 cpuid        = 1
 dynamic pcpu = 0xffffff807f7fc980
 curthread    = 0xffffff0002d35460: pid 25 "syncer"
 curpcb       = 0xffffff8242f96d10
 fpcurthread  = none
 idlethread   = 0xffffff00023598c0: tid 100005 "idle: cpu1"
 curpmap      = 0xffffffff808557d0
 tssp         = 0xffffffff808c3168
 commontssp   = 0xffffffff808c3168
 rsp0         = 0xffffff8242f96d10
 gs32p        = 0xffffffff808c1fa0
 ldt          = 0xffffffff808c1fe0
 tss          = 0xffffffff808c1fd0
 
 cpuid        = 2
 dynamic pcpu = 0xffffff807f803980
 curthread    = 0xffffff000234d000: pid 11 "idle: cpu2"
 curpcb       = 0xffffff8000051d10
 fpcurthread  = none
 idlethread   = 0xffffff000234d000: tid 100004 "idle: cpu2"
 curpmap      = 0xffffffff808557d0
 tssp         = 0xffffffff808c31d0
 commontssp   = 0xffffffff808c31d0
 rsp0         = 0xffffff8000051d10
 gs32p        = 0xffffffff808c2008
 ldt          = 0xffffffff808c2048
 tss          = 0xffffffff808c2038
 
 cpuid        = 3
 dynamic pcpu = 0xffffff807f80a980
 curthread    = 0xffffff0002359000: pid 12 "swi1: netisr 0"
 curpcb       = 0xffffff8000060d10
 fpcurthread  = none
 idlethread   = 0xffffff000234d460: tid 100003 "idle: cpu3"
 curpmap      = 0xffffffff808557d0
 tssp         = 0xffffffff808c3238
 commontssp   = 0xffffffff808c3238
 rsp0         = 0xffffff8000060d10
 gs32p        = 0xffffffff808c2070
 ldt          = 0xffffffff808c20b0
 tss          = 0xffffffff808c20a0
 
 db:1:allpcpu>  bt
 Tracing pid 36 tid 100089 td 0xffffff0002ded000
 kdb_enter() at kdb_enter+0x3b
 panic() at panic+0x180
 g_journal_flush() at g_journal_flush+0x8c8
 g_journal_worker() at g_journal_worker+0x14bb
 fork_exit() at fork_exit+0x11f
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff82452d0d00, rbp = 0 ---
 db:1:bt>  ps
   pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
 80395 80387 80395     0  S+      wdrain   0xffffffff8087cc80 initial thread
 80387 80386 80387     0  S+      ttyin    0xffffff0002cdf4a8 sh
 80386 79541 80386  1003  S+      wait     0xffffff01bc6f9470 su
 80199  5911    37     0  S       nanslp   0xffffffff80859308 sleep
 80192 79516 80192  1003  S+      ufs      0xffffff01b43f1620 ls
 80175 80155 80144     0  R+                                  mount
 80155 80148 80144     0  S+      wait     0xffffff01bc6f98e0 sh
 80149 80145 80144     0  S+      piperd   0xffffff00345e4888 sed
 80148 80145 80144     0  S+      wait     0xffffff01d3c97000 sh
 80145 80144 80144     0  S+      wait     0xffffff007a9c1470 sh
 80144 80123 80144     0  S+      wait     0xffffff0002e188e0 sh
 80123 80122 80123     0  S+      ttyin    0xffffff0002a600a8 sh
 80122 78340 80122  1003  S+      wait     0xffffff01b531d000 su
 79541 79540 79541  1003  Ss+     wait     0xffffff01b9bd2000 bash
 79540 79538 79538  1003  S       select   0xffffff0002c9c540 sshd
 79538  5898 79538     0  Ss      sbwait   0xffffff00345f5694 sshd
 79516 79515 79516  1003  Ss+     wait     0xffffff01bc6f9000 bash
 79515 79513 79513  1003  S       select   0xffffff01f7afe740 sshd
 79513  5898 79513     0  Ss      sbwait   0xffffff0002f613ec sshd
 78340 78339 78340  1003  Ss+     wait     0xffffff01c97d1000 bash
 78339 78337 78337  1003  S       select   0xffffff0002daf840 sshd
 78337  5898 78337     0  Ss      sbwait   0xffffff00344703ec sshd
 70905     1 69750     0  S       piperd   0xffffff0002f602d8 sh
 70904     1 69750     0  S       pause    0xffffff00345c7980 top
 49914     1 49914     0  Ss      nanslp   0xffffffff80859308 watchdogd
 49465     0     0     0  SL     ipmireq  0xffffff00346693b8 [ipmi0: kcs]
  6330     1  6330     0  Ss+     ttyin    0xffffff0002a6bca8 getty
  6329     1  6329     0  Ss+     ttyin    0xffffff0002a7c0a8 getty
  6328     1  6328     0  Ss+     ttyin    0xffffff0002a7bca8 getty
  6327     1  6327     0  Ss+     ttyin    0xffffff0002a748a8 getty
  6326     1  6326     0  Ss+     ttyin    0xffffff0002a74ca8 getty
  6325     1  6325     0  Ss+     ttyin    0xffffff0002a7a0a8 getty
  6324     1  6324     0  Ss+     ttyin    0xffffff0002a7a4a8 getty
  6323     1  6323     0  Ss+     ttyin    0xffffff0002a7a8a8 getty
  6287     1  6287     0  Ss      select   0xffffff003443f5c0 inetd
  6267     1  6267     0  Ss      select   0xffffff0002adac40 moused
  6108     1  6108     0  Ss      nanslp   0xffffffff80859308 cron
  6102     1  6102    25  Ss      pause    0xffffff0034329980 sendmail
  6098     1  6098     0  Ss      select   0xffffff0002daf340 sendmail
  5911     1    37     0  S+      wait     0xffffff0034026470 sh
  5903     1  5903     0  Ss      select   0xffffff0002c9ca40 sshd
  5898     1  5898     0  Ss      select   0xffffff0002c9cac0 sshd
  5882  5833  5833    88  S       (threaded)                  mysqld
 100191                   S       sigwait  0xffffff82454cea38 mysqld
 100190                   S       ucond    0xffffff0002dad200 mysqld
 100169                   S   select   0xffffff0002b39240 initial thread
  5833     1  5833    88  Ss      wait     0xffffff003432b8e0 sh
  5693     1  5693   181  Ss      select   0xffffff00343032c0 nrpe2
  5641     1  5641     0  Ss      select   0xffffff00343035c0 ntpd
  5480     0     0     0  SL      mdwait   0xffffff0002b63000 [md0]
  5435     1  5435    53  Ss      (threaded)                  named
 100158                   S       kqread   0xffffff0002c7f700 named
 100157                   S       ucond    0xffffff0002c28080 named
 100156                   S       ucond    0xffffff0002b3a480 named
 100155                   S       ucond    0xffffff0002b3a380 named
 100154                   S       ucond    0xffffff0002b3a400 named
 100153                   S       ucond    0xffffff0002c28000 named
 100145                   S       sigwait  0xffffff82453e8a38 named
  5344     1  5344     0  Ss      select   0xffffff0002de96c0 syslogd
  5308     0     0     0  SL      -       0xffffffff80855960 [accounting]
  5103     1  5103     0  Ss      select   0xffffff0002bb2e40 devd
  1522  1514  1514    64  R                                   pflogd
  1514     1  1514     0  Ss      sbwait   0xffffff003403b144 pflogd
   264     1   264     0  Ss      pause    0xffffff0002e17510 adjkerntz
    36     0     0     0  RL      CPU 0            [g_journal mirror/gm]
    35     0     0     0  SL      gj:work  0xffffff0002dde800 [g_journal
 mirror/gm]
    34     0     0     0  SL      gj:work  0xffffff0002d8dc00 [g_journal
 mirror/gm]
    33     0     0     0  SL      gj:work  0xffffff0002d8d600 [g_journal
 mirror/gm]
    32     0     0     0  SL      gj:work  0xffffff0002d8c600 [g_journal
 mirror/gm]
    31     0     0     0  SL      gj:work  0xffffff0002d8c800 [g_journal
 mirror/gm]
    30     0     0     0  SL      m:w1  0xffffff0002d6f200 [g_mirror gm2]
    29     0     0     0  SL      m:w1  0xffffff0002d6c000 [g_mirror gm1]
    28     0     0     0  SL      m:w1  0xffffff0002d6e200 [g_mirror gm0]
    27     0     0     0  RL                               [softdepflush]
    26     0     0     0  SL      vlruwt   0xffffff0002d388e0 [vnlru]
    25     0     0     0  RL      CPU 1                       [syncer]
    24     0     0     0  SL      psleep   0xffffffff8087cc48 [bufdaemon]
    23     0     0     0  SL      pgzero   0xffffffff8088af4c [pagezero]
    22     0     0     0  SL      pollid   0xffffffff80858108 [idlepoll]
    21     0     0     0  SL      psleep   0xffffffff8088a2e8 [vmdaemon]
    20     0     0     0  SL      psleep  0xffffffff8088a2ac [pagedaemon]
    19     0     0     0  SL      ccb_scan 0xffffffff808244e0 [xpt_thrd]
    18     0     0     0  SL      suspwt   0xffffff0002df5c34 [g_journal
 switcher]
    17     0     0     0  SL      pftm     0xffffffff801cf420 [pfpurge]
    16     0     0     0  SL      (threaded)                  usb
 100063                   D       -        0xffffff8000429e18 [usbus3]
 100062                   D       -        0xffffff8000429dc0 [usbus3]
 100061                   D       -        0xffffff8000429d68 [usbus3]
 100060                   D       -        0xffffff8000429d10 [usbus3]
 100058                   D       -        0xffffff8000420ef0 [usbus2]
 100057                   D       -        0xffffff8000420e98 [usbus2]
 100056                   D       -        0xffffff8000420e40 [usbus2]
 100055                   D       -        0xffffff8000420de8 [usbus2]
 100053                   D       -        0xffffff8000417ef0 [usbus1]
 100052                   D       -        0xffffff8000417e98 [usbus1]
 100051                   D       -        0xffffff8000417e40 [usbus1]
 100050                   D       -        0xffffff8000417de8 [usbus1]
 100048                   D       -        0xffffff800040eef0 [usbus0]
 100047                   D       -        0xffffff800040ee98 [usbus0]
 100046                   D       -        0xffffff800040ee40 [usbus0]
 100045                   D       -        0xffffff800040ede8 [usbus0]
    15     0     0     0  SL      idle 0xffffff80003f3000 [mpt_recovery3]
     9     0     0     0  SL      idle 0xffffff80003de000 [mpt_recovery2]
     8     0     0     0  SL      idle 0xffffff80003b7300 [mpt_raid1]
     7     0     0     0  SL      idle 0xffffff80003b7000 [mpt_recovery1]
     6     0     0     0  SL      idle 0xffffff80003a2300 [mpt_raid0]
     5     0     0     0  SL      idle 0xffffff80003a2000 [mpt_recovery0]
    14     0     0     0  SL      -        0xffffffff80859004 [yarrow]
     4     0     0     0  RL                                  [g_down]
     3     0     0     0  SL      -        0xffffffff80854cc0 [g_up]
     2     0     0     0  SL      -        0xffffffff80854cb0 [g_event]
    13     0     0     0  SL      (threaded)                  ng_queue
 100016                   D       sleep    0xffffffff807ecdf0 [ng_queue3]
 100015                   D       sleep    0xffffffff807ecdf0 [ng_queue2]
 100014                   D       sleep    0xffffffff807ecdf0 [ng_queue1]
 100013                   D       sleep    0xffffffff807ecdf0 [ng_queue0]
    12     0     0     0  RL      (threaded)                  intr
 100069                   I                            [swi0: uart uart]
 100068                   I                            [irq12: psm0]
 100067                   I                            [irq1: atkbd0]
 100066                   I                            [irq17: ichsmb0]
 100065                   I                            [irq15: ata1]
 100064                   I                            [irq14: ata0]
 100059                   I                            [irq23: ehci0]
 100054                   I                            [irq18: uhci2]
 100049                   I                            [irq19: uhci1]
 100044                   I                            [irq16: uhci0]
 100042                   I                            [irq257: mpt3]
 100040                   I                            [irq256: mpt2]
 100035                   I                            [irq25: mpt1]
 100032                   I                            [irq26: mpt0]
 100031                   I                            [irq9: acpi0]
 100029                   I                            [swi5: +]
 100028                   I                            [swi2: cambio]
 100022                   I                            [swi6: task queue]
 100021                   I                           [swi6: Giant taskq]
 100012                   I                            [swi3: vm]
 100011                   I                            [swi4: clock]
 100010                   I                            [swi4: clock]
 100009                   I                            [swi4: clock]
 100008                   I                            [swi4: clock]
 100007                   Run     CPU 3                [swi1: netisr 0]
    11     0     0     0  RL      (threaded)                idle
 100006                   CanRun                            [idle: cpu0]
 100005                   CanRun                            [idle: cpu1]
 100004                   Run     CPU 2                     [idle: cpu2]
 100003                   CanRun                            [idle: cpu3]
     1     0     1     0  SLs     wait     0xffffff000234a8e0 [init]
    10     0     0     0  SL      audit_wo 0xffffffff808888d0 [audit]
     0     0     0     0  SLs     (threaded)                  kernel
 100039                   D       -        0xffffff000250e480 [em1 taskq]
 100038                   D       -        0xffffff000250c700 [em0 taskq]
 100030                   D       -   0xffffff000242c500 [thread taskq]
 100027                   D       -   0xffffff00023f7c00 [acpi_task_2]
 100026                   D       -   0xffffff00023f7c00 [acpi_task_1]
 100025                   D       -   0xffffff00023f7c00 [acpi_task_0]
 100024                   D       -   0xffffff00023f7c80 [ffs_trim taskq]
 100023                   D       -   0xffffff000242c180 [kqueue taskq]
 100000                   D       sched    0xffffffff80854e00 [swapper]
 
 db:1:ps>  show thread
 Thread 100089 at 0xffffff0002ded000:
  proc (pid 36): 0xffffff0002ddb470
  name: g_journal mirror/gm
  stack: 0xffffff82452cd000-0xffffff82452d0fff
  flags: 0x10004  pflags: 0x210000
  state: RUNNING (CPU 0)
  priority: 76
  container lock: sched lock 0 (0xffffffff8085fcc0)
 db:1:thread>  alltrace (!! NOT ALL GIVEN)
 Tracing command iozone pid 80395 tid 100387 td 0xffffff0042fe1460
 sched_switch() at sched_switch+0xde
 mi_switch() at mi_switch+0x176
 sleepq_wait() at sleepq_wait+0x42
 _sleep() at _sleep+0x317
 waitrunningbufspace() at waitrunningbufspace+0x4d
 bufwrite() at bufwrite+0x15b
 ffs_syncvnode() at ffs_syncvnode+0x201
 ffs_fsync() at ffs_fsync+0x3f
 fsync() at fsync+0x155
 amd64_syscall() at amd64_syscall+0x1f4
 Xfast_syscall() at Xfast_syscall+0xfc
 --- syscall (95, FreeBSD ELF64, fsync), rip = 0x180885f9c, rsp =
 0x7fffffffe7e8, rbp = 0x180a20000 ---
 
 Tracing command ls pid 80192 tid 100390 td 0xffffff01b00e6460
 sched_switch() at sched_switch+0xde
 mi_switch() at mi_switch+0x176
 sleepq_wait() at sleepq_wait+0x42
 __lockmgr_args() at __lockmgr_args+0x83c
 ffs_lock() at ffs_lock+0x8c
 VOP_LOCK1_APV() at VOP_LOCK1_APV+0x46
 _vn_lock() at _vn_lock+0x47
 vget() at vget+0x70
 vfs_hash_get() at vfs_hash_get+0xea
 ffs_vgetf() at ffs_vgetf+0x48
 ufs_lookup_ino() at ufs_lookup_ino+0xccb
 vfs_cache_lookup() at vfs_cache_lookup+0xf0
 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40
 lookup() at lookup+0x454
 namei() at namei+0x53a
 kern_statat_vnhook() at kern_statat_vnhook+0x8f
 kern_statat() at kern_statat+0x15
 lstat() at lstat+0x2a
 amd64_syscall() at amd64_syscall+0x1f4
 Xfast_syscall() at Xfast_syscall+0xfc
 --- syscall (190, FreeBSD ELF64, lstat), rip = 0x18098ff1c, rsp =
 0x7fffffffe368, rbp = 0x180c276d0 ---
 
 Tracing command mount pid 80175 tid 100254 td 0xffffff00348aa8c0
 sched_switch() at sched_switch+0xde
 mi_switch() at mi_switch+0x176
 sleepq_wait() at sleepq_wait+0x42
 _sleep() at _sleep+0x317
 bwait() at bwait+0x64
 bufwait() at bufwait+0x56
 bread() at bread+0x1e
 ffs_balloc_ufs2() at ffs_balloc_ufs2+0xbd2
 cgaccount() at cgaccount+0x357
 ffs_snapshot() at ffs_snapshot+0xb33
 ffs_mount() at ffs_mount+0x652
 vfs_donmount() at vfs_donmount+0xdc5
 nmount() at nmount+0x63
 amd64_syscall() at amd64_syscall+0x1f4
 Xfast_syscall() at Xfast_syscall+0xfc
 --- syscall (378, FreeBSD ELF64, nmount), rip = 0x1807b065c, rsp =
 0x7fffffffdaa8, rbp = 0x180a080d8 ---
 
 Tracing command g_journal mirror/gm pid 36 tid 100089 td 0xffffff0002ded000
 kdb_enter() at kdb_enter+0x3b
 panic() at panic+0x180
 g_journal_flush() at g_journal_flush+0x8c8
 g_journal_worker() at g_journal_worker+0x14bb
 fork_exit() at fork_exit+0x11f
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff82452d0d00, rbp = 0 ---
 
 Tracing command syncer pid 25 tid 100078 td 0xffffff0002d35460
 cpustop_handler() at cpustop_handler+0x3a
 ipi_nmi_handler() at ipi_nmi_handler+0x30
 trap() at trap+0x155
 nmi_calltrap() at nmi_calltrap+0x8
 --- trap 0x13, rip = 0xffffffff803d6736, rsp = 0xffffff800001efe0, rbp =
 0xffffff8242f96930 ---
 panic() at panic+0x96
 ffs_bufwrite() at ffs_bufwrite+0x2bc
 ffs_sbupdate() at ffs_sbupdate+0x98
 ffs_sync() at ffs_sync+0x623
 sync_fsync() at sync_fsync+0x16a
 sync_vnode() at sync_vnode+0x15e
 sched_sync() at sched_sync+0x1c9
 fork_exit() at fork_exit+0x11f
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff8242f96d00, rbp = 0 ---
 
 Tracing command g_journal switcher pid 18 tid 100071 td 0xffffff0002a00460
 sched_switch() at sched_switch+0xde
 mi_switch() at mi_switch+0x176
 sleepq_wait() at sleepq_wait+0x42
 _sleep() at _sleep+0x317
 vfs_write_suspend() at vfs_write_suspend+0x100
 g_journal_switcher() at g_journal_switcher+0xd2b
 fork_exit() at fork_exit+0x11f
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff8242f73d00, rbp = 0 ---
 
 Tracing command intr pid 12 tid 100007 td 0xffffff0002359000
 cpustop_handler() at cpustop_handler+0x3a
 ipi_nmi_handler() at ipi_nmi_handler+0x30
 trap() at trap+0x155
 nmi_calltrap() at nmi_calltrap+0x8
 --- trap 0x13, rip = 0xffffffff803af7d5, rsp = 0xffffff800002cfe0, rbp =
 0xffffff8000060bf0 ---
 ithread_loop() at ithread_loop+0x65
 fork_exit() at fork_exit+0x11f
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xffffff8000060d00, rbp = 0 ---
 
 db:1:alltrace>  capture off
 db:0:kdb.enter.panic>  call doadump
 Dumping 2691 out of 8181 MB:..1%..11%..21%..31%
 ..41%..51%..61%..71%..81%..91%
 Dump complete
 = 0
 db:0:kdb.enter.panic>  reset
 
 The snapshot-mount sits on bwait(), iozone sits on waitrunningbufspace,
 maybe we must adapt some sysctl's ? From the dump I have
 
 (kgdb) p hirunningspace
 $9 = 1048576
 (kgdb) p lorunningspace
 $10 = 524288
 (kgdb) p runningbufspace
 $11 = 5687296
 
 I need help to trigger this problem further down and can give any output
 from the written vmcore.
 
 -- 
 Dr. Andreas Longwitz
 
 Data Service GmbH
 Beethovenstr. 2A
 23617 Stockelsdorf
 Amtsgericht Lbeck, HRB 318 BS
 Geschftsfhrer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau
 

From: Andreas Longwitz <longwitz@incore.de>
To: bug-followup@freebsd.org
Cc:  
Subject: Re: kern/164252: [geom] gjournal overflow
Date: Wed, 09 May 2012 18:12:51 +0200

 The panic "gjournal overflow" is caused by design problems of snapshot
 and/or gjournal on big disks (1 TB or more). Each journaled partition is
 served by a kernel thread g_journal and there is one kernel thread
 g_journal switcher responsible for all journaled partitions. Aside from
 mount/umount the g_journal switcher and snapshot for ffs are the only
 kernel threads using vfs_write_suspend() holding the lock "suspwt".
 After starting a snapshot on a big disk with
        mksnap_ffs /backup/.snap/snapshot
 the lock "suspwt" will be catched and hold for a long time. Some seconds
 later the g_journal switcher tries to switch the journal of the backup
 partition and blocks on the "suspwt" lock. Therefore he can not handle
 the other journaled partition anymore, he must wait until the snapshot
 releases the "suspwt" lock.
 
 On my test server (FreeBSD 8.3) /backup is mounted on /dev/mirror/gm2p2
 (1,8 TB) and kern.geom.journal.debug=1 gives
 
 May  9 09:59:47 : Data has been copied.
 May  9 09:59:47 : Msync time of /backup: 0.031111s
 May  9 09:59:47 : Sync time of /backup: 0.086015s
 May  9 09:59:47 : Cache flush time: 0.000705s
 May  9 09:59:47 : BIO_FLUSH time of mirror/gm2p2: 0.015049s
 May  9 10:04:12 : Journal mirror/gm2p2 71% full, forcing journal switch.
 May  9 10:17:48 : Suspend time of /backup: 1080.955120s
 May  9 10:17:48 : Starting copy of journal.
 May  9 10:17:48 : Cache flush time: 0.013182s
 May  9 10:17:48 : Cache flush time: 0.027241s
 May  9 10:17:48 : Switch time of mirror/gm2p2: 0.206213s
 May  9 10:17:48 : Entire switch time: 1081.589788s
 
 The critical "Suspend time" was 18 minutes with no I/O on any other
 partitions. The same test with some I/O's on any other partition
 triggers the panic immediately, because my journal is not big enough to
 hold the I/O's of 18 minutes.
 The same problem occurs on removing the snapshot, the g_journal switcher
 waits for 10 minutes on the "ufs" lock.
 The same test on a 1 TB disk drops the "Suspend time" to 190 seconds for
 the snapshot and 12 seconds for the remove.
 
 My conclusion is, that snapshot (used for dump -L) on a journaled
 partition is not safe, when the "Suspend time" for the biggest journaled
 partition is more than about 20 seconds.
 
 -- 
 Dr. Andreas Longwitz
 
 Data Service GmbH
 Beethovenstr. 2A
 23617 Stockelsdorf
 Amtsgericht Lbeck, HRB 318 BS
 Geschftsfhrer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau
 
>Unformatted:
