From Tor.Egge@idi.ntnu.no  Sun May 25 21:07:18 1997
Received: from pat.idt.unit.no (0@pat.idt.unit.no [129.241.103.5])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA25936
          for <FreeBSD-gnats-submit@freebsd.org>; Sun, 25 May 1997 21:07:13 -0700 (PDT)
Received: from ikke.idi.ntnu.no (tegge@ikke.idi.ntnu.no [129.241.111.65])
	by pat.idt.unit.no (8.8.5/8.8.5) with ESMTP id GAA29993
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 26 May 1997 06:07:09 +0200 (MET DST)
Received: (from tegge@localhost)
	by ikke.idi.ntnu.no (8.8.5/8.8.5) id GAA01188;
	Mon, 26 May 1997 06:07:09 +0200 (MET DST)
Message-Id: <199705260407.GAA01188@ikke.idi.ntnu.no>
Date: Mon, 26 May 1997 06:07:09 +0200 (MET DST)
From: Tor Egge <Tor.Egge@idi.ntnu.no>
Reply-To: Tor.Egge@idi.ntnu.no
To: FreeBSD-gnats-submit@freebsd.org
Subject: fsck -p gets transient unexpected inconsistensies
X-Send-Pr-Version: 3.2

>Number:         3688
>Category:       kern
>Synopsis:       fsck -p gets transient unexpected inconsistensies
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun May 25 21:10:01 PDT 1997
>Closed-Date:    Thu Sep 18 04:18:21 PDT 1997
>Last-Modified:  Thu Sep 18 04:18:46 PDT 1997
>Originator:     Tor Egge
>Release:        FreeBSD 3.0-CURRENT i386
>Organization:
Norwegian University of Science and Technology, Trondheim, Norway
>Environment:

FreeBSD 3.0-CURRENT

ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 19 on pci0:9:0
ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs
	
ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 16 on pci0:12:0
ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs

sd0:  scbus0 target 0 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
cd1:  scbus0 target 1 lun 0: <MATSHITA CD-ROM CR-506 8S05> type 5 removable SCSI 2
sd2:  scbus0 target 2 lun 0: <SEAGATE ST15150N 0905> type 0 fixed SCSI 2
sd3:  scbus0 target 3 lun 0: <Quantum XP34300 L915> type 0 fixed SCSI 2
sd6:  scbus1 target 2 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd7:  scbus1 target 3 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd8:  scbus1 target 4 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd9:  scbus1 target 5 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd10: scbus1 target 6 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd11: scbus1 target 8 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd12: scbus1 target 9 lun 0: <QUANTUM XP34550W LXY4> type 0 fixed SCSI 2
sd13: scbus1 target 10 lun 0: <QUANTUM XP34550W LXY1> type 0 fixed SCSI 2

/etc/ccd.conf:
ccd0    64      0       /dev/sd6d /dev/sd7d
ccd1    64      0       /dev/sd8a /dev/sd9a /dev/sd10a
ccd2    64      0       /dev/sd11a /dev/sd12a /dev/sd13a
ccd3    64      0       /dev/sd2a /dev/sd3a

sd0:
  a:   176715        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 10)
  b:  1413720   176715      swap                        # (Cyl.   11 - 98)
  c:  8883945        0    unused        0     0         # (Cyl.    0 - 552)
  d:   160650  1590435    4.2BSD      512  4096    16   # (Cyl.   99 - 108)
  e:   321300  1751085    4.2BSD      512  4096    16   # (Cyl.  109 - 128)
  g:  6811560  2072385    4.2BSD     1024  8192    16   # (Cyl.  129 - 552)

sd2:
  a:  8385930        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 521)
  c:  8385930        0    unused        0     0         # (Cyl.    0 - 521)

sd3:
  a:  8385930        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 521)
  c:  8385930        0    unused        0     0         # (Cyl.    0 - 521)

sd6 and sd7:
  a:   128520        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 7)
  b:   706860   128520      swap                        # (Cyl.    8 - 51)
  c:  8883945        0    unused        0     0         # (Cyl.    0 - 552)
  d:  8048565   835380    4.2BSD     1024  8192    16   # (Cyl.   52 - 552)

sd8, sd9, sd10, sd11, sd12, sd13:
  a:  8883945        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 552)
  c:  8883945        0    unused        0     0         # (Cyl.    0 - 552)

ccd0:
  c: 16097024        0    4.2BSD        0     0     0   # (Cyl.    0 - 7859*)

ccd1 and ccd2:
  c: 26651712        0    4.2BSD        0     0     0   # (Cyl.    0 - 13013*)

ccd3:
  c: 16771712        0    4.2BSD        0     0     0   # (Cyl.    0 - 8189*)

/etc/fstab:
/dev/sd0b                       none            swap    sw 0 0
/dev/sd6b                       none            swap    sw 0 0
/dev/sd7b                       none            swap    sw 0 0
/dev/sd0a                       /               ufs     rw 1 1
/dev/sd0e                       /store          ufs     rw 1 2
/dev/sd0g                       /usr            ufs     rw 1 2
/dev/sd0d                       /var            ufs     rw 1 2
proc                            /proc           procfs  rw 0 0
/dev/sd0b                       /tmp            mfs     rw,-s=240000 0 0
/dev/sd6a                       /resroot1       ufs     rw 1 2
/dev/sd7a                       /resroot2       ufs     rw 1 2
/dev/ccd0c                      /export/ftpsearch1 ufs  rw 1 2
/dev/ccd1c                      /export/ftpsearch2 ufs  rw 1 2
/dev/ccd2c                      /export/ftpsearch3 ufs  rw 1 2
/dev/ccd3c                      /mirror            ufs  rw 1 2 

>Description:

When recovering from a system crash, `fsck -p' in /etc/rc complained about
unexpected inconsistencies on 3 different filesystems. When running fsck
manually on each of these filesystems, only the clean flag needed to be set in
the superblock. The three filesystems were all located on ccd devices.

When recovering from the next system crash, `fsck -p' in /etc/rc complained
about the values in super block not agreeing with those in the first alternate,
but when running fsck manually, only the clean flag needed to be set in the
superblock. This was on a small partition (/resroot2) where no write operations
had been performed since the last boot.

When investigating the probable cause (on a different 3.0-CURRENT machine), I
found that simultaneous open of several partitions on a disk (where no
partitions were open before the attempt) caused inconsistent behaviour,
sometimes with an kernel crash.

sdopen on the different partitions ends up calling dsopen. As long as the first
dsopen on the device has not completed, a new call to dsopen ends up doing the
same reading of the disklabel from the device. When the first call to dsopen
returns, the other calls to dsopen might still do things to the disk label and
slice maps that causes reads/writes to the partition for which the first dsopen
call was made to access wrong places on the disk.

Writes to freed kernel memory might also occur. This probably triggered
the kernel crash during the investigation.

This bug does not explain my problems with fsck, which must have been
caused by a different bug.

>How-To-Repeat:

Configure a disk (sd1) with several file systems:
  a:   204800        0    4.2BSD     1024  8192    16   # (Cyl.    0 - 99)
  c:  3450880        0    unused        0     0         # (Cyl.    0 - 1684)
  d:   204800   204800    4.2BSD     1024  8192    16   # (Cyl.  100 - 199)
  e:   204800   409600    4.2BSD     1024  8192    16   # (Cyl.  200 - 299)
  f:   204800   614400    4.2BSD     1024  8192    16   # (Cyl.  300 - 399)
  g:   204800   819200    4.2BSD     1024  8192    16   # (Cyl.  400 - 499)
  h:  2426880  1024000    4.2BSD     1024  8192    16   # (Cyl.  500 - 1684)

No partitions from the disk mounted when performing parallel open of raw 
devices:

#!/bin/sh
fsck -n /dev/rsd1a &
fsck -n /dev/rsd1d &
fsck -n /dev/rsd1e &
fsck -n /dev/rsd1f &
fsck -n /dev/rsd1g &
fsck -n /dev/rsd1h &

Run this script several times. You should get some error messages 
similar to the following:

----
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1s4: cannot find label (no disk label)
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1s4: cannot find label (no disk label)
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1: raw partition size != slice size
sd1: start 0, end 3450901, size 3450902
sd1c: start 0, end 3450879, size 3450880
sd1s1: raw partition size != slice size
sd1s1: start 0, end 3450901, size 3450902
sd1s1c: start 0, end 3450879, size 3450880
sd1: ILLEGAL REQUEST asc:21,0 Logical block address out of range

Fatal trap 12: page fault while in kernel mode
cpunumber = 1
fault virtual address	= 0x8
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xe0150db7
stack pointer	        = 0x10:0xe94afde0
frame pointer	        = 0x10:0xe94afe00
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 6 (cpuidle1)
interrupt mask		= 
---

>Fix:
	
protect critical parts of dsopen (and other routines ?) with a lock ?
>Release-Note:
>Audit-Trail:

From: Tor Egge <Tor.Egge@idi.ntnu.no>
To: Tor.Egge@idi.ntnu.no
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/3688: fsck -p gets transient unexpected inconsistensies
Date: Sun, 22 Jun 1997 17:56:39 +0200

 I wrote:
 
 > When recovering from a system crash, `fsck -p' in /etc/rc complained about
 > unexpected inconsistencies on 3 different filesystems. When running fsck
 > manually on each of these filesystems, only the clean flag needed to be set in
 > the superblock. The three filesystems were all located on ccd devices.
 
 This is caused by a superblock for a different filesystem being present
 in memory when comparing the superblock with the alternate superblock. :-(
 
 Under some circumstances, copy-on-write handling is broken.
 
 How-To-Repeat:
 
 Compile the appended program, using the `-static' option.
 
 Run this program on a machine with a recent 3.0-current SMP kernel and
 only one CPU enabled (sysctl -w kern.smp_active=1).
 ----------
 #include <sys/types.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <stdlib.h>
 #include <sys/errno.h>
 #include <errno.h>
 #include <assert.h>
 #include <fcntl.h>
 #include <fstab.h>
 #include <string.h>
 
 #ifndef NUMCHILD
 #define NUMCHILD 3
 #endif
 
 struct child
 {
   pid_t pid;
   int fd;
 } children[NUMCHILD];
 
 char okbuf[1];
 char childbuf[1];
 
 void touchbuf(char *buf);
 
 int main(int argc,char **argv)
 {
   int i;
   pid_t pid;
   int pipefd[2];
   char *buf;
   int exitcode;
 
   exitcode = 0;
   buf = malloc(8*1024);
   for (i=0;i<NUMCHILD;i++) {
     if (pipe(pipefd)) {
       perror("pipe");
       exit(1);
     }
     fflush(stdout);
     fflush(stderr);
     fflush(stdin);
     pid = fork();
     if (pid<0) {
       perror("fork");
       exit(1);
     }
     if (pid == 0) {
       /* child */
       close(pipefd[0]);
       touchbuf(buf);
       if (read(pipefd[1],buf,1)!=1) {
 	perror("child read");
 	exit(1);
       }
       memcpy(okbuf,buf,1);
       sleep(1);
       if (write(pipefd[1],buf,1)!=1) {
 	perror("child write");
 	exit(1);
       }
       if (memcmp(okbuf,buf,1)) 
 	printf("Child %d (pid %d) broken\n",i,getpid());
       exit(0);
     }
     /* parent */
     close(pipefd[1]);
     children[i].fd = pipefd[0];
     children[i].pid = pid;
   }
   for (i=0;i<NUMCHILD;i++) {
     memset(childbuf,i+1,sizeof(childbuf));
     if (write(children[i].fd,childbuf,1)!=1) {
       perror("parent write");
       exit(1);
     }
   }
   sleep(3);
   for (i=0;i<NUMCHILD;i++) {
     memset(okbuf,i+1,sizeof(okbuf));
     printf("Verifying child %d (pid %d)\n",i,
 	children[i].pid);
     if (read(children[i].fd,childbuf,1)!=1) {
       perror("parent piperead");
     }
     if (memcmp(childbuf,okbuf,1)) {
       printf("BAD ");
       exitcode=1;
     } else
       printf("GOOD");
     printf(" (got %d, expected %d)\n",childbuf[0],okbuf[0]);
   }
   exit(exitcode);
 }
 
 char ___xxx;
 void touchbuf(char *buf)
 {
   ___xxx = * buf;
 }
 
 ----------
 
 The result might be something like:
 
 ------
 ikke:/amd/kamelia/home/kamelia/a/tegge$ ./bad2
 Child 1 (pid 553) broken
 Child 0 (pid 552) broken
 Verifying child 0 (pid 552)
 BAD  (got 3, expected 1)
 Verifying child 1 (pid 553)
 BAD  (got 3, expected 2)
 Verifying child 2 (pid 554)
 GOOD (got 3, expected 3)
 ------
 
 When using gdb on the kernel and looking at the page tables, the
 virtual page pointed to by the `buf' variable is read-only and located
 on the same physical page for all three child processes.
 
 - Tor Egge

From: Tor Egge <Tor.Egge@idi.ntnu.no>
To: Tor.Egge@idi.ntnu.no
Cc: FreeBSD-gnats-submit@FreeBSD.ORG
Subject: Re: kern/3688: fsck -p gets transient unexpected inconsistensies
Date: Mon, 23 Jun 1997 02:05:59 +0200

 > 
 > I wrote:
 > 
 > > When recovering from a system crash, `fsck -p' in /etc/rc complained about
 > > unexpected inconsistencies on 3 different filesystems. When running fsck
 > > manually on each of these filesystems, only the clean flag needed to be set in
 > > the superblock. The three filesystems were all located on ccd devices.
 > 
 > This is caused by a superblock for a different filesystem being present
 > in memory when comparing the superblock with the alternate superblock. :-(
 
 Suggested fix:
 
 Index: mp_machdep.c
 ===================================================================
 RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v
 retrieving revision 1.17
 diff -c -r1.17 mp_machdep.c
 *** mp_machdep.c	1997/06/02 10:44:08	1.17
 --- mp_machdep.c	1997/06/22 23:50:10
 ***************
 *** 45,50 ****
 --- 45,51 ----
   #include <machine/cpufunc.h>
   #include <machine/segments.h>
   #include <machine/smptests.h>	/** TEST_DEFAULT_CONFIG, LATE_START */
 + #include <machine/specialreg.h>
   
   #include <i386/i386/cons.h>	/* cngetc() */
   
 ***************
 *** 429,434 ****
 --- 430,442 ----
   
   	/* start each Application Processor */
   	start_all_aps(boot_addr);
 + 
 + 	/* 
 + 	 * The init process might be started on a different CPU now,
 + 	 * and the boot CPU might not call prepare_usermode to get
 + 	 * cr0 correctly configured. Thus we initialize cr0 here.
 + 	 */
 + 	load_cr0(rcr0() | CR0_WP | CR0_AM);
   }
   
   
State-Changed-From-To: open->closed 
State-Changed-By: phk 
State-Changed-When: Thu Sep 18 04:18:21 PDT 1997 
State-Changed-Why:  

The suggested patch was committed some time ago. 
>Unformatted:
