From blackye@bag.ieo-research.it  Sun Oct  5 19:36:52 2003
Return-Path: <blackye@bag.ieo-research.it>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 138D016A4B3
	for <FreeBSD-gnats-submit@freebsd.org>; Sun,  5 Oct 2003 19:36:52 -0700 (PDT)
Received: from bag.ieo-research.it (bag.ieo-research.it [213.92.108.146])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6313743F85
	for <FreeBSD-gnats-submit@freebsd.org>; Sun,  5 Oct 2003 19:36:50 -0700 (PDT)
	(envelope-from blackye@bag.ieo-research.it)
Received: from bag.ieo-research.it (localhost [127.0.0.1])
	by bag.ieo-research.it (8.12.9/8.12.9) with ESMTP id h962YLTu033669
	for <FreeBSD-gnats-submit@freebsd.org>; Mon, 6 Oct 2003 04:34:21 +0200 (CEST)
	(envelope-from blackye@bag.ieo-research.it)
Received: (from blackye@localhost)
	by bag.ieo-research.it (8.12.9/8.12.9/Submit) id h962YK9Y033668;
	Mon, 6 Oct 2003 04:34:20 +0200 (CEST)
Message-Id: <200310060234.h962YK9Y033668@bag.ieo-research.it>
Date: Mon, 6 Oct 2003 04:34:20 +0200 (CEST)
From: Andrea Cocito <blackye@break.net>
Reply-To: The Black Hacker <blackye@break.net>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Boot failing for ALi chipsets, patch attached
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         57631
>Category:       kern
>Synopsis:       [agp] [patch] boot failing for ALi chipsets
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    jhb
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Oct 05 19:40:25 PDT 2003
>Closed-Date:    
>Last-Modified:  Tue Oct 25 03:26:58 GMT 2005
>Originator:     The Black Hacker
>Release:        FreeBSD 5.1-RELEASE i386
>Organization:
>Environment:
System: FreeBSD bag.ieo-research.it 5.1-RELEASE FreeBSD 5.1-RELEASE #0: Mon Jul 21 16:46:37 CEST 2003 root@bag.ieo-research.it:/usr/src/sys/i386/compile/BAG i386
>Description:
Booting on several chipsets, inluding ALi and others used for laptops
and industrial systems, fails. This is due to the fact that the AGP
bus requests an aperture size of "zero"; which is probably wrong but
the code supposed to handle this in src/pci/agp_*.c is deadly broken.
>How-To-Repeat:
Try to boot any 5.* version on the affected machines
>Fix:
Patch follows.
Index: sys/pci/agp_ali.c
===================================================================
RCS file: /home/ncvs/src/sys/pci/agp_ali.c,v
retrieving revision 1.8
diff -u -r1.8 agp_ali.c
--- sys/pci/agp_ali.c	22 Aug 2003 07:13:20 -0000	1.8
+++ sys/pci/agp_ali.c	5 Oct 2003 18:19:02 -0000
@@ -102,21 +102,20 @@
 		return error;
 
 	sc->initial_aperture = AGP_GET_APERTURE(dev);
+	gatt = NULL;
 
-	for (;;) {
+	while (AGP_GET_APERTURE(dev) != 0) {
 		gatt = agp_alloc_gatt(dev);
-		if (gatt)
+		if (gatt != NULL)
 			break;
-
-		/*
-		 * Probably contigmalloc failure. Try reducing the
-		 * aperture so that the gatt size reduces.
-		 */
-		if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) {
-			agp_generic_detach(dev);
-			return ENOMEM;
-		}
+		AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2);
+	}
+		
+	if (gatt == NULL) {
+		agp_generic_detach(dev);
+		return ENOMEM;
 	}
+
 	sc->gatt = gatt;
 
 	/* Install the gatt. */
Index: sys/pci/agp_amd.c
===================================================================
RCS file: /home/ncvs/src/sys/pci/agp_amd.c,v
retrieving revision 1.16
diff -u -r1.16 agp_amd.c
--- sys/pci/agp_amd.c	22 Aug 2003 07:13:20 -0000	1.16
+++ sys/pci/agp_amd.c	5 Oct 2003 18:19:02 -0000
@@ -240,19 +240,20 @@
 	sc->bsh = rman_get_bushandle(sc->regs);
 
 	sc->initial_aperture = AGP_GET_APERTURE(dev);
+	gatt = NULL;
 
-	for (;;) {
+	while (AGP_GET_APERTURE(dev) != 0) {
 		gatt = agp_amd_alloc_gatt(dev);
-		if (gatt)
+		if (gatt != NULL)
 			break;
+		AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2);
+	}
 
-		/*
-		 * Probably contigmalloc failure. Try reducing the
-		 * aperture so that the gatt size reduces.
-		 */
-		if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2))
-			return ENOMEM;
+	if (gatt == NULL) {
+		agp_generic_detach(dev);
+		return ENOMEM;
 	}
+
 	sc->gatt = gatt;
 
 	/* Install the gatt. */
Index: sys/pci/agp_intel.c
===================================================================
RCS file: /home/ncvs/src/sys/pci/agp_intel.c,v
retrieving revision 1.19
diff -u -r1.19 agp_intel.c
--- sys/pci/agp_intel.c	17 Sep 2003 02:58:17 -0000	1.19
+++ sys/pci/agp_intel.c	5 Oct 2003 18:19:02 -0000
@@ -154,21 +154,20 @@
 	    MAX_APSIZE;
 	pci_write_config(dev, AGP_INTEL_APSIZE, value, 1);
 	sc->initial_aperture = AGP_GET_APERTURE(dev);
+	gatt = NULL;
 
-	for (;;) {
+	while (AGP_GET_APERTURE(dev) != 0) {
 		gatt = agp_alloc_gatt(dev);
-		if (gatt)
+		if (gatt != NULL)
 			break;
-
-		/*
-		 * Probably contigmalloc failure. Try reducing the
-		 * aperture so that the gatt size reduces.
-		 */
-		if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) {
-			agp_generic_detach(dev);
-			return ENOMEM;
-		}
+		AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2);
+	}
+		
+	if (gatt == NULL) {
+		agp_generic_detach(dev);
+		return ENOMEM;
 	}
+
 	sc->gatt = gatt;
 
 	/* Install the gatt. */
Index: sys/pci/agp_sis.c
===================================================================
RCS file: /home/ncvs/src/sys/pci/agp_sis.c,v
retrieving revision 1.9
diff -u -r1.9 agp_sis.c
--- sys/pci/agp_sis.c	22 Aug 2003 07:13:20 -0000	1.9
+++ sys/pci/agp_sis.c	5 Oct 2003 18:19:02 -0000
@@ -104,21 +104,20 @@
 		return error;
 
 	sc->initial_aperture = AGP_GET_APERTURE(dev);
+	gatt = NULL;
 
-	for (;;) {
+	while (AGP_GET_APERTURE(dev) != 0) {
 		gatt = agp_alloc_gatt(dev);
-		if (gatt)
+		if (gatt != NULL)
 			break;
-
-		/*
-		 * Probably contigmalloc failure. Try reducing the
-		 * aperture so that the gatt size reduces.
-		 */
-		if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) {
-			agp_generic_detach(dev);
-			return ENOMEM;
-		}
+		AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2);
+	}
+		
+	if (gatt == NULL) {
+		agp_generic_detach(dev);
+		return ENOMEM;
 	}
+
 	sc->gatt = gatt;
 
 	/* Install the gatt. */
Index: sys/pci/agp_via.c
===================================================================
RCS file: /home/ncvs/src/sys/pci/agp_via.c,v
retrieving revision 1.11
diff -u -r1.11 agp_via.c
--- sys/pci/agp_via.c	22 Aug 2003 07:13:20 -0000	1.11
+++ sys/pci/agp_via.c	5 Oct 2003 18:19:03 -0000
@@ -112,21 +112,20 @@
 		return error;
 
 	sc->initial_aperture = AGP_GET_APERTURE(dev);
+	gatt = NULL;
 
-	for (;;) {
+	while (AGP_GET_APERTURE(dev) != 0) {
 		gatt = agp_alloc_gatt(dev);
-		if (gatt)
+		if (gatt != NULL)
 			break;
-
-		/*
-		 * Probably contigmalloc failure. Try reducing the
-		 * aperture so that the gatt size reduces.
-		 */
-		if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) {
-			agp_generic_detach(dev);
-			return ENOMEM;
-		}
+		AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2);
+	}
+		
+	if (gatt == NULL) {
+		agp_generic_detach(dev);
+		return ENOMEM;
 	}
+
 	sc->gatt = gatt;
 
 	/* Install the gatt. */

>Release-Note:
>Audit-Trail:

From: Andrea Cocito <blackye@break.net>
To: freebsd-gnats-submit@FreeBSD.org
Cc:  
Subject: Re: kern/57631: Boot failing for ALi chipsets, patch attached
Date: Mon, 1 Dec 2003 22:48:55 +0100

 Just in case: if someone needs a working miniinst iso with the patch
 it is available at http://bio.ieo-research.it/tmp/
 
 I have been asked by maybe 20 ppl a fixed kernel.... I don't understand
 why this was not fixed.
 
 The code makes no sens also in current 5.2 (where the specific issue
 of ALi is fixed by a workaround). This piece of code:
 
         for (;;) {
                  gatt = agp_alloc_gatt(dev);
                  if (gatt)
                          break;
 
                  /*
                   * Probably contigmalloc failure. Try reducing the
                   * aperture so that the gatt size reduces.
                   */
                  if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) {
                          agp_generic_detach(dev);
                          return ENOMEM;
                  }
          }
 
 .. just makes no sense !
 
 It will NEVER try to allocate a smaller aperture (as the comment
 suggests), all the checks are reversed... it will either panic because
 trying to aloocate zero bytes, or fail in any case... unless it is asked
 to allocated ONE byte.....
 
 Instead of the dirty workaround to avoid reaching the broken
 code:
 +	if (entries == 0) {
 +		device_printf(dev, "bad aperture size\n");
 +		return NULL;
 +	}
 
 ... it is way better to FIX the broken code.
 
 The proposed patch fixes it, also for otehr agp devices.
 
 
 Ciao,
 
 A.
 
Responsible-Changed-From-To: freebsd-bugs->jhb 
Responsible-Changed-By: jhb 
Responsible-Changed-When: Wed Apr 7 08:29:47 PDT 2004 
Responsible-Changed-Why:  
I'll take this one. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=57631 

From: John Baldwin <jhb@FreeBSD.org>
To: freebsd-gnats-submit@FreeBSD.org, blackye@break.net
Cc: Doug Rabson <dfr@nlsystems.com>
Subject: Re: kern/57631: Boot failing for ALi chipsets, patch attached
Date: Wed, 7 Apr 2004 14:08:01 -0400

 Part of this PR confuses me.  Specifically, the statement that the current 
 code will never try to reduce the size of the aperture because the return 
 value checks are incorrect.  This seems wrong.  AGP_SET_APERTURE() returns 0 
 on success and EINVAL on failure.  Thus, the old code will try to set the 
 aperture to half of its current size.  If it fails, then it does a detach and 
 returns ENOMEM.  If it succeeds, it tries to allocate the gatt again.  Which 
 test is wrong?  Also, your new code is subject to an infinite loop it seems.  
 Specifically, your new code doesn't check for an error from AGP_SET_APERTURE 
 at all.  For the agp drivers I've looked at, this means that when you finally 
 get down to an invalid value (such as 2mb) that the aperture size will stay 
 at the previous value (4mb) since a failing SET doesn't change the aperture 
 size, so you will stay in the loop forever.  It seems that you should break 
 out of the loop if AGP_SET_APERTURE fails, so something like:
 
 	sc->initial_aperture = AGP_GET_APERTURE(dev);
 	gatt = NULL;
 	while (AGP_GET_APERTURE(dev) != 0 && gatt == NULL) {
 		gatt = agp_alloc_gatt(dev);
 		if (gatt == NULL) {
 			/* Probably contigmalloc failure. */
 			if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2) != 0)
 				break;
 		}
 	}
 	if (gatt == NULL) {
 		agp_generic_detach(dev);
 		return (ENOMEM);
 	}
 	sc->gatt = gatt;
 
 In that case, I can almost see just addnig the check for zero initial size 
 instead as that is less code churn.
 
 -- 
 John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
 "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
>Unformatted:
