From nobody@FreeBSD.org  Fri Jun 16 16:51:44 2006
Return-Path: <nobody@FreeBSD.org>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8DA2B16A474
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Jun 2006 16:51:44 +0000 (UTC)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (www.freebsd.org [216.136.204.117])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 46A4943D48
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Jun 2006 16:51:44 +0000 (GMT)
	(envelope-from nobody@FreeBSD.org)
Received: from www.freebsd.org (localhost [127.0.0.1])
	by www.freebsd.org (8.13.1/8.13.1) with ESMTP id k5GGphvF025450
	for <freebsd-gnats-submit@FreeBSD.org>; Fri, 16 Jun 2006 16:51:43 GMT
	(envelope-from nobody@www.freebsd.org)
Received: (from nobody@localhost)
	by www.freebsd.org (8.13.1/8.13.1/Submit) id k5GGphOd025449;
	Fri, 16 Jun 2006 16:51:43 GMT
	(envelope-from nobody)
Message-Id: <200606161651.k5GGphOd025449@www.freebsd.org>
Date: Fri, 16 Jun 2006 16:51:43 GMT
From: Dirk Jagdmann <dj@secion.de>
To: freebsd-gnats-submit@FreeBSD.org
Subject: SATA timeouts on HP DL140 G2
X-Send-Pr-Version: www-2.3

>Number:         99039
>Category:       i386
>Synopsis:       [ata] SATA timeouts on HP DL140 G2
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    remko
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jun 16 17:00:35 GMT 2006
>Closed-Date:    Mon Sep 11 13:22:23 GMT 2006
>Last-Modified:  Mon Sep 11 13:22:23 GMT 2006
>Originator:     Dirk Jagdmann
>Release:        6.1
>Organization:
Secion GmbH
>Environment:
FreeBSD shape1.net 6.1-RELEASE-p2 FreeBSD 6.1-RELEASE-p2 #2: Fri Jun 16 16:21:50 CEST 2006     root@shape1.net:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
The HP DL140 G2 features the Intel ICH5 SATA controller. When using
FreeBSD 6.1 together with BIOS v1.16 the system suffers from some sort
of SATA Timeouts when the harddrive is accessed after about 30 (or more)
seconds of idle time. For example if you load a text file into an editor,
then edit your text (and thus not doing any harddisk IO) for 1min and
then save your text, the system will hang for ~3s before saving the text.
Occasionally the kernel generates a message like:

kernel: ad2: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=55292287

Apart from the timeouts the system seems to be stable and working fine.
There is no data corruption when accessing the harddisk.
>How-To-Repeat:
I have made a test program which will measure some harddisk action and
then wait for some time, before repeating the action. If the waited
timespan is 31,61 or 91 seconds you see a delay of ~3200ms before the
first action finishes. Further code comments are unfortunately in german:

// timeout.cpp 2006-06-16 Dirk Jagdmann <dj@secion.de>
//
// Testprogramm um die SATA Timeouts auf DL140G2 mit BIOS Version 1.16
// zu reproduzieren.
//
// Das Programm misst die Ausfuehrungszeit des Kommandos CMD in Millisekunden.
// Auf einem System ohne Last sollte die Zeit unter 10ms liegen. Nach jeweils
// 10 Ausfuehrungen von CMD wird eine Zeitspanne gewartet (1,31,61,91,121
// Sekunden). Hat das System den SATA Timeout, so ist nach der Wartezeit die
// erstmalige Ausfuehrungszeit von CMD auf ~3200ms erhoeht.
//
// Der Fehler tritt auf bei DL140G2 BIOS 1.16 FreeBSD 6.1 GENERIC kernel.
// Ein Update auf BIOS 1.17 behebt den Fehler, SATA Timeouts treten dann nicht
// mehr auf.
//
// Die Ausgabe des Programms sind drei Zahlen. Die erste Zahl gibt die
// laufende Nummer des Aufrufs von CMD aus. Nach 300 Aufrufen beendet sich
// das Programm. Die zweite Zahl zeigt die gewartete Zeitspanne vor dem
// letzten Aufruf von CMD in Sekunden. Die dritte Zahl zeigt die
// Ausfuehrungszeit von CMD in Millisekunden. Betraegt die Ausfuehrungszeit
// weniger als 10ms, so wird die aktuelle Zeile ueberschrieben.
//
// Zum Programmende wird eine Matrix der gemessenen Zeiten ausgegeben.

#include <stdlib.h>
#include <map>
#include <iostream>
#include <vector>
#include <stdint.h>
#include <sys/time.h>

#define CMD "ls -l /etc > /tmp/doj"

inline uint32_t timeGetTime()
{
  struct timeval tv;
  if(gettimeofday(&tv, NULL) == 0)
    return tv.tv_sec*1000 + tv.tv_usec/1000;
  return 0;
}

int main()
{
  typedef std::vector<int> stat_v;
  typedef std::map<int, stat_v> stat_t;
  stat_t stat;

  int z=0, s=1;
  while(++z<300)
    {
      sleep(s);

      for(int j=0; j<10; j++)
	{
	  const uint32_t before=timeGetTime();
	  system(CMD);
	  const int t=timeGetTime()-before;
	  stat[s].push_back(t);

	  std::cout << "\r#" << z << "\t" << s << "s \t" << t << "ms    " << std::flush;
	  if(t>=10)
	    std::cout << std::endl;
	}

      s+=30;
      if(s>=122)
	s=1;
    }

  for(stat_t::iterator i=stat.begin(); i!=stat.end(); ++i)
    {
      std::cout << i->first << ":\t";
      stat_v &v=stat[i->first];
      for(stat_v::iterator i=v.begin(); i!=v.end(); ++i)
	std::cout << *i << ' ';
      std::cout << std::endl;
    }

  return 0;
}

>Fix:
A BIOS update to version 1.17 fixes the timeout problem. The BIOS was downloaded from:
http://h18023.www1.hp.com/support/files/server/us/download/24573.html

>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->suspended 
State-Changed-By: linimon 
State-Changed-When: Sat Jun 24 03:19:03 UTC 2006 
State-Changed-Why:  
Mark 'suspended' since a workaround (upgrading the firmware) is available. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=99039 
State-Changed-From-To: suspended->feedback 
State-Changed-By: remko 
State-Changed-When: Mon Sep 11 13:19:55 UTC 2006 
State-Changed-Why:  
Hello, 

There seems to be a good workaround (which people should apply imo) 
to fix this, though, is the behaviour ever seen again? if not this 
problem can be closed since the workaround will be the solution. 

Thanks 


Responsible-Changed-From-To: freebsd-i386->remko 
Responsible-Changed-By: remko 
Responsible-Changed-When: Mon Sep 11 13:19:55 UTC 2006 
Responsible-Changed-Why:  
grab the pr 

http://www.freebsd.org/cgi/query-pr.cgi?pr=99039 
State-Changed-From-To: feedback->closed 
State-Changed-By: remko 
State-Changed-When: Mon Sep 11 13:22:22 UTC 2006 
State-Changed-Why:  
the mail bounces, make the workaround be the solution and close the PR 

http://www.freebsd.org/cgi/query-pr.cgi?pr=99039 
>Unformatted:
