From markusk@pvv.ntnu.no  Mon Nov  3 00:03:41 2003
Return-Path: <markusk@pvv.ntnu.no>
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2C2D416A4CE
	for <FreeBSD-gnats-submit@freebsd.org>; Mon,  3 Nov 2003 00:03:41 -0800 (PST)
Received: from proto.pvv.ntnu.no (proto.pvv.ntnu.no [129.241.210.168])
	by mx1.FreeBSD.org (Postfix) with SMTP id 5E93C43FCB
	for <FreeBSD-gnats-submit@freebsd.org>; Mon,  3 Nov 2003 00:03:39 -0800 (PST)
	(envelope-from markusk@pvv.ntnu.no)
Received: (qmail 71891 invoked by uid 28724); 3 Nov 2003 08:03:37 -0000
Message-Id: <20031103080337.71890.qmail@proto.pvv.ntnu.no>
Date: 3 Nov 2003 08:03:37 -0000
From: Markus Bjartveit Kruger <markusk@pvv.ntnu.no>
Reply-To: Markus Bjartveit Kruger <markusk@pvv.ntnu.no>
To: FreeBSD-gnats-submit@freebsd.org
Cc:
Subject: Improvement of radixsort() implementation
X-Send-Pr-Version: 3.113
X-GNATS-Notify:

>Number:         58860
>Category:       misc
>Synopsis:       Improvement of radixsort() implementation
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kientzle
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:  
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Mon Nov 03 00:10:12 PST 2003
>Closed-Date:    Mon Nov 24 12:38:57 PST 2003
>Last-Modified:  Mon Nov 24 12:38:57 PST 2003
>Originator:     Markus Bjartveit Kruger
>Release:        FreeBSD 4.8-RELEASE i386
>Organization:
Programvareverkstedet
>Environment:
System: FreeBSD proto.pvv.ntnu.no 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Tue Sep 16 21:03:28 CEST 2003 root@bacchus.pvv.ntnu.no:/local/obj/usr/src/sys/PROTO i386


	
>Description:
	
I've found a possible improvement to the implementation of radixsort()
located in /usr/src/lib/libc/stdlib/radixsort.c, which will make the
implementation more efficient when sorting data sets containing many
strings that share a common prefix.

The improvement consists in checking for bins where all strings have
the same character at the current position of the radix sort, and
moving on to the next character instead of needlessly shuffling the
strings in the bin around.  When I used the modified radixsort() to
sort a web log file, which typically contains many strings with common
prefixes (IP addresses), I registered a speedup of approximately 15%.
On a constructed dataset where all strings had a long common prefix,
the speedup was 50%.
>How-To-Repeat:
	
Compile a program with the old and new version of radixsort() and compare
processing times.  Remember to set appropriate optimization options, as
the efficiency gain can be lost otherwise.
>Fix:

	

--- radixsort-patch begins here ---
--- radixsort.c	Tue Mar 11 12:39:57 1997
+++ radixsort.c.patch	Fri Oct 31 12:55:10 2003
@@ -175,6 +175,17 @@
 		}
 
 		/*
+		 * Special case: if all strings have the same
+		 * character at position i, move on to the next
+		 * character.
+		 */
+		if (nc == 1 && count[bmin] == n) {
+			push(a, n, i+1);
+			nc = count[bmin] = 0;
+			continue;
+		}
+
+		/*
 		 * Set top[]; push incompletely sorted bins onto stack.
 		 * top[] = pointers to last out-of-place element in bins.
 		 * count[] = counts of elements in bins.
--- radixsort-patch ends here ---


>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: freebsd-bugs->kientzle 
Responsible-Changed-By: kientzle 
Responsible-Changed-When: Mon Nov 10 20:34:39 PST 2003 
Responsible-Changed-Why:  
I'm testing this and will commit it tonight. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=58860 
State-Changed-From-To: open->closed 
State-Changed-By: kientzle 
State-Changed-When: Mon Nov 24 12:37:04 PST 2003 
State-Changed-Why:  
Committed fix to -CURRENT, submitter has checked. 
Still need to MFC this, though. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=58860 
>Unformatted:
