Subj : Re: any function to handle this kind of counting? To : comp.programming From : Jon Harrop Date : Sat Jul 30 2005 04:53 am Randy wrote: > ... > But just because there's no pretty solution doesn't mean there's no ugly > solution. I posted an elegant 10-line solution... > For option #2, a dynamic data structure, you could use a 2D linked list, Yes, my program used the type "(int * (char * int) list) list". > Either way, rebuilding the dynamic data structure as you go is gonna be > slow. No, that is completely wrong. What you are calling "dynamic data structures" can be asymptotically faster than the array-based approach. Performance of my list-based implementation could be improved upon by using sets or hash tables instead of association lists. For example, the following program uses a set (balanced binary tree): module CharMap = Map.Make(Char) module Int = struct type t = int let compare (i : t) j = compare i j end module IntMap = Map.Make(Int) let ab = List.combine a b let add b m = CharMap.add b (try 1 + CharMap.find b m with Not_found -> 1) m let add2 (a, b) m = let m2 = add b (try IntMap.find a m with Not_found -> CharMap.empty) in IntMap.add a m2 m List.fold_right add2 ab IntMap.empty and this version uses hash tables: let ab = List.combine a b in let add b m = try incr (Hashtbl.find m b) with Not_found -> Hashtbl.add m b (ref 1) in let m = Hashtbl.create 1 in let add2 (a, b) = try add b (Hashtbl.find m a) with Not_found -> let m2 = Hashtbl.create 1 in add b m2; Hashtbl.add m a m2 in List.iter add2 ab; Here is an array-based implementation: I get the following times with first elements in the range [0..10) and second elements in the range [32..127): $ time ./collate 10 2000 List based real 0m0.018s user 0m0.010s sys 0m0.000s $ time ./collate2 10 2000 Tree based real 0m0.011s user 0m0.010s sys 0m0.000s $ time ./collate3 10 2000 Hash table based real 0m0.008s user 0m0.000s sys 0m0.010s $ time ./collate4 10 2000 Array based real 0m0.007s user 0m0.010s sys 0m0.000s With a larger range [0..10^5), giving sparse arrays, the array-based method requires a lot more memory (150Mb vs 5Mb) and is 500x slower than the tree-based method: $ time ./collate 100000 2000 List based real 0m0.347s user 0m0.330s sys 0m0.010s $ time ./collate2 100000 2000 Tree based real 0m0.017s user 0m0.010s sys 0m0.000s $ time ./collate3 100000 2000 Hash table based real 0m0.030s user 0m0.010s sys 0m0.000s $ time ./collate4 100000 2000 Array based real 0m9.139s user 0m7.880s sys 0m0.290s Well-written C equivalents using the non-trivial data structures would require hundreds of lines of code. C is a very poor choice of language for this kind of work... -- Dr Jon D Harrop, Flying Frog Consultancy http://www.ffconsultancy.com .