Subj : Re: any function to handle this kind of counting? To : comp.programming From : Jon Harrop Date : Mon Aug 01 2005 09:26 pm Peter Ammon wrote: >>>A data structure impedance mismatch is easier to deal with than a >>>language mismatch. >> >> You're probably going to end up copying the data structures in both >> cases. > > That's so. But it's still substantially more difficult to modify your > OCaml code to operate with ObscureCLibrary than it would be if you were > all C. Going from C->OCaml is probably even harder, but less likely to > come up. The easiest way to call C code from OCaml is to write an IDL interface and compile it into bindings using camlidl. The main difference in complexity is that you would have to create a formal definition of the C library's interface. However, you would want to know the API in order to use the C library from C code anyway, so it really isn't that different. For example, you'd have to supplement the C declarations with "in", "out" or "in/out". There are pathological cases (like the GLU tesselator) where the C interface is not easily represented by OCaml's type system. However, it took me much longer to get working C++ code that it did to write OCaml bindings to the GLU tesselator. >>>>3. Efficiency (a "home grown" set implementation in C is unlikely to >>>>match the performance of OCaml's set implementation). >>> >>>Doubtful. OCaml is very fast, but it still targets the same instruction >>>set as C, and there are very fast C libraries as well. >> >> No, my statement is true. Instruction sets are basically irrelevant. > > The point was that there's no optimizations that are possible in OCaml > that are not possible in C, at least in principle. If you mean that you can write a C equivalent of any OCaml code then I think that is incorrect. I believe there are parts of FPL compilers (probably ocamlopt) that must be written in assembler. For example, some GCs. > Or if there are, I > don't know about them :> My C compiler does optimizations typically > associated with functional languages, like transforming recursive > functions to tail-recursive form and then applying tail-recursion > optimization. No, there are many optimisations that are performed by FPL compilers but not by C compilers. Typically, either the optimisations don't make sense in C (e.g. unboxing) or C code doesn't facilitate the optimisation by making the necessary information available (e.g. referential transparency). Languages like OCaml are at a huge disadvantage in terms of performance. It is precisely these optimisations that allow the implementation to regain much of the lost performance. Out of curiosity, can your C compiler in-line a function when it is passed by pointer to another function? >> Specifically in the context of sets, the OCaml stdlib implementation will >> return the union of two non-overlapping sets in O(1) time. That is >> substantially faster than the O(n) of the STL set_union and all other >> imperative set implementations that I have heard of. > > That's really cool, if true! Can you point me at more information about > it? All the references I can find say that OCaml's module Set uses > red-black trees, and I don't see how you can get O(1) unions with those. I think OCaml's set module uses AVT not RB trees. OCaml's implementation is fast not because of the balancing, but because it is a purely functional data structure and, therefore, it can exploit referential transparency. Essentially, the STL can't reuse any of the previous sets (i.e. referencing them) when creating the union/intersection/difference because you might mutate the data in an old set and that would affect the new set. Instead, the STL must explicitly copy all of the elements that it uses. In contrast, OCaml "knows" that your set is immutable (a functional data structure) so it is free to refer back to any old bits of tree that it can find when constructing the result. Consequently, OCaml's implementation is much faster whenever it can reuse old bits of tree, such as when computing the union of two equal-sized non-overlapping trees. I wrote a program in C++ and OCaml to compute the set of "n"th-nearest neighbours of a given atom in a conventional condensed matter physics simulation of an amorphous material (it also applies to proteins, etc.). The OCaml code is less than half the length and 100x faster than the C++ implementation. Both implementations can be optimised, of course. To optimise the C++ you must compute set unions without the STL's set_union function. The optimisations are all rather complicated. Anyway, the point is that OCaml is much closer to the sweet spot on the writing-time vs running-time for a wide variety of programs. >>>>4. Capability (OCaml's built-in hash function can traverse any data >>>>structure). >>> >>>I don't understand what you mean by this. >> >> The GC must be able to traverse any data structure using only information >> available at run-time (OCaml strips out as much run-time type information >> as possible in order to evade run-time type checking). OCaml also comes >> with several functions (e.g. polymorphic comparison operators) that use >> this capability to traverse any data structure in order to compare, hash >> and so forth. > > Interesting. But wouldn't that make hashing a large data structure > expensive? Yes. OCaml's built-in hash function stops after 100 data, IIRC. You can supply your own hash function, of course. Ideally, you don't want to repeatedly hash big data structures in any language. So you'd normally compute the hash once and store it in the data structure ready to be looked up. >>>For the record, here's how I'd do it in Cocoa/Objective-C, using a hash >>>table: >>> >>>NSMutableDictionary* dict = [NSMutableDictionary dictionary]; >>>NSEnumerator* enumer = [input objectEnumerator]; >>>NSString* string; >>>while ((string = [enumer nextObject])) >>>[dict setObject:[NSNumber numberWithInt:1 + [[dict objectForKey:string] >>>intValue]] forKey:string]; >>> >>>Certainly nothing approaching "hundreds of lines!" If anything, it's >>>shorter than the OCaml approach. >> >> That's also very nice. I can't understand it and I don't know if it is >> correct (e.g. does it compute the same result as my OCaml?). > > If I understand your code correctly, I think so. It computes a > dictionary of string->number of appearances of that string. While > Objective-C shares OCaml's nasty syntax and lacks OCaml's elegance, it > makes up for that in sheer practicality: > > 1) Dictionaries can work with any type that can hash itself and compare > itself to another object, so I don't have to write the "setup" > type-declaring code. This is the same in OCaml. > 2) Messages sent to nil objects return 0, and a missing key in a > dictionary gives you nil, so I don't have to special-case looking up an > object that is not in the dictionary. OCaml doesn't have a "nil" value. You usually have a "find" function that returns the result of the map or raises an exception to say that the key wasn't found. > Just wait 'til the Perl guy gets here, though! Yes, any language with built-in dictionaries will be great for that task. C is an awful choice... > (Seriously, tell more about these sets.) :-) Check out the freebies on our company website if you want more OCaml examples. Also, check out the first chapter of my book and the free code from it. Feel free to buy a few copies too. ;-) -- Dr Jon D Harrop, Flying Frog Consultancy http://www.ffconsultancy.com .