Subj : Re: any function to handle this kind of counting?
To   : comp.programming
From : Peter Ammon
Date : Mon Aug 01 2005 11:38 am

Jon Harrop wrote:
> Peter Ammon wrote:
> 
>>Jon Harrop wrote:
>>
>>>1. Stability (many relevant bits of OCaml's stdlib have been formally
>>>proven correct using theorem provers and will have been used by many more
>>>people than a "home grown" C implementation).
>>
>>I'd wager that more eyes have looked at something like the GLib C data
>>structure library than at OCaml.
> 
> 
> I'll wager that formally proving something correct improves stability more
> than having "eyes looking at it".

You're probably right.

> 
> 
>>>2. Generality (what happens if you want to combine two difference C
>>>programs which use two different "home grown" set implementations?).
>>
>>If you want a library to support some task, you're going to find a lot
>>more options in C than in OCaml.
> 
> 
> Incompatible options, yes.
> 
> 
>>A data structure impedance mismatch is easier to deal with than a language
>>mismatch. 
> 
> 
> You're probably going to end up copying the data structures in both cases.

That's so.  But it's still substantially more difficult to modify your 
OCaml code to operate with ObscureCLibrary than it would be if you were 
all C.  Going from C->OCaml is probably even harder, but less likely to 
come up.

> 
> 
>>>3. Efficiency (a "home grown" set implementation in C is unlikely to
>>>match the performance of OCaml's set implementation).
>>
>>Doubtful.  OCaml is very fast, but it still targets the same instruction
>>set as C, and there are very fast C libraries as well.
> 
> 
> No, my statement is true. Instruction sets are basically irrelevant.

The point was that there's no optimizations that are possible in OCaml 
that are not possible in C, at least in principle.  Or if there are, I 
don't know about them :>  My C compiler does optimizations typically 
associated with functional languages, like transforming recursive 
functions to tail-recursive form and then applying tail-recursion 
optimization.

> The
> point is that an optimiser can do a lot better when it has more information
> to act upon, i.e. in a higher-level language.
> 
> Specifically in the context of sets, the OCaml stdlib implementation will
> return the union of two non-overlapping sets in O(1) time. That is
> substantially faster than the O(n) of the STL set_union and all other
> imperative set implementations that I have heard of.

That's really cool, if true!  Can you point me at more information about 
it?  All the references I can find say that OCaml's module Set uses 
red-black trees, and I don't see how you can get O(1) unions with those.

> 
> 
>>>4. Capability (OCaml's built-in hash function can traverse any data
>>>structure).
>>
>>I don't understand what you mean by this.
> 
> 
> The GC must be able to traverse any data structure using only information
> available at run-time (OCaml strips out as much run-time type information
> as possible in order to evade run-time type checking). OCaml also comes
> with several functions (e.g. polymorphic comparison operators) that use
> this capability to traverse any data structure in order to compare, hash
> and so forth.

Interesting.  But wouldn't that make hashing a large data structure 
expensive?

> 
> 
>>For the record, here's how I'd do it in Cocoa/Objective-C, using a hash
>>table:
>>
>>NSMutableDictionary* dict = [NSMutableDictionary dictionary];
>>NSEnumerator* enumer = [input objectEnumerator];
>>NSString* string;
>>while ((string = [enumer nextObject]))
>>[dict setObject:[NSNumber numberWithInt:1 + [[dict objectForKey:string]
>>intValue]] forKey:string];
>>
>>Certainly nothing approaching "hundreds of lines!"  If anything, it's
>>shorter than the OCaml approach.
> 
> 
> That's also very nice. I can't understand it and I don't know if it is
> correct (e.g. does it compute the same result as my OCaml?).
> 

If I understand your code correctly, I think so.  It computes a 
dictionary of string->number of appearances of that string.  While 
Objective-C shares OCaml's nasty syntax and lacks OCaml's elegance, it 
makes up for that in sheer practicality:

1) Dictionaries can work with any type that can hash itself and compare 
itself to another object, so I don't have to write the "setup" 
type-declaring code.

2) Messages sent to nil objects return 0, and a missing key in a 
dictionary gives you nil, so I don't have to special-case looking up an 
object that is not in the dictionary.

Just wait 'til the Perl guy gets here, though!

(Seriously, tell more about these sets.)

-Peter

-- 
Pull out a splinter to reply.

.