Subj : Re: Polymorphism sucks [Was: Paradigms which way to go?]
To   : comp.programming,comp.object
From : Chris Sonnack
Date : Fri Jul 29 2005 09:10 pm

topmind writes:

>>> It not necessarily about OOD. [...] it was talking about trees in
>>> general, not necessarily related to polymorphism or OO. The context
>>> was about the appearent popularity of trees among computer users
>>> IIRC, not just developers.
>>
>> In fact--as your own task break down showed--it's a natural form for
>> any intelligent person when breaking down something comprised of many
>> parts.  Or more properly, of parts and sub-parts...as all tasks are.
> 
> Like I keep saying, small trees are fine in some cases. When routines
> get bigger, one has to form sub-routines to avoid duplicating
> code (ie, duplicate nodes) in most cases. I rarely encounter routines
> that grow more than about 200 lines before candidates for
> duplication-factoring start to appear.

You're confusing several issues here.  By your own quote at the top,
we're talking about the popularity of trees (which I'd say goes beyond
computer *users* to all intelligent minds).

Trees--as an analytic tool--are *extremely* valuable, and for good
reason.  You demonstrated this yourself when you described a task as
an outline.  And, contrary to your statements about size, they become
even more valuable when the task is large.

I'm moving a later section up here, because it applies:

>>>> The simple fact is, you broke a common task into levels.  Just about
>>>> any set of tasks naturally breaks down that way.  (In fact, I've
>>>> spent this week working with a project leader for a coming project
>>>> doing just that.  The project is very large, and without a
>>>> hierarchical breakdown, it'd be beyond the capacity of any human
>>>> to deal with.)
>>>
>>> I bet you start to have cycles (or node duplication) past a certain
>>> point.
>>
>> Nope.  There are *no* duplicate tasks.  Each is distinct.
> 
> I don't believe you. I would have to see the code. I think
> there is probably a misunderstanding somewhere here.

Yes.  Yours.  Nowhere above am I talking about code.  I'm talking about
breaking down a large project into separate tasks.  Exactly what you
did a few weeks ago with a very small task.  In this case, we're talking
about a three-month project involving about a dozen people (more if you
count involved customers and testers).

I would submit that an outline (aka tree) is the ONLY viable way to
do such a breakdown.  It's not a "convenient lie", it's a reality.
Large tasks are made of small tasks.  It's natural.


WARNING: NOW WE'RE TALKING ABOUT DATA....

>> If all we're talking about is the taxonomy, then one can use as
>> many trees as needed to express different views of the information
>> depending on what one is looking for.
> 
> Yes, but IMO sets are superior to multiple trees in most cases.

For my money, if I had a situation with a pile of data and I wanted
different (hierarchical) views, I'd very likely put the data in a
database, table or whatever (aka set) and *VIEW* it with tree tool.

If the situation were such that the data was naturally hierarchical
and had a predominent tree structure to it, I might very well store
it in a tree--at least while in memory.  As I've said before, if I
need to persist data (that is, store it on disk), then typically I
use some sort of table, flat file or database.  It all depends.


WARNING: NOW WE'RE TALKING ABOUT CODING...

>>> Just like a record from a "Drink" entity.
>>
>> Nope.  A record is dumb.  It requires code to process it.
> 
> [...snip non-responsive answer...]
> 
>>>> Imagine I have a collection of Drink objects and I want to list
>>>> them in order by the amount of caffeine.  simple, since they all
>>>> have a common interface that lets me ask for the caffeine content.
>>>
>>> Just like a record from a "Drink" entity.
>>
>> Nope.  Records are dumb.  They have no interface.
> 
> They have no "interface"? Data queries.

Records can't be queried.  DATABASES can.  Records are just dumb
"things"....a collection of fields.


>> Database design is a pretty well-explored domain.  There are some very
>> fundamental limitations here.  You either need to look at each record
>> to determine if it matches your query, OR you need to--da da--look at
>> a subset....a child of the full dataset.
> 
> Perhaps, but there are potentially other indexing schemes that
> are non-tree. But it is a moot point anyhow. Just because the
> underlying machine uses 1's and 0's does not nec. mean that
> programmers should also.

All programmers ultimately do use 1s and 0s.  We just have tools that
usually shield us from the messy details.  More to the point, what the
DB indexes demonstrate is that, yet again, hierarchical organization
can be a huge win.

>> When it comes to dealing with large datasets, partitioning is a win.
>> Database designers know this, hence the indexing technology.
> 
> Philosophers have long known there is rarely One Right Taxonomy/
> Partitioning for a given thing.

So you use multiple partition schemes where needed.  No biggie.
The fact remains, without partitioning *somehow*, you have a mess.


>>> One could use SQL, another relational query language, and/or
>>> Query-By-Example to find stuff.
>>
>> Not seeing anything "relational" about this.  You do understand the
>> term, right?   (SQL, for example, works just fine in non-relational
>> databases.)
> 
> Hmmmm. I don't think I agree with this, but will have to think about
> that one.

I thought you were a "database guy"?  Think about a database with only
one table.  Think you can write an SQL query to return a subset of
the records in that table?   Of COURSE you can.


WARNING: TALKING ABOUT VIRTUAL FILE SYSTEMS NOW...

>> Numerical identifiers for folders?  Very dumb idea!
>> The example shows:
>>
>> }  westsrvr:4251/slides.shw
>>
>> "4251" has no connection to anything real.  How can anyone think
>> users can remember "4251"?  I have thousands of files in hundreds
>> of directories.  No human could remember distinct numbers for all
>> those folders.  I can't believe anyone sane could think this was
>> a viable option.
> 
> There may be other ways to label folders, and each may also have
> a discription attribute, and perhaps even make it a primary key.

Of course, and your page mentions that.  I was commenting there on
the stupidity of subjecting human users to numerical identifiers.

Consider a large--very large--file system.  How many attributes does
it take to uniquely identify a file, to enable you to locate it?

Consider that these attributes are essentially AND'd together.  If
you were writing a query, your WHERE clause would have many sub-clauses
connected with ANDs.

Which, logically speaking, is EXACTLY what a file path is.

The difference is that a file path is a lot easier to browse.
I looked at your--whatchamacallit--finder window.  Considering that
in a day's work I reference hundreds of files, having to use some
GUI search tool each time I wanted to locate a file.....NFW.

> For example, when companies move articles around in
> web URL paths, often it busts existing browser bookmarks.
> The same thing can happen with "meaningful" names.
> A "dumb" key is safer from such because it carries no
> external meaning.

What makes you think a dumb key prevents files from being moved?
The exact same problem exists.  The only problem is that with a
"dumb" key, there's no logical handle.  At least "Bobs Sales Project"
is a sensible thing to search for.

>> Further, all it's done is create an "abbreviation" for a location
>> (but a very difficult abbreviation to remember).  The same issue
>> of changes applies.  Change the location, and everyone's links
>> are wrong.
> 
> Whaaaaaat?

You seem to be assuming that files in your system never move.  I
have no reason--quite the contrary, in fact--to think that's true.
Files move for lots of reasons.

What you don't seem to understand is that a regular hierarchical file
system is logically equivalent to your system.  You have attributes,
a regular FS has path parts.  If you change the attributes/path parts,
the file "moves" and people's static references to it are no longer
valid.

The only real difference is that your virtual system requires a lot
more overhead.  And, ironically, a real implementation of your imaginary
system would probably use a hierarchical FS on the back end anyway.
If you throw all the files in one BIG directory, performance tends to
drop a lot.  (It's exactly the same as how index are hierarchical.)


>> Later on the page is the idea of associating properties to a file and
>> later searching for it by properties.  Which is fine until you forget
>> what properties you used, make up too many to handle, or change them.
>> How do you browse through the data to find the lost file??
> 
> How is this worse than forgetting a giant path???

It's approximately the same, I'd say.  The difference is that your system
uses a lot more overhead and--depending on how it's implemented--may be
a lot harder to browse quickly.

>> It also requires a big database in which to store all this.
> 
> So? Higher abstractions require more horse-power. I don't
> think 30,000 desktop files requires that much horse-power.

I thought we were talking WANs and big companies.  I likely have more
than 30K files--way more--on just my own machine.  My company must have
millions.


WARNING: TALKING ABOUT WORDS NOW...

>>>> It's simple and undeniable: sets have less structure than trees.  EOS.
>>>
>>> Prove it!
>>
>> I don't have to, you just admitted it.
> 
> It depends entirely on how one defines "structure".

Let's move this later bit to here:

>>>> Higher: above, superior.
>>>> Order: degree.
>>>> Structure: a complex construction or entity.
>>>
>>> Define "superior". Define "complex".
>>
>> My guess is you know what "superior" and "complex" mean.
> 
> Those terms tend to be relative and vague.

They may be somewhat relative--many things are--but I can't say
I find them in any way vague.

> It is too imprecise for our purposes.

So propose an alternative.

> For example, Bill Gates may be "superior" from a money tally
> standpoint, but if he ends up in hell when he dies (as a
> hypothetical example), then he is not superior from a
> religious standpoint.

Why do you think an attribute applies to all contexts?  From a
financial point of view, Gates is *way* superior.  Period.  There
is nothing vague or relative about that.

So, getting back to the point, structure is well-defined and we HAVE
a context.  Let's talk set theory.  You're big on sets, hopefully
you have some grasp of the underlying theory of sets.  Let's find
out.

Here's a set:   {{red} {blue} {green} {yellow} {thursday} {28} {}}

What does it mean?  What is it's structure?

Here's another:  {{1} {3} {2} {1.5} {34} {99} {0}}

What does it mean?  Is it the same as the other set?  If so, why?
If not, why not?

Here's another:  {{} {} {} {} {} {} {}}
And another:  {{{}}}

Are these the same or different from the first two?  Why or why not?

What--if any--structure exists in any of the above sets?
What--if anything--can you say about the above sets?

Now consider this data structure:
	{red}
		{green}
			{yellow}
			{28}
		{blue}
			{thursday}
			{}

What--if any--structure exists in the above data structure?
What--if anything--can you say about the above data structure?
(Does the fact that it's called a data STRUCTURE suggest anything?)

We await your answers--no copying your neigbor's, now.


WARNING: BACK TO HIERARCHIES AND TREES IN GENERAL...

>>> Having 2 bosses in a really small company. Who is the "root"?
>>
>> The CEO.
> 
> I worked at a company that had 3 owners and I had 3 bosses.

Can the bosses give you orders?  Can they give the owners orders?

What happened when each of the three bosses asked you to be at a
different meeting scheduled for the same time?


>> To the point: the situation is *entirely* hierarchical.
> 
> No, that is not a pure tree.

You keep saying that like it means something.  It doesn't.


>>> A GUI page A where you launch page B, but click a link which opens
>>> another instance of page A. This is common during web-browsing, I
>>> would note.
>>
>> ?? What does browser history have to do with anything?  That doesn't
>> in any way speak to recursion in the software.
> 
> If A calls B, B calls C, and C calls A, it *is* recursion.

The browsing *history* follows a recursive path.  The browser software
does not.  Your claim that GUI software is recursive is incorrect.

GUI **usage** may recurse, but so what?


>>>>> But a pure tree has no duplicate nodes.
>>>>
>>>> That's BS.  Show me one authority that agrees.
>>>
>>> Connect the subroutine calls on the paper. Don't take my word for it,
>>> get your pen out.
>>
>> So,... when you can't respond to a point, you just throw in something
>> totally irrelevant?  Okay.  We'll just assume capitulation.
> 
> No, I just cannot easily describe it without a visual.

Please read carefully.  Please read the first quoted line above.  Then
read the response--second quoted line above.  Your response--third line
above--is non-responsive, and the point is lost.  Let's make it more
clear:

>you> But a pure tree has no duplicate nodes.
>>
>me> That's BS.  Show me one authority that agrees.

Your turn again.


>>>> Totally False.  The call tree represents reality.
>>>
>>> And lots of duplications of parts of reality.
>>
>> If a routine is entered multiple times, that **IS** the reality.
> 
> And it is duplication.

What part of "that IS the reality" don't you get?  If a routine is
entered multiple times, the call graph **MUST** have duplicates.
(You DO know what a call graph is, don't you?--a(n imaginary) tree
listing the thread of execution of a running program.)

Once more: the call graph is a "pure" tree--no node links back, no
branches connect--but does indeed have "duplicate" nodes.  That is,
the name of the node is duplicated.  In the reality of the running
program, each visit to a given routine is *contextually* different.

Consider a parse tree.  Most programs use the language statements many
times, so there are duplicate, say, "IF/ELSE" nodes.  Probably lots
and lots of them.  But the parse tree itself is still a "pure" tree,
because each "IF/ELSE" is contextually different.

Get it?  Keep at it until you do.


>>>> I DO event-driven programming, and my programs definately have
>>>> high level routines and low level routines (and many medium level
>>>> routines).
>>>
>>> Show me the hierarchy here in out-line form then.
>>
>> Let's consider the most recent project I've worked on:
>>
>> Main
>> 	Initialize_Common_Globals()
>> 	Initialize_Program()
>> 	Load_Program()
>> 		LoadProperties()
>> 	Get_Special_Target_List()
>> 		LoadList(SpecialTargetList)
>> 			OpenTable("[SpecialTargets]")
>> 			<load loop>
>> 			Close
>> 	Load(ApplicationForm)
>>
>> Et cetera.  Get the picture?
> 
> That looks like an implementation of the event-handling engine,
> not actual events themselves.

What part of "my programs definately have high level routines and
low level routines" did you fail to apprehend?


>>> Lack of experience? I am a middle-aged developer. Started out on VAX's
>>> and PRIME minicomputers.
>>
>> Time in the saddle doesn't necessarily translate to knowledge.  You just
>> don't talk like someone who really understands data structures or OOD or
>> how polymorphism is used.
> 
> I similarly feel you don't have experience with relational
> and databases,...

Despite that I've worked heavily with them for 20 years?  (-:

> ...such as your complaint about auto-generated folder keys above.

My complaint stands.  It's a dumb idea.


> (Some relational fans don't agree with auto-gen keys, but know that
> named primary keys are also a possibility without giving it second
> thought. I had to remind you.)

I'm quite familiar with all this.  What you don't seem to appreciate is
that numerical identifiers are, in this case--to be blunt--stupid.  No
human will like them or use them.  If you replace them with string keys
(which, BTW, is my preference in DB *design* where possible--I only use
auto-gen'd keys were absolutely necessary), then you're no different
than using a file system path.

And incidentally, I will give you this: a virtual file system ON TOP OF
a regular one--a system that attaches searchable attributes and other
meta-data to files--IS a cool thing.


-- 
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL  |
|_____________________________________________|_______________________|

.