Subj : Re: Change Patterns (was: Polymorphism sucks)
To   : comp.programming,comp.object
From : Chris Sonnack
Date : Fri Jul 01 2005 06:35 pm

topmind writes:

>> It has nothing to do with disk footprint.  It has to do
>> with the comparable overhead of *managing* and *using*
>> an external system.
> 
> If it is compiled into the EXE, how is it more "external"
> than the string library, math library, file library, etc.?

Can you name a single DBS that can be compiled into the app?

More to the point, why, when all I need to do is pass data
between functions?


>> It's a simple equation.  External DBS solutions don't buy
>> anything in this case and cost more than simple "in-
>>language" solutions.
> 
> DB does not necessarily mean an external-language.

By "in-language" I mean that the language itself provides all
the functionality I require.

I don't need a separate database.
I don't need a separate library.
I don't have to worry about licensing & distribution issues.
I don't have to concern myself with the DB API.


>> I would generally use a DB if the ability to search were
>> important, or if it were important to query for a subse
>> of the total dataset.
> 
> And that is not an uncommon need. Often one cannot know
> future uses of a given set of information at the start
> of a project.  Thus, DB's are a good future-change hedge.

In fact, any time one is dealing with a substantial dataset,
a DB is probably a good choice.  And they're an obvious
choice if you need to persist data between program runs.

However, at a guess, the bulk of programming problems do
not lie in that domain.

>>>> From an engineering and maintenance point of view, not
>>>> to mention from an elegance and simplicity point of view,
>>>> you'd like to completely decouple the component from the
>>>> data provider as well as from the data consumer.
>>>
>>> What about decoupling from input order?
>>
>> Implicit in what I wrote above.
> 
> Well, I don't see it.

What do you think "COMPLETELY DECOUPLE the component from
the data provider as well as from the data consumer" means?


>> What about trivially easy to implement, native to the language,
>> no libraries, no IPC, no database, and low overhead?
> 
> If you want to forever live in the 80's and avoid powerful
> tools, be my guest.

I use powerful tools WHEN APPROPRIATE.  In the example I provided
you (in great detail) they are not for reasons I've listed repeatedly.


>>> I don't have a need to swap the output format that often anyhow.
>>
>> It was YOUR example in the first place.
> 
> I don't think that is the case.

This is a quote from a post of yours dated 6/18 (this thread):

} Let me talk your language and give a systems-software example:
} suppose you wanted a device/io driver to output *both* to HTML
} and a database. Or three: email, database, and HTML file

> IIRC it stems from one of R. Martin's scenarios where his
> wiki could "store" messages in RAM, DB, or files.

(Why double-quote "store"?)

That was another example in another thread.  And that was an
excellent solution he delivered using polymorphism.  It allowed
others to develop plugins for his app a need to recompile.


>>> I am not sure what "other thread" you are referring to.
>>
>> But you're only participating in two here.
>> Can't you keep them straight?
> 
> I meant multiple messages and examples, not multiple topics.

It's still a key example you introduced.  You really should
remember it.  Bank account, combined savings/checking.  Ring
a bell?  It would seem you remembered it a couple posts ago:

>>> At least a reader can see that a bank account can
>>> potentially be both checking and savings.
>>
>> Once more: I agree and I've shown you (as did Robert) how
>> it can be done in an elegant, clean fashion.  Now that we've
>> done that, you want to disavow your own example.
> 
> And I have shown how that technique does not scale.

Why would it need to?  How many account types can any bank
possibly have?

> Yes, if that was the ONLY change, then a hybrid account
> "type" may do it.

It's not a "hybrid".  It's a distinct type.  Common functionality
is factored out to reduce code.  There are MANY ways to do this
depending on the circumstances and language used.

> I am unclear whether you are suggesting that things will stay
> small, and thus hybridization will be "sufficient", or whether
> you are rejecting the idea that hybridization does not scale.

In this case, things probably will stay small.  More importantly,
OOD actually is good at adding new types.  What's harder is adding
new functionality (methods) to existing types.  Procedural design
can be better (easier) at adding new functionality, but has a
harder time (sometimes MUCH harder time) adding new types.

> If the first (stay simple), then competitors, such as case
> statements, will not be a problem either.

That's almost always false.  Staying simple, in this case, means
having few types (of accounts) to manage.  However, the banking
software itself is likely large and needs to "switch" on a type
in many, many places.

This is exactly why OOD shines at adding new types.  You can add
a CheckingSavings class and--if necessary--factor out common code.
This change takes place in one place: the class code.  And because
the change is entirely local, it can be heavily unit tested to
guarentee the class performs as specified.

Meanwhile, there are case statements switching on type scattered
all through out the banking software.  EACH of these must be
located and checked for changes.  Now you've changed large amounts
of the main software modules and all those changes need to be
tested.

> Either way, you are between a rock and a hard-place as far as
> scale and change.

Nope.  The OOD localizes the addition of a new type.  It's highly
likely the main software modules need no change at all.

>>> I already gave an anecdote of such breaking down during a
>>> budgeting summary app.
>>
>> Which, from your description, you misunderstood the natural
>> design from the beginning and had to re-do it.
> 
> IIRC, it was the user, an accountant, who decided to move away
> from a tree when she or her boss was not happy with the first
> round of output. It was not careless analysis on my part.
> People change their mind, and those changes are not always
> tree-shaped.

(I have no idea what the heck "tree-shaped changes" would be.)

Maybe I've just been in the corporate environment long enough
to know budgets don't have tree-like characteristics.

> For the sake of argument, even if it was due to careless
> analysis, sets are a better hedge against such mistakes.
> Trees degenerately more poorly than sets do tree-ness if
> they have to.

I'm sorry, I can't parse the grammar on that one.  From what I
can extract, I don't agree, simply because these are two tools
with separate domains.  One shines in some areas, the other
shines in others.


>> I've given you an awful lot of counter-examples upon which you
>> seem mute.
> 
> I don't remember any good ones.

Companies, the military, BOMs, CAD/CAM assemblies, XML, HTML,
animal nervous or circulatory systems, utility distribution
networks,... (to name just a few).

> You seemed to simply deny or question my anecdotes.

Do you expect to say things unquestioned?


>> Searching trees is trivially easy.
> 
> Bull. Only if you use the tree's main factors. The "most recent
> files regardless of directory" example I already gave is an
> example.

Trivially easy.  Stupid old MS-DOS can list files recursively
and sorted by date (which you'd have to text process after, but
the point is that even MS-DOS can do tree->list).  In Unix,
you could do it from the command line using find.  On either
(or any platform supporting perl), it's trivially done in a
small number of code lines.

Recursing a tree structure is Programming 201 (if not 101).

    function Scan_Tree (Tree t, Output o)
        foreach n in t.nodelist
            if n.ischild
                Scan_Tree (t,o)
            else
                o.Write(n.contents)

You want to find "most recent" files?  Define "most recent".
Shall we say all files modified today (and we'll use a bit
different tree design just for fun)?

    function Scan_Tree (Directory d, Output o)
        -- do the files
        foreach file in t.files
            if file.date == TODAY
                o.Write(file.name)
        -- do the subdirectories
        foreach subdir in d.subdirs
            Scan_Tree (subdir, o)

Incidentally, note the usefulness of the polymorphic Output
object, "o".  The function is completely decoupled from the
output mechanism.  Could be the screen, could be a file,
could be the network.  Doesn't matter.

> Let's see you search for the largest or smallest parts
> in a BOM tree.

Pretty much the same trivially easy algorithm as above.

>>> probably different than what a material specialist would want
>>> to see. A material specialist may not really care how stuff is
>>> connected.
>>
>> And would not likely be working from a BOM.
> 
> So we have to duplicate the information to see it different
> ways? Not very good factoring there.

Not duplicating, presenting differently.  A BOM usually contains
nodes that refer to the actual parts.  A materials specialist
isn't interested in the BOM, which may list the same screw in
many sub-assemblies.  They just care about that screw.  They'd
work from the database that contains the individual part data
(and that's where a database would be the right tool, BTW).

>> A BOM isn't a taxonomy, it's a hierarchical list.
> 
> Either way, similar problems creep in.

What problems?  How else would you like a Bill of Materials?
As one flat list with additional fields describing where each
piece goes?  I think the users would be after you with torches
and pitchforks!

A BOM, like an outline, presents a large amount of data in a way
that allows you to view the level of detail you need at the mo'.


>> A, B and C are typically all children of an assembly.
> 
> If everything is connected, where does one assembly start
> and another end?

Where the designer specifies (be honest, have you ever actually
used a CAD/CAM program?).

> Suppose we have a chassis and many things are bolted to a
> chassis. What are the bolt's parents?

The chassis assembly.  Which consists of the chassis, all the
mounted sub-assemblies and the mounting hardware.


>>>> Hierarchies and fractals have a lot in common in that there
>>>> are self-similar *levels* and in the concept of "drilling down"
>>>> (or "zooming back").  The tree structure--as much as you try
>>>> to find fault with it--IS a fundamental, natural relationship.
>>>
>>> Wrong. [...snip unsupported hand waving...]
>>
>> I see your claim and hand waving.  Now let's see some analysis
>> or cogent thinking to back it up.
> 
> Back it up? The default is chaos. If you claim tree-ness, you have
> to show tree-ness. You claim a pattern, you have to demonstrate that
> pattern.

Do you deny the hierarchical nature of fractals?
Do you deny that fractals are widely found in nature?

I've pointed out *many* natural hierarchies.
I've illustrated many programing abstractions that are trees.

All you've done is make claims.  And presented one personal experience
where you got into trouble with a tree-shaped abstraction.


>>>> Databases fail to efficiently model trees, because trees are
>>>> "triangular" in nature.  Databases are tabular, or "rectangular",
>>>> in nature.  These are mutually exclusive natures.
>>>
>>> RDB's are whatever shape you want.
>>
>> Not at their core.  Each table is 2D.  Each query returns a 2D
>> dataset. You can build upon that, but at their root, databases
>> are 2D.
> 
> Wrong. They are not bound to any dimension limit.
> The 2D thing is how we typically present them on
> paper, but tables can represent hundreds of
> dimensions.

Do you deny that a database table has ONLY rows and columns?

The key word above is "can represent".  Of course they can.
You can build all sorts of abstractions on top of the data,
but you either use denormalized data and duplicate a LOT, or
you use the overhead of RDB and need to manage normalizing
the data and maintaining the links and data integrity.

All of which is quite doable, but it's a *cost*.  In many cases,
that cost is unacceptable given other solutions with lower cost.


>>> RDB's can represent trees also,...
>>
>> Inefficiently.
> 
> You are right that they won't be able to do tree operations
> as fast as a tree-only system.

Or efficiently.

> But DB's are more general purpose so that they can handle new
> "shapes" without painful overhauls.

I VERY much doubt that the redesign of the relations required
to change the shape is trivial.  You're essentially talking
about re-doing the database schema and all queries and procs
based on the schema.


>>> A tree with duplicate nodes. Genetic copy-and-paste.
>>
>> No, not duplicate nodes.  Nodes that connect.
> 
> If the connections are cross-branch, it is not a tree.
> Remember, everybody has/had a mother and father.

Geneology trees usually treat "parents" as a node.

Tell you what.  Simple challenge:  show me how you'd represent
your paternal grandparents and all their offspring.  Remember,
your answer can't be tree-shaped.


>> SELECT *
>> FROM table_a
>> WHERE (CREATED >= '2005-06-01') AND (CREATED < '2005-07-01')
>>
>> Needs to examine every record in the database to produce the
>> query, right?
> 
> Not necessarily. Read about "indexes".

Fine, needs to examine every entry in the indexes table (unless
it's smart enough to store the indexes in some form of, um, TREE
structure that allows drilling down).

>> Now imagine a magical database that is hierarchical with a root,
>> a first level of "By Year" and a second level of "By Month".
>>
>> SELECT * FROM table_a.2005.June
>>
>> Only has to search a subset of the total.
> 
> Databases can take shortcuts on frequently-used factors by using
> indexes.

Who said this was frequently used?  It's an ad hoc query.

The point, in both cases, is that querying for data requires,
one way or other, examining that dataset.  You've GOT to be
aware of how queries slow down as the data grows EVEN with
indexes.


>> Remember, we were discussing where there are natural trees in
>> real life.
> 
> How are you defining "natural"?

The usual way.


>>> But if you are trying to figure out if there is a copper processing
>>> expert in Cleveland, the tree is mostly moot.
>>
>> Actually, the tree is very useful, since it lets me look ONLY in
>> Cleveland, rather than everywhere in the world.
> 
> Perhaps, but says nothing about copper processors.

And?   What would you rather?  Query all employees everywhere in the
world, or just those in Cleveland?


>> In places where the problem isn't hierarchical, I do.
> 
> But one cannot know ahead of time what will stay
> hierarchical and what will not.

Many things are naturally hierarchical and can be assumed to remain
so.  Some of it comes from experience in the domain.  For example,
your budget summary problem: after 20 years in corporate programming,
I *know* budget allocations are not tree-shaped.


-- 
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL  |
|_____________________________________________|_______________________|

.