Subj : Re: Change Patterns (was: Polymorphism sucks)
To   : comp.programming,comp.object
From : topmind
Date : Sat Jul 02 2005 12:37 am



Chris Sonnack wrote:
> topmind writes:
>
> >> It has nothing to do with disk footprint.  It has to do
> >> with the comparable overhead of *managing* and *using*
> >> an external system.
> >
> > If it is compiled into the EXE, how is it more "external"
> > than the string library, math library, file library, etc.?
>
> Can you name a single DBS that can be compiled into the app?

http://directory.fsf.org/libs/c/SQLite.html

http://linux.techass.com/projects/xdb/xbasedocs/xbase_c1.html

http://x2c.dtop.com/x2c-spec.html

>
> More to the point, why, when all I need to do is pass data
> between functions?

Fine, if you have super-simple stuff, go for it.
Don't need a DB for Hello World.

>
>
> >> It's a simple equation.  External DBS solutions don't buy
> >> anything in this case and cost more than simple "in-
> >>language" solutions.
> >
> > DB does not necessarily mean an external-language.
>
> By "in-language" I mean that the language itself provides all
> the functionality I require.
>
> I don't need a separate database.
> I don't need a separate library.
> I don't have to worry about licensing & distribution issues.
> I don't have to concern myself with the DB API.

You sound like a Luddite.

>
>
> >> I would generally use a DB if the ability to search were
> >> important, or if it were important to query for a subse
> >> of the total dataset.
> >
> > And that is not an uncommon need. Often one cannot know
> > future uses of a given set of information at the start
> > of a project.  Thus, DB's are a good future-change hedge.
>
> In fact, any time one is dealing with a substantial dataset,
> a DB is probably a good choice.  And they're an obvious
> choice if you need to persist data between program runs.
>
> However, at a guess, the bulk of programming problems do
> not lie in that domain.

I question that.

>
> >>>> From an engineering and maintenance point of view, not
> >>>> to mention from an elegance and simplicity point of view,
> >>>> you'd like to completely decouple the component from the
> >>>> data provider as well as from the data consumer.
> >>>
> >>> What about decoupling from input order?
> >>
> >> Implicit in what I wrote above.
> >
> > Well, I don't see it.
>
> What do you think "COMPLETELY DECOUPLE the component from
> the data provider as well as from the data consumer" means?


Coupling to time is still coupling. If time requirements
change, you have to change your coding strategy away
from pipes.


> >>> I don't have a need to swap the output format that often anyhow.
> >>
> >> It was YOUR example in the first place.
> >
> > I don't think that is the case.
>
> This is a quote from a post of yours dated 6/18 (this thread):
>
> } Let me talk your language and give a systems-software example:
> } suppose you wanted a device/io driver to output *both* to HTML
> } and a database. Or three: email, database, and HTML file
>
> > IIRC it stems from one of R. Martin's scenarios where his
> > wiki could "store" messages in RAM, DB, or files.
>
> (Why double-quote "store"?)

It reflects a long debate that once took place about the
concept and definition of "storage".

>
> That was another example in another thread.  And that was an
> excellent solution he delivered using polymorphism.  It allowed
> others to develop plugins for his app a need to recompile.
>

IIRC, R. Martin never claimed his system could do
both at the same time without a fair amount of changes.

>
> >>> I am not sure what "other thread" you are referring to.
> >>
> >> But you're only participating in two here.
> >> Can't you keep them straight?
> >
> > I meant multiple messages and examples, not multiple topics.
>
> It's still a key example you introduced.  You really should
> remember it.  Bank account, combined savings/checking.  Ring
> a bell?  It would seem you remembered it a couple posts ago:

It does not matter. I guessed that's what you were talking
about and answered it ANYHOW in that
topic. Go back and look. Why fuss over
non-issues such as what I remembered or didn't? It makes it
appear that you are more interested in trying to embarrass
me via irrelavent trivia than deal with technical issues.

>
> >>> At least a reader can see that a bank account can
> >>> potentially be both checking and savings.
> >>
> >> Once more: I agree and I've shown you (as did Robert) how
> >> it can be done in an elegant, clean fashion.  Now that we've
> >> done that, you want to disavow your own example.
> >
> > And I have shown how that technique does not scale.
>
> Why would it need to?  How many account types can any bank
> possibly have?

That depends on how you define "type". The real world is
is not really based on "types" in the longer run. "Types"
are a poor modeling concept and should be banished,
especially for complex business entities/objects.

Customers want an a-la-carte smorgish-bord of choices.
They don't want stuff lumped together into arbitrary
"types". I don't want to pay for a spoiler on a car just
to get leather seats. Sometimes regulation or
red-tape forces things into such lumps, but
even legislation changes over time. A stock-bond
hybrid (such as a portfolio) is perfectly possible
if regulators allow some creativity.
Some combinations perhaps don't make
sense, but the pattern of such misfits is not
necessarily tree-shaped.

>
> > Yes, if that was the ONLY change, then a hybrid account
> > "type" may do it.
>
> It's not a "hybrid".  It's a distinct type.

Any hybrid can be shoehorned into hierachical classification,
but that is messy. What was that combination formula
again? f-squared minus one, where f is each factor?

IT DOES NOT SCALE. Face it. Every new factor can
explode the size of the tree if you use combo
subtypes.


> Common functionality
> is factored out to reduce code.  There are MANY ways to do this
> depending on the circumstances and language used.

Yes, and sets are a very good way because they don't
assume any particular pattern of change. Thus, if you
guess wrong up front, you are less penalized. Let's
see a tree handle 10 relatively orthogonal factors
of variation. Go for it. Tree me!

>
> > I am unclear whether you are suggesting that things will stay
> > small, and thus hybridization will be "sufficient", or whether
> > you are rejecting the idea that hybridization does not scale.
>
> In this case, things probably will stay small.  More importantly,
> OOD actually is good at adding new types.


Yes. We all know that. The issue is whether that is a
common change pattern. In my observation it is not (at
least not "clean" types).


> What's harder is adding
> new functionality (methods) to existing types.  Procedural design
> can be better (easier) at adding new functionality, but has a
> harder time (sometimes MUCH harder time) adding new types.

That is why for more complex projects, dynamic sets are the
way to go.

>
> > If the first (stay simple), then competitors, such as case
> > statements, will not be a problem either.
>
> That's almost always false.  Staying simple, in this case, means
> having few types (of accounts) to manage.  However, the banking
> software itself is likely large and needs to "switch" on a type
> in many, many places.

So? If you don't add new "types" (cough) very often, it does
not matter. Plus, one can add new operations without visiting
a bunch of subclasses.

>
> This is exactly why OOD shines at adding new types.  You can add
> a CheckingSavings class and--if necessary--factor out common code.
> This change takes place in one place: the class code.


No. You have to move the checkings logic and the savings logic
up the tree in order to share it. I don't have to shuffle things
to different routines because the dispatching is based on
dynamic references instead of the physical packaging of
code. I have moved to the virtual world while you are pushing
stones up the hill. Yabba Dabba Du!


> And because
> the change is entirely local, it can be heavily unit tested to
> guarentee the class performs as specified.
>
> Meanwhile, there are case statements switching on type scattered
> all through out the banking software.  EACH of these must be
> located and checked for changes.  Now you've changed large amounts
> of the main software modules and all those changes need to be
> tested.

If that was a common change pattern, you would indeed win.
But, that is not how most real things change.

>
> > Either way, you are between a rock and a hard-place as far as
> > scale and change.
>
> Nope.  The OOD localizes the addition of a new type.  It's highly
> likely the main software modules need no change at all.

Not if the sharing has to change levels.

>
> >>> I already gave an anecdote of such breaking down during a
> >>> budgeting summary app.
> >>
> >> Which, from your description, you misunderstood the natural
> >> design from the beginning and had to re-do it.
> >
> > IIRC, it was the user, an accountant, who decided to move away
> > from a tree when she or her boss was not happy with the first
> > round of output. It was not careless analysis on my part.
> > People change their mind, and those changes are not always
> > tree-shaped.
>
> (I have no idea what the heck "tree-shaped changes" would be.)

A new subtype that does not share any existing implementation.
In other words, simply add a new node without having to
shuffle existing nodes or parts of existing nodes in order
to get the needed sharing of behavior.

>
> Maybe I've just been in the corporate environment long enough
> to know budgets don't have tree-like characteristics.
>
> > For the sake of argument, even if it was due to careless
> > analysis, sets are a better hedge against such mistakes.
> > Trees degenerately more poorly than sets do tree-ness if
> > they have to.
>
> I'm sorry, I can't parse the grammar on that one.  From what I
> can extract, I don't agree, simply because these are two tools
> with separate domains.  One shines in some areas, the other
> shines in others.
>
>
> >> I've given you an awful lot of counter-examples upon which you
> >> seem mute.
> >
> > I don't remember any good ones.
>
> Companies, the military, BOMs, CAD/CAM assemblies, XML, HTML,
> animal nervous or circulatory systems, utility distribution
> networks,... (to name just a few).

I have debunked most of them. Sets represent assemblies just
as well if not better (speed aside).

Circulatory systems are a directed graph, not a tree.

Utility distribution? Thatsa new one. Are you sure they
are pure trees?

>
> >> Searching trees is trivially easy.
> >
> > Bull. Only if you use the tree's main factors. The "most recent
> > files regardless of directory" example I already gave is an
> > example.
>
> Trivially easy.  Stupid old MS-DOS can list files recursively
> and sorted by date (which you'd have to text process after, but
> the point is that even MS-DOS can do tree->list).


That is a work-around to trees, not using trees.


> In Unix,
> you could do it from the command line using find.  On either
> (or any platform supporting perl), it's trivially done in a
> small number of code lines.
>
> Recursing a tree structure is Programming 201 (if not 101).
>
>     function Scan_Tree (Tree t, Output o)
>         foreach n in t.nodelist
>             if n.ischild
>                 Scan_Tree (t,o)
>             else
>                 o.Write(n.contents)

Again, that is a work-around to emulating a sequential
search. It is not something faciliated by trees.

>
> You want to find "most recent" files?  Define "most recent".
> Shall we say all files modified today (and we'll use a bit
> different tree design just for fun)?
>
>     function Scan_Tree (Directory d, Output o)
>         -- do the files
>         foreach file in t.files
>             if file.date == TODAY
>                 o.Write(file.name)
>         -- do the subdirectories
>         foreach subdir in d.subdirs
>             Scan_Tree (subdir, o)
>
> Incidentally, note the usefulness of the polymorphic Output
> object, "o".  The function is completely decoupled from the
> output mechanism.  Could be the screen, could be a file,
> could be the network.  Doesn't matter.

Lovely. More device driver examples touting polymorphism.

>
> > Let's see you search for the largest or smallest parts
> > in a BOM tree.
>
> Pretty much the same trivially easy algorithm as above.

And something not aided by trees. It is a Turing Tarpit.

>
> >>> probably different than what a material specialist would want
> >>> to see. A material specialist may not really care how stuff is
> >>> connected.
> >>
> >> And would not likely be working from a BOM.
> >
> > So we have to duplicate the information to see it different
> > ways? Not very good factoring there.
>
> Not duplicating, presenting differently.  A BOM usually contains
> nodes that refer to the actual parts.  A materials specialist
> isn't interested in the BOM, which may list the same screw in
> many sub-assemblies.  They just care about that screw.  They'd
> work from the database that contains the individual part data
> (and that's where a database would be the right tool, BTW).

Then you don't really have a tree after all, but a network
database, which were pretty much shot in the back with
Dr. Codd's paper. Network databases were tried in the
60's and lost out (except in narrow niches). The witch
is dead.

>
> >> A BOM isn't a taxonomy, it's a hierarchical list.
> >
> > Either way, similar problems creep in.
>
> What problems?  How else would you like a Bill of Materials?
> As one flat list with additional fields describing where each
> piece goes?  I think the users would be after you with torches
> and pitchforks!

Learn about relational modeling.

>
> A BOM, like an outline, presents a large amount of data in a way
> that allows you to view the level of detail you need at the mo'.
>
>
> >> A, B and C are typically all children of an assembly.
> >
> > If everything is connected, where does one assembly start
> > and another end?
>
> Where the designer specifies (be honest, have you ever actually
> used a CAD/CAM program?).
>
> > Suppose we have a chassis and many things are bolted to a
> > chassis. What are the bolt's parents?
>
> The chassis assembly.  Which consists of the chassis, all the
> mounted sub-assemblies and the mounting hardware.

Sounds like when you don't know where else to put stuff, you
just dump it into some "assembly".

>
>
> >>>> Hierarchies and fractals have a lot in common in that there
> >>>> are self-similar *levels* and in the concept of "drilling down"
> >>>> (or "zooming back").  The tree structure--as much as you try
> >>>> to find fault with it--IS a fundamental, natural relationship.
> >>>
> >>> Wrong. [...snip unsupported hand waving...]
> >>
> >> I see your claim and hand waving.  Now let's see some analysis
> >> or cogent thinking to back it up.
> >
> > Back it up? The default is chaos. If you claim tree-ness, you have
> > to show tree-ness. You claim a pattern, you have to demonstrate that
> > pattern.
>
> Do you deny the hierarchical nature of fractals?
> Do you deny that fractals are widely found in nature?
>
> I've pointed out *many* natural hierarchies.
> I've illustrated many programing abstractions that are trees.

Not "are". They are purely a human imaginitive invention.

>
> All you've done is make claims.

Like claim that things *are* trees? Can you prove they
are not sets, or that sets are less flexible?

> And presented one personal experience
> where you got into trouble with a tree-shaped abstraction.
>

I pointed out at least 3 I beleive.

>
> >>>> Databases fail to efficiently model trees, because trees are
> >>>> "triangular" in nature.  Databases are tabular, or "rectangular",
> >>>> in nature.  These are mutually exclusive natures.
> >>>
> >>> RDB's are whatever shape you want.
> >>
> >> Not at their core.  Each table is 2D.  Each query returns a 2D
> >> dataset. You can build upon that, but at their root, databases
> >> are 2D.
> >
> > Wrong. They are not bound to any dimension limit.
> > The 2D thing is how we typically present them on
> > paper, but tables can represent hundreds of
> > dimensions.
>
> Do you deny that a database table has ONLY rows and columns?

Yes, but rows and columns can be used to represent 3D and 4D
structures, etc. Trees can be represented as 1D structures (like
a Lisp program string). Does that make them 1D?

>
> The key word above is "can represent".  Of course they can.
> You can build all sorts of abstractions on top of the data,
> but you either use denormalized data and duplicate a LOT, or
> you use the overhead of RDB and need to manage normalizing
> the data and maintaining the links and data integrity.
>
> All of which is quite doable, but it's a *cost*.  In many cases,
> that cost is unacceptable given other solutions with lower cost.

Data integrity is a bonus, not a requirement. For that
matter so is normalizing, but you take a big risk by
not. Normalization is mostly about
the removal of duplication.

>
>
> >>> RDB's can represent trees also,...
> >>
> >> Inefficiently.
> >
> > You are right that they won't be able to do tree operations
> > as fast as a tree-only system.
>
> Or efficiently.
>
> > But DB's are more general purpose so that they can handle new
> > "shapes" without painful overhauls.
>
> I VERY much doubt that the redesign of the relations required
> to change the shape is trivial.  You're essentially talking
> about re-doing the database schema and all queries and procs
> based on the schema.

They are far more flexible than trees. If you want a node
to be part of a tree, just add a Parent_ID column. If you
want it to be part a bunch of sets, just create a
many-to-many table with references to the record.
The original record is *still* there.
The few change patterns that reqire
schema overhauls usually mess up trees also.

>
> >>> A tree with duplicate nodes. Genetic copy-and-paste.
> >>
> >> No, not duplicate nodes.  Nodes that connect.
> >
> > If the connections are cross-branch, it is not a tree.
> > Remember, everybody has/had a mother and father.
>
> Geneology trees usually treat "parents" as a node.
>
> Tell you what.  Simple challenge:  show me how you'd represent
> your paternal grandparents and all their offspring.  Remember,
> your answer can't be tree-shaped.
>

I cannot provide a sketch here and am not in the mood for
ascii art.

>
> >> SELECT *
> >> FROM table_a
> >> WHERE (CREATED >= '2005-06-01') AND (CREATED < '2005-07-01')
> >>
> >> Needs to examine every record in the database to produce the
> >> query, right?
> >
> > Not necessarily. Read about "indexes".
>
> Fine, needs to examine every entry in the indexes table (unless
> it's smart enough to store the indexes in some form of, um, TREE
> structure that allows drilling down).

You are right, indexes use trees. That is a good use for them.
RDBMS can use the power of trees without forcing external
trees on the user of the RDBMS. Thus, you get the power without
the limits. Eat Cake!


>
> >> Now imagine a magical database that is hierarchical with a root,
> >> a first level of "By Year" and a second level of "By Month".
> >>
> >> SELECT * FROM table_a.2005.June
> >>
> >> Only has to search a subset of the total.
> >
> > Databases can take shortcuts on frequently-used factors by using
> > indexes.
>
> Who said this was frequently used?  It's an ad hoc query.

How do you do ad-hoc queries on trees where the factor is
not a branching factor? More costly recursion to emulate a
sequential search?

>
> The point, in both cases, is that querying for data requires,
> one way or other, examining that dataset.  You've GOT to be
> aware of how queries slow down as the data grows EVEN with
> indexes.

RDBMS are faster than tree DB's for ad-hoc queries.

>
> >> Remember, we were discussing where there are natural trees in
> >> real life.
> >
> > How are you defining "natural"?
>
> The usual way.

"Natural" seems to be your definition for how YOU think.
I now tend to think in terms of sets far more than trees.
That is "natural" to me.

>
>
> >>> But if you are trying to figure out if there is a copper processing
> >>> expert in Cleveland, the tree is mostly moot.
> >>
> >> Actually, the tree is very useful, since it lets me look ONLY in
> >> Cleveland, rather than everywhere in the world.
> >
> > Perhaps, but says nothing about copper processors.
>
> And?   What would you rather?  Query all employees everywhere in the
> world, or just those in Cleveland?

That is up to the database. I just ask for what I want. How it is
internally stored is generally not my concern. And, quickness
tricks such as adding indexes don't change my original query.

>
>
> >> In places where the problem isn't hierarchical, I do.
> >
> > But one cannot know ahead of time what will stay
> > hierarchical and what will not.
>
> Many things are naturally hierarchical and can be assumed to remain
> so.  Some of it comes from experience in the domain.  For example,
> your budget summary problem: after 20 years in corporate programming,
> I *know* budget allocations are not tree-shaped.

Yes, but not everyone has the benefit of 20-years. Thus, when
in doubt, take the better hedge: sets.

> 
> 

-T-

.