Subj : Re: How to diagnose/resolve libthread panic?
To   : comp.programming.threads
From : Giancarlo Niccolai
Date : Tue Jun 07 2005 04:25 pm

David Butenhof wrote:

> 
> The thread library and libc are expecially vulnerable, because they tend
> to be called a lot; so the addresses of their internal data will
> frequently appear on the stack. (I've often thought that a useful,
> though expensive, diagnostic mode would be a compiler switch that would
> "scrub" the stack in the procedure epilogue to prevent this... then
> again, you wouldn't compile that way for production code, so it'd just
> be less likely that you'd hit the problem in testing and even more
> likely that it'll show up at the big customer site on their busiest day
> of the year.)

To the excellent description of Dr. Butenhof, I would just like to add that
LIBC calls are particularily vulnerable to errors caused in the main
application also because, for efficiency reason and (wise) design
decisions, they provide minimal or no parameter/memory error detection.
OTOH, many applications have minimal parameter/data check during their
execution, but often those "sanity" check can just move the bugs bombing to
other parts of the applications.
I.e., suppose that I have a function that silently returns if a null pointer
is passed to it, and then I call libc this way:

char *buf = 0; 
// ooops, forgot to initialize it

prepare_buffer( buf );  // this silently returns when buf is zero
sprintf( buf, "...", ... );  // bang, segfault in LIBC.

After a month or so, it's hard to remember that prepare_buffer would have
silently failed (and returned) if fed with a 0 pointer, and at a first
glance, the developer hunting the bug may say "well, if prepare_buffer
didn't crashed, then there's something wrong in sprintf". This is a very
simple example, but this class of "problem masking bug" can get extremely
subtle and complicated in a real-world application (expecially in MT
architecture).

I often met this kind of errors in code I had to review, so, I took the
healty habit to crash the application whenever a sanity check fails,
instead of just trying to fix things on the fly... (i.e. the assert() in
C++, or similar self-made things in C).
And yes, when there's a segfault in a LIBC code, the first thing you should
do is to check what's the operational data that LIBC is being fed with, as
the poor, defenseless LIBC cannot parry all the blows the evil apps are
going to deliver to it ;-)

Giancarlo Niccolai.

.