Subj : Re: How to diagnose/resolve libthread panic? To : comp.programming.threads From : Giancarlo Niccolai Date : Tue Jun 07 2005 04:25 pm David Butenhof wrote: > > The thread library and libc are expecially vulnerable, because they tend > to be called a lot; so the addresses of their internal data will > frequently appear on the stack. (I've often thought that a useful, > though expensive, diagnostic mode would be a compiler switch that would > "scrub" the stack in the procedure epilogue to prevent this... then > again, you wouldn't compile that way for production code, so it'd just > be less likely that you'd hit the problem in testing and even more > likely that it'll show up at the big customer site on their busiest day > of the year.) To the excellent description of Dr. Butenhof, I would just like to add that LIBC calls are particularily vulnerable to errors caused in the main application also because, for efficiency reason and (wise) design decisions, they provide minimal or no parameter/memory error detection. OTOH, many applications have minimal parameter/data check during their execution, but often those "sanity" check can just move the bugs bombing to other parts of the applications. I.e., suppose that I have a function that silently returns if a null pointer is passed to it, and then I call libc this way: char *buf = 0; // ooops, forgot to initialize it prepare_buffer( buf ); // this silently returns when buf is zero sprintf( buf, "...", ... ); // bang, segfault in LIBC. After a month or so, it's hard to remember that prepare_buffer would have silently failed (and returned) if fed with a 0 pointer, and at a first glance, the developer hunting the bug may say "well, if prepare_buffer didn't crashed, then there's something wrong in sprintf". This is a very simple example, but this class of "problem masking bug" can get extremely subtle and complicated in a real-world application (expecially in MT architecture). I often met this kind of errors in code I had to review, so, I took the healty habit to crash the application whenever a sanity check fails, instead of just trying to fix things on the fly... (i.e. the assert() in C++, or similar self-made things in C). And yes, when there's a segfault in a LIBC code, the first thing you should do is to check what's the operational data that LIBC is being fed with, as the poor, defenseless LIBC cannot parry all the blows the evil apps are going to deliver to it ;-) Giancarlo Niccolai. .