







                                     Chapter 3.2:

                                  EErrrroorrss iinn RRFFCC 11003366



          IInnttrroodduuccttiioonn

               RFC  1036, the  standard _d_u  _j_o_u_r for  the format  of Usenet
          (netnews) messages contains significant errors, enumerated below.
          References  are made  to RFC  850,  the previous  netnews message
          format standard,  and also  to RFC  822, the mail  message format
          standard (which, note, has been slightly amended by RFC 1123).

          HHeeaaddeerr oorrddeerr iinnssiiggnniiffiiccaanntt

               Between RFC  850 and RFC  1036, a sentence  stating that the
          order of  message headers is insignificant has  fallen out of the
          standard.  This  may be a  reflection of the reality  that B 2.10
          news did indeed care about the order of FFrroomm: and PPaatthh:

          ``RRee:'' iiss oonnllyy tthhrreeee cchhaarraacctteerrss

               One  sees the  contradiction ``the  four  characters "Re:"''
          repeatedly; there should be a space after the colon.

          ccmmssgg iinnccoorrrreeccttllyy ddeessccrriibbeedd

               Similary, RFC 1036 claims that a SSuubbjjeecctt: prefix of ``cmsg''
          will be  interpreted as denoting  a control message;  the correct
          prefix is ``cmsg '' (including a space).

          XXrreeff iiss ttrraannssmmiitttteedd

               RFC 1036 says  that XXrreeff: headers should not be transmitted,
          yet they are  stored on disk as part of  message headers, so they
          will be  transmitted by both B and C  news.  The standard appears
          to be too strict.

          ccaanncceellss sshhoouulldd pprrooppaaggaattee aallwwaayyss

               RFC  1036  says that  _c_a_n_c_e_l  control  messages should  stop
          propagating  if the  receiving system is  ``unable to  cancel the
          message as  requested''.  It is not clear  what this means, given
          that modern news systems hang onto cancellations for not-yet-seen
          articles in hopes of being able  to cancel them in the future.  B
          2.11  interprets absence  of the  target  article as  ``unable to
          cancel''.   It  would improve  the  efficacy  and reliability  of
          _c_a_n_c_e_ls to  propagate them anyway, given  that feed anomalies are
          widespread.   There   have  been  verified   instances  in  which
          cancellations did  not achieve  anywhere near the  propagation of
          the  original article.   In the interests  of robustness,  C News
          interprets absence of the target article as deferred cancellation
          rather than failure to cancel, and propagates the _c_a_n_c_e_l.



                                        - 45 -








          ccaanncceell vvaalliiddaattiioonn

               RFC 1036  requires that a  _c_a_n_c_e_l message have  a SSeennddeerr: or
          FFrroomm: header  matching the message  it is cancelling.   It is not
          entirely clear from the text whether this restriction is supposed
          to be enforced at the originating site or at each receiving site,
          although the latter is implied.

               More seriously,  it is not clear  what ``matching'' means in
          this  context, considering  that  a substantial  fraction of  the
          information  in such  lines  is typically  in  RFC 822  comments.
          There  is an  unfortunate  tradition of  news readers  generating
          header comments in varying ways.  There is also a lot of obsolete
          or misdesigned  news software still  operational, and some  of it
          gratuitously alters  the header comments (and  sometimes even the
          non-comment parts  of the headers!) in  messages passing through.
          While  theoretically   these  complications  should   affect  the
          original and  the cancellation  identically, in practice  this is
          not  consistently   so,  and  it  is   difficult  to  generate  a
          cancellation   that   works  dependably.    This   is  not   just
          speculation;  there  are verified  cases  of  the originators  of
          messages having  considerable difficulty cancelling  them when it
          was important to do so.

               The  value  of  RFC  1036  authentication is  also  somewhat
          questionable.   It provides  no useful  security  against malice,
          because news is  so easy to forge.  While there  is some value in
          preventing accidents, there is  room for doubt as to whether this
          is worth the interference with legitimate cancellations.

               C  News takes  the position  that the  RFC 1036  approach to
          authentication is impossible to implement in a practical way, due
          to   its  vagueness   and  the   prevalence  of   gratuitous  and
          unpredictable header  rewriting, and on balance  the inability to
          cancel is  worse than the largely-illusory  security provided.  C
          News therefore does not authenticate cancellations.

               Doing   something   about    the   problem   is   difficult.
          Specification of  a _p_r_e_c_i_s_e  algorithm for header  matching would
          help,  but  finding one  that  will  disregard gratuitous  header
          mangling  is  hard.   A  more  appealing  approach  would  be  to
          authenticate cancellations by  cryptographic means, but there are
          severe  difficulties in  key distribution  on an  unreliable non-
          real-time  network   like  Usenet,  and  the   cost  of  checking
          cryptographic credentials  is disturbingly high.   Ultimately, it
          may  be   necessary  to  abandon   destructive  control  messages
          entirely,  or reserve  them for rare  emergencies and  route them
          through a trusted moderator for cryptographic authentication.

          iihhaavvee/sseennddmmee nnoott ddooccuummeenntteedd







                                        - 46 -







               The description of  the ihave/sendme protocol is so vague as
          to be  useless to an  implementor.  See the  C news documentation
          for an adequate  description of the protocol.  The description in
          RFC 1036 also contains an error: _r_e_m_o_t_e_s_y_s is not optional; given
          that there may  be multiple message-ids preceding it, there would
          be no  way (other than  ad-hocery) to tell if  the final argument
          were a message-id or a _r_e_m_o_t_e_s_y_s

          CCaassee-sseennssiittiivviittyy iinn mmeessssaaggee-iiddss

               RFC 1036  says nothing  about whether message-ids  are case-
          sensitive or not, thereby punting  the issue to RFC 822.  The RFC
          822 rules  are horrendously complex  and no news  system has ever
          implemented them  correctly.  (B 2.10 considers  them fully case-
          sensitive,  which is  wrong.  B 2.11  considers them  fully case-
          insensitive, which  is also wrong.   C News gets  the normal case
          right, but correct handling of certain obscure RFC 822 constructs
          would require a complex parsing algorithm; fortunately, the cases
          where this  matters appear to be  extremely rare.) Simplification
          appears necessary.

          NNeeww hheeaaddeerrss

               The B news SSuuppeerrsseeddeess:  header needs to be documented in the
          next  revision of  the RFC,  as does  the C  news generalisation,
          AAllssoo-CCoonnttrrooll: (see _r_e_l_a_y_n_e_w_s

          ``KKeeyywwoorrddss''

               Section 2  says that a  header begins with  a ``keyword'' as
          the header  name.  RFC 1036 never defines what  a keyword is, and
          RFC  822  does  not  use  the  term.  ``Keyword''  here  must  be
          considered an informal  term with no precise meaning, imposing no
          additional restrictions on header syntax.

               In particular, things like ``>from: foo@bar'',  which causes
          B News  to choke,  appear to  be legal RFC  822 (and  hence 1036)
          headers.  (Before quoting  lexical rules, such as the requirement
          for balancing  brackets, please note  that the 822  lexical rules
          are context-sensitive.)

               Theoretical  legality notwithstanding,  such bizarre  header
          names are dubious  and unwise practice.  RFC 1036 probably should
          be tightened up to exclude them.

          RRFFCC 882222 CCoommmmeennttss

               RFC 1036  section 2 implies, both  in its general discussion
          and  in its  discussion  of the  ``From:'' header,  that RFC  822
          comments  are  not,  in general,  accepted  in  RFC 1036  article
          format.  However,  the point is  not made loudly  and explicitly,
          and  some  nit-pickers   argue  that  RFC  1036  permits  dubious
          practices like timezone name comments in ``Date:'' headers.  This
          needs  to be  nailed down  in black  and white.   C News  takes a
          strict  position  on  this in  cases  where  it  cares about  the



                                        - 47 -







          contents of headers.

          DDuupplliiccaattee HHeeaaddeerrss

               RFC 822 requires that  at most one ``Date:'' header occur in
          a message,  and likewise for ``From:'',  although careful reading
          is needed to discover  this.  It permits more than one ``Message-
          ID:''  or  ``Subject:'' header,  and  is  (of course)  completely
          silent  about ``Newsgroups:'' and  ``Path:''.  With  the arguable
          exceptions of ``From:''  and ``Subject:'', duplicates of required
          headers are highly undesirable in news and cause difficulties for
          current  implementations.   RFC  1036  vaguely implies  that  the
          required headers are expected  to be unique, but never says this.
          This needs to  be made much more precise.  C  News takes a strict
          position and rejects articles with duplicate required headers.











































                                        - 48 -








                                     Chapter 3.3:

                          KKnnoowwnn PPoorrttiinngg PPrroobblleemmss WWiitthh CC NNeewwss



          IInnttrroo

               C News in general is pretty portable.  People have got it to
          run  on  a  very  wide  range  of systems  with  little  trouble.
          Difficulties  are usually  problems in  the  system, not  C News.
          Some of them, however, are widespread enough to be worth comment,
          for the  guidance of people  having problems.  If you  run into a
          novel  problem, we  are always interested  in hearing  about such
          things.

          UUnniixx DDeeppeennddeenncciieess

               The biggest portability glitch  in C News is that it depends
          a  lot on  Unix utilities.   The extensive  use of  complex shell
          files,  _s_e_d and  _a_w_k programs,  and a wide  range of  lesser Unix
          utilities  would make  it quite  difficult  to move  C News  to a
          system that  is seriously  non-Unix-like.  The actual  C programs
          seldom depend on Unix in major ways.  (An exception is the use of
          _r_e_a_d  system calls  in _e_x_p_i_r_e, to  avoid difficulties  with stdio
          end-of-file behavior;  we now know how to  avoid this but haven't
          implemented the fixes yet.)

               We  know that  _a_w_k and  the colon (:)  operator of  _e_x_p_r are
          problem areas under Minix.

          SShheellll PPrroobblleemmss

               C  News seriously  stress-tests shells.   The  current Minix
          shell is not robust enough in the face of complex inputs, botches
          some  constructs  entirely, and  can  run out  of  memory on  the
          complex  shell files.   Any shell  that is  too old  to implement
          comments  begun  by  ``#'' is  big  trouble,  since  we use  such
          comments everywhere.

               Any system/shell combination that thinks that a shell script
          starting with ``#! /bin/sh''  should  be   run  by  the  C  Shell
          (because it  starts with `#') is also big  trouble: you will have
          to change that line to ``: use /bin/sh''  everywhere.    We  know
          that at  least some releases  of Xenix have this  problem.  It is
          not necessary that  your kernel understand the ``#!'' feature--we
          believe that nothing in C  News relies on it--but it is essential
          that it not cause invocation of the C Shell.

               We know that some Hewlett-Packard Unixes have broken shells,
          probably the result of mistakes in HP's efforts to make the shell
          8-bit-clean; the symptom is that something like:





                                        - 4499 -







          xx=yy
          iiff tteesstt " $xx" != " yy"
          tthheenn
                  eecchhoo ooooppss
          ffii
          prints ``oops''.  This is, again, big trouble, because we do that
          a lot.

               Many people using 3B1s, aka  UNIX PCs, run the Korn shell as
          their /_b_i_n/_s_h.   Some other folks  may do this  too.  Beware that
          _k_s_h was not fully _s_h-compatible for a long time, with some subtle
          differences  in  the ill-documented  behavior  of backquotes  and
          backslashes.  Some  of the C  News shell scripts,  notably _i_n_e_w_s,
          are known to hit these bugs.   We are _t_o_l_d that current _k_s_hs have
          fixed them.

               It is  reliably reported that  SunOS 4.0._x shells  will dump
          core in some ill-defined circumstances, when the user environment
          (sum  of all  environment variables) is  exactly the  wrong size.
          Perhaps this has been fixed in 4.1.

               It is  reliably reported  that the  VAX 3.1 Ultrix  shell is
          somewhat broken and gives various kinds of trouble.  Switching to
          /_b_i_n/_s_h_5  (note that  this requires  fixing the  first line  of a
          zillion shell files) is reported to banish the problems.

               It is  reliably reported that  recent SunOS shells  give the
          wrong exit  status from  the _w_a_i_t  command: they give  a 0  for a
          successful  wait,  rather  than giving  the  exit  status of  the
          process waited for.  This makes _i_n_e_w_s -_W appear to always succeed
          even if backgrounded parts of it failed, which can be troublesome
          in NNTP environments where correct exit status is important.

          MMaakkee PPrroobblleemmss

               There is  a persistent problem on  3B2s with implementations
          of _m_a_k_e that  violate the SVID in a subtle  way.  They attempt to
          execute makefile commands directly, rather than via the shell, if
          the commands do not contain metacharacters.  This means that if--
          as  on  many  3B2s--_t_e_s_t is  a  shell  builtin  _a_n_d  _t_h_e_r_e _i_s  _n_o
          /_b_i_n/_t_e_s_t _p_r_o_g_r_a_m, the makefile line ``test -s file''  will cause
          _m_a_k_e to  complain about an unknown command.   (The SVID says that
          makefile commands  must be executed  as if by the  shell, and the
          shell will  execute this line correctly.) We've  added `;' on the
          ends  of such  lines, which  suffices to convince  _m_a_k_e to  run a
          shell  on the  systems  we've encountered,  but AT&T  is good  at
          finding  ways to  break such workarounds.   This problem  is also
          known to occur in A/UX.

               Another obscure problem, a  bug in either _m_a_k_e or the shell,
          appears in at least some releases of Ultrix: a construct like
                  llnn ... || ccpp ...
          in a shell file is seen as an error--and _m_a_k_e aborts--when the _l_n
          fails, even though the _c_p would work.




                                        - 50 -







          OOffffsseettooff

               ANSI  C requires  C compilers  to  supply a  macro _o_f_f_s_e_t_o_f,
          which can be used to find the offset of a structure member within
          the structure.  _R_e_l_a_y_n_e_w_s's header-parsing code uses it, defining
          it  if the  system  has not  supplied it.   Unfortunately, it  is
          really  hard   to  write  a   portable  version  of   this.   The
          implementation we currently use is:
          #ddeeffiinnee ooffffsseettooff(ttyyppee, mmeemm) ((cchhaarr *)&((ttyyppee *)NNUULLLL)->mmeemm - (cchhaarr
*)NNUULLLL)
          The  table in  _r_e_l_a_y/_h_d_r_d_e_f_s._c  puts invocations  of _o_f_f_s_e_t_o_f  in
          initializers.  This  turns out to  be a severe stress  test for C
          compilers.  A compilation error in _h_d_r_d_e_f_s._c is almost certain to
          be problems with this  macro.  Some compilers, notably the one in
          Microport  System V  for the  286,  reject it.   We have  heard a
          report that  System V Release  2 on the  VAX silently miscompiles
          it!   If  you have  trouble  with _o_f_f_s_e_t_o_f,  you  might try  this
          version instead:
          #ddeeffiinnee ooffffsseettooff(ttyyppee, mmeemm) ((iinntt)&((ttyyppee *)NNUULLLL)->mmeemm)

          FFaasstt SSttddiioo RRoouuttiinneess

               We  supply a  set  of fast  standard-I/O  routines that  are
          compatible with most AT&T-derived implementations of _s_t_d_i_o.  They
          speed up  C News quite  a bit.  However,  they don't work  on all
          Unixes.  The  tester program  we supply, which  the library-build
          procedure runs, is thought  to diagnose such problems 100% of the
          time.  It  has been reported in the past  that A/UX and Microport
          386  stdios flunk  the test.   SunOS  4.0 used  to pass  the test
          falsely, but improvements in  both the test and the routines seem
          to have cured  the problems: 4.0.3 passes the test  and as far as
          we can tell, the routines run correctly under it.

               In  any  case, if  you  are feeling  nervous  or are  having
          mysterious problems, telling _b_u_i_l_d that you don't want to use the
          fast-stdio stuff is always  safe.  This is also the best response
          if you have trouble compiling those routines.

          vvooiidd

               Old compilers that don't understand the _v_o_i_d type will choke
          on   much    of   our    code.    There   is    a   commented-out
          ``#define void int'' in  _n_e_w_s._h that cures most  cases of this if
          you uncomment it.   (We have a report that you  might need to add
          ``-Dvoid=int''  to the  Makefile in  _l_i_b_v_7  if you're  using that
          library.) C News does not rely on the ANSI C ``void *''  type  as
          far as we know.

          MMooddeess iinn ffooppeenn

               Unix V7 documented  only ``r'', ``w'', and ``a'' as suitable
          mode arguments to  _f_o_p_e_n.  It actually implemented the read/write
          modes, ``r+'', ``w+'', and ``a+'',  as well, and C News relies on
          them.  Unix reimplementations based on old documentation may have
          trouble here;  we know that at least the  older versions of Minix
          really don't implement these modes.



                                        - 51 -








               A  related complication  in Minix  is that  _f_t_e_l_l reportedly
          doesn't  give the  right answer  in ``a''  mode.  This  makes _d_b_z
          flunk its regression test.

          MMAAXXLLOONNGG

               The  _r_e_l_a_y/_c_p_u._h file  formerly defined  a  constant _M_A_X_L_O_N_G
          which is  the biggest positive  value of a  _l_o_n_g.  The definition
          was  clever but  failed on some  odd systems  (Unisys?).  Current
          versions  of C  News  generate the  value dynamically  in a  less
          fallible  way, and  check the value  for plausibility.   (This is
          encountered when _r_e_l_a_y_n_e_w_s  is asked to process a single article,
          not a  batch.  This happens  primarily when an  article is posted
          locally,  with  _i_n_e_w_s.)  It is  still  barely  possible that  the
          plausibility check will fail on some bizarre system.

          ddff OOuuttppuutt FFoorrmmaatt

               The _s_p_a_c_e_f_o_r  utility needs to understand  the output format
          of _d_f, unless  you're lucky enough to have a  system that has one
          of the  semi-standard system calls  to report disk  space.  There
          are numerous variations  on _d_f.  _B_u_i_l_d and the relevant makefiles
          know  about  the  more  common  ones,  but customization  may  be
          necessary.  _S_p_a_c_e_f_o_r  is commented well enough  that it should be
          possible  to  figure  out  the  necessary  changes;  usually  the
          initializations  of _n_r  and _n_f  are all  that need  changing.  If
          there  are colons  (:)  in your  _d_f's output  format, you  should
          probably  start from  the  ``sysv'' _s_p_a_c_e_f_o_r,  which attempts  to
          preprocess the  output to get rid of  System V garbage; otherwise
          the ``bsd'' one is a reasonable starting point.

               One constant  nuisance is _d_fs that are too  stupid to take a
          directory name as an argument.  The long-term solution to this is
          to edit  a suitable variant of _s_p_a_c_e_f_o_r to  know about the proper
          arguments.  A short-term solution is to use the ``null'' variant,
          sacrificing  space checking  for  the sake  of getting  something
          working.

               We're told  that HP-UX 7.0 users are  best advised to choose
          the ``bsd'' variant of spacefor,  and edit it to call _b_d_f instead
          of _d_f.  Similar approaches may be useful on other hybrid SysV/BSD
          systems.

          FFllooaattiinngg PPooiinntt

               The only places in our code where floating point is used, as
          far as we know, are in  the calculation of expiry dates in _e_x_p_i_r_e
          and the calculation of space in some of the variants of _s_p_a_c_e_f_o_r.
          These  are not  performance bottlenecks,  so  slow floating-point
          arithmetic is not  a problem.  Complete absence of floating point
          would require  only minor modifications.  Note,  however, that we
          use _a_w_k  a lot, and the typical  _a_w_k implementation uses floating
          point extensively.




                                        - 52 -







          338866 OOppttiimmiizzeerr vvss. ddbbzz

               We have  a reliable report  that the System  V 386 optimizer
          (invoked  when _c_c  is given  the -OO  option) miscompiles  the _d_b_z
          package badly  enough to cripple it,  without producing any error
          messages.  The only fix is to compile _d_b_z without -OO.

               SCO Xenix/386 2.3 has the same problem.

          nnnnaaffrreeee aanndd nnnnffrreeee

               We have  a reliable report  that the HP  Spectrum C compiler
          has  an optimiser  bug  that makes  it  throw up  (with a   ``cc:
          Internal   error    3279:   Please   contact    your   local   HP
          representative''  message) on  the _n_n_a_f_r_e_e  macro (and  _n_n_f_r_e_e, a
          historical synonym)  in _h/_n_e_w_s._h.  The  following revised version
          of the macro reportedly avoids the problem.
          #ddeeffiinnee nnnnaaffrreeee(mmeemmpppp) ddoo { iiff (*(mmeemmpppp) != 00) { ffrreeee((cchhaarr *)*(mmeemmp
ppp)); \
             *(mmeemmpppp) = 00; }} wwhhiillee (00)

               It  is also  reliably reported  that the  Microport compiler
          objects to  these macros in  large model.  Whether  the above fix
          would suffice is not known.  Manual expansion [barf!] is known to
          work,  although it  would be  less painful  to define  a function
          containing  the  right code  and  change the  macro  to call  the
          function.  Code for a suitable function can, in fact, be found in
          _h/_n_e_w_s._h, inside `#ifdef lint'.

          AANNSSII CC

               Although we made an  effort to be ANSI-C compatible, lack of
          access to a  real ANSI C compiler limits our  ability to do this.
          A secondary  problem is  that it's  really very difficult  to get
          code that  is 100% acceptable to both ANSI  C compilers and older
          compilers.   Some  issues inevitably  got  resolved  in favor  of
          current  compilers,   and  may  cause  complaints   from  ANSI  C
          compilers.

               Work   is  in   progress  on  moving   us  closer   to  ANSI
          compatibility.   Beware  that  if  __SSTTDDCC__  is defined  by  your
          compiler but  it is _n_o_t ANSI  compatible, you are on  your own as
          far as we're concerned, even if the value is specified as 0.  (We
          can't just use ``#if __STDC__'' because  some preprocessors choke
          on the appearance of an unknown identifier in #if.)

          GGNNUU CC

               If  compiling with  the  GNU compiler,  you may  need the  -
          ttrraaddiittiioonnaall  flag.  Beware,  also,  that if  you  are using  your
          system's _d_b_m library, it contains functions that return structure
          values,  and the  GNU conventions  for  handling such  values are
          incompatible with the ones in many AT&T-derived compilers.  The -
          ffppcccc-ssttrruucctt-rreettuurrnn option cures this.

          AAwwkk PPrroobblleemmss



                                        - 5533 -








               A number  of problems can arise if your  _a_w_k has bugs, since
          the shell files rely on _a_w_k fairly extensively.  For example, _a_w_k
          is a  prime suspect  if _s_p_a_c_e_f_o_r  doesn't work.  We've  fixed the
          worst trouble spots, but would appreciate detailed information on
          any more.

               One  known problem  that  is hard  to  avoid is  line-length
          limits in  _a_w_k.  In particular, for  several purposes in control-
          message  handling   C  News   wants  to  build   a  ``canonical''
          representation  of  the   _s_y_s  file,  with  backslashed  newlines
          removed.  This is  done by NEWSBIN/_r_e_l_a_y/_c_a_n_o_n_s_y_s._a_w_k.  Most _a_w_ks
          have limits on line length, and some of the limits are too low to
          cope  with long multiply-continued  _s_y_s lines.   512-byte limits,
          found in  a number of old _a_w_ks,  are particularly troublesome.  A
          quick look indicates  that this will interfere, to some uncertain
          extent,  with  _c_h_e_c_k_g_r_o_u_p_s   and  _s_e_n_d_s_y_s.   Big  deal.  :-)  The
          complaint may  also appear from _n_e_w_g_r_o_u_p, but  there it should be
          harmless.

               Bart Muyzer and Martijn Roos Lindgreen report that HP-UX 8.0
          _a_w_k  is  badly  broken,  such  that  (for  example)  the  regular
          expression ``/[\t ]/'' will  match backslashes and t's as well as
          tabs  and  spaces.   Installing the  HP-UX  7.0  _n_a_w_k  as _a_w_k  is
          reportedly a workable fix.

          SSyysstteemmss WWiitthhoouutt HHaarrdd LLiinnkkss

               Some vaguely Unixoid  systems have trouble implementing real
          (``hard'')  links.  Examples  are VMS  in  general and  Eunice in
          particular.  There are some hooks for dealing with this, but it's
          not trivial.

               _R_e_l_a_y_n_e_w_s will try to make symbolic links if real ones fail.
          There is  one exception: if _r_e_l_a_y_n_e_w_s cannot  buffer up enough of
          the article  in memory  to find  the `Newsgroups:' line,  it will
          drop the  article into  a temporary file  and will rely  on being
          able  to  move that  to  the  first of  the  `real' locations  by
          manipulating links.  This  should essentially never happen except
          on 16-bit machines, and should be rare even on them.

               _E_x_p_i_r_e has a -ll option  which tells it to consider the first
          filename  of an  article its `leader',  not expiring  the article
          under that name until it  has expired under all others.  This has
          not been tested much recently.

               The  locking  system  (both  C  routines  and  the  _n_e_w_s_l_o_c_k
          program) will need revision in some system-dependent way to avoid
          use of links.

               There is  one minor use  of links in  installation (_i_n_e_w_s is
          found  in two  places, and  the Makefile  attempts a  link before
          doing a copy), and substantially more in the regression tests.

          1166-bbiitt MMaacchhiinneess



                                        - 5544 -








               C News  has been tested  on 16-bit machines--indeed,  a good
          bit of  the early development work was done  on one--and does run
          on them.   Nothing relies on ints being  32 bits.  Nothing relies
          on pointers  and ints  being the  same size, as  far as  we know.
          Nothing  relies on  large  address spaces,  although  one or  two
          modules  come in separate  small-space and  large-space versions,
          and the small-space versions are slower.

               However,  there  are some  fundamental  limits to  consider.
          Both  _r_e_l_a_y_n_e_w_s and  _e_x_p_i_r_e--the  usual trouble  spots for  space
          shortages--want to  keep lots of stuff in  core.  There isn't any
          easy way around this one.

          NNuummbbeerr ooff FFiillee DDeessccrriippttoorrss

               There  is  a constant,  NOPENBFS,  in _r_e_l_a_y/_t_r_b_a_t_c_h._c,  that
          defines how  many batch files  are kept open  simultaneously.  If
          you are  feeding much news  to more systems  than this, _r_e_l_a_y_n_e_w_s
          performance will suffer.

               The major  limit on  NOPENBFS is available  file descriptors
          (although on  a 16-bit machine there might also  be a shortage of
          memory for _s_t_d_i_o buffers).  Other parts of _r_e_l_a_y_n_e_w_s want perhaps
          10 file  descriptors for other purposes, so  with the usual total
          supply of  20, a NOPENBFS value  of 10 is the  right default.  If
          you feed  many people,  and your  kernel provides a  process with
          more  than  20  file  descriptors,  you  probably want  to  boost
          NOPENBFS (this can  be done with -DNOPENBFS=xxx in the makefile).
          Remember to leave about 10 descriptors worth of headroom.

          SShheellll PPrroocceessssiinngg OOrrddeerr

               Normally, shell variable  expansion should take place before
          scanning  for syntax  elements such  as  ``0<&1''.  At  least one
          reimplementation  of  the shell  (specifically,  Bash 1.04)  does
          things in  the wrong order.   This is known to  affect, at least,
          _r_e_l_a_y/_s_h/_a_n_n_e._j_o_n_e_s, which  can be fixed by  changing (circa line
          44)
          "")     UUSSEERR="`wwhhoo aamm ii <&$ffdd |
          to
          "")     UUSSEERR="`eevvaall \"wwhhoo aamm ii <&$ffdd\" |
          or so we are told.

          BBiinnaarryy-MMooddee FFooppeenn

               In several  places, the new _d_b_z uses  ANSI C ``binary mode''
          fopen codes, e.g.  `fopen(..., "r+b")'.       The     text/binary
          distinction  involved is  meaningless under  Unix, and  most Unix
          implementations  just  ignore  trailing  nonsense in  the  second
          argument of _f_o_p_e_n, so it all works out nicely.







                                        - 55 -







               Unfortunately...  at  least  one  version  of  DEC's  Ultrix
          objects to  the `b's, we are  told.  Sigh.  DEC will  have to fix
          this to make their systems ANSI compatible, but heaven only knows
          how long that will take.

               Meanwhile,  the fix  is  to delete  the `b's  in the  second
          arguments  of  the   _f_o_p_e_ns  in  three  places  in  _d_b_m_i_n_i_t()  in
          _d_b_z/_d_b_z._c,  if  your  system has  this  particular  bit of  brain
          damage.

          ssiizzee_tt

               Some  systems, notably  from  Microport, do  not define  the
          _s_i_z_e__t type  in the <_s_y_s/_t_y_p_e_s._h>  header.  Worse, the  _s_i_z_e__t in
          that header doesn't necessarily bear any relationship to the ANSI
          C _s_i_z_e__t.  This causes  trouble in the _d_b_z library in particular.
          The  proper type  for _s_i_z_e__t  is whatever  the C  _s_i_z_e_o_f operator
          returns,  nominally an  unsigned type  which  is large  enough to
          contain the  size of any memory object.   We think nothing relies
          too heavily on it being  unsigned.  Note that _s_i_z_e__t must also be
          suitable for use in the two middle arguments of _f_r_e_a_d and _f_w_r_i_t_e,
          the last argument of _m_e_m_c_p_y, _m_e_m_c_h_r, and _m_e_m_c_m_p, and the argument
          of _m_a_l_l_o_c.

          BBaacckkggrroouunndd PPrroocceesssseess vvss. ccsshh

               _I_n_e_w_s runs much of its processing in the background.  We are
          told  that this  can hit  problems,  in some  circumstances, with
          _c_s_h's  manipulations of  signals,  terminal modes,  etc etc.   We
          prefer a  standard shell, and have made  no attempt to understand
          the  C  shell's   weirdnesses.   We're  aware  that  well-written
          programs can fail under the  C shell due to bizarre problems with
          weird signals,  etc., but  we class  this as the  fault of  the C
          shell and its co-conspirators and decline to contort our programs
          to  compensate for  its failings.  We  do sympathize  with people
          victimized by it, but can be of no practical help.

          CCoommpprreessss BBeehhaavviioorr

               Extremely old (pre-1985) versions of _c_o_m_p_r_e_s_s run off at the
          mouth  with a  status message  on _s_t_d_e_r_r  even when  nothing goes
          wrong in the  compression.  This upsets the batcher, which thinks
          any _s_t_d_e_r_r output means trouble.

          uulliimmiitt

               Most versions of System V have the concept of _u_l_i_m_i_t, a per-
          process limit  on how big an individual file  can be.  This limit
          can  be lowered  by  anyone but  raised only  by the  super-user;
          normally  _i_n_i_t or  _l_o_g_i_n initializes it  to some  suitable value.
          Unfortunately, many System Vs set  it far too low, at 1 megabyte.
          This  causes  trouble   with  many  things,  but  in  particular,
          _r_e_l_a_y_n_e_w_s, _e_x_p_i_r_e, etc.  need to be able to work with the _h_i_s_t_o_r_y
          file, which  can easily be several  megabytes.  It's necessary to
          deal with this on all paths  by which any of these programs might



                                        - 56 -







          be  invoked: from  _u_u_c_p or other  transport software  bringing in
          news, from  _c_r_o_n, and by users via _i_n_e_w_s  for local postings.  It
          is  difficult  to  do this  in  a  portable  way when  super-user
          privileges are needed.

          RReessttrriicctteedd SShheellllss

               There is an unfortunate interaction between the `#!' feature
          in shell files and the ``restricted shell'' feature found in some
          Unixes (notably System V): the restricted shell typically is just
          a  different  way  of invoking  /_b_i_n/_s_h,  and  in some  versions,
          careless code just checks the  first letter of the name the shell
          was invoked  under to see  if it was `r'.   Unfortunately, if the
          system has the `#!' feature and there is a shell file named, say,
          _r_n_e_w_s whose first line is `#! /bin/sh', this  shell file will end
          up invoking the restricted shell!

               Simply  deleting the  `#!' line might  fix this;  on systems
          which have the Korn shell, changing `#! /bin/sh' to `#! /bin/ksh'
          might  work.  Otherwise  you will  have to  arrange to  have your
          neighbors  invoke _c_u_n_b_a_t_c_h  instead  of _r_n_e_w_s,  or  else write  a
          simple _r_n_e_w_s  that invokes the  real one under  another name.  It
          would be  wise to check  for other shell files  whose names begin
          with `r', also, as _r_n_e_w_s definitely isn't the only one.

          RReemmoottee IInnvvooccaattiioonn vvss. NNoonnssttaannddaarrdd SShheellllss

               When  _n_e_w_s_r_u_n is  invoked on  a  host that  is not  the news
          server, it uses _r_s_h to  propagate news batches to the server.  It
          runs  /_b_i_n/_s_h explicitly  to avoid  major difficulties  with non-
          standard shells, but it has  to rely on the invoker's login shell
          to run  that one command.  This means  _n_e_w_s_r_u_n will emit spurious
          output  if its  invoker's  login shell  is  the C  shell and  its
          invoker's  ._c_s_h_r_c  contains  commands  that generate  output.   A
          similar problem occurs with _b_a_s_h and ._b_a_s_h_r_c.

               The simplest  solution is to use /_b_i_n/_s_h  as the login shell
          for _n_e_w_s_r_u_n's  invoker.  Otherwise, if you  have a networked news
          server, check  that the login shell is  standard enough to invoke
          /_b_i_n/_s_h by executing the following command as _n_e_w_s_r_u_n's invoker.
          rrsshh _n_e_w_s_s_e_r_v_e_r eexxeecc /bbiinn/sshh -cc ttrruuee
          This command should output nothing.

               A slightly  related problem is  that not everyone  calls the
          run-remote-shell  command _r_s_h;  on  System V  in particular,  _r_s_h
          means  something different.   For  the moment  we  have opted  to
          ignore   this  issue,   as  the   possibilities   for  gratuitous
          differences boggle the mind.  People facing this problem may wish
          to place an _r_s_h shell file in the search path to invoke the right
          command in the right way, whatever that is.

          VVaalluueess ooff LLooggiiccaall OOppeerraattoorrss






                                        - 57 -







               There seem to be compilers, e.g. the Ultrix one on DEC's new
          RISC  workstations,  that  go  into  convulsions  when  they  see
          something like
          *pp++ = iissaasscciiii(cc) && iissaallnnuumm(cc);
          because they don't know how to generate a numeric value for `&&',
          or because they don't know how  to turn that value into a `char'.
          One or two places in C News use constructs like this.  If you run
          into this,  you might want  to try replacing  the right-hand side
          with something like ``(...) ? 1 : 0''  to   get  the  troublesome
          operator back into a conditional context.

          EEmmppttyy LLiinneess

               Some backward  operating systems (through which  your C News
          distribution may have passed on its way to you), and perhaps some
          stupid text-handling software  even on sane operating systems, do
          not recognize the notion of  an empty line.  They think all lines
          must have  at least one  character in them; the  closest they can
          come to  an empty line  is a line  consisting of a  single blank.
          This matters because  _r_e_l_a_y_n_e_w_s will tolerate white space only in
          certain places  in the _s_y_s file, and  in particular, it tolerates
          empty lines but not  lines consisting solely of white space.  The
          result will  be complaints (in  _e_r_r_l_o_g) about white  space in the
          _s_y_s line for a system named `` ''.

          aaccttiivvee-FFiillee DDaattee

               On  some Bull  systems, at least  ones running  DPX/2 B.O.S.
          1.0, apparently _r_e_l_a_y_n_e_w_s updates the contents of the _a_c_t_i_v_e file
          correctly, but  the file's date remains  unchanged!  This appears
          to be  a kernel bug.  It reportedly upsets  some news readers.  A
          workaround, said to be effective, is to add the line
          uuttiimmee(ccttllffiillee(aaccttrreellnnmm), (ttiimmee_tt *)NNUULLLL);
          after the call to _n_n_a_f_r_e_e in _a_c_t_f_s_y_n_c in _l_i_b_b_i_g/_a_c_t_i_v_e._b_i_g._c.

          eennuumm OOppeerraattoorrss

               Some  compilers have  difficulty compiling  the  _r_e_a_d_n_e_w_s we
          supply, because  they object to  applying the `!'  operator to an
          _e_n_u_m type.  Changing the definition of _b_o_o_l_t_y_p_e in _r_n_a/_d_e_f_s._h to
          ttyyppeeddeeff iinntt bbooooll;
          #ddeeffiinnee ffaallssee   00
          #ddeeffiinnee ttrruuee    11
          is reported to solve this.

          AAmmiiggaa LLiibbrraarryy OOrrddeerriinngg

               It  is reliably  reported that  when  compiling some  of the
          programs under  SVR4 on the Amiga, it is  necessary to give ``-lc
          -lucb'' as library options--linking of the C library _m_u_s_t precede
          linking of the  Berkeley-emulation library, or the code links but
          will not run.

          AAIIXX aanndd MMaacchh vvss. ffssyynncc()




                                        - 58 -







               The _r_e_l_a_y_n_e_w_s regression test fails under some (all?) AIXes,
          because the  system refuses to  do an _f_s_y_n_c on  a file descriptor
          open  to  /_d_e_v/_n_u_l_l It  is  possible that  this  does not  affect
          production use, however.  Mach (at least on the NeXT) is reported
          to have similar problems.

          AAIIXX/337700 vvss. SShheellll FFiilleess

               AIX/370  has added  at  least one  keyword  (``on'') to  the
          shell, and  this is known to cause syntax  complaints in at least
          one shell script (_n_e_w_s_r_u_n_n_i_n_g).  Unless this is also a keyword in
          the final version of POSIX 1003.2, we don't plan to fix this.

          SSttrruucctt CCoonnddiittiioonnaall EExxpprreessssiioonnss

               Some (all?) SCO  Xenix compilers take offense to expressions
          like
          vvaalluuee = (ddbbzziinntt) ? ddbbzzffeettcchh(kkeeyy) : ffeettcchh(kkeeyy);
          where the  functions return _s_t_r_u_c_t values.   This occurs in three
          places  in _d_b_z/_d_b_z_m_a_i_n._c  and the workaround  is to  expand those
          conditionals to statements like:
          iiff (ddbbzziinntt)
                  vvaalluuee = ddbbzzffeettcchh(kkeeyy);
          eellssee
                  vvaalluuee = ffeettcchh(kkeeyy);

          ssttaalleenneessss vvss. UUllttrriixx

               Several  Ultrix  users  have  reported  a problem  with  the
          ``staleness'' command.   It seems Ultrix's _s_e_d  is an antique and
          blows up on the  complex regular expression in _s_t_a_l_e_n_e_s_s.  A fix,
          at some  small cost  in performance, is  to change the  last four
          lines of _s_t_a_l_e_n_e_s_s to something like
          eexxeecc aawwkk '$11 == "/eexxppiirreedd/" { pprriinntt "-oo", $33 }' $NNEEWWSSCCTTLL/eexxpplliisstt

          SSCCOO XXeenniixx ssttrriinngg ffuunnccttiioonnss

               Under SCO Xenix 2.3, and perhaps other recently-released SCO
          systems,  the  string  functions like  _s_t_r_c_h_r  exist  but can  be
          inordinately slow when dealing with long strings.  This is not an
          academic issue:  one symptom is that _r_e_l_a_y_n_e_w_s  takes a long time
          to start  up, eating 10-15  seconds of CPU time  before it starts
          processing articles.  This  apparently is a combination of badly-
          written  code  and  strange internationalization  support.   Just
          using  our string  functions, by telling  _b_u_i_l_d that  your system
          does not  have them, works  much better.  Telling  the compiler -
          nnooiinnttll may be helpful if you don't want to go that far.

          OOlldd SSCCOO XXeenniixx vvss. _s_e_t_v_b_u_f

               The  _d_b_z package  makes  some use  of  the _s_e_t_v_b_u_f  routine.
          Incredible though it sounds, old versions of SCO Xenix reportedly
          had  the order  of _s_e_t_v_b_u_f's  arguments wrong!   If you  have SCO
          Xenix version 2.2 or  earlier, check the arguments to _s_e_t_v_b_u_f: if
          the second and third are a type and a buffer pointer respectively



                                        - 59 -







          (they are supposed to be a  buffer pointer and a type), you're in
          trouble and will have to tinker with the _d_b_z sources.

          SSuunnOOSS 44.11.11 vvss. wwrriittee()

               In some circumstances, a  SunOS 4.1.1 _w_r_i_t_e system call to a
          disk file  can be  interrupted by a  signal.  No other  Unix does
          this, and routines like _f_w_r_i_t_e  are not prepared to cope with it.
          This can result in gratuitous failures of _d_b_z in particular.

               It is thought not to be  a problem in C News, but some other
          packages using  _d_b_z suffer, and we mention it  just in case.  Sun
          acknowledges it as a bug.  The bug-id is 1052649.  It is fixed in
          patch 100293-01.

          uuuuccpp VVaarriiaattiioonnss

               There are innumerable variations on the details of _u_u_c_p that
          may require appropriate  modifications to _q_u_e_u_e_l_e_n.  For example,
          some versions of Honey DanBer (aka BNU) _u_u_c_p cut all system names
          down to seven characters, and _q_u_e_u_e_l_e_n will have to be altered to
          do likewise.

          mmaalllloocc VVaarriiaattiioonnss

               On some  systems, performance is noticeably  better if the -
          llmmaalllloocc library is used, rather than relying on the _m_a_l_l_o_c in the
          standard C library.  A/UX is reportedly an example.

          SSllooww eeggrreepp

               At  least some  System V  suppliers  (including, reportedly,
          Apple in some [now obsolete?] A/UX versions) have broken _e_g_r_e_p in
          such  a  way that  it  is  inordinately slow.   It  may be  worth
          substituting _g_r_e_p  for _e_g_r_e_p in  some of the  shell scripts, with
          appropriate  caution since  they  do not  accept  quite the  same
          pattern syntax.





















                                        - 60 -








                                     Chapter 3.4:

                                    CC NNeewwss vvss. VVMMSS



               To run  C News at all,  you need a fairly  good emulation of
          Unix.  There  are several such for VMS.   They have various minor
          imperfections.  The  only one we  specifically _k_n_o_w of  that is a
          real problem for  C News is the inability to  make real links for
          cross-postings.   There  is  some  half-hearted code  in  various
          places that  tries to deal with this situation.   It has _n_o_t been
          tested too thoroughly.

               _R_e_l_a_y_n_e_w_s normally  files an  article under its  first group
          and  then makes  hard links  into  further groups.   If _r_e_l_a_y_n_e_w_s
          finds itself  unable to make  a hard link,  it will try  making a
          symbolic link  instead.  The one  situation where this  will fall
          down is if a news article's header is enormous, too big to fit in
          core.  In this case,  _r_e_l_a_y_n_e_w_s stores the article in a temporary
          file, makes links to it under all the appropriate names, and then
          unlinks the  temporary name.  This obviously  isn't going to work
          if the appropriate-name links are symbolic.  We believe this case
          essentially never happens on large-address-space machines, and is
          quite rare even for small address spaces.

               _E_x_p_i_r_e has  a -ll option that tells it  to consider the first
          name of an article as the ``leader'', not to be deleted until all
          others have been deleted.

               The one  place where extra work would  be necessary would be
          _m_k_h_i_s_t_o_r_y, which has no notion that some links are different from
          others.

               So  far as  we know,  we  don't get  into any  of the  other
          trouble areas of Unix emulation  on VMS, at least with the Eunice
          emulator.  We don't  have a VMS handy for testing,  so we make no
          guarantees.



















                                        - 61 -








                                      Section 4:

                                    IImmpplleemmeennttaattiioonn






                                     Chapter 4.1:

                       CCoonnttrrooll MMeessssaaggee IImmpplleemmeennttaattiioonn iinn CC NNeewwss



          IInnttrroodduuccttiioonn

               Netnews  _c_o_n_t_r_o_l   _m_e_s_s_a_g_e_s  are   ordinary-looking  netnews
          articles which contain  the special header CCoonnttrrooll: Such articles
          are  filed  in the  pseudo-newsgroup  _c_o_n_t_r_o_l  and cause  related
          actions by the  local news system, such as mailing  a file to the
          poster of the control message.

          BBuuiilltt-iinnss

               iihhaavvee sseennddmmee and  ccaanncceell are handled internally by _r_e_l_a_y_n_e_w_s
          because processes  cannot share open  _d_b_m databases, there  is no
          standard way  to close them,  and these control  messages read or
          write  the _h_i_s_t_o_r_y  files, including  the  _d_b_m files.   _I_h_a_v_e and
          _s_e_n_d_m_e are also permitted to have message-id arguments containing
          < and >  both of which are rejected by  _r_e_l_a_y_n_e_w_s in arguments to
          externally-implemented  control messages,  since  they are  shell
          metacharacters  (/  and   ..  are  also  bounced)  and  could  be
          indicative of an attempt to do something nasty.

          NNoorrmmaall CCoonnttrrooll MMeessssaaggeess

               Most  control  messages  are  implemented  by  _r_e_l_a_y_n_e_w_s  by
          executing the command  line following CCoonnttrrooll: with a search path
          of $_N_E_W_S_C_T_L/_b_i_n:$_N_E_W_S_B_I_N/_c_t_l  and with standard input  set to the
          control message article.   The command inherits the standard news
          search path and _r_e_l_a_y_n_e_w_s user and group ids, typically _n_e_w_s this
          can be  important to  gain access  rights to control  files.  The
          news system  will be locked (by $_N_E_W_S_C_T_L  while the command runs,
          because this  is often  important for manipulating  control files
          from the  command, and  because the  news system is  locked while
          _r_e_l_a_y_n_e_w_s runs.   If that  command returns non-zero  exit status,
          mail is  sent to $_N_E_W_S_M_A_S_T_E_R (usually  uusseenneett Standard output and
          standard  error often  are  redirected to  $_N_E_W_S_C_T_L and  $_N_E_W_S_C_T_L
          respectively; invocations of _r_e_l_a_y_n_e_w_s by _i_n_e_w_s are exceptions.







                                        - 62 -








                                     Chapter 4.2:

                 AA TToouurr TThhrroouugghh tthhee CC NNeewwss LLiibbrraarriieess aanndd IInncclluuddee FFiillees
s



          lliibbcc aanndd ffrriieennddss

          _l_i_b_c contains  routines that are sufficiently  useful and general
          that they could profitably  be added to one's C library.  Indeed,
          on some  systems they are  in the C  library.  Notable inventions
          include _f_g_e_t_m_f_s which  safely reads arbitrarily-long input lines;
          it has an ffggeettmmffss.hh over  in the header directories.  _l_d_i_v is the
          ANSI one, if  you need it.  _m_e_m_l_i_s_t is a  package to ease keeping
          track  of a  lot of  allocated memory.   _s_t_d_f_d_o_p_e_n is  invoked by
          setuid programs to  ensure that the standard file descriptors are
          indeed   open,  opening   /ddeevv/nnuullll  on   each   closed  standard
          descriptor.

          lliibbssttddiioo  contains  new  (faster)  guts  for the  original  _s_t_d_i_o
          library; if  they compile on your V7, 4BSD  or System III system,
          you may want to use them  instead of the default versions in your
          C library.  On System  V, these routines are only slightly faster
          than the versions in the C library.

          lliibbffaakkee  contains  things  that  probably  should  be in  your  C
          library,  but might  not be,  and a couple  of fake  routines for
          system calls you might not have.

          lliibbccnneewwss

          lliibbccnneewwss contains functions  of general use to news software, but
          not  the world  at  large.  _c_o_m_p_l_a_i_n  is like  _w_a_r_n_i_n_g but  never
          prints the symbolic value of _e_r_r_n_o

               There  is a locking  package, containing  _l_o_c_k_d_e_b_u_g _n_e_w_s_l_o_c_k
          _n_e_w_s_u_n_l_o_c_k _e_r_r_u_n_l_o_c_k  and _n_e_m_a_l_l_o_c _l_o_c_k_d_e_b_u_g  enables or disables
          lock  debugging.  _n_e_w_s_l_o_c_k  attempts to  lock the  news transport
          layer against read-then-write  access to the aaccttiivvee file, writing
          to the hhiissttoorryy lloogg eerrrrlloogg  and batch files.  It returns only when
          it has the  lock; in the meantime it sleeps  and retries until it
          gets  the  lock.   There  is  no  timeout;  this  is  a  feature.
          _n_e_w_s_u_n_l_o_c_k  removes  the  above-mentioned  lock if  this  process
          locked the  news system.  _e_r_r_u_n_l_o_c_k is like  _e_r_r_o_r except that it
          unlocks the  news system (via  _n_e_w_s_u_n_l_o_c_k) before exiting;  it is
          should always be  called instead of _e_x_i_t or _e_x_i_t  if there is any
          chance that this process has locked the news system.  _n_e_m_a_l_l_o_c is
          like _e_m_a_l_l_o_c but it calls _e_r_r_u_n_l_o_c_k if it can't allocate memory.

               _l_t_o_z_a  converts a  lloonngg iinntt  to a string  of a  given width,
          containing the decimal  representation, zero-padding as needed on
          the left.  _l_t_o_z_a_n is like _l_t_o_z_a but omits the terminating NUL, so
          it  can be  used  to overwrite  a string  without truncating  it.
          _n_g_m_a_t_c_h returns a  truth-value resulting from comparing a list of



                                        - 63 -







          newsgroup patterns and  a list of newsgroups.  One may substitute
          ``distribution'' for ``newsgroup''.

               There  is  a  package  of  pathname  manipulators.   _a_r_t_f_i_l_e
          returns a name for  its filename argument, assumed to be relative
          to  $_N_E_W_S_A_R_T_S _f_u_l_l_a_r_t_f_i_l_e  promises to  return  a fully-qualified
          path  name.  _c_t_l_f_i_l_e  returns a name  for its  filename argument,
          assumed to be relative to $_N_E_W_S_C_T_L _b_i_n_f_i_l_e returns a name for its
          filename argument, assumed  to be relative to $_N_E_W_S_B_I_N _c_d changes
          to  its  argument  directory, check  for  errors,  and notes  the
          directory   name,   by  making   a   private   copy,  for   later
          optimisations.   _n_e_w_s_u_m_a_s_k   returns  the  value   of  $_N_E_W_S_U_M_A_S_K
          _n_e_w_s_p_a_t_h returns  the value  of $_N_E_W_S_P_A_T_H _n_e_w_s_m_a_s_t_e_r  returns the
          value of  $_N_E_W_S_M_A_S_T_E_R All  these functions supply  default values
          for the _N_E_W_S variables if  none are found in the environment.  If
          values  are  found  in the  environment,  they  are  used, and  a
          function named _u_n_p_r_i_v_i_l_e_g_e_d is called.

               _r_e_a_d_l_i_n_e   is  like   _f_g_e_t_s  but   executes   _n_e_w_s_l_o_c_k  upon
          encountering EOF and retries the read.  This is used when reading
          growing  files such  as hhiissttoorryy or  batch files.   _s_t_r_l_o_w_e_r down-
          cases an entire string, in  place.  _s_t_r_s_a_v_e is like _s_t_r_d_u_p but it
          calls _n_e_m_a_l_l_o_c rather  than _e_m_a_l_l_o_c _s_t_r_3_s_a_v_e takes three strings,
          allocates  space for their  concatenation via  _n_e_m_a_l_l_o_c including
          terminating NUL, and  concatenates them onto it.  A _N_U_L_L argument
          will  be  replaced  by  an  empty  string.   _t_i_m_e_s_t_a_m_p  writes  a
          timestamp on a given stream,  and returns the current time via an
          argument for later use.

          UUnniixx-vvaarriiaanntt-ssppeecciiffiicc lliibbrraarriieess

          There are several libraries that provide functions for talking to
          specific  Unix  variants.   These  are basically  functions  that
          change from one variant to another.  lliibbffaakkee (see above) contains
          things which simply might not be there in a particular system.

          These  libraries  all  provide  the  same  virtual  interface  to
          operating-system-dependent  services:  _f_c_l_s_e_x_e_c _f_o_p_e_n_e_x_c_l  _g_e_t_c_w_d
          and  _g_e_t_h_o_s_t_n_a_m_e Implementations  for vanilla  implementations of
          these  variants are provided:  Seventh Edition,  including 4.1BSD
          (lliibbvv77  Eighth  and   Ninth  Editions  (lliibbvv88  4.2BSD  and  later
          (lliibbbbssdd4422  System III  and  System V  (lliibbuussgg  _f_c_l_s_e_x_e_c sets  the
          close-on-exec flag for  a given _s_t_d_i_o stream.  _f_o_p_e_n_e_x_c_l performs
          an ``exclusive create'' open:  the open fails if the file already
          exists.

          AAddddrreessss-ssppaaccee-ssiizzee-ssppeecciiffiicc lliibbrraarriieess

          These libraries provide  alternate versions of the aaccttiivvee and ssyyss
          file  code:   lliibbssmmaallll  should  work  on   any  machine,  but  is
          suboptimally  fast; lliibbbbiigg  has worked  even  on PDP-11s,  and is
          quite fast, but consumes memory without apology.

          IInncclluuddee ffiilleess




                                        - 64 -








          lliibbhh contains  include files unique to C  news.  nneewwss.hh defines a
          few limits, some file names, some types (bboooolleeaann and ssttaattuusstt some
          characters, some status bits, some macros for speed, and declares
          functions  in lliibbccnneewwss  or miscellaneous  functions  in _r_e_l_a_y_n_e_w_s
          (oops!).   ccoonnffiigg.hh declares the  pathname functions  in lliibbccnneewwss
          ffggeettmmffss.hh declares  symbolic values and macros  for using ffggeettmmffss
          lliibbcc.hh  is a  start at  a  V9-like declaration  of all  of the  C
          library.  mmeemmlliisstt.hh defines the interface to mmeemmlliisstt

          hhffaakkee contains a few include files that your system ought to have
          but might  not.  ssttddlliibb.hh is a  degenerate ANSI ssttddlliibb.hh ssttrriinngg.hh
          declares  the string functions.   ttiimmeebb.hh declares  the structure
          used by _f_t_i_m_e












































                                        - 65 -








                                     Chapter 4.3:

                                   LLoocckkiinngg iinn CC NNeewwss



          Several parts  of C News  need some way  of locking parts  of the
          news  subsystem against  concurrent  execution.  Various  system-
          specific locking  system calls exist,  but none of  them is truly
          portable, and most of them provide far more functionality than we
          need.

          C News locking uses the _l_i_n_k(2) system call and pre-agreed names.
          _L_i_n_k has the necessary  characteristic for safe locking: it is an
          atomic  test-and-set operation.   Furthermore, it  exists  in all
          Unixes.

          All locks are created in the NEWSCTL directory (see _C_o_n_f_i_g_u_r_a_t_i_o_n
          _M_e_c_h_a_n_i_s_m_s _i_n _C _N_e_w_s for where  this directory is to be found and
          how  programs can  determine this) and  have names  starting with
          `LOCK'.   To acquire  a lock,  first create  a temporary  file in
          NEWSCTL with  a name of the  form `L._n', where _n  is your process
          id.   You are  urged to  also write your  process id,  in decimal
          ASCII, into  this file.  Then attempt to  link the temporary file
          to `LOCK_x', where  _x is chosen based on what  sort of locking you
          wish to do.  Existing lock names are: center; ll.  LOCKrelaynews,
          modifications to control files LOCKinput     input      subsystem
          processing spooled input LOCKbatch batcher    preparing   batches
          LOCKexpire     expire expiring articlesIf the link fails,sleep and
          try again.   If it succeeds, proceed.  The  temporary file may be
          removed  then  or  at the  same  time  as  the  lock is  removed.
          Programs are expected to  make a determined effort to remove lock
          files when they terminate, normally or as a result of signals.

          Shell programs  have an additional  problem in that  System V has
          broken _l_n(1) so  that it removes a pre-existing destination file.
          C News  therefore provides a  pure, simple locking  program under
          the  name NEWSBIN/newslock (if  the recommendations  in _D_i_r_e_c_t_o_r_y
          _L_a_y_o_u_t _a_n_d  _P_A_T_H _i_n _C _N_e_w_s are  followed, this will automatically
          be   in  the   search  path   of   shell  programs).    Usage  is
          `newslock tempfile lockfile'; exit status is 0 for success, 1 for
          failure,  2  for  wrong number  of  arguments.   No messages  are
          printed  for  normal  failure, so  no  redirection  of output  is
          needed.

          A suitable locking procedure  for a shell file using the standard
          configuration facilities is:










                                        - 6666 -







          lloocckk="$NNEEWWSSCCTTLL/LLOOCCKKxxxxxx"         # mmooddiiffyy nnaammee aass aapppprroopprriiaattee
          lltteemmpp="$NNEEWWSSCCTTLL/LL.$$"
          eecchhoo $$ >$lltteemmpp
          ttrraapp "rrmm -ff $lltteemmpp ; eexxiitt 00" 00 11 22 1155
          wwhhiillee ttrruuee
          ddoo
                  iiff nneewwsslloocckk $lltteemmpp $lloocckk
                  tthheenn
                          ttrraapp "rrmm -ff $lltteemmpp $lloocckk ; eexxiitt 00" 00 11 22 1155
                          bbrreeaakk
                  ffii
                  sslleeeepp 3300
          ddoonnee
          A template of this form can be found in the file _n_e_w_s_l_o_c_k._s_h.

          Although  there  are  various  thorny questions  associated  with
          breaking locks by dead  programs, reboot is a time when surviving
          locks are definitely  invalid.  (Although there are problems even
          here if a networked group of systems are not rebooted as a unit.)
          For  this  and other  reasons,  a system  running  C News  should
          execute NEWSCTL/bin/newsboot at reboot time (e.g. from /_e_t_c/_r_c).





































                                        - 67 -


