  WebLoad 4.1.3 User's Guide
  Linas Vepstas (linas@linas.org)
  v4.1.3, 4 January 2000

  WebLoad is a collection of tools for testing and measuring web appli-
  cations and web servers.  The tools can be used to record and playback
  HTTP conversations between a web browser and a web server, make timing
  measurements, and gather statistics.  The true power of this tool lies
  in its stress-test and stress-measurement abilities: It can simulate a
  large number of users accessing a web site in structured yet random-
  ized sessions.
  ______________________________________________________________________

  Table of Contents




















































  1. Overview

  2. New Features, Compatibility

     2.1 Compatibility Issues
     2.2 New Features since Version 3.2.2

  3. QuickStart

     3.1 Intro
     3.2 Prerequisites
     3.3 Collecting a Session File
     3.4 Collecting Traces
     3.5 Performing a Single-User Run
     3.6 Performing a Multi-User Run
     3.7 Accessing External Hosts from Behind a Firewall
        3.7.1 NAT (Network Address Translation) / Masquerading Bastion Firewalls
        3.7.2 Proxy Servers
        3.7.3 SOCKS Bastions and Gateways
           3.7.3.1 SOCKS Unix Instructions:
           3.7.3.2 SOCKS Windows95 or Windows NT Instructions:
              3.7.3.2.1 Hummingbird Installation Instructions
              3.7.3.2.2 The NEC SocksCap Package

  4. Recording Sessions with WebMon

        4..1 Limitations
     4.1 Using Webmon
     4.2 SSL Cipher-Suite Codes
     4.3 Theory of Operation
        4.3.1 Proxy Mode
        4.3.2 Rewrite Mode
        4.3.3 Passthru Mode
        4.3.4 Firewall Operation
     4.4 Example Usage
     4.5 Statistics Reported by Webmon

  5. Playing Back Sessions with WebClient

     5.1 Features
     5.2 Command Line Summary
     5.3 Example Usage
     5.4 Example Input Files

  6. Webclient Command Line Flags

     6.1 Cookie Handling
     6.2 GIF Fetching
     6.3 HTTP/1.0, HTTP/1.1, KeepAlive and Multi-Threading
     6.4 Substitution and Re-Writing
     6.5 Header Modification and Key-Value Substitution
        6.5.1 The
        6.5.2 The
     6.6 Example: HTTP Authorization
     6.7 Example: Setting the User-Agent Tag
     6.8 URL-embedded State (URL Cookies)
     6.9 Substituting in GET Requests and POST Bodies
     6.10 Example: Substitution for <<USER>>, <<PIN>>, and <<PASSWD>>
     6.11 Error Detection and Reporting
     6.12 Clean Exit On Error
     6.13 Page Validation with Check Sums
     6.14 Think Time Distributions
     6.15 Other Flags
        6.15.1 Debugging & Tracing
        6.15.2 MultiUser Options

  7. Webclient Request File Format

     7.1 Request File Basics
     7.2 Specifying Blocks of Requests
     7.3 Custom Think Times

  8. Webclient Statistics Reports

     8.1 Definition of Statistics
     8.2 SSL Timing Anomalies
     8.3 Reporting Summary Statistics for Blocks of URL's
     8.4 Obtaining Individual Transaction Completion Times

  9. How To Do Multi-User Runs

     9.1 Workload Parameters
     9.2 Notes on the "Ramp-Up" Process
     9.3 Interpreting the Output
     9.4 Gathering Other Statistics
     9.5 Post-Processing and Data Reduction
     9.6 Stress Testing
     9.7 Large Numbers of Clients

  10. Troubleshooting

     10.1 Error Messages
     10.2 Known Problems or Likely Trouble Spots
        10.2.1 Checksums
        10.2.2 Handles

  11. License

     11.1 License


  ______________________________________________________________________

  11..  OOvveerrvviieeww

  WebLoad is a collection of tools for testing and measuring web
  applications and web servers.  The tools can be used to record and
  playback HTTP conversations between a web browser and a web server,
  make timing measurements, and gather statistics.  The true power of
  this tool lies in its stress-test and stress-measurement abilities: It
  can simulate a large number of users accessing a web site in
  structured yet randomized sessions.


  WebLoad has been loosely derived from the public-domain WebStone
  package, but has seen a major redesign since then, with the addition
  numerous features and enhancements.  Features include:


  +o  RReeccoorrdd aanndd ppllaayybbaacckk ooff UURRLL''ss,, iinncclluuddiinngg ......

  +o  Ability to handle generic HTTP traffic, including XML-based
     protocols such as OFX.  Custom HTTP headers and bodies may be
     specified.

  +o  Ability to trace/record through multiple servers, capturing entire
     web surfing sessions over many sites.

  +o  Can act as man-in-the-middle, gathering detailed traces of HTTP
     traffic to debug server, browser or proxy bugs.


  +o  Checksum generation and validation to detect mangled, missing or
     misdelivered pages.

  +o  Adjustable timeout to detect non-responding servers.

  +o  Login password support, allowing websites that require customer
     registration and login to be traced.

  +o  Autologoff when error condition is detected, allowing a login id to
     be reused, thus avoiding a manual reset or login timeout.

  +o  Support for cookies and handles embedded in URL's.

  +o  Limited ability to make generic text and key-value substitutions in
     HTTP header and body.

  +o  Support for SSLv2 and SSLv3 encryption.

  +o  Support for HTTP/1.0 and HTTP/1.1 protocols

  +o  Works through Socks & Proxy firewalls.

  +o  Rudimentary support for JavaScript, Frames, Layers, base href tags.
     This allows more complex sites to be correctly traversed.

  +o  Automatic followup of redirects (302 and 304 return codes).

  +o  Works with password-protected sites.




  +o  AAbbiilliittyy ttoo eemmuullaattee aa rreeaall uusseerr tthhrroouugghh aa vvaarriieettyy ooff ffeeaattuurreess,, ssuucchh
     aass ......

  +o  Emulation of the browser image (GIF) cache; cached images, and
     audio/video clips are not re-fetched.

  +o  Fixed or randomly variable "think time" to simulate a user pausing
     to read a web page.  Think time may be assigned per page, or an
     average declared for the session.

  +o  Blocks of URL's can be assigned fractional probabilities for
     playback.  Thus, complex user behaviors, such as visiting only part
     of a web site only some of the time can be emulated.




  +o  AAbbiilliittyy ttoo ggaatthheerr ttiimmeessttaammppss aanndd ppeerrffoorrmmaannccee ssttaattiissttiiccss,, iinncclluuddiinngg
     ......

  +o  Ability to time individual events, as well as gather summary
     statistics for entire sessions.  Features include:

  +o  Ability to timestamp a user session, including socket connect, SSL
     negotiation, socket read and write times, elapsed delay times.
     Timestamps can be obtained during record and during playback.

  +o  Automatic collection of statistics, including average, min, max and
     standard deviations.

  +o  Stats may be reported in detail, in blocks, and as a summary.  For
     example, the standard deviation for socket connection delay for a
     particular URL can be obtained, or the average connect time for a
     block of URL's, or the average end-to-end time over all pages for
     the entire session.




  +o  AAbbiilliittyy ttoo ssttrreessss llooaadd aa sseerrvveerr aanndd ggaatthheerr ssttrreessss ssttaattiissttiiccss
     tthhrroouugghh mmuullttii--uusseerr eemmuullaattiioonn..  FFeeaattuurreess iinncclluuddee ......

  +o  Load Ramp-up. Startup of clients staggered in blocks, avoiding
     initialization bottlenecks, as well as lockstep servicing,
     "sloshing" and other multi-user measurement pitfalls.

  +o  Statistics gathering synchronization.  This guarantees that that no
     statistics are gathered during the ramp-up and ramp-down phases.

  +o  Statistics gathering check-pointing.  Statistics are gathered and
     presented only for whole sessions, thus excluding partially begun
     or partially completed sessions that occur at ramp-up and ramp-
     down.  Stats are "rolled back" to last complete session.

  +o  Variety of scripts to extract, summarize and manipulate multi-user
     run reports, including CPU usage, queue lengths, page and session
     times.




  +o  MMuullttii--OOSS ssuuppppoorrtt:: mmoosstt UUnniixx''ss,, LLiinnuuxx,, 9955 aanndd NNTT..


  +o  Ports to other Unix's should be straightforward.

  +o  Multiuser support, stress-loading and tools not available on 95/NT
     due to OS limitations.  (Multiuser support requires shared memory;
     timeouts require alarms; neither are supported on NT/95/98. SSL
     support requires encryption which is available on NT only under
     restrictive licenses.  In addition, running multiple copies of
     webclient on NT seems to bring out stability problems in the
     NT/95/98 TCP/IP stack (viz. hangs & crashes).



  22..  NNeeww FFeeaattuurreess,, CCoommppaattiibbiilliittyy

  Version 4.1 of the WebLoad tools adds a number of new features.  In
  the course adding these features, there have been minor changes that
  break backwards compatibility of scripts with older tools.



  22..11..  CCoommppaattiibbiilliittyy IIssssuueess


  +o  The webbot program is no longer supported.  All features previously
     found in webbot are now supported with webclient.

  +o  The webmon program now uses the -v flag where in the past it used
     the -q flag.  This makes the meanings of these flags the same on
     both webmon and webclient.  TThhiiss cchhaannggee wwiillll bbrreeaakk eexxiissttiinngg ssccrriippttss
     tthhaatt uussee webmon.

  +o  The webclient program uses the HTTP/1.1 protocol by default, rather
     than HTTP/1.0 that earlier versions used.  By default, it will use
     four threads to fetch gifs in parallel, and obeys the full HTTP/1.1
     Persistant Connection (KeepAlive) semantics.

  22..22..  NNeeww FFeeaattuurreess ssiinnccee VVeerrssiioonn 33..22..22


  +o  Allows use of custom headers provides extensive ``header re-writing
     capabilities''.

  +o  Added support for arbitrarily large requests, such as those used by
     OFX and other HTTP-based, non-HTML protocols.

  +o  Request ``headers'' and bodies may be retrieved from separate
     files.

  +o  Support for ``HTTP Authorization''.

  +o  Checksum computation can be disabled on a per-URL basis by
     specifying -1 for the ``checksum''.

  +o  Extended ``input file format'' is both more flexible and easier to
     read.  Backwards compatibility maintained for older files.

  +o  Support for handles ( ``URL-cookies'') embedded in HTTP response
     bodies.

  +o  Gaussian or Exponential ``think-time distributions'' may be
     specified.

  +o  Enhanced ``runtime statistics'' reporting by run.workload allows
     better at-a-glance tracking of performance while multi-usr runs are
     underway.




  33..  QQuuiicckkSSttaarrtt


  This chapter provides a quick introduction to the WebLoad tools,
  concepts and procedures.  It should get you up and running quickly
  without having to read a lot.  For more complete information, please
  follow industry standard RTFM procedures.



  33..11..  IInnttrroo

  WebLoad is a http server performance measurement tool suite.  In
  typical usage, there are three phases to using WebLoad:



  1. Collecting and recording a "session" (a sequence of URL's) based on
     an actual tour of a web site with a browser,

  2. Using the session file to make single-user statistical &
     performance measurements,

  3. Using the session file to emulate a load of tens or hundreds of
     users actively accessing the web site.


  WebLoad can also be used in other ways, such as collecting detailed
  traces of actual HTTP traffic, periodically "pinging" a server to see
  if its alive, checking up on site response times, validating web
  pages.


  33..22..  PPrreerreeqquuiissiitteess

  The WebLoad tools have been compiled and run under a variety of
  Unix's, as well as Win95 and WinNT.  The current distribution has only
  been tested under Linux, although ports to other platforms should be
  easy.

  Note, however, that not all features are available under 95/NT (due to
  MS limitations).  The multiuser tools cannot be made to work on NT/95.
  The data analysis scripts require perl to be installed, however,
  without the tools, there's no data to analyze ...


  33..33..  CCoolllleeccttiinngg aa SSeessssiioonn FFiillee

  A sequence of URL's representing a user's session with a browser can
  be collected with the webmon tool.  The webmon program acts as an HTTP
  proxy in that it listens for and accepts connections from a web
  browser, and then forwards the actual URL requests to the true
  destination.  It will record the requests that it receives in a
  session file that can later be "played back" with the webclient tool.


  The webmon tool takes a number of command line parameters that must be
  specified in order for the tool to work; these parameters include the
  port number to listen on, the name of the session capture file, and
  whether, and what kind of SSL encryption should be used.


  Rather than typing in all of the required command-line parameters, it
  is easiest to run webmon from a shell script.  An example AIX script
  is provided in examples/run.webmon; the Windows 95 equivalent is the
  batch file examples/webmon.bat.  You should make a copy of this file,
  and edit it to suit your tastes and particular setup.


  To collect a session log, simply start run.webmon, and then configure
  your browser to use webmon as a proxy.  To do this, you will need to
  find the _P_r_e_f_e_r_e_n_c_e_s menu entry for your browser.  For example, for
  the Netscape browser, follow the menu entries _E_d_i_t_._._._P_r_e_f_e_r_e_n_c_e_s_._._.
  then choose _A_d_v_a_n_c_e_d_._._._P_r_o_x_i_e_s and select _M_a_n_u_a_l _p_r_o_x_y
  _c_o_n_f_i_g_u_r_a_t_i_o_n_._._._V_i_e_w_._._..  You will see a dialog window with a number
  of blanks, including two labeled _H_T_T_P _P_r_o_x_y and _S_e_c_u_r_i_t_y _P_r_o_x_y.  Type
  in the name of the host on which you plan to run webmon; and use 5080
  (the default webmon port number) as the port.  Leave the field for the
  _S_O_C_K_S _H_o_s_t blank.  Then click on _O_K to complete the changes.

  You can now use the browser as you ordinarily would, visiting
  whichever sites you please.  Webmon will record and trace your
  activity.  To stop webmon, just kill it.  It cleans up automatically.

  BBee ssuurree ttoo cclleeaarr yyoouurr bbrroowwsseerr''ss ccaacchhee bbeeffoorree ssttaarrttiinngg..  If you've been
  previously surfing a site before you turn on webmon it is likely that
  your browser will work with pages taken from the browser's cache,
  rather than sending new requests to the server.  If the browser
  doesn't contact the server, then webmon can't record the traffic.
  What it will record will look fragemented and confusing.

  Webmon will create a file that resembles examples/http.requests, while
  the messages written to the screen should resemble
  examples/webmon.out.





  33..44..  CCoolllleeccttiinngg TTrraacceess

  webmon can be used to collect raw http traces, showing all requests
  and responses.  Tracing is enabled by specifying the -t flag followed
  by the file name where the trace should be written.  You can
  selectively trace only the client requests, or the server responses,
  by using the --trace-client and --trace-server flags.




  33..55..  PPeerrffoorrmmiinngg aa SSiinnggllee--UUsseerr RRuunn

  After a URL trace has been collected, the webclient tool can be used
  to collect single-user timing statistics.  It will take a sseessssiioonn of
  URL's recorded with webmon, and play them back in order, measuring a
  variety of statistics for each URL.  To obtain a good sampling of
  aavveerraaggee statistics, it can be made to replay the session over again
  multiple times.  To better simulate an actual user, it can be made to
  pause between each fetch.


  Webclient supports a number of optional features.  Among these are:


  +o  compute checksums of returns pages, and compare to reference
     checksums.  This is useful for detecting situations where the web
     server is returning garbage and/or completely unexpected data.

  +o  generate an error if the web server took too long to respond.

  +o  run a log-off script if an error occurred, so that logged-on
     accounts are not left hanging.


  The best way to create a list of URL's for the webclient program is to
  modify the output trace of the webmon proxy server.  The sequence of
  requests to be played back must delimited with a <<START>> tag.  This
  tag is automatically written by webmon.  The end of the sequence of
  requests must be delimited with the <<END>> tag, followed by two
  numbers: the _c_o_u_n_t and the _t_h_i_n_k _t_i_m_e.  The _c_o_u_n_t is the number of
  times that the session should be run.  The _t_h_i_n_k _t_i_m_e is the number of
  seconds to pause between each request.  A zero think time causes the
  webclient to replay the requests as quickly as it can; a non-zero
  value causes webmon to pause between requests, emulating a user
  reading and thinking about a web page, before moving on to the next
  request.


  To further simulate a realistic setting, the _t_h_i_n_k _t_i_m_e can be handled
  on one of two ways: either as a fixed length of time to wait, or as a
  random amount of time to wait, where the average of the random times
  is specified.  To get get webclient to use fixed length pauses,
  specify the think time as a *negative* value.  Webclient will pause
  minus the specified number of seconds.  To get randomized pause
  intervals, specify the think time as a positive quantity.  The average
  length of the pause will then be the specified think time.


  Webmon automatically writes out the <<END>> tag, like so:






  ______________________________________________________________________

  <<END>> COUNT MEAN_THINK_TIME
  ______________________________________________________________________




  This needs to be edited to look more like so:


       ______________________________________________________________________

       <<END>> 5  12.6
       ______________________________________________________________________




  which specifies five repetitions of the session, and an average pause
  of 12.6 seconds.


  An example input file can be found in examples/webclient.input, which
  is just a modified version of the file examples/http.requests which
  was generated by webmon.  The output report that results from running
  webclient can be seen in examples/webclient.report.  In order to
  simplify the running of webclient, the shell script
  examples/run.webclient can be modified and used to run webclient. It
  documents some of the more basic flags that are used with webclient;
  for a full description of the webclient parameters, refer to the
  webclient documentation.




  33..66..  PPeerrffoorrmmiinngg aa MMuullttii--UUsseerr RRuunn

  Performing accurate multi-user performance measurements is harder than
  merely starting 25 copies of webclient within a single shell script.
  This package includes several utilities to simplify the process.
  Foremost is the script run.workload.  It performs one important,
  crucial function: it synchronizes the clients so that none of them are
  making performance measurements until all of them have started.  This
  is vital for getting accurate measurements, as the ramp-up times for
  starting dozens of clients can take dozens of minutes, and the CPU,
  network, paging-space usage will vary wildly during ramp-up.  In
  addition,  one typically wants to start the clients in a staggered
  fashion, so that they are not all fetching the same pages at the same
  times.  The run.workload script provides both this synchronization,
  and the staggered startup.


  The run.workload script also simplifies measurements if multiple
  servers need to be loaded simultaneously.  It also provides facilities
  for specifying multiple, unique logins & passwords for each client.


  Briefly, take the following steps to do a multi-user run:



  +o  review the chapter ````HHooww TToo DDoo MMuullttii--UUsseerr RRuunnss'''' below.

  +o  In the examples directory,  edit run.workload, modify number of
     servers, the host addresses and port numbers, etc. as appropriate.
  +o  In the examples directory, edit the passwd file, and add/change any
     login names and pins/passwords as required.

  +o  run run.vmstat on the server machine to get server CPU usage info

  +o  use cputotals script to summarize stats about CPU usage

  +o  use sumstats to boil down multiple client stats into a shorter,
     more manageable collection.




  33..77..  AAcccceessssiinngg EExxtteerrnnaall HHoossttss ffrroomm BBeehhiinndd aa FFiirreewwaallll

  There are many different types of firewalls; the mechanism used to get
  access to the outside world depends on the type of firewall. You
  should consult with the local sysadmin to find out what kind of
  firewall you have.  WebLoad supports three styles of firewalls:
  NAT/Masq, Proxy, and SOCKS.


  If you are reading this, you probably don't have a NAT/Masq firewall,
  as this would be completely transparent to you, and you wouldn't know
  that you were behind a firewall.


  If you have a choice between a Proxy and SOCKS firewall, the Proxy is
  recommended.  This is because the Proxy is much easier to configure,
  and the code/implementation is simpler, more robust, and less prone to
  unpleasant surprises.  SOCKS may yield slightly better performance,
  maybe.




  33..77..11..  NNAATT ((NNeettwwoorrkk AAddddrreessss TTrraannssllaattiioonn)) // MMaassqquueerraaddiinngg BBaassttiioonn FFiirree--
  wwaallllss

  These should be completely transparent to clients behind the firewall.
  All of the WebLoad clients should "just work".  If they do not, check
  that the default route (the route for all external IP packets) is
  aimed at the firewall.  If the route (netstat -r) appears OK, then
  check with the firewall admin that port 80 and/or port 443 (the SSL
  port) have not been filtered or disabled.




  33..77..22..  PPrrooxxyy SSeerrvveerrss

  Proxy Servers are a special kind of a webserver designed to relay HTTP
  traffic between the internal, protected network and the external
  world.  Proxy servers are typically installed on the bastion host, and
  are typically configured to listen to port 80 or 1080 for HTTP
  traffic.


  The two clients webmon and webclient can chat proxy-HTTP with proxy
  servers. Simply specify the proxyhost:portnum with the -P flag
  (capital P flag).


  If you are able to access ordinary web sites, but not SSL-secured
  websites with the -P flag, then please verify with the sysadmin that
  the proxy understands the CONNECT protocol, and that it is configured
  to allow CONNECT's.




  33..77..33..  SSOOCCKKSS BBaassttiioonnss aanndd GGaatteewwaayyss

  SOCKS-based firewalls require specially-modified clients and special
  configuration files.



  33..77..33..11..  SSOOCCKKSS UUnniixx IInnssttrruuccttiioonnss::

  The "r" versions of the clients: rwebmon and rwebclient, are special
  socks-ified versions of the usual clients that know how to get through
  a SOCKS-style firewall.  They do require special configuration.


  There must be a properly configured socks client file in
  /etc/socks.conf.  This file must be installed by root.  A sample
  socks.conf file can be found in the examples directory.  This file
  specifies which IP addresses lie behind the firewall, which ones are
  external, and how to route external addresses to the appropriate SOCKS
  server.  Ask the sysadmin to help setup and configure the
  /etc/socks.conf file.


  Alternately, a socks server can be specified with the SOCKS_SERVER
  environment variable.  For example, for the machine soxy.lady.com has
  a SOCKS server configured to listen to port 1080, then try




       ______________________________________________________________________
       export SOCKS_SERVER=soxy.lady.com:1080   # in sh/bsh/bash/ksh

       setenv SOCKS_SERVER soxy.lady.com:1080   # in csh/tcsh
       ______________________________________________________________________







  33..77..33..22..  SSOOCCKKSS WWiinnddoowwss9955 oorr WWiinnddoowwss NNTT IInnssttrruuccttiioonnss::

  There are several packages available for Win95 that implement the
  SOCKS package.  Two of note are the NEC SocksCap package, and the
  Hummingbird SOCKS package.  It appears that the NEC package will not
  inter-operate properly with some SSL libraries used by the WebLoad
  software, causing Win95 to lock up hard.  The Hummingbird package
  appears to work quite well.  The instructions below detail
  installation & operation of the Hummingbird package.


  The Hummingbird package is available free-of-charge at
  <http://www.hummingbird.com/products/socks/> It is also accessible
  through <http://www.hummingbird.com/freestuff.htm> The license appears
  to be fairly unencumbered & does not seem to in any way restrict use.




  33..77..33..22..11..  HHuummmmiinnggbbiirrdd IInnssttaallllaattiioonn IInnssttrruuccttiioonnss


  1. The download file is called socksx86.zip

  2. unzip it in directory C:\EXCEED\SOCKSX86

  3. While running Win95, open a DOS shell.

  4. Run .\INSTALL.BAT at the command line

  5. Warning: double-clicking on the INSTALL.BAT icon from the Win95
     file browser will cause installation to fail.  It will *NOT*
     install this way. It MUST be run from the Windows shell command
     line.  Furthermore, this must be done from a DOS shell opened while
     Win95 is running.  It will not install properly if Win95 is not
     running.

  6. Edit C:\WIN\SYSTEM\SOCKS.CNF This file uses the same format and
     syntax as the Unix /etc/socks.conf file; so if you have one handy
     that works well, just copy it to C:\WIN\SYSTEM\SOCKS.CNF
     Otherwise, edit this and add a line similar to the following:



       ______________________________________________________________________
          SOCKD4 @=9.3.199.116 0.0.0.0 0.0.0.0
       ______________________________________________________________________





  The first number is the IP address of your socks server.  The second
  two numbers are _m_a_s_k_s.  In this example, the masks are set so that
  *all* IP traffic to & from your box will go through the the socks
  server, even if its internal traffic.  You may want to change this so
  that internal traffic does not go through socks.

  That's it.


  After installing Hummingbird SOCKS, note that *all* TCP/IP
  applications are socks-ified, and get access to the external world.
  In particular, Netscape and MSIE will both access the external world,
  even if they are configured with proxy turned off. (i.e.  configured
  for _d_i_r_e_c_t _c_o_n_n_e_c_t_i_o_n _t_o _t_h_e _I_n_t_e_r_n_e_t).

  The webmon and webclient exe's have been tested and work just fine
  with the Hummingbird socks package.





  33..77..33..22..22..  TThhee NNEECC SSoocckkssCCaapp PPaacckkaaggee

  The following instructions are xxxx'ed out because the SocksCap
  package causes Win95 to lock up hard when used with these clients on
  my machine.  You may have better luck, and so I left these
  instructions in.  Note that the Hummingbird SOCKS package described
  above does work.

  Installing SOCKS for Win95/NT:


  1. Go to  <http://www.socks.nec.com/>

  2. Download SocksCap for Win32

  3. Run sc32r103.exe (the setup file)

  4. Start SocksCap (sc32.exe)

  5. Choose config from the menu, setup your socks server and port user:
     your name protocol: socks 4 or socks 5 (probably socks 4 at most
     sites)

  6. Create a new "application profile"; add the name of your
     application and its working directory

  7. Start the application from the SocksCap control panel by double-
     clicking

  To run a program under SocksCap from the command line or a batch file:


       ______________________________________________________________________
       C:\fullpath\to\sockscap\sc32.exe c:\fullpath\to\your\app  -flags -for -your-app
       ______________________________________________________________________








  44..  RReeccoorrddiinngg SSeessssiioonnss wwiitthh WWeebbMMoonn

  WebMon is an HTTP tracing and capture tool.  It can be used to examine
  the HTTP dialogue between a web browser and an web server.  It
  provides a variety of features & functions:



  +o  Record a web-surfing session.  The recorded URL's can then be
     played back with the webclient tool.

  +o  Write a trace of all HTTP traffic between the server & browser to a
     file.  The trace will show both requests and responses, and is
     useful for debugging header and data problems.

  +o  Use SSL to connect to either the browser, the server, or both.  In
     a very limited way, can be used to initiate "man-in-the-middle"
     attacks.  Non-SSL browsers can be made to connect to SSL servers,
     and v.v.

  +o  Provide request and response timing information about an actual
     browsing session.  Since timing is done at the socket level, no
     browser overhead is included in the timings.  The browser will see
     a somewhat degraded response, but the measurements produced by
     webmon will be repeatable and far more accurate than a hand with a
     stopwatch.

  +o  Rewrite and make substitutions in the HTTP request, and in the
     reply.  Substitutions include values in the HTTP request and
     response headers, as well as specific strings that occur in the
     text bodies.  Rewriting support is partial, experimental, and
     poorly documented at this time.  It usually works... but contact
     the maintainer for additional support.

  WebMon's most popular use is to provide the input files for the
  webclient tool.



  44..00..11..  LLiimmiittaattiioonnss

  webmon does not try to match the connection behavior of the browser in
  how it fetches pages.  It will open distinct and unique sockets for
  each web-page GIF fetch, instead of reusing an existing socket like
  many web browsers.  In particular, it does not emulate or the HTTP
  "KeepAlive" protocol, nor does it use multiple parallel connections to
  the web server.  Thus, it cannot be used to diagnose conditions.  In
  most respects, webmon appears to be a single-threaded process to the
  web server.




  44..11..  UUssiinngg WWeebbmmoonn

  In the default mode of operation, webmon acts as a proxy server.
  Therefore, you must configure your browser to use webmon as a proxy,
  as described in the previous chapter.  Specific example usages are
  given in the next section, below.

  XXX this section needs more work ...


  The following command-line options are defined:




































  ______________________________________________________________________
  Usage: webmon [options]

  -a, --act-as-proxy        behave as a proxy server
  -A, --alarm=<time>        turns on timeout alarms (delay 'time')
  -d, --debug               print basic debugging messages
  -D, --Debug               turns on verbose debugging
  -E, --Debug-debug         turns on extra verbose debugging
  -e, --print-each          print individual response time observations
  -h, --help                print this message
  -n, --no-xlation          disable host/url translation/rewriting
  -p, --listen-port=<port>  specifies the port to listen to
  -P, --proxy=<proxy:port>  specifies the proxyserver and port
  -q, --debug-time          turns on debugging output for timing statistics
  -Q, --Debug-time          turns on verbose timing debug output
  -r, --report-file=<file>  specifies the report file name
  -t, --trace-file=<file>   write HTTP traces to file
  -U, --user-agent=<string> specify value of 'User-Agent:' in HTTP header
                            string must be enclosed in single quotes
  -v, --request-file=<file> record browser requests in this file
  -w, --webserver=<server:port> specifies the webserver and port

  Flags without short-form equivalents:
  --access-log=<file>       write webserver-style access log
  --no-bug-compat           enable strict conformance to HTTP standards
  --quiet                   minimize messages written to stdout
  --show-progress           write out each URL as its fetched
  --trace-client            write exchanges with client to trace file
  --trace-server            write exchanges with server to trace file
  --version                 print version info and exit

  SSL options:

        --ssl-server           use SSL to connect to server
    -S, --server-cipher=NUM    force use of a specific cipher
        --server-timeout=NUM   set SSL session cache timeout (seconds)
        --ssl-browser          use SSL to connect to browser
        --browser-cipher=NUM   force use of a specific cipher
        --browser-timeout=NUM  set SSL session cache timeout (seconds)
        --certif=<filename>    specify file containing certificates
    -k, --keyring=<filename>   specify file containing keys
    -K, --password=<passwd>    specify keyring password
    -N, --distinguish=<name>   specify a distinguished name

  Cipher suites may be specified in the form SSLV2:N with N 1,2,3,..7
  or SSLV3:NN for NN=00, 01, 02 ... 06 or 0A.If a cipher suite has been
  specified, then no cipher suite negotiation is performed.
  ______________________________________________________________________






  44..22..  SSSSLL CCiipphheerr--SSuuiittee CCooddeess

  The Cipher-suite numbering convention follows that of the SSLv3 spec.
  Below is a partial listing; see the spec for a full listing.


  SSLV2 Cipher suite names and N  values:





  ______________________________________________________________________
  SSL2_RC4_128_WITH_MD5 =                 1
  SSL2_RC4_128_EXPORT_40_WITH_MD5 =       2
  SSL2_RC2_128_CBC_WITH_MD5 =             3
  SSL2_RC2_128_CBC_EXPORT40_WITH_MD5 =    4
  SSL2_IDEA_128_CBC_WITH_MD5 =            5
  SSL2_DES_64_CBC_WITH_MD5 =              6
  SSL2_DES_192_EDE3_CBC_WITH_MD5 =        7
  ______________________________________________________________________




  The Cipher values +rc4export and +rc2export in the magnus.conf file
  for the webserver appear to correspond to cipher suite specifications
  of 2 and 4, respectively.  (Yes, they are backwards....)


  SSLV3 Cipher suite names and NN values:



       ______________________________________________________________________
       SSL_NULL_WITH_NULL_NULL =                   0x0000
       SSL_RSA_WITH_NULL_MD5 =                     0x0001
       SSL_RSA_WITH_NULL_SHA =                     0x0002
       SSL_RSA_EXPORT_WITH_RC4_40_MD5 =            0x0003
       SSL_RSA_WITH_RC4_128_MD5 =                  0x0004
       SSL_RSA_WITH_RC4_128_SHA =                  0x0005
       SSL_RSA_EXPORT_WITH_RC2_CBC_40_MD5 =        0x0006
       SSL_RSA_WITH_DES_CBC_SHA =                  0x0009
       SSL_RSA_WITH_3DES_EDE_CBC_SHA =             0x000A
       ______________________________________________________________________





  OpenSSL supports a large number of additional ciphers not listed here.




  44..33..  TThheeoorryy ooff OOppeerraattiioonn

  Webmon can be used in one of three different ways (which may seem
  similar, but have important, fundamental differences).  Lets term
  these the "proxy" mode, the "rewrite" mode, and the "passthru" mode.


  44..33..11..  PPrrooxxyy MMooddee

  For most users, it is suggested that the "proxy" mode be used in
  normal operation, as it is the simplest to configure, understand and
  use.  In this mode, webmon acts as an ordinary web proxy.  That is, it
  will listen for URL requests that have a server name embedded in them,
  and will contact these servers on behalf of the browser.  Thus, a
  typical HTTP request line sent by the browser to webmon will look like
  GET http://some.where.com/some/file.html HTTP/1.0 Upon receipt, webmon
  will contact the host some.where.com on port 80 and will issue it the
  usual HTTP request GET /some/file.html HTTP/1.0.  It will then wait
  for a response, and pass this response back to the browser,
  unmodified.  Essentially all proxies work in this way.

  When webmon is used in proxy mode to access secure (encrypted) web
  sites, it does do something unusual.  Instead of passing the secure
  connection request to the secure server, it will instead present it's
  own credentials, initiating a "man-in-the-middle" attack.  This allows
  webmon to decrypt the traffic from the browser and record it, _i_._e_. to
  snoop on it.  It does this by listening for the CONNECT HTTP
  directive, and instead of connecting, it responds with its own
  certificate, as if it were the end host.  As long as the browser
  accepts the certificate, webmon will be able to snoop on the browsers
  traffic.

  Note that webmon is distributed with a working example certificate in
  the examples directory.  This certificate contains an export-grade
  (512-bit) private key and a signature from a fictitious certification
  authority.  This signing authority will not be recognized as official
  by most browsers, and thus, most browsers will put up warning dialogs
  when presented with this certificate.   If a signed certificate were
  obtained from the usual recognized authorities, webmon could be used
  to snoop on encrypted traffic without the users knowledge (provided
  the user had configured to use webmon as a proxy).

  To use webmon in the "proxy" mode, specify the -a and -n command line
  flags.



  44..33..22..  RReewwrriittee MMooddee

  Another mode of operation for webmon is the "rewrite" mode.  In this
  mode, the browser accesses webmon as if it were the true server, i.e.
  it uses webmon's URL as the server URL.  As in all of the other modes
  this mode, webmon will pass the request on to the true server, and
  wait for a reply.  When it receives the reply, it will "rewrite" it to
  blank out all embedded URL's, and convert them into local URL's.  In
  this way, it can keep the browser from ever "escaping" to a different
  server: any link that the user clicks on will always be a link that is
  "local" to the webmon server, and thus the browser will always direct
  this request back to webmon.

  In "rewrite" mode, the browser *must* be specifically aimed at webmon,
  and the port that webmon listens to, and webmon must in turn be aimed
  at a specific website.  One must tell webmon about an initial web
  server with the -w flag, thus for example: webmon -w
  www.fictitious.com.  DDoo nnoott use the -a or -n flags in this mode; they
  are mutually incompatible.  Then, in order for the browser to access
  the site _w_w_w_._f_i_c_t_i_t_i_o_u_s_._c_o_m, the user must request the URL
  http://your.webmonhost.yourdomain.com:5080/.  Similarly, if the user
  wants to view http://www.fictious.com/some/page.html, they should
  request http://your.webmonhost.yourdomain.com:5080/some/page.html.  As
  long as the user just clicks on links, and does not type any in,
  webmon will capture the surfing session and record it.

  Note that in "rewrite" mode, webmon needs to perform some amount of
  parsing of HTML pages in order to find links and re-write them.  It is
  a very simple parser, and it can get confused by more complex sites.
  It does understand some basic JavaScript, and forms, but it can be
  confused by layers and ilayers, and will occasionally trip over a
  <base href= tag.  Because of this, the rewrite mode is not the
  recommended mode of operation.

  Note that a certain amount of SSL testing can be done in rewrite mode
  that is difficult to do in other modes. In this mode, webmon can
  accept unencrypted connections, and convert them to encrypted
  connections; it can also do the _v_i_c_e_-_v_e_r_s_a.  The --ssl-server flag is
  used to indicate that an SSL session should be negotiated with the
  server; the --ssl-browser flag is used to negotiate an SSL session
  with the client.  Note that a certificate is needed when talking SSL
  to the client.  Instead of allowing SSL to negotiate a cipher suite,
  the --server-cipher and --browser-cipher flags can be used to
  explicitly specify an SSLv2 or SSLv3/TSLv1 cipher suite.



  44..33..33..  PPaasssstthhrruu MMooddee

  The "passthru" mode is the most transparent mode of operation that
  webmon has.  In this mode, all re-writing is disabled.  It neither re-
  writes the requests (as is done in proxy mode) nor does it rewrite the
  responses (as it does in rewrite mode).  It simply passes any received
  request to an indicated server.  This is done by using the -w and the
  -n command-line flags together.

  The most useful form of this mode is when the -w flag is used to
  specify a proxy (and nnoott the -P flag), and the browser is told that
  webmon is a proxy.  In this configuration, webmon will simply record
  all the traffic between the browser and the proxy, without changing
  any of it.

  Note that in pass-through mode, webmon will still listen for the
  CONNECT protocol directive used for establishing encrypted sessions,
  and will initiate a man-in-the-middle attack on the traffic.  In this
  way, it can still snoop on the encrypted traffic.



  44..33..44..  FFiirreewwaallll OOppeerraattiioonn

  All three of the modes described above can be used with either socks
  or proxy style firewalls.  Firewall operation proceeds as described
  elsewhere in this document, and webmon will always "do the right
  thing" in such situations.   These modes are not further described
  here, as they serve only to confuse conversation, whereas their
  operation is transparent.



  44..44..  EExxaammppllee UUssaaggee

  One typically wants to use a number of command-line flags with webmon,
  and it is easiest to put this into a script batch file, and treat it
  as a config file.  The example file examples/run.webmon illustrates
  such a file.  Various important switch settings are reviewed below.


  To configure webmon to act as an HTTP proxy, listening to port 5080,
  and recording the session to a file my.session, start webmon as
  follows:



       ______________________________________________________________________
                webmon -a -n -p 5080 -v my.session
       ______________________________________________________________________




  Note that long-form versions of these flags can be used: these require
  more typing, but can serve as memory aids to make the file more
  readable:




  ______________________________________________________________________
           webmon --act-as-proxy          \
                  --no-xlation            \
                  --listen-port=5080      \
                  --request-file=my.session
  ______________________________________________________________________




  When webmon starts it will print a summary of the command-line
  options, and then indicate that it is listening to a particular port.
  The tail end of this message will look like so:



       ______________________________________________________________________
                Info: webmon: listening for connections at yourmachinename:5080
                Info: webmon: waiting for a connection.......
       ______________________________________________________________________





  XXXXXX UUNNDDEERR CCOONNSSTTRRUUCCTTIIOONN Finish writing this section...

  To collect timing statistics and put them in the file webmon.report,
  the command would look like the following:



       ______________________________________________________________________
                 webmon -a -n -p 5080 -r webmon.report
       ______________________________________________________________________




  To trace traffic ...


       ______________________________________________________________________
                  webmon -a -n -p 5080 -t webmon.session
       ______________________________________________________________________




  or use the flags ...


       ______________________________________________________________________
                  webmon -a -n -p 5080 --trace-server -t webmon.session
                  webmon -a -n -p 5080 --trace-client -t webmon.session
       ______________________________________________________________________




  To do everything as above, but connect to the server using SSLV3 and
  cipher suite 4 (RC4_128_MD5):




  ______________________________________________________________________
             webmon -a -n -p 5080 -S SSLV3:04 -v webmon.session
  ______________________________________________________________________




  or use start webmon with the --ssl-server option (use SSL), and ...

  The script file examples/run.webmon has been included here to give you
  examples of how to use webmon.  You can use these scripts directly, if
  you like.  Take a look at them to see how they are used.





  44..55..  SSttaattiissttiiccss RReeppoorrtteedd bbyy WWeebbmmoonn


  webmon prints the following statistics to its console and to the
  report file.  The existence of a report file (-r option) tells webmon
  to calculate and print these timing statistics:



     rreessppoonnssee ttiimmee
        elapsed time from connect until all data for the indicated
        request to arrive. This time does *not* include time spent to
        resolve host names into IP addresses.


     ffiirrsstt ddaattaa rreessppoonnssee ttiimmee
        time from connect until the first message containing body data
        is received.


     ccoonnnneecctt ttiimmee
        time required to establish a connection to the host for this
        request.  If you are using SSL, it includes the SSL connect
        time.


     hheeaaddeerr ddeellaayy
        time from when the request was sent to the host until the HTTP
        header was received.


     bbyytteess ttrraannssffeerrrreedd
        number of header and body bytes received.


     ttrraannssffeerr ttiimmee
        time from when the header was received until the last byte of
        body was received.


  In addition, the following statistics are placed in the report file,
  for each request, if SSL is enabled:


     SSSSLLCCoonnnneeccttOOvvhhdd
        The time spent in the SSL code running on the local host.  This
        time is indicative of CPU cycles spent on the client that are
        independent of any processing occurring on the server.

     NNeettwwoorrkk DDeellaayy CCoonnnneecctt
        The time spent in TCP/IP and waiting for the response from the
        host during an SSL connection handshaking & negotiation.


     SSSSLLHHeeaaddeerrOOvvhhdd
        The time spent in local SSL code during the processing for the
        HTTP header.


     NNeettwwoorrkk DDeellaayy HHeeaaddeerr
        The time spent in TCP/IP and waiting for the HTTP header to be
        returned.


     SSSSLLTTrraannssffrrOOvvhhdd
        The time spent in SSL during transfer of the body of the
        message.


     NNeettwwoorrkk DDeellaayy TTrraannssffeerr
        The time spent in TCP/IP and waiting for the body to be
        returned.

  The following invariants are true about the various statistics:



       ______________________________________________________________________

       response time = connect time + header delay + transfer time

       connect time  = SSLConnectOvhd + Network Delay Connect

       header delay  = SSLHeaderOvhd + Network Delay Header

       transfer time = SSLTransfrOvhd + Network Delay Transfer
       ______________________________________________________________________







  55..  PPllaayyiinngg BBaacckk SSeessssiioonnss wwiitthh WWeebbCClliieenntt

  WebClient is an HTTP submitter that simulates the network behavior of
  a web browser (or other HTTP client).  It can be used to measure the
  performance of a web server.  In normal operation, WebClient will read
  a session's worth of URL's from a file, submit these to a webserver
  host, and time the responses.  Statistics are computed an shown for
  one or more repeated session runs.



  55..11..  FFeeaattuurreess

  WebClient provides the following features and functions:



  +o  Handles POST, HEAD and GET HTTP commands.

  +o  Supports SSL encrypted connections, including cipher suite
     negotiation.
  +o  Will work through both HTTP Proxy and SOCKS firewalls, using either
     encrypted or unencrypted connections.

  +o  A session of URL's can be repeated multiple times in order to
     generate averages, standard deviations, worst-case and best-case
     timing statistics.

  +o  Blocks of URL's in a session can be assigned a fractional weight,
     causing that block to be fetched only a fraction of the time.  This
     feature is useful for simulating a variety of "typical" user paths
     through a web site, where the user might visit different parts of
     the site on different occasions.

  +o  WebClient can pause either a fixed amount of time, or a random
     amount of time (exponentially distributed) between URL fetches.
     This "think time" can be used to simulate a user's pauses at the
     keyboard, and is particularly useful in generating a realistic
     multi-user load upon a webserver.

  +o  Different think-times can be specified at different points of the
     script.

  +o  Think-time distributions of exponential, gaussian or fixed may be
     specified.

  +o  Will automatically follow a chain of redirects issued by the web
     server.  That is, if the web server replies with the 30x series
     status codes, webclient will fetch the web page from the new
     redirected location.

  +o  If enabled, will automatically analyze the web page for embedded
     GIF's, JPEG's, audio files, _e_t_c_., and will fetch these.  The times
     required to fetch GIF's are recorded.

  +o  Will emulate the caching behavior of a browser with regard to the
     GIF/JPEG/etc. files.  That is, if an image has already been
     fetched, it will not be fetched again, thus more accurately
     representing client load and leading to better statistics.

  +o  Statistics can be reported for blocks of URL's.  This is
     particularly useful if the block is normally treated as a logical
     "whole" by the web browser: for example, a collection of frames
     (framesets), or multiple separate script files that are fetched
     with a page.  (Redirects and GIF's are already handled
     automatically, framesets and scripts/applets are not.)

  +o  Automatically handles "cookies", accepting and returning cookies as
     appropriate.

  +o  Can use checksums to validate the page that the webserver returned.
     If the webserver returns a webpage whose checksum does not match
     the previously computed checksum, an error is reported.

  +o  Provides a two-phase startup protocol.  Clients can be started at
     staggered intervals, so that web traffic is randomized; however, no
     statistics are gathered until a synchronize flag is written.  This
     allows accurate steady-state multi-user measurements to be made, by
     excluding the ramp-up phase from the statistics measurements.

  +o  Avoids printing session fragment statistics.  A session fragment
     can occur if webclient is interrupted in the middle of a session.
     Rather than including this fragmentary session in the statistics,
     webclient will roll back to the last complete session when
     computing summary statistics, and when reporting the completion
     time, it will report the time of completion of the last whole
     session (rather than the current time).   Similarly, fragmentary
     sessions can also occur during ramp-up.  Statistics gathering does
     not start until the first whole session after full ramp-up.

  +o  Provides ability to substitute text in submitted URL's.  Typically,
     the substitution is for username or password values, and is used
     during multi-client runs so that a single input file can be sued
     for all clients, while assigning each client a unique
     username/pin/password.


  +o  A timeout alarm can be specified which will generate an error
     report if the webserver does not respond within the indicated
     period.

  +o  Ability to report timestamps for each request issued to the
     webserver, also, the ability to report detailed statistics for each
     request.

  +o  Allows for graceful cleanup in case of error or interruption.  If
     webclient detects an HTTP error, or if it is interrupted (e.g. by a
     control-C), it can issue the HTTP requests in a cleanup file to
     (for example) log off a user (using that user's current cookie
     cache).

  +o  WebClient does nnoott support KeepAlive (although some preliminary
     work has been done to enable this.  Contact the author for more
     details and the availability schedule).

  +o  WebClient and WebMon do nnoott handle framesets automatically,
     although they should be modified to do so.



  55..22..  CCoommmmaanndd LLiinnee SSuummmmaarryy

  By typing webclient -h or webclient --help at the command line, a
  summary of command line options will be printed, as shown below.  More
  detailed documentation for each of these flags is presented in the
  section ``Webclient Command Line Flags'' below.



























  ______________________________________________________________________
  Usage: webclient [options]

  -A, --alarm=<time>       turns on timeout alarms (delay 'time')
  -c, --cache-emulation    turns on the gif cache emulation (requires -g flag)
  -d, --debug              print basic debugging messages
  -D, --Debug              turns on verbose debugging
  -E, --Debug-debug        turns on extra verbose debugging
  -e, --print-each         print individual response time observations
  -f, --input-file=<file>  specifies the url list file to be run
  -g, --fetch-gifs         turns on fetching of gif files
  -h, --help               print this message
  -i, --ignore-checksums   don't validate web pages with checksums
  -L, --log-file=<file>    specify name of error log file
  -m, --shmem=<child:shmkey> defines common shared memory segment
  -P, --proxy=<proxy:port> specifies the proxyserver and port
  -q, --debug-time         turns on debugging output for timing statistics
  -Q, --Debug-time         turns on verbose timing debug output
  -R, --random-seed=<seed> specify seed used to generate random think times
  -r, --report-file=<file> specifies the report file name
  -t, --trace-file=<file> write HTTP traces to file
  -u, --user-pin-pw=<username:userpin:userpasswd> same as specifying
          --substitute=<<USER>>:username
          --substitute=<<PIN>>:userpin
          --substitute=<<PASS>>:userpasswd

  -U, --user-agent=<string> specify value of 'User-Agent:' in HTTP header
          string must be enclosed in single quotes if it contains [{( etc...
  -v, --new-url-file=<file>     recalculate checksums, write new input file
  -W, --wait-interval=<seconds> pause after each trial, before starting next
  -w, --webserver=<server:port> specifies the webserver and port
  -x, --timestamps              write request start and end timestamps

  Flags without short-form equivalents:
  --access-log=<file>           write webserver-style access log
  --clean-exit=<file>           specifies urls to run on error or interrupt
  --cookie-path=<URL>           make sure that cookie was set for path
  --fork                        run in the background after validating args
  --handle=<field>              replace handle values in header from returned page data
  --header-add=<field:val>      substitute or add field-value to HTTP header
  --header-file=<file>          use the HTTP header found in this file
  --header-subst=<field:val>    substitute field-value to HTTP header
  --http-version=<float>        use HTTP/1.0 or 1.1 protocol
  --no-bug-compat               enable strict conformance to HTTP standard
  --no-keep-alive               do not use keep-alive sockets to feth gifs
  --num-threads=<int>           number of threads to use for gif fetching
  --num-sessions=<int>          override number of times to replay session
  --quiet                       minimize messages written to stdout
  --skip-stats                  don't collect or print performance statistics
  --show-progress               write out each URL as its fetched
  --substitute=<key:value>      replace "key" with "value" in POST or URL
  --think-time=<float>          override the think time in the input file
  --think-fixed                 think time specifies a fixed time interval
  --think-exponential           think time is random exponential distribution
  --think-gaussian              think time is random gaussian distribution
  --version                     print version info and exit
  --warn-checksums              print warnings when page checksums are bad
  SSL options:

        --ssl-server           use SSL to connect to server
    -S, --server-cipher=NUM    force use of a specific cipher
        --server-timeout=NUM   set SSL session cache timeout (seconds)
        --ssl-browser          use SSL to connect to browser
        --browser-cipher=NUM   force use of a specific cipher
        --browser-timeout=NUM  set SSL session cache timeout (seconds)
        --certif=<filename>    specify file containing certificates
    -k, --keyring=<filename>   specify file containing keys
    -K, --password=<passwd>    specify keyring password
    -N, --distinguish=<name>   specify a distinguished name

  Cipher suites may be specified in the form SSLV2:N with N 1,2,3,..7
  or SSLV3:NN for NN=00, 01, 02 ... 06 or 0A.If a cipher suite has been
  specified, then no cipher suite negotiation is performed.

  ______________________________________________________________________








  55..33..  EExxaammppllee UUssaaggee

  The shell script run.webclient in the examples directory provides a
  brief introduction to the most commonly used options.  It is
  reproduced below:












































  ______________________________________________________________________
  #!/bin/sh
  #
  # run.webclient
  #
  # This is a simple script that makes it easier to run the "webclient"
  # tool.  You should modify this script to suit your installation,
  # tastes and style.
  #
  # To run this script, simply say "run.webclient"
  #
  #----------------------------------------------------------
  # set the environment so that the actual binaries can be found
  export PATH=$PATH:../bin
  export LIBPATH=$LIBPATH:../lib
  #
  #----------------------------------------------------------
  # Flags:
  # -A 10: log an alarm if the server doesn't respond within 10 seconds
  #
  # -i: ignore checksum errors when comparing checksums recorded in the
  #     input file to the checksums of the returned web pages.  This is
  #     useful when the web page has a date embedded in it, and the date
  #     has changed since the input file has been created.
  #
  # -f webclient.input: specify the input file containing the URL's to test
  #
  # -w webbank.com:8080:  use the web server on webbank.com, port 8080
  #
  # -r webclient.report: write a report to the file "webclient.report"
  #
  # -t webclient.trace: write a trace of all data exchanged with the
  #    webserver to the file "webclient.trace".
  #
  # webclient -A 10 -i \
  #       -f webclient.input \
  #       -w webank.com:8080 \
  #       -r webclient.report \
  #       -t webclient.trace
  #
  #----------------------------------------------------------
  # Same as above, except that SSL encryption should be used on the
  # connection
  #
  # --ssl-server:      Negotiate the SSL cipher to use
  #                    Note the double-dash: this is a "long-form" option.
  #                    Not all long-form options have single-letter equivalents;
  #                    there are not enough letters in the alphabet.
  #
  # webclient -A 10 -i \
  #       -f webclient.input \
  #       --ssl-server \
  #       -w webank.com:443 \
  #       -r webclient.ssl.report \
  #       -t webclient.trace
  #
  #----------------------------------------------------------
  # Same as above, but the client is told to claim that it is a browser
  # of a particular variety (in this case, Netscape version 4 for AIX,
  # with an english-language defaults.).
  #
  # webclient -A 10 -i \
  #       -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)' \
  #       -f webclient.input \
  #       --ssl-server \
  #       -w webank.com:443 \
  #       -r webclient.ssl.report \
  #       -t webclient.trace
  #
  # Some other User-Agent strings:
  # -U 'Mozilla/3.0 (Win95; I)'     Netscape Version 3 for Windows 95
  # -U 'Mozilla/3.04 (Win95; U)'    Netscape Version 3 for Windows 95
  # -U 'Mozilla/2.02 (OS/2; U)'     Netscape Version 2 for OS/2
  # -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)'           NS for AIX
  # -U 'Mozilla/4.05 [en] (X11; U; Linux 2.0.32 i586)'      NS for Linux
  #
  # Note that Internet Explorer tries hard to be compatible in every way:
  # -U 'Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)'    MSIE for W95
  #
  # Mozilla is not the only possible User-Agent name:
  # -U 'Konqueror/1.0'             KDE File Manager desktop client
  # -U 'Lynx/2.7.1 libwww-FM/2.14' Lynx command line browser
  #
  #----------------------------------------------------------
  # As above, except that the host to be accessed lies on the far side
  # of a proxy bastion firewall.
  #
  # -P proxy.server.com:80  The proxy server listens on port 80
  #
  # webclient -A 10 -i \
  #       -P proxy.server.com \
  #       -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)' \
  #       -f webclient.input \
  #       --ssl-server \
  #       -w webank.com:443 \
  #       -r webclient.ssl.report \
  #       -t webclient.trace
  #
  #----------------------------------------------------------
  # As above, except that the the URL's in "cleanup.input" will be sent to
  # the webserver if webclient is interrupted (with a control-C, or with
  # a "kill -USR1 <pid-of-webclient>", or if webclient detects an error
  # (such as a URL that is not found, a bad checksum, or other HTTP problem).
  # The goal of cleanup is to log the user out gracefully if/when an error
  # occurs.
  #
  # -P proxy.server.com:80  The proxy server listens on port 80
  #
  # webclient -A 10 -i \
  #       -P proxy.server.com \
  #       -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)' \
  #       -f webclient.input \
  #       --ssl-server \
  #       -w webank.com:443 \
  #       --clean-exit=cleanup.input \
  #       -r webclient.ssl.report \
  #       -t webclient.trace
  #
  #----------------------------------------------------------
  #
  # Putting it all together:
  # the command below should allow a connection to be made to a real-life
  # server on the net.  To get past the home page, you will need a real-life
  # account and password, but the below should at least get to the home page.
  #
  webclient -A 120 -i \
          -P proxy.asdf.com:80 \
          -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)' \
          -f nb.client.input \
          --ssl-server \
          --clean-exit=cleanup.input \
          -w server.nationonline.com:443 \
          -r nb.ssl.report -t nb.trace
  #
  #
  #----------------------------------------------------------
  #----------------------------------------------------------
  #----------------------------------------------------------
  # Some further flags & examples:
  #
  # -S SSLV3:04:       Use SSLV3 encoding standard 04 (128-bit RC5)
  #
  # -k mykeydb.kdb:    use the keyring 'mykeydb.kdb'.
  #
  # -K linas:          the password to the keyring file is 'linas'
  #
  # -b mykeystash.sth: instead of specifying a password, a password stash
  #                    file can be used instead.
  #
  #----------------------------------------------------------
  # As above, except that the SOCKS-ified version of webclient is used.
  # The socks-ified version is able to reach external hosts from
  # behind a SOCKS-style firewall.  Note that SOCKS must be correctly
  # configured on your host machine.  Use SOCKS only if you do not have
  # a proxy server, or cannot get access to a NAT/Masquerading firewall.
  #
  # export SOCKS_SERVER=socksy.lady.com
  export SOCKS_SERVER=socks.hosiery.com
  #
  # rwebclient -A 10 -i \
  #       -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)' \
  #       -f webclient.input \
  #       -S SSLV3:04 -k mykeydb.kdb -K linas \
  #       -w webank.com:443 \
  #       -r webclient.ssl.report -t webclient.trace
  #
  #----------------------------------------------------------
  #----------------------------------------------------------
  #----------------------------------------------------------
  ______________________________________________________________________






  55..44..  EExxaammppllee IInnppuutt FFiilleess

  Webclient supports two styles of input files: 'oldstyle' and
  'newstyle'.  The new format was chosen to allow for greater
  flexibility.  The old format is supported for backwards compatibility.
  More detailed documentation for the input file formats is provided in
  the section ``Webclient Input File Format'' below.

  The following is an example of the old file format:













  ______________________________________________________________________
  # this is an example comment
  # a session must start with a <<START>> delimiter
  <<START>>
  #
  # the HTTP method is followed by a run count and checksum information
  GET /logon 1.00 983210 324602 152 110 491
  GET /logon/validate.js -1.00 11358531 4156403 1740 286 117
  #
  # data for the body of a post appears on the same line, before the run count.
  POST /proclogin.ns UserID=<<USER>>&Pin=<<PIN>>&Password=<<PASS>>&PageGood=%2Fbank%2Finit.html&PageBad=%2Flogon%2Flogon.html%3FStatus_Code%3Dlogon_error&ProductName=Netscape&ReleaseVersion=4.04+%5Ben%5D&ClientOS=X11&CountryCode=USA&Language=ENU&x=25&y=9 1.00 1439849 597641 193 184 286
  GET /bank/history.html?phase=SHOW_FORM&account_index=5 1.00 1910842 913002 228 216 541
  POST /bank/history.html transaction_type=++1&transaction_source=&transaction_period=&start_date=&end_date=&low_amount=&high_amount=&low_check_number=&high_check_number=&maximum_number_of_transactions=99&enter.x=45&enter.y=19 1.00 2106710 969606 260 248 541
  GET /bank/acctxdet.html?account_index=5&reference_number=119115157977 1.00 1506792 691736 186 174 541
  GET /bank/history.html 1.00 2106710 969606 260 248 541
  POST /bank/history.html done.x=47&done.y=5 1.00 1439832 597624 193 184 286
  GET /proclogoff.ns 1.00 276785 105313 38 23 621
  #
  # note that fully qualified URL's can be used as well:
  GET http://bank.com/homepage.html 1.00 456982 10188 42 73 12
  #
  # The end of the session is delimited by <<END>> followed
  # by the playback count (5) and the think time (12.0 seconds)
  <<END>> 5 12.0
  ______________________________________________________________________





  An example of a new style input file follows below.  Note that the
  mark-up is much more verbose and is much more XML-like.


































  ______________________________________________________________________
  # this is an example comment
  # a session must start with a <<START>> delimiter
  <<START>>
  #
  # The details of a request are broken out into multiple lines
  <<REQUEST>> GET /logon
  <<COUNT>> 1.00
  <<CKSUM>> 983210 324602 152 110 491
  <<REQUEST>> GET /logon/validate.js
  <<COUNT>> -1.00
  <<CKSUM>>11358531 4156403 1740 286 117
  #
  # <<MARK>> means that summary statistics should be generated
  # for both of the above URL's taken together as one.
  <<MARK>>
  #
  # <<THINK>> implies that this is where the client should pause to think.
  <<THINK>>
  #
  <<REQUEST>>POST /proclogin.ns
  # The body of the post occurs between BODY tags
  <<BODY>>
  UserID=<<USER>>&Pin=<<PIN>>&Password=<<PASS>>&PageGood=%2Fbank%2Finit.html&PageBad=%2Flogon%2Flogon.html%3FStatus_Code%3Dlogon_error&ProductName=Netscape&ReleaseVersion=4.04+%5Ben%5D&ClientOS=X11&CountryCode=USA&Language=ENU&x=25&y=9
  <</BODY>>
  <<COUNT>> 1.00
  <<CKSUM>> 1439849 597641 193 184 286
  <<MARK>>
  <<THINK>>
  #
  <<REQUEST>>GET /bank/history.html?phase=SHOW_FORM&account_index=5
  <<COUNT>> 1.00
  <<CKSUM>> 1910842 913002 228 216 541
  <<REQUEST>> POST /bank/history.html
  <<BODY>>
  transaction_type=++1&transaction_source=&transaction_period=&start_date=&end_date=&low_amount=&high_amount=&low_check_number=&high_check_number=&maximum_number_of_transactions=99&enter.x=45&enter.y=19
  <</BODY>>
  <<COUNT>> 1.00
  <<CKSUM>> 2106710 969606 260 248 541
  <<MARK>>
  <<THINK>>

  <<REQUEST>> GET /bank/acctxdet.html?account_index=5&reference_number=119115157977
  <<COUNT>> 1.00
  <<CKSUM>> 1506792 691736 186 174 541
  <<MARK>>
  <<THINK>>

  <<REQUEST>> GET /bank/history.html
  <<COUNT>> 1.00
  <<CKSUM>> 2106710 969606 260 248 541
  <<MARK>>
  # for this interval only, the think time will be 45 seconds
  <<THINK>> 45.0

  <<REQUEST>> POST /bank/history.html
  <<BODY>>
  done.x=47&done.y=5
  <</BODY>>
  <<COUNT>> 1.00
  <<CKSUM>> 1439832 597624 193 184 286
  <<MARK>>
  <<THINK>>

  <<REQUEST>> GET /proclogoff.ns
  <<COUNT>> 1.00
  <<CKSUM>> 276785 105313 38 23 621
  <<MARK>>
  <<THINK>>
  #
  # The end of the session is delimited by <<END>> followed
  # by the playback count (5) and the think time (12.0 seconds)
  <<END>> 5 12.0
  ______________________________________________________________________









  66..  WWeebbcclliieenntt CCoommmmaanndd LLiinnee FFllaaggss

  This section reviews some of the features controlled by command line
  flags.


  66..11..  CCooookkiiee HHaannddlliinngg

  There are two types of "cookies" in common usage on the web.  The
  first kind follow the HTTP Cookie specification, and are sent as part
  of the HTTP header.  This section will discuss this type of cookie.
  Another type of "cookie" is a string that is embedded in the URL
  itself, and is passed from server to client by embedding it directly
  into the body of a web page (usually in some url-encoded form).  This
  second type of cookie is also supported by webclient, and the
  mechanisms for dealing with it are discussed in further detail in the
  section ``URL-embedded State'' below.


  webclient will automatically accept and cache any and all cookies
  returned by the server.  The cookies will then be handled following
  the usual cookie semantics for a browser: if path names match, then
  the cookie will be returned to the server.  webclient does *not* age
  cookies, and thus, they will not expire from the cache in that
  fashion.  Also, webclient does not maintain a persistent store of
  cookies: once webclient exits, any cookies it had are lost.


  In order to make webclient a more realistic multi-user stress tool, it
  will flush the cookie cache at the end of a session.  That is, each
  new session is started with an empty cookie cache, simulating new user
  with a recently restarted browser.


  In order to help verify correct operation, webclient can be made to
  check for the presence of cookies on certain paths, and to print an
  error and exit if the server did not return a cookie for that path.
  The --cookie-path flag can be specified any number of times to add a
  path to the error checking code.


  Some web and application servers refer to a state maintenance
  technique called _u_r_l_-_e_n_c_o_d_i_n_g in connection to a discussion of
  cookies.  Note that url-encoding does not use cookies in the sense in
  which the HTTP spec implies; rather, the server embeds unique, long
  strings directly into the urls in the body of the web page.  These
  long strings are used by the server to provide a cookie-like function.
  webclient provides support for these types of "url-cookies", and is
  able to track them with the --handle flag described in the section
  ``URL-embedded State'' below.


  By default, in order to maintain backwards compatibility, webclient
  will check for the presence of a cookie on the path /proclogin.ns.
  This is the same as specifying the flag --cookie-path=/proclogin.ns If
  any cookie path is explicitly specified, then the default
  /proclogin.ns is not set.




  66..22..  GGIIFF FFeettcchhiinngg


  The -g and the -c flags enable the fetching and caching of images.
  webclient is able to scan a web page and fetch any images that it
  finds embedded in the page.  It does so by scanning the returned page
  for references of the form IMG SRC= and extracting and fetching the
  specified URL.  It uses a fairly sophisticated pattern-matching
  algorithm to find the URL, and is able to pick its way through some
  more obtuse quotation mark and white-space combinations, such as those
  that might occur in JavaScript.   Note, however, that webclient does
  _n_o_t provide a JavaScript interpreter, and that therefore it can get
  confused by more complex image-fetching JavaScript applets.  It does
  not support images fetched with client-side Java applets.  Images are
  fetched only if the -g option is set; by default, image-fetching is
  disabled.

  Emulation of a browser's gif-cache is supported with the -c flag.
  That is, if webclient notices that it has previously fetched a given
  gif url this session, it will not fetch that url again.  The result is
  that the number of gif files fetched by webclient should match the
  number of gif files fetched by the browser during an entire session,
  assuming that the gif cache was empty when the user requested the
  server's logon page.  If the -c option is not specified, every gif is
  fetched every time the page is requested.

  By default, webclient uses four threads and the HTTP/1.1 Persistant
  Connection protocol for fetching gif files in parallel over four
  sockets.  The number of threads and the protocol used can be changed
  as explained below.


  Note:

  1. The gif cache is cleared every time a session replay starts.  (e.
     g. when the logon request is issued).

  2. gifs are fetched on subsequent requests if they have not yet been
     fetched this session replay.

  3. All open sockets are closed when the session ends.  This helps
     maintain the appearance that each session comes from a different
     web browser.



  66..33..  HHTTTTPP//11..00,, HHTTTTPP//11..11,, KKeeeeppAAlliivvee aanndd MMuullttii--TThhrreeaaddiinngg

  By default, webclient uses four threads and the HTTP/1.1 Persistant
  Connection protocol for fetching gif files in parallel over four
  sockets.  This behaviour can be modified with three flags: --no-keep-
  alive, --num-threads=nnn and --http-version=1.x


  By default, the HTTP/1.1 protocol specifies that _P_e_r_s_i_s_t_a_n_t
  _C_o_n_n_e_c_t_i_o_n_s are to be used when a browser talks to the web server.
  What this means is that once the browser has opened a socket to the
  server, it keeps that socket open for further URL requests.  This
  helps eliminate the overhead of negotiating a new socket for each
  request.  By default, webclient does the same, in order to better
  emulate a real web user.  However, this behaviour can be disabled by
  specifying the --no-keep-alive flag.  This flag causes the Connection:
  Close header field to be added to the HTTP header, and the socket to
  be closed after all of the data has been received.


  The defacto industry-standard Netscape extensions to the HTTP/1.0
  protocol had a similar concept, called Keep-Alive.  webclient can be
  made to use this protocol by using the --http-protocol=1.0 flag.
  Currently, there are only two valid values that this flag can take:
  HTTP/1.0 and HTTP/1.1.  By specifying HTTP/1.0, webclient will try to
  use Keep-Alive by including the header field Connection: KeepAlive
  with each request (and keeping the socket open).  This can again be
  disabled by using the --no-keep-alive flag.

  To further improve performance, browsers open a number of sockets to
  the web server for fetching gifs in parallel.  The default number of
  sockets is four for both Netscape(TM) Navigator and Microsoft(TM)
  Internet Explorer, although users can adjust this value from the
  control panel or preferences dialog.  To emulate this behaviour,
  webclient maintains a pool of four threads for gif fetching.  Each
  thread handles the i/o on one socket.  The number of threads (and thus
  the number of sockets) that are used can be changed with the --num-
  threads=nnn flag.

  Note that once webclient has opened a socket to the server, it will
  keep it open indefinitely (as long as the --no-keep-alive flag wasn't
  sepcified).  However, webservers have only a limited pool of
  connections, and busy webservers will routinely close the socket on
  unsuspecting browsers. webclient does notice when this occurs, and
  keeps statistics on how often it was able to reuse and open socket,
  and how often an open socket was unexpectedly closed by the server.
  These stats are printed as part of the normal stats output.

  Note:

  1. All open sockets are closed at the end of the session.  Sockets are
     _n_o_t kept alive across sessions.  This helps maintain the appearance
     that each session comes from a different web browser.






  66..44..  SSuubbssttiittuuttiioonn aanndd RRee--WWrriittiinngg

  webclient supports a number of substitution and re-writing modes.
  These include:

  +o  ``Substitution or addition of key-value        pairs to the HTTP
     request header''.  This allows keys, such as Authorization to be
     added to the header on a per-client basis, without requiring
     multiple copies of an input file.

  +o  ``Automated tracking and substitution'' of "url-cookies" or
     variable strings embedded in URL's.  The variable part of a URL is
     automatically extracted from earlier web pages, and substituted
     into the request URL.

  +o  ``Substitution of text in a request URL        or POST body'' with
     text supplied on the command line.  This allows webclient to be
     used in complex scripts while retaining a single input file. It is
     typically used to substitute for logon ID's and passwords that
     might get embedded in the request URL.

     Each of these are discussed in greater detail below.



  66..55..  HHeeaaddeerr MMooddiiffiiccaattiioonn aanndd KKeeyy--VVaalluuee SSuubbssttiittuuttiioonn

  The HTTP headers generated and sent by webclient can be fully
  customized and rewritten.   By default, webclient sends a simple,
  basic HTTP header.  A fully customized header can be specified with
  the --header-file flag, or alternately, the header can be placed in
  the input file, using the ``<<HEADER>> directive''.

  Whether or not a custom header has been specified, key-value pairs in
  the header can be substituted for or added to the header with the
  --header-subst and --header-add flags.  These flags are particularly
  useful when creating multi-user scripts, where each running copy of
  webclient needs to send a slightly different header.  In particular,
  this is needed in order to perform HTTP-style authentication.


  The default header that webclient currently should resemble:


       ______________________________________________________________________
       User-Agent: webclient/WebLoad v4.0beta3 (Linux OpenSSL 0.9)
       Host: webby.com:80
       Referer: webby.com/page.html
       Accept: */*
       Accept-Language: en
       Accept-Charset: iso8859-1, *, utf-8
       ______________________________________________________________________




  The User-Agent value will reflect the current actual version of web-
  client.  It can be modified with the -U flag ``described below'', or
  by specifying a custom header.  It can be omitted by using a custom
  header which does not contain it.

  The Host value is automatically generated and updated by webclient
  depending on the server being contacted.  If this tag is present in
  the header, then webclient will always update its value as
  appropriate.  It can be omitted by using a custom header which does
  not contain it.

  The Referer tag will be automatically added and updated based on the
  most recent URL that webclient had requested.  There is currently no
  way to disable the presence or automatic update of this tag.




  66..55..11..  TThhee ----hheeaaddeerr--ffiillee  FFllaagg

  A fully customized HTTP header can be specified with the --header-file
  command-line flag, for example: webclient --header-file=some.file.name
  This header will be used for all fetches, including the fetching of
  gifs.  A typical header file might look like the following:

       ______________________________________________________________________
       Accept: image/gif, image/x-bitmap, image/jpeg, image/png
       Accept-Language: en
       Pragma: no-cache
       Authorization: Basic amFtZXM6amQpMrT=
       ______________________________________________________________________




  Note that the header file should _n_o_t contain the HTTP method (_v_i_z_.
  GET, POST), this is handled separately.  Note that the header file
  should _n_o_t contain the body for a POST request, this is handled
  separately with the ``<<POSTDATA>> input file directive''.  The header
  file should not contain blank lines or comment lines.  It will be
  parsed into key-value pairs which can be substituted for with the
  --header-subst and --header-add flags.



  66..55..22..  TThhee ----hheeaaddeerr--ssuubbsstt  aanndd ----hheeaaddeerr--aadddd  FFllaaggss

  Values in the HTTP header can be substituted for with the --header-
  subst flag.  For example, webclient --header-subst="Accept-Language:
  fr" will change the value of the Accept-Language tag in the header to
  be fr.  The substitution will only be made if the tag already appears
  in the header.  If the tag does not appear, then the substitution will
  not be made.

  The --header-add flag can be used to make a substitution for an
  existing value, or to add the tag-value pair if it is not already
  present.



  66..66..  EExxaammppllee:: HHTTTTPP AAuutthhoorriizzaattiioonn

  Some web sites require authentication using the HTTP 401 response code
  in conjunction with the Authorization header field.  That is, the web
  server will deny access to a web page unless the browser (webclient)
  supplied a field of the form


       ______________________________________________________________________
       Authorization: Basic amFtZXM6amQpMrT=
       ______________________________________________________________________




  in the header sent with the URL request.  The string of seemingly ran-
  dom letters is an encoded username-password pair.  Appropriate values
  for the encoded string can be gotten by using the webmon tool with
  tracing enabled.  These values can be placed in the webclient request
  header file.  Alternately, it might be more convenient to specify
  these on the command line, using the --header-add flag.   This is par-
  ticularly the case when multiple copies of webclient must run, each
  with it's own login.  The following can be used to add the above line
  to the header:



       ______________________________________________________________________
       webclient --header-add="Authorization: Basic amFtZXM6amQpMrT="
       ______________________________________________________________________

  The difference between the --header-subst and the --header-add flags
  is that the former will make the substitution only if the key is
  already present in the header, whereas the latter will either
  substitute or will add the key-value pair if it is not present.



  66..77..  EExxaammppllee:: SSeettttiinngg tthhee UUsseerr--AAggeenntt TTaagg

  By default, webclient sets the User-Agent tag in HTTP headers sent to
  the server to webclient 4.0pre0 (Linux) or similar.  However, some web
  servers (in particular, the Netscape Enterprise Server) check for the
  User-Agent type, and respond differently to different server types.
  Sometimes the differences are subtle, and yet they can change overall
  behavior dramatically: things like redirects, socket close semantics
  and returned headers can change, and sometimes even bugs will be
  exhibited for some cases but not others.

  To get webclient to trick the webserver into behaving more
  appropriately, the -U flag can be used to change the value of the
  User-Agent field.


  Before you start, you must figure out what the browser you are trying
  to impersonate is sending.  To do this, use webmon with the -t (trace)
  option.  Run a few requests and then stop webmon.  Look in the trace
  file for a line that begins with User-Agent:.  The string that follows
  this is the string that must be specified on the -U option.  For
  example, with Netscape 4.04, under AIX, the string is:




       ______________________________________________________________________
       User-Agent: Mozilla/4.04 [en] (X11; AIX 4.1; Nav)
       ______________________________________________________________________




  You would then pass this flag to webclient as shown below. Note the
  use of the single quote marks to delimit the string.  The quotes are
  needed whenever there is embedded whitespace in the string, and also
  to delimit shell special characters, such as "(".

  (_N_o_t_e_: The DOS shell under Windows95/98/NT cannot use quote marks to
  delimit a string.  In order to prevent the embedded blanks from
  causing a problem, convert them to hash marks (# signs).  webclient
  will automatically convert them back into spaces).



       ______________________________________________________________________
       webclient -U 'Mozilla/4.04 /[en] (X11; AIX 4.1; Nav)'
       ______________________________________________________________________





  Some other User-Agent strings:





  ______________________________________________________________________
  -U 'Mozilla/3.0 (Win95; I)'     Netscape Version 3 for Windows 95
  -U 'Mozilla/3.04 (Win95; U)'    Netscape Version 3 for Windows 95
  -U 'Mozilla/2.02 (OS/2; U)'     Netscape Version 2 for OS/2
  -U 'Mozilla/4.04 [en] (X11; U; AIX 4.2; Nav)'           NS for AIX
  -U 'Mozilla/4.05 [en] (X11; U; Linux 2.0.32 i586)'      NS for Linux
  ______________________________________________________________________





  Note that the -U flag is entirely equivalent to the longer, more
  verbose flag --header-add.  The previous example is completely
  equivalent to the following:


       ______________________________________________________________________
       webclient --header-add="User-Agent: Mozilla/4.04 /[en] (X11; AIX 4.1; Nav)"
       ______________________________________________________________________






  66..88..  UURRLL--eemmbbeeddddeedd SSttaattee ((UURRLL CCooookkiieess))

  Some web-site designs embed customer-specific information into URL's
  as an alternative mechanism to "cookies" for maintaining state
  information.  webclient can track this state information in an
  automated fashion, generating the appropriate URL's dynamically as it
  traverses a web site.  There is a restriction: webclient assumes that
  the state information is url-encoded as a key-value pair in the URL.

  This is best illustrated with an example.  Suppose that when a user
  visits a website, the request the URL /cgi-bin/firstpage, and that the
  page that is issued in response to this contains the URL /cgi-
  bin/secondpage?this=that&token=qwertyuiop&up=down where the string
  qwertyuiop is generated dynamically and differs for every visitor to
  the site.  Then webclient can be configured to track navigate this
  site by using an input file similar to the following:



       ______________________________________________________________________
       GET /cgi-bin/firstpage
       GET /cgi-bin/secondpage?this=that&token=xxx&up=down
       ______________________________________________________________________




  and using the command line



       ______________________________________________________________________
       webclient --handle=token
       ______________________________________________________________________




  This will cause webclient to scan each web page it receives for new
  values of the key "token", and substitute for its value in any
  subsequent GET or POST requests, including POST data bodies.  The
  particular value "xxx" used in the input file does not matter.
  Substitutions for multiple handles can be done by specifying as many
  --handle= flags as needed.

  Note that if a token appears multiple times on the same page with
  different values, webclient will record only the last value that it
  finds on the page.  This may not be the desired behavior in some
  cases.  Note that the an ampersand (&), white space, a (single or
  double) quote-mark, or a right angle bracket (>) are assumed to
  delimit the end of the token.





  66..99..  SSuubbssttiittuuttiinngg iinn GGEETT RReeqquueessttss aanndd PPOOSSTT BBooddiieess

  The flag --substitute can be used to make generic substitutions in the
  request URI and in the POST body.  Thus, for example, if the input
  file to webclient contains a URL of the form GET
  /some/where/blort.html, and the client is started as webclient
  --substitute=blort:page001 then the actual URL that will be requested
  will be /some/where/page001.html.

  This substitution mechanism allows webclient to be used in perl and
  shell scripts, where different urls need to be fetched by different
  clients, but maintaining dozens or hundreds of client specific URL
  files is not desired.  Typically, this flag is used to substitute for
  user-names and passwords (see below).  Substitutions are carried out
  in both the URL's and the POST bodies.  As many --substitute flags can
  be specified as needed.



  66..1100..  EExxaammppllee:: SSuubbssttiittuuttiioonn ffoorr <<<<UUSSEERR>>>>,, <<<<PPIINN>>>>,, aanndd <<<<PPAASSSSWWDD>>>>

  When benchmarking password-protected web sites, each copy of webclient
  will typically need to use its own username/password pair.
  Authentication by web sites is usually handled in one of two different
  ways: either by using the HTTP Authorization mechanism or by embedding
  the username and password into the request or post data.  The former
  approach was ``discussed above''; the latter approach can be handled
  with the -u flag.  Rather than creating a unique input file for each
  client, with a username/password hard-coded into the input file, a
  substitution can be performed.  Thus, for example, if the input file
  contains the request



       ______________________________________________________________________
       GET /path/to/cgi?login=<<USER>>&idcode=<<PIN>>&pwd=<<PASSWD>>
       ______________________________________________________________________




  and you wanted to substitute the values linas, 1234 and r00tp4ssw0rd
  for the login, idcode and pwd, you could specify







  ______________________________________________________________________
  webclient --substitute=<<USER>>:linas  \
            --substitute=<<PIN>>:1234
            --substitute=<<PASSWD>>:r00tp4ssw0rd
  ______________________________________________________________________




  on the command line.  Alternately, you can use the abbreviated form
  with the -u flag, by merely specifying



       ______________________________________________________________________
       webclient -u linas:1234:r00tp4ssw0rd
       ______________________________________________________________________






  66..1111..  EErrrroorr DDeetteeccttiioonn aanndd RReeppoorrttiinngg

  webclient contains a number of facilities to simplify error detection
  and reporting.  Some of these are described below.


  +o  Errors can be logged to a separate file with the -L or --log-file
     flag.   This eliminates the need for searching the report files for
     an error message.  Note that multiple clients can safely specify
     the same log file: all writes to this log file are serialized.  The
     log file will remain empty if no error occurs.

  +o  A hung or extremely slow webserver can be detected with the -A or
     --alarm flag.  The value following this flag should be a timeout
     expressed in seconds.  If the webserver fails to respond to a
     request after this length of time, an error will be logged in the
     log file, and a message will be printed to stdout.

  +o  If a website is password protected, a fatal error that kills
     webclient can leave a user logged in and unable to log in a second
     time.  To avoid this situation, the --clean-exit flag can be used
     to specify a sequence of URL's that will be run when webclient
     encounters a fatal error.  ``See below for        more details.''

  +o  In order to make sure that the web site is returning the web page
     that was actually asked for, webclient will compute a checksum for
     the page, and compare it to the checksum stored in the input file.
     This checking can be disabled on a page by page basis, or globally
     with the -i flag.  Checksums in the input file can be recomputed
     with the -v flag.  ``See below for more details.''




  66..1122..  CClleeaann EExxiitt OOnn EErrrroorr


  Some web site designs prevent a user from logging on more than once at
  the same time.  There are a variety of reasons to design a web site in
  this way, and many websites enforce this.  When using webclient to
  access such a site, it becomes desirable to log the user off in the
  case of an error, so that the user is not blocked from making future
  logins.
  Note that simply logging off by running webclient a second time may
  not be an option because websites that enforce logins usually use
  cookies to keep track of the user.  That is, a user cannot log-off
  unless they also present the right cookie.  When webclient exits for
  any reason, the current cookie(s) are lost, and thus it can become
  impossible to log-off after webclient has exited.  In order to work in
  this environment, a log-off script can be specified with the --clean-
  exit flag.


  In the case of an error, or if it is interrupted, webclient can be
  made to send a series of HTTP requests by using the --clean-exit flag
  to specify a file containing the HTTP requests to run.  The format for
  the clean-exit file is the same as the input file.  Errors that can
  trigger a clean exit include any unexpected HTTP errors (such as 304
  Not Found, 500 Server Error, etc), timeouts (due to the use of the -A
  flag), or an interrupt (ctrl-C from the terminal or SIGUSR1, or SIGINT
  from a controlling shell script).  Note that this last usage
  simplifies the management of multiple copies of webclient via
  controlling scripts.




  66..1133..  PPaaggee VVaalliiddaattiioonn wwiitthh CChheecckk SSuummss


  Webclient is designed to check the validity of the data that is
  returned for a particular request by calculating a check sum for that
  page.  It then compares the check sum to the one that is stored in the
  session request file.  If the check sum does not match, then webclient
  assumes that an error has occurred.  Checksum mismatches normally
  cause webclient to print a detailed error message and trace
  information, and then stop.  If instead, you want it to continue, and
  just print a warning message, then specify the --warn-checksums
  option.


  However, checksums can be troublesome when a web page includes
  variable, changing data, such as the current date or time, or a
  rotating banner advertisement, or other data that changes daily and/or
  every time the web page is fetched.


  To work around pesky checksum pages, validation can be disabled in one
  of two ways: one a per-URL basis, and for the entire run.  Validation
  can be disabled on a per-url basis simply by editing the input file,
  and setting the checksum value equal to "-1".  This will cause
  validation for that page to be skipped.  Validation of checksums for
  the entire run can be disabled by specifying the -i option to
  webclient.  In general, it is important not to disable checksums
  globally, since if you do, the server could return completely bogus
  data and you will never find out that you are timing a bogus page.


  The HTTP header is not included in the checksum calculation; therefore
  variations in the header due to cookies, expiration date pragmas, or
  server versions will not affect the checksum.


  If the web pages are changing only infrequently, the -v flag can be
  used to recompute the check sums, and output a new session file with
  the new checksums in it.  Alternately, the -v flag can be used to
  create checksums for a request file that does not already have them.
  (In normal operation, the session file will have been created by
  webmon, and webmon will have computed and written out the appropriate
  checksum.  This is the preferred mode of operation, as the correctness
  of the web page can be visually inspected with webmon.)


  66..1144..  TThhiinnkk TTiimmee DDiissttrriibbuuttiioonnss

  Webclient supports the concept of 'think time' in order to better
  simulate multi-user loads on a server.  The tthhiinnkk ttiimmee is the amount
  of time that webclient pauses between URL requests, simulating a user
  who has stopped to read a web page before clicking on the next
  hyperlink.  The think time may be specified either in the ``input
  file'', or with the --think-time=<float> flag. The <float> parameter
  specifes the time, in seconds, as a floating point number.

  Think-times that are fixed or are randomly distributed may be
  specified.  By default, a exponentially random distribution is used,
  although a gaussian or a fixed distribution may also be specified.
  One of these mutually-exclusive distributions may be specified with
  the --think-fixed, --think-exponential or the --think-gaussian flags.
  The image below shows both distributions, for a mean think time of ten
  seconds.





  The exponential distribution is given by

       P(t;m) = (1/m) exp (-t/m)


  where m is the mean.  The standard deviation of the exponential dis-
  tribution is m.

  The gaussian distribution is given by

       P(t;m) = 2 L t exp (- L t sup {2})


  where L = pi/ (4 m sup {2}), where pi = 3.14... and m is the mean.
  Note that the standard deviation is given by m sqrt (4/pi -1) =
  0.5227... m.

  The exponential distribution has been long accepted as an appropriate
  model for typing behaviour at a computer terminal keyboard.  The
  gaussian distribution, with a small probability of a small think time,
  might more accurately describe web browser users.



  66..1155..  OOtthheerr FFllaaggss

  The following flags are not documented above, but are still very
  important and useful:


  66..1155..11..  DDeebbuuggggiinngg && TTrraacciinngg


     --hh,, ----hheellpp
        Print a command summary and exit.

     ----vveerrssiioonn
        Print webclient version info and exit.


     --rr,, ----rreeppoorrtt--ffiillee==<<ffiillee>>
        Specify the file to which the webclient report will be written.
        What is written to the report file is nearly identical to what
        webclient writes to standard out, unless the --quiet flag has
        been specified.

     ----aacccceessss--lloogg==<<ffiillee>>
        Write a webserver-style access log.  The industry-standard
        logfile format is used.  Note that the ip address written in the
        logfile is that of the server that was contacted. The result
        code is the result code that webclient received, and the length
        is the length (including the header) that was received.  The
        timestamp that is printed is taken after the entire message has
        been received.

     --tt,, ----ttrraaccee--ffiillee==<<ffiillee>>
        Write a trace of all of the HTTP traffic to the indicated file.

     ----sskkiipp--ssttaattss
        Don't collect or print summary performance statistics.

     --ee,, ----pprriinntt--eeaacchh
        Print individual response time observations.

     --xx,, ----ttiimmeessttaammppss
        Print request start and end timestamps.

     ----qquuiieett
        Minimize the number of messages written to standard out.

     ----sshhooww--pprrooggrreessss
        Write out each URL as it is fetched. Useful for visually
        inspecting the forward progress of webclient through a series of
        requests.  Note that this can generate a lot of output on a fast
        system.

     ----nnoo--bbuugg--ccoommppaatt
        Disable bug-compatability mode.  Currently, this flag disables
        only one bug: the 'Content-Length-off-by-two' bug.  In this bug,
        the Netscape browser will send a POST body with an appended
        CRLF, and then will set the Content-Length in the header two
        bytes shorter then the actual message.  Unfortunately, some
        servers, notably Sun url-decoding Java Servlets depend on this
        incorrect length being set, generating parse errors or
        NullPointerExceptions if not.  The correct HTTP protocol for the
        ContentLength is documented in RFC2616 Paragraph 4.4.  Note that
        by default, bug-compatibility is enabled, and a warning message
        willl be printed whenever the bug occurs.



  66..1155..22..  MMuullttiiUUsseerr OOppttiioonnss


     ----nnuumm--sseessssiioonnss==<<iinntt>>
        Override number of times that the session will be run.
        Normally, the number of times that a session will be played is
        specified in the request input file.  The value specified with
        this flag will override that value.

     --WW,, ----wwaaiitt--iinntteerrvvaall==<<sseeccoonnddss>>
        Pause after each session trial, before starting the next
        session.  Normally, once a session has been completed, a new
        session is started immediately.  You can use this flag to
        specify a delay between sessions.  Alternately, you can specify
        a think-time after the last URL of the session (or before the
        first URL of the session), leading to the same effect.

     --RR,, ----rraannddoomm--sseeeedd==<<sseeeedd>>
        Specify a seed value to be used with the random-number generator
        used to generate random think times.  This flag is useful for
        getting repeatable think times and thus repeatable results.

     ----ffoorrkk
        Fork this process to run in the background after validating all
        of the command-line arguments.  This is a handy feature for
        starting webclient from a shell script:  if some obvious startup
        error occurs, the shell can deal the failing client in the
        foreground.  Otherwise, once past the initial startup, the
        client will move to background, freeing the shell to start
        another client.

     --mm,, ----sshhmmeemm==<<cchhiilldd::sshhmmkkeeyy>>
        Specify the common shared memory location for webclient to use.
        This flag is required when synchronizing multiple copies of
        webclient; it allows the ramp-up and statistics gathering phases
        to be appropriately synchronized, and allows some basic
        reporting back to the controlling program.





  77..  WWeebbcclliieenntt RReeqquueesstt FFiillee FFoorrmmaatt

  The list of URL's that webclient will fetch are read from an _i_n_p_u_t or
  _r_e_q_u_e_s_t file.  These requests must include a checksum to validate the
  returned data, and may also be interspersed with a variety of
  directives controlling the think times, the request header and body,
  and the way in which summary statistics are generated.  Several
  examples of an input file are show in a ``previous section''.




  77..11..  RReeqquueesstt FFiillee BBaassiiccss

  The request file is formatted with a quasi-XML-like markup language.
  It normally starts with a <<START>> tag, and ends with an <<END>> tag.
  In between, each line specifies an HTTP request, indicates any
  additional POST data, tells how often the request needs to be run,
  whether or not the request is part of a group block that needs to run
  together, and the checksum information.  Lines beginning with the hash
  # sign are treated as comments.


  The collection of requests between <<START>> and <<END>> is termed a
  "session", and is meant to model a typical user session, from logon,
  to doing some work, to logging off.  When a session is played back
  with webclient, the requests are processed in the sequence specified
  in the file.


  A session may be run multiple times.  In order to simulate a real
  user's pauses between between web pages, the playback can be adjusted
  so that there is a pause between requests.  The pause can be specified
  to be (exponentially) random or a fixed amount of time.


  The basic elements of the syntax are described below:


     <<<<SSTTAARRTT>>>>
        Indicates the start of a session description.  All URL's up to
        the <<END>> input line are read and saved and submitted as a
        session.  Currently, only one such tag may appear in a request
        file.



     <<<<EENNDD>>>> ccoouunntt tthhiinnkk__ttiimmee
        Denotes the end of a session description.  It is followed by two
        parameters:

        ccoouunntt: integer, the number of times that the session will be
        replayed.   Note that this value can be overridden with the
        --repeat-count command-line flag.

        tthhiinnkk__ttiimmee:  floating point, number of seconds.  If zero, then
        webclient will play back the URI's in as rapid succession as
        possible.  If negative, then webclient will wait exactly
        -think_time seconds between each URI.  If positive, then
        webclient will wait a random, exponentially-distributed amount
        of time between each URI.  The average distribution of think
        times will equal think_time.  Note that this value can be
        overridden with the command-line --think-time option (See the
        ``Think Time Distributions'' section).


        Currently, only one such tag may appear in a request file.



     GGEETT uurrll ffrraaccttiioonn cchheecckkssuummiinnffoo

     PPOOSSTT uurrll bbooddyy ffrraaccttiioonn cchheecckkssuummiinnffoo
        Requests may be formatted into a single input line each, as
        shown here. Alternately, the data, fraction and checksum info
        can be broken up into multiple lines by using the <<REQUEST>>
        _e_t_c_. directives below.  Depending on the style and nature of the
        session, the single-line approach may be simpler and easier to
        read than the multi-line approach; this is left as a matter of
        taste.

        If the single-line approach is used, then each request must
        really be a single input line.  If the url or data gets long --
        don't insert a CR or NL to split the line -- the CR/NL will be
        interpreted as white space and the url will end at that point.
        If the url includes a query string (data after the ?)  you must
        include it as part of the url without spaces.

        The fraction and checksuminfo fields have the same meaning as
        those described below.  Besides just GET and POST, and valid
        HTTP method may be specified, including OPTIONS, HEAD, PUT,
        DELETE, TRACE and CONNECT.  Note that the POST form must have
        the post body appearing as shown in the second form.



     <<<<RREEQQUUEESSTT>>>> mmeetthhoodd uurrll
        Request the indicated url using method from the web server.  The
        method should be a valid HTTP method; _v_i_z_. one of GET, HEAD,
        POST, _e_t_c_.  The url may be a fully qualified URI (such as
        http://server.com/some/page.html) or a path fragment (such as
        /some/page.html).  The former form allows a session to stretch
        across multiple servers; webclient will resolve and address each
        server in turn.  If these two types of requests are intermixed,
        the most recently specified server will be used for the
        fragmentary requests.  Alternately an initial web server can be
        specified with the -w or --webserver flag (this initial server
        will be over-ridden if/when a fully-qualified URL appears in the
        input file).

        Note that POST requests should be followed by a <<BODY>> tag
        specifying the body of the POST request.


     <<<<BBOODDYY>>>>  <<<<//BBOODDYY>>>>

     <<<<PPOOSSTTDDAATTAA>>>> <<<<//PPOOSSTTDDAATTAA>>>>
        These two tags are synonyms.  They are used to specify an HTTP
        body that is sent to the webserver along with the HTTP header.
        The end of the body should be marked with a <</BODY>> or
        <</POSTDATA>> tag appearing on a new line.  This tag allows
        webclient to emulate not only the use of HTML forms, but also to
        support non-HTML-based HTTP protocols, such as OFX.


     <<<<BBOODDYYFFIILLEE>>>> ffiilleennaammee
        The body to a POST may be specified out-of-line, as a separate
        file.  The indicated file should contain the text of the POST
        body.  This form is especially convenient when working with HTTP
        protocols that have large bodies (such as OFX), or the bodies
        are available through an independent test suite (such as OFX).



     <<<<CCOOUUNNTT>>>> ffrraaccttiioonn
        The fraction controls randomization of the session.  If the
        fraction is 1.0, then this URL request is submitted as part of
        every session that is run.  If the fraction is between 0.0 and
        1.0, then that value indicates the fraction of sessions for
        which this request will be run.  For example, to run the request
        on 50% of all sessions, use a fraction of 0.50.  For each time
        through the session, webclient will generate a random number to
        determine if the request should be run, using fraction as the
        probability.

        Values greater than one are interpreted as run counts.  Thus, a
        value of 1.5 will be interpreted to mean that the request should
        be run at least once, and possibly twice.  (Note, however, that
        the summary timing statistics report might not be generated in
        the form you expect when using values greater than than 1.0.
        This may change in future versions.)

        A block of repeated URL's may be specified by using a negative
        number for the fraction.  This usage is discussed in greater
        detail below.


     <<<<CCKKSSUUMM>>>> iinntt iinntt iinntt iinntt iinntt
        Specifies a checksum against which the returned page will be
        validated.  This checksum is normally computed by webmon when a
        session is being recorded; the checksum can also be recomputed
        by specifying the -v flag.  To disable the use of checksum
        validation for the current request, specify a -1 for the first
        integer.


     <<<<HHEEAADDEERR>>>> <<<<//HHEEAADDEERR>>>>
        Text in between the <<HEADER>> and the <</HEADER>> tags will  be
        used to form the HTTP header.  This header will remain in effect
        for the current and subsequent requests until a new header is
        specified (with either the <<HEADER>> or the <<HEADERFILE>>
        directive).


     <<<<HHEEAADDEERRFFIILLEE>>>> ffiilleennaammee
        Specify a file that contains the request header to use.  This
        tag performs the same function as the --header-file flag, except
        that it applies on a per-url basis.  For greater detail, review
        the section on ``Header Substitution''.  The specified header
        will remain in effect for the current and subsequent requests
        until a new header is specified (with either the <<HEADER>> or
        the <<HEADERFILE>> directive).



     <<<<TTHHIINNKK>>>>

     <<<<TTHHIINNKK>>>> sseeccoonnddss
        Specifies that a pause should occur between this and the next
        request.  Used to emulate a user pausing to "think" between page
        fetches.  If the number of seconds is specified, then the pause
        will be for that length; otherwise, the default think-time
        (specified on the <<END>> tag or with the --think-time flag)
        will be used.  Note that, as elsewhere, times specified as a
        negative number denote a fixed think time, while those specified
        with a positive number denote an average for a random
        exponential distribution.


     <<<<MMAARRKK>>>>
        Specifies that performance statistics should be rolled up
        between this and the previous <<MARK>> and reported as a unit.
        This is useful when a single page view might appear as multiple
        URL's in the request file, and means and deviations are wanted
        for the combination.



  77..22..  SSppeecciiffyyiinngg BBlloocckkss ooff RReeqquueessttss

  Oftentimes, a number of URL requests need to be run as a group, with
  the same random choice being made for all members of the group.  For
  example, suppose you want to run the "purchase product" transaction in
  35% of all sessions.  Well, in a real session, the user wouldn't be
  able to jump into the middle of the set of web pages that form the
  transaction.  Instead, they must first pull up a services page, then
  select a service/product, then fill out a form, then verify their
  order before submission, _e_t_c_.  This sequence of pages must be treated
  as a block in order for them to make sense.  If the first page of the
  block is randomly chosen to be run, then the whole block will be run.
  If the first page is randomly rejected, then none of the block is to
  be run.


  Members of a block can be indicated by specifying the fraction -1.0.
  If the fraction is negative, then that web page will be treated as
  part of a block that begins with the most recent non-negative URI.
  Thus, to group a series of pages together, you do something like the
  following:








  ______________________________________________________________________
       GET /page1 0.75
       GET /page2 -1.0
       GET /page3 -1.0
       GET /page4 -1.0
  ______________________________________________________________________




  This will cause pages 1-4 to be run on 75% of all sessions.  Note that
  the group ends with the next page with a non-negative value for the
  fraction.

  (Extensions are planned for nested hierarchical blocks but have not
  been implemented).




  77..33..  CCuussttoomm TThhiinnkk TTiimmeess

  Think times are used to emulate a user pausing between requests to
  "think" about what they are doing before issuing the next request.
  webclient has a number of ways of specifying the think time that
  should be applied between requests.

  By default, the think time specified on the <<END>> line applies to
  _e_a_c_h gap between input URL's.  That is, there will be a pause between
  each URL before it is issued.  This value can be overridden by
  specifying the --think-time flag; but again, the value specified
  applies to _e_a_c_h gap.  The location of the gaps, and the length of the
  gaps can be overridden by using the <<THINK>> directive.

  The think-times used can be fixed or randomly generated.  When the
  think-time is randomly generated, it is done so with an exponential
  distribution whose average is the specified time.  The exponential
  distribution provides a more realistic model of actual human behavior,
  with some pauses taking longer than others, but all tending towards a
  mean.  In all cases where a think time is specified to webclient, a
  positive number implies that the exponential distribution should be
  used, and a negative number implies that a fixed length of time should
  be used.  A gaussian distribution may also be specified.  Further
  details are presented in the ``Think Time Distributions'' section.


  For example, the input file



















  ______________________________________________________________________
  <<START>>
  GET /pageone.html
  GET /pagetwo.html
  <<THINK>>
  GET /pagethree.html
  GET /pagefour.html
  GET /more.html
  <<THINK>>
  GET /another.html
  GET /andmore.html
  <<THINK>>
  GET /last.html
  <<THINK>>
  <<END>> 50 8.3
  ______________________________________________________________________




  will result in _n_o pause between the fetch of /pageone.html and
  /pagetwo.html, and a think-time of 8.3 seconds between the fetch of
  /pagetwo.html and /pagethree.html.  That is, the think-time will be
  non-zero only where the <<THINK>> directive appears.


  Optionally, independent think times can be specified, like so:



       ______________________________________________________________________
       <<START>>
       GET /pageone.html
       GET /pagetwo.html
       <<THINK>>  5.2
       GET /pagethree.html
       GET /pagefour.html
       GET /more.html
       <<THINK>>
       GET /another.html
       GET /andmore.html
       <<THINK>>  49.0
       GET /last.html
       <<END>> 50 8.3
       ______________________________________________________________________





  The first pause will last 5.2 seconds, the second will last 8.3
  seconds (the default), and the last pause will last 49 seconds.  In
  this example, there is no pause between the fetch of /last.html and
  /pageone.html in the next go-around.  A final think is needed if you
  want to avoid immediately going back to the beginning.


  If the keyword <<THINK>> never appears in the file, then the default
  think-time will be applied between each and every url.  Thus, the
  input file






  ______________________________________________________________________
  <<START>>
  GET /pageone.html
  GET /pagetwo.html
  GET /pagethree.html
  <<END>> 50 8.3
  ______________________________________________________________________





  is identical to




       ______________________________________________________________________
       <<START>>
       GET /pageone.html
       <<THINK>> 8.3
       GET /pagetwo.html
       <<THINK>> 8.3
       GET /pagethree.html
       <<THINK>> 8.3
       <<END>> 50 8.3
       ______________________________________________________________________





  Additional flexibility is provided by the --think-time flag.  If this
  flag is used, it overrides the default value specified with the
  <<END>> tag.





  88..  WWeebbcclliieenntt SSttaattiissttiiccss RReeppoorrttss

  This section ... is under construction ...


  88..11..  DDeeffiinniittiioonn ooff SSttaattiissttiiccss


  The statistics reported by webclient are basically the same as those
  reported by webmon with the following major difference.  Statistics
  reported by webclient are based on each page fetched (including the
  gifs), whereas statistics reported by webmon are based on each HTTP
  request.  Thus, for login, webclient will report the total time to
  fetch the html page and its related gifs, whereas webmon will report
  the time to fetch the page and the time to fetch each gif.  Since
  webmon sits between the browser and the server, webmon basically does
  not know when requests related to this page end and requests related
  to the next page begin.  (Well, it can make a pretty good guess as to
  when this occurs, but the current version of webmon does not do this).
  Also webclient will treat a redirect as being part of a request, while
  webmon treats a redirect as new and unrelated request.


  With this distinction in mind, here is the description of the
  statistics that webclient prints:

     rreessppoonnssee__ttiimmeess
        elapsed time from just before the TCP/IP connection to the
        server, until last byte of data for the last gif is received.
        This time includes that needed for redirects, SSL negotiation,
        and SSL encryption/decryption.  The time needed to resolve a
        server address is *not* included.  (Since DNS/YP/resolv.conf
        name resolution tends to be highly variable and subject to OS
        caching, and a variety of other "noisy" factors).


     hhttmmll__rreessppoonnssee__ttiimmeess
        time from first connect until the last byte of html data is
        received.  Redirects are processed and counted as part of the
        delay.


     ggiiff__rreessppoonnssee__ttiimmeess
        sum of times (over all gifs fetched) from connect through last
        data byte.


     tthhiinnkk__ttiimmee
        the amount of idle time spend between the end of the request,
        and the dispatch of the next request. This statistic is useful
        to find out what the actual think times were for a run.  This is
        useful when using a randomized think time, because the actual
        average will never exactly equal the requested average (although
        would eventually get close).  That is, if a 60 second think time
        is requested, the actual average think time might come out to be
        58.27 or 60.835 after statistical variation.


     rreeqquueesstt__ttiimmee
        the sum, more or less, of the response time plus the think time.
        The sum is not exact because the request time is measured from
        before that start of the request, until after the end of the
        think period, while the response times are measured at the
        socket level, and do not include much of the client overhead.



        The sum of html and gif times equals the total response time;
        that is, response_times = html_response_time + gif_response_time



     ffiirrsstt__ddaattaa__rreessppoonnssee__ttiimmee
        time from first connect until first message containing html body
        data arrives.  If a file is redirected, the first data response
        timer does not stop until the first message containing body data
        in the first non-redirected file is received.



     ccoonnnneecctt__ttiimmeess
        sum of time required to establish a connection with the host,
        summed over all files fetched this page.  If SSL is enabled,
        this time includes the SSL connection negotiation.  Note that
        webclient will automatically attempt to reuse SSL session_id
        handles for all subsequent connects.  Webclient will reset the
        SSL session_id at the end of a webclient session.  Thus,
        typically the very first fetch will take a time (about a second)
        while the cipher suite is negotiated, while subsequent connects
        will occur much more quickly (50 to 100 mS) as the SSL
        session_id is reused.

     hheeaaddeerr__ddeellaayyss
        sum of time required to fetch all headers, timed from just
        before the first byte of the request is sent, to just after the
        last byte of the header has arrived.


     ttrraannssffeerr__ttiimmeess
        sum of times required to transfer the bodies of the files
        (excluding header).



        The sum of connect delays, header delays, and transfer times
        equals the total response time. That is,


        response_time = connect_times + header_delays + transfer_times;



  If SSL is enabled, the following statistics are also printed:


     nneett__ddeellaayy
        the time spent in TCP/IP code, sending data, and waiting for
        responses from the server, summed over all files and gifs/jpegs
        for this page.


     ssssll__oovvhhdd
        the time spent on the local host in the SSL routines, summed
        over all files this page.  Encryption and decryption can require
        significant amounts of CPU cycles; this provides a measure of
        the time spent on the client only.


     hhttmmll__nneett__ddeellaayy
        the time spent in TCP/IP code, sending data, and waiting for
        response from the server, from connect until the last byte of
        html on the first non-redirected page is received


     hhttmmll__ssssll__oovvhhdd
        the time spent in SSL routines on the client, from first connect
        until the last byte of html on the first non-redirected page is
        received


     ggiiff__nneett__ddeellaayy
        the time spent in TCP/IP code, sending data, and waiting for
        response from the server, from connect to last byte of data,
        summed over all gifs fetched this page


     ggiiff__ssssll__oovvhhdd
        the time spent in SSL routines on the client, from connect to
        end of gif, summed over all gifs this page


     ttccpp__ccoonnnneecctt__ttiimmeess
        time required to do TCP/IP connects, summed over all files this
        page.  This time includes the time needed to create the socket,
        bind a name to it, and return from the system connect() call.
        It does *not* include any SSL negotiation time.


     ssssll__nneett__ddeellaayy__ccoonnnneecctt
        time spent in TCP/IP code, sending data, and waiting for
        responses while negotiating an SSL connection summed over all
        files this page.  This is a measure of how long the SSL
        negotiation takes, not counting the CPU overhead in the
        webclient SSL processing code.


     ssssll__ccoonnnneecctt__oovvhhdd
        time spent in SSL routines on the client, while negotiating an
        SSL connection, summed over all files this page. This measures
        the efficiency/performance of the particular SSL implementation
        used by webclient.


     ssssll__nneett__ddeellaayy__hheeaaddeerr
        time spent in TCP/IP code, sending data, and waiting for
        responses while asking for and obtaining the HTTP header, summed
        over all files this page


     ssssll__hheeaaddeerr__oovvhhdd
        time spent in SSL routines on the client, while requesting and
        receiving the HTTP header, summed over all files this page


     ssssll__nneett__ddeellaayy__ttrraannssffeerr
        time spent in TCP/IP code, sending data, and waiting for
        responses while transferring the body of the message, summed
        over all files this page


     ssssll__ttrraannssffeerr__oovvhhdd
        time spent in SSL routines on the client, while transferring the
        body of the file, summed over all files this page



  The following invariants are true for these statistics:


  response_times =  net_delay + ssl_ovhd;


  html_response_times = html_net_delay  + html_ssl_ovhd;


  gif_response_times = gif_net_delay + gif_ssl_ovhd;


  connect_times = tcp_connect_times + ssl_connect_ovhd +
  ssl_net_delay_connect;


  header_delays = ssl_header_ovhd + ssl_net_delay_header;


  transfer_times = ssl_transfer_ovhd + ssl_net_delay_transfer;








  88..22..  SSSSLL TTiimmiinngg AAnnoommaalliieess


  When an SSL connection is first established, a negotiation is
  performed so that the client and server can agree on private keys to
  use for the selected encryption algorithm.  This negotiation uses
  public key encryption and is hence, quite slow.  Once this negotiation
  has been performed, subsequent HTTP requests from the client to that
  same server use a cached session ID to reestablish the connection.


  Hence, we would expect the first connect times to be larger than
  subsequent connect times.  To make sure that the connect times from
  each repetition of a session are comparable, the SSL session_id is
  discarded at the end of each execution of the session and it must be
  renegotiated at the time of the next logon.



  88..33..  RReeppoorrttiinngg SSuummmmaarryy SSttaattiissttiiccss ffoorr BBlloocckkss ooff UURRLL''ss

  Summary statistics for a collection of URL's can be obtained by using
  the <<MARK>> keyword to delimit a block of URL's.  Thus, for example,
  the input file




       ______________________________________________________________________
       <<START>>
       GET /pageone.html
       GET /pagetwo.html
       <<MARK>>
       GET /pagethree.html
       GET /pagefour.html
       GET /more.html
       <<MARK>>
       GET /another.html
       GET /andmore.html
       <<MARK>>
       GET /last.html
       <<END>>
       ______________________________________________________________________





  will print statistics not only for each to the individual URL's, but
  also for four "blocks": The first two URL's combined, then the next
  three, combined, then the next two, and the final lone URL.




  88..44..  OObbttaaiinniinngg IInnddiivviidduuaall TTrraannssaaccttiioonn CCoommpplleettiioonn TTiimmeess

  Although webclient is designed to run a series of sessions, and then
  compute statistical timing information from the resulting
  measurements, information about each individual transaction can be
  printed out.  The -x flag will cause request timestamps to be printed;
  the -e flag will cause per-request broken-down statistics to be
  printed.



  The -x flag causes timestamps signifying the beginning and end of each
  transaction are printed into the report file.  The timestamps are in
  the form of floating point numbers, representing seconds since the
  beginning of the epoch Jan 1,1970


  The actual format is as follows:




       ______________________________________________________________________
       TPUT <<START>> try="try" date = "session_start_date"
       TPUT <URL>   try="try" req="nreq" url = "url"
       TPUT <START> try="try" req="nreq" date= "request_start_date"
       TPUT <END>   try="try" req="nreq" date= "request_completion_date"
       TPUT <SKIP>  try="try" req="nreq"
       TPUT <OBS>   try="try" req="nreq" datum="obs_name" value="value"
       TPUT <<END>> try="try" date = "session_completion_date"
       TPUT <<INTERRUPTED>>
       ______________________________________________________________________





  where:


       ______________________________________________________________________
       try                     is number of the session trial,
       nreq                    is the number of the request,
       url                        is the url of the request,

       session_start_date      is the current date/time just before the
                               first request is issued,

       session_completion_date is the current date/time just after the
                               response to the last request has been received.

       request_start_date      is the current date/time just before the URL
                               is sent to the web server,

       request_completion_date is the current date/time just after the last
                               byte of the last webpage or gif/image associated
                               with this URL has been received

       obs_name                is the name of the observed (measured) item
       value                    is the value of the observed item

       <<START>>               denotes the beginning of a session (trial)
       <<END>>                 denotes the end of a session (trial)
       <START>                 denotes the beginning of a request observation
       <END>                   denotes the end of a request observation
       <SKIP>                  denotes that the indicated request was not issued.
       <OBS>                   delimits a detailed timing observation
       <<INTERRUPTED>>            denotes that a signal was caught.

       ______________________________________________________________________







  99..  HHooww TToo DDoo MMuullttii--UUsseerr RRuunnss


  Multiuser runs are performed by running multiple copies of webclient
  under the control of the perl program run.workload.  This perl program
  provides various convenience functions, most important of which are
  its ability to perform a clean ramp-up and shutdown of the workload,
  and to exclude the ramup/down periods from statistics gathering.  The
  control program uses a special shared-memory area to coordinate the
  clients; the control features are not reproducible with a simple
  "roll-your-own" shell script.



  The file run.workload is primarily a configuration file, with the
  actual work being done by the program workload.pl .  You should review
  the contents of run.workload and make sure that reasonable defaults
  and values have been chosen for your run.  Typically, one starts the
  program as follows:




       ______________________________________________________________________
       run.workload  url.session.input  30   5.0  3600 prefix_
       ______________________________________________________________________





  This will launch 30 copies of webclient, run for 3600 seconds, using
  url.session.input as it's input file, and setting the default think
  time to be 5.0 seconds (overriding any think times specified in the
  input file.) The runtime of 3600 seconds is exclusive of the startup
  and shutdown times (which can be significant, especially when a large
  number of clients are specified).  The output reports are placed in a
  directory named prefix_3600_30.  This directory name is built up by
  using the prefix, the length of the run, and the number of clients.
  The directory is automatically created.


  Be sure to pick a run duration that is long enough to collect a
  reasonable amount of data.  For complex sites or complex workloads,
  runs of less than an hour can lead to choppy and uneven statistics.
  Note that sessions with non-zero think times can take minutes or tens
  of minutes to run, depending on the complexity of the session. Thus,
  you want to pick a run duration that allows at the very least 4 or 5
  sessions to play through.  Remember that no statistics are collected
  until a new session is started after ramp-up is complete, and
  statistics from partially-completed sessions at the end of the run are
  also discarded.  Thus, typically 2 or 3 sessions are lost to rampup
  and ramp-down times, which is why 4-5 sessions minimum is recommended.




  99..11..  WWoorrkkllooaadd PPaarraammeetteerrss

  Here are some of things that can be tuned or controlled by modifying
  run.workload:



  1. $options -- set to specify options to webclient.  Options such as
     -A or -S SSLV3:4 are placed here.
  2. $webserver -- specifies the webserver host being used.

  3. $sleep_interval -- specifies how long run.workload is to run
     between statistics being printed.

  4. $nservers -- number of webservers on the host to use.  Each
     webserver on the host is assumed to have a distinct port number.
     Set this to one to use only one webserver on the host.

  5. $port[] array -- this is the array of port numbers.  If $nservers
     is set to 1 then $port[0] is the only entry that is used.

  6. $nstart -- run.workload starts the child webclient's in groups.
     This number specifies the size of each group.

  7. $dest -- the name of the directory where the results will be
     placed.

  8. $custid[] array -- provides the customer id's for the child runs.

  9. $pin[] array -- provides the pin numbers for the child runs.

  10.
     $passwd[] array -- provides the passwords for the child runs.

  11.
     $seed[] array -- provides the random seeds used by the child runs.
     Distinct seeds are required to get distinct think times and
     randomization values in each child.

  12.
     $shmkey -- used to identify the shared memory area to the children.
     Value does not matter and should not need to be changed unless
     another program (perhaps another instance of run.workload) that
     uses the same shared memory key exists on the local system.


  In certain places in run.workload, the program updshm is used to
  update the shared memory area instead of the PERL routine shmwrite().
  This is because shmwrite() was causing segmentation faults when it
  should not; so it was replaced with the updshm program instead.



  99..22..  NNootteess oonn tthhee ""RRaammpp--UUpp"" PPrroocceessss


  The run.workload script goes through a sequence of steps to ramp up
  the workload on the server and start statistics gathering.  The below
  enumerates the sequence it goes through.



  1. As many copies of webclient as specified are all started.  As each
     one starts, it goes through some basic initialization, after which
     it forks itself off into the background.  If any of these fail
     during initialization, run.workload stops and all clients are
     killed.  Once each client has finished initializing, it increments
     a flag in shared memory, indicating that it's ready to start
     running requests.  It will not actually run any requests until
     run.webclient sets the 'start' flag.

  2. run.workload waits until all children have indicated that they have
     initialized.


  3. run.workload then increments a second global variable by $nstart.
     This causes the first $nstart children to begin submitting
     requests.  run.workload waits at least 12 seconds (by default), or
     longer, until each of the children has actually completed at least
     one request, before starting another group of $nstart children.

  4. When all children have completed at least one request, run.workload
     sets a flag in global memory indicating that all children are now
     actively submitting requests.  At this point, ramp up is complete.

  5. The children examine this "ramp-up complete" flag to determine
     whether to start collecting statistics.  They do not actually start
     collecting statistics until the start of a new session after the
     flag is set.  Thus, statistics are always for whole sessions.  Data
     for sessions that started before ramp-up complete are discarded.

  6. When the run duration expires, each copy of webclient is sent a
     SIGUSR1 signal, telling it to shut down.  Data collected for the
     partially-completed session that was running when the signal was
     caught is discarded.  Then each client will print out the
     statistics that it has collected.  By discarding the partially
     completed run at the tail end, the statistics that are presented
     are always for a whole number of sessions repetitions.

  7. run.workload takes a snapshot of the current values for the
     statistics in the global memory area and prints these to stdout, in
     order to simplify the live monitoring of a run.  Note that the
     printed statistics do exclude any data gathered during ramp-up, and
     thus should present an accurate measure of throughput.  However,
     they will not match exactly the summary statistics generated by
     sumstats, since sumstats uses only whole sessions, with partially
     completed sessions trimmed from the beginning and end of the run.
     Thus, sumstats will typically report a smaller number of completed
     URL's, that were done in a smaller amount of time.  The average
     throughput should be the same, though, within statistical
     fluctuations.




  99..33..  IInntteerrpprreettiinngg tthhee OOuuttppuutt


  Below is an example of the output from run.workload:




       ______________________________________________________________________
       T  ELAP STP STL SESS  REQS  CONNS  GIFS  KBYTES GIF-KB  REQ   FIRST  END to
       I   for this interval:                   KB-RATE        RATE   DATA   END
       --------------------------------------------------------------------------
       T    30  0  0    74     74   370    370   1736    702   2.47  0.135  1.135
       I                                         57.87         2.47  0.135  1.135
       T    60  0  0   145    145   725    725   3402   1376   2.42  0.145  1.152
       I                                         55.52         2.37  0.155  1.169
       T    90  0  0   219    219  1095   1095   5138   2078   2.43  0.187  1.151
       I                                         57.87         2.47  0.270  1.149
       ______________________________________________________________________




  Notice that there are two lines, labelled T and I.  The T line shows
  totals, the I line shows stats for the interval.  In the above, the
  interval is 30 seconds long.
  This output is primarily useful in keeping an eye on the run; more
  detailed and compete statistics can be gotten by post-processing the
  reports files after the end of the run.  Later sections discuss the
  post-processing stage.

  The columns in the output are labelled as follows:


     EELLAAPP
        Elapsed time.  The amount of time, in seconds, since the
        beginning of the run. In the above example, we see three
        entries: 30, 60 and 90 seconds into the run.


     SSTTPP
        Stopped runs. The number of clients that have completely halted.
        Clients will halt when they encounter a variety of differnt
        errors, such as unresponsive web servers, missing web pages, bad
        checksums, and other reasons.


     SSTTLL
        Stalled runs. The number of clients that have made no progress
        in the last two intervals.  A client is "stalled" if the number
        of KB of HTML fetched and the number of gif files fetched has
        not changed for two $nsleep intervals.  In this example, $nsleep
        is 30 seconds.  There are no stalled runs.


     SSEESSSS
        Sessions.  The total number of sessions that have been completed
        by all of the clients since the begining of the run. Here, we
        see that 219 sessions were completed in 90 seconds.


     RREEQQSS
        Requests.  The total number of requests that have been completed
        by all of the clients since the begining of the run. Here, we
        note that the number of requests equals the number of sessions:
        that's because the 'session' consisted of a single web page.
        If, for example, a session consisted of two pages, then the
        number of completed requests would be about double the number of
        completed sessions.


     CCOONNNNSS
        Connections.  The total number of connections made to the web
        server.  These include sockets opened to fetch gifs.  If a
        server is running with Keep-Alive/Persistant connections
        enabled, then the connection count will typically stay low.


     GGIIFFSS
        Gifs. The total number of image (or audio) files fetched by all
        clients.  Since one web page typically contains a lot of images,
        this number will typically be much larger than the number of
        completed requests.  However, since webclient emulates gif
        caching, this number will stay low if the same set of gifs
        appear on each page.  In the above example, we see that the web
        page has five gifs on it, and thus the number of fetches is five
        times the number of requests.


     KKBBYYTTEESS
        Kilobytes Fetched. The total amount of data downloaded,
        including header bytes, body bytes, and gif bytes.  Header bytes
        are  the part of the HTTP header that isn't the displayed HTML:
        even for a simple and can contribute significantly to the total
        network traffic.


     KKBB--RRAATTEE
        Kilobyte Rate.  Shown in the same column as KYBTES, but down one
        row, this is the rate of data transfer, in KBytes per second.
        This figure is only for the interval; it is _n_o_t an average since
        the begining.


     GGIIFF--KKBB
        Image KBytes Fetched. The total amount of KBytes that were
        image/audio/ embedded graphics.


     RREEQQ RRAATTEE
        Request Rate.  The number of pages per second being fetched.
        There are two numbers in the column, on alternating rows.  On
        the T row, we have the average request rate, since the begining
        of the run.  On the I row, we have the request rate in the last
        (30 second) interval.


     FFIIRRSSTT DDAATTAA
        First Data Response Time. The elapsed time, in seconds, between
        when a URL request is made, and the first non-header response
        byte is received.  (Some web servers send a partial header
        immediately; this statistic measures how long it took until data
        actually started flowing.)

        There are two numbers in the column, on alternating rows.  On
        the T row, we have the average response time, since the begining
        of the run.  On the I row, we have the response time in the last
        (30 second) interval.


     EENNDD ttoo EENNDD
        End to End Response Time. The elapsed time, in seconds between
        when a URL request is made, and the very last byte of the
        response is received.  This number is always greater than the
        First Data Response Time, because it includes the additional
        overhead of delivering the rest of the page.  Some cgi-
        bins/interctive web sites start streaming a web page beck to the
        browser even before the entire web page has been dynamically
        generated.  Thus, back end delays can cause long delays between
        the start of a web page transmission, and its end.

        There are two numbers in the column, on alternating rows.  On
        the T row, we have the average response time, since the begining
        of the run.  On the I row, we have the response time in the last
        (30 second) interval.


  Note that these numbers are useful primarily in keeping an eye on the
  running test.  More detailed and complete statistics are obtained by
  post-processing the reports generated by the clients.  Later sections
  discuss the post-processing tools







  99..44..  GGaatthheerriinngg OOtthheerr SSttaattiissttiiccss

  It is useful to gather system performance statistics on the tested web
  server.  In particular, the vmstat command can be used to capture CPU
  usage.  Other commands provide network traffic, and the cwsmon tool
  will provide GOLD, MQ Series and IPC statistics.  Some tools are
  provides to simplify the handling of some of these other statistics.
  In particular, note the following:



     ccppuuttoottaallss
        Takes the output of the vmstat command, and computes the average
        CPU usage, number of page faults, I/O & etc.  Handles multiple
        files at once, printing a one-line summary per file.  This
        script is critical for understanding what the cpu usage was on
        the server during a run.


     cchhoopplloogg
        Chops up one large vmstat.out file into pieces, suitable for
        input to the cputotals command.  To determine how to chop up the
        vmstat input file into pieces, it parses the output of
        run.workload to determine what times a run started and ended.
        It then uses these start and end times to chop a corresponding
        chunk out of the run.vmstat output.  It can handle multiple
        run.workload.out files simultaneously to minimize typist
        fatigue.


     ttiimmeecchhoopp
        Perl script that chops off the beginning and end of an input
        file, printing only the middle to stdout, as specified by a
        start and end-time.  Handles run.vmstat files by default, but
        can handle cwsmon or other file formats with some fiddling.  Can
        be used to create the input files needed by cputotals, by
        stripping off garbage data at the beginning and the end of a
        vmstat measurement.






  99..55..  PPoosstt--PPrroocceessssiinngg aanndd DDaattaa RReedduuccttiioonn


  Each client of a multi-user run produces its own report file when the
  run terminates.  The report file contains statistics for each
  individual web page, as well as various sorts of averages and roll-
  ups.  There's a lot of data in there.  Probably mor than you want.  To
  get averages for the whole run (averages of all of the client
  statistics), some post-porcessing of the data needs to be done.  This
  section describes the scripts and the process for this data reduction.

  The output of the process is a report file that very much resembles
  the report file of each individual client, except that it contains the
  averages over all clients.   The report breaks out the response times
  and delays for each web page, for collections of web pages, and for
  the session as a whole.  The last few lines of the report show the
  grand total page rate and response time.

  The instructios below describe how to create a summary report.



  1. You need to summarize the response time outputs from all of the
     webclient report files.  The way that run.workload is configured at
     present, these end up in the reports directory under run_dd_nn.  To
     reduce the data, run sumstats run_dd_nn/reports/*


     This extracts the response time per URL and the overall response
     time for the run from each of the report files.


     Note that if randomization is used (i. e. the "fraction" field of
     the URL input line is less than 1.0, which causes the associated
     URL to be run on that fraction of the sessions attempted), then the
     response times printed by reducer will not add up to the overall
     response time at the bottom of the reducer report.  This is because
     the response times listed are averaged over all runs where the
     particular URL was submitted, whereas the total at the bottom is
     averaged over all sessions attempted.  To get the totals to add up,
     you must weight the intermediate numbers by the fraction of
     sessions where those URL's were actually submitted.



  2. You need to fixup the vmstat output files so they contain data only
     from the time period associate with the measurement run.  This can
     be done manually, or with the aid of several tools. These fixed-up
     files are then used by the cputotals command to report the average
     CPU usage, memory size, page faults, etc. for the run.


     The choplog tool automates the following manual procedure:



     a. View the run.workload.out file that contains the output text
        from the run.workload command. Find the line "rampup complete at
        time:" and record the time (in hours and minutes) that is given
        there.

     b. Find the line "run complete at time:" and record the time given
        there.  These times represent the start and end time of the run,
        exclusive of startup and shutdown times.

     c. Edit the file vmstat.out that was produced by the run.vmstat
        command.  Delete the lines of the file that correspond to
        observations taken outside of the experimental interval.


     The choplog tool allows you to specify a vmstat.out and multiple
     workload.out files on the command line.  It will produce one
     chopped vmstat.out file for each workload.out file that was
     specified.  This is very handy when digesting output from multiple
     overnight runs.


     Alternately, the timechop utility is a more primitive tool: given
     an input, a start time, and an end time, it will print out only the
     portion of the input that lies between the start and end times.


     After the properly cleaned up vmstat.out files have been created,
     the cputotals command can be used to report the average CPU busy &
     idle percentages, as well as the average number of context switches
     and system call rates per second.  The cputotals command will
     accept multiple input files at once, and will print a separate
     summary line for each.
  99..66..  SSttrreessss TTeessttiinngg

  Of course, this toolset can be used to perform stress-testing of web
  servers.  Below we describe some additional features that may be
  useful for such testing.


  In the real world, network errors occur, and clients occasionally drop
  or disconnect sockets.  These tools can simulate some of these
  behaviours in a simplistic fashion.  webclient has been written so
  that if it catches a SIGHUP during a network operation (_i_._e_. system
  call) it will repeat that operation.  This can induce some network
  abnormalities while still allowing webclient to continue functioning.
  The perl script smack can be used to send SIGHUP signals to all
  running copies of webclient.




  99..77..  LLaarrggee NNuummbbeerrss ooff CClliieennttss

  Several changes have been made in run.workload to accommodate larger
  numbers of webclient children:


  1. The runs are now started in groups of 10 rather than all at once.
     Each time a group of 10 is started, run.workload waits until they
     run at least 1 request each before it starts the next 10.  That way
     we avoid convoy problems (e. g. where all of the children run the
     same request at the same time).  This part of the run is called
     _r_a_m_p_u_p.  You can revert to the old behavior by setting the $nstart
     variable to $number_of_users.  This will cause all of the runs to
     start at once.

  2. Statistics are not collected by either run.workload or webclient
     until after the end of rampup.

  3. The output no longer prints the status of each run.  Only when a
     run stops will a message about that particular run be printed.  (or
     when it becomes or ceases being "stalled" -- see next item)

  4. run.workload now checks for "stalled" or "lack of progress" runs.
     A run is stalled if the number of html KBytes and gif files it has
     fetched does not change for two print intervals (currently set to
     30 seconds).  Messages are printed when a run becomes stalled and
     when it resumes progress (if ever).

  5. Since throughput statistics are not recorded until after the end of
     rampup, the throughput rates should not require adjustment due to
     startup.

  6. The perl5 shmwrite() function appears to have a bug in it that
     causes it to fail with a segment violation if it is used more than
     once per run.  To get around this, there is a little program called
     updshm that updates shared memory and is used instead of shmwrite.

  7. The number_of_users values supported by run.workload is limited
     only by the size of the password file.  Runs with up to 350 users
     have been successfully completed.



  1100..  TTrroouubblleesshhoooottiinngg



  1100..11..  EErrrroorr MMeessssaaggeess

  A brief guide to some of the error messages, and the work arounds.


     RRuunn__xxxx__yyyy//rreeppoorrttss hhaass ssoommee oodddd llooookkiinngg rreeppoorrttss..
        all stats are 0.00 for several runs, then jump to a very high
        value, (200-300 seconds), then a very low value, then to 0.00
        again.  What does this mean?"  I dunno


     II ggeett tthhee eerrrroorr mmeessssaaggee::
        ....cannot connect ..."  You are behind a firewall.  Use either
        the socks versions of the binaries, or specify the proxy server
        with the -p flag.





  1100..22..  KKnnoowwnn PPrroobblleemmss oorr LLiikkeellyy TTrroouubbllee SSppoottss



  1100..22..11..  CChheecckkssuummss

  If there is a current date or some similar thing in a page, the date
  will be different every day the page is run.

  Or, you COULD run with the -i flag and tell webclient to ignore
  checksums.  This is not a good idea for the following reason:  your
  server could fail and send more or less meaningless web pages to
  webclient in the middle of a long performance run, and you will never
  know the difference.  It has happened to me....  That is the reason
  the checksum stuff is in there in the first place!

  If you do get a mysterious checksum error, look in webclient error log
  and then go look at the trace file generated when you retrained your
  script file.  (You did use the shell script "retrain" to retrain your
  webclient input file, didn't you?).  Locate the offending page in the
  trace file and in the webclient error log.  Split them out into a
  separate file (each) and diff them to find out why they are different.
  This will often point to something like a date field or the infamous
  handles (see item(2)) which you will have to figure out how to kludge
  around somehow.


  1100..22..22..  HHaannddlleess

  XXXXXX TThhiiss sseeccttiioonn iiss oobbssoolleettee aanndd nnoo lloonnggeerr aapppplliiccaattbbllee.  A handle is a
  value manufactured by the servlet code and sent out in the page, that
  the server expects the browser to hand back to the server on one of
  the requests in the page.  The server uses this to store data between
  pages.  What happens is that the handle value is somehow associated
  with permanent storage in the server; when a new request comes along,
  it supplies the handle value given to the browser in the last page.
  The server code then goes and fetches the saved data.

  This is problematic for webclient in that webclient runs a saved list
  of URL's.  Those URL's can have handle values in them that were good
  only for the particular run when the URL's were saved and are
  meaningless in a subsequent run.  So webclient will fail to submit a
  plausible request to the server.

  Handles are also problematic for webclient in that if webclient is
  checking hash values (to make sure that the pages sent are the correct
  ones) and there is a handle value on the page, then every time the
  page is run, the handle values are different so they never checksum
  the same.

  How this problem is solved:  In the simple_check_sum routine in
  webclient and webmon, the handle values are found and overwritten with
  X's before the hash is calculated.  This should make the pages
  checksum the same.  Now this is still a problem if handle values are
  not always of the same length.


  1111..  LLiicceennssee


  1111..11..  LLiicceennssee

  The documentation is covered under the GNU FDL:


       ______________________________________________________________________
             Copyright (c)  2000  Linas Vepstas.
             Permission is granted to copy, distribute and/or modify this document
             under the terms of the GNU Free Documentation License, Version 1.1
             or any later version published by the Free Software Foundation;
             with the Invariant Sections being LIST THEIR TITLES, with the
             Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
             A copy of the license is included in the section entitled "GNU
             Free Documentation License".
       ______________________________________________________________________




  The source code is covered byt the GNU GPL, and also the following
  (which are compatible with the GPL):































  ______________________________________________________________________
  This software was derived, in part, from software created by Silicon
  Graphics, Inc. for public use.  The base software was from WebStone
  version 2.0b4.  The WebStone software is copyright 1995, Silicon Graphics,
  Inc.  Use of WebStone is governed by the following license statement
  as well as the heading given below.  This heading is present in all of
  the WebStone source files:

      This file and all files contained in this directory are
      copyright 1995, Silicon Graphics, Inc.

      This software is provided without support and without any obligation on the
      part of Silicon Graphics, Inc. to assist in its use, correction, modification
      or enhancement. There is no guarantee that this software will be included in
      future software releases, and it probably will not be included.

      THIS SOFTWARE IS PROVIDED "AS IS" WITH NO WARRANTIES OF ANY KIND INCLUDING THE
      WARRANTIES OF DESIGN, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE,
      OR ARISING FROM A COURSE OF DEALING, USAGE OR TRADE PRACTICE.

      In no event will Silicon Graphics, Inc. be liable for any lost revenue or
      profits or other special, indirect and consequential damages, even if
      Silicon Graphics, Inc. has been advised of the possibility of such damages.


      /**************************************************************************
       *                                                                        *
       *          Copyright (C) 1995 Silicon Graphics, Inc.                     *
       *                                                                        *
       *  These coded instructions, statements, and computer programs were      *
       *  developed by SGI for public use.  If any changes are made to this code*
       *  please try to get the changes back to the author.  Feel free to make  *
       *  modifications and changes to the code and release it.                 *
       *                                                                        *
       **************************************************************************/

  This software was derived, in part, from software created by Sverre H. Huseby
  <sverrehu@online.no> and covered under the "Artistic License"
  http://ibiblio.org/pub/Linux/LICENSES/artistic.license
  ______________________________________________________________________


























