Newsgroups: comp.sys.encore
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!cunixf.cc.columbia.edu!melissa
From: melissa@cunixf.cc.columbia.edu (Melissa Metz)
Subject: Re: dump really slow
Message-ID: <1991Apr17.193235.22135@cunixf.cc.columbia.edu>
Keywords: slowdumps
Reply-To: melissa@cunixf.cc.columbia.edu (Melissa Metz)
Organization: Columbia University
Date: Wed, 17 Apr 1991 19:32:35 GMT


I asked:
> My question: why does dump run so slow and use so many CPU seconds?

I got various responses.  Thanks for all the ideas!  Sorry for not
sending thanks sooner, but we've been busy trying out some of the
suggestions, and it takes a while to get results.  Unfortunately, we
are still suffering, though I haven't tried everything yet.

phil@pex.eecs.nwu.edu (William LeFebvre) suggested:
> One thing that we do when dumping our Multimax is we restrict the
> number of sub-processes that it starts.  This is done with the
> Encore-specific option "-N", as in "dump -N 4 ..."  We use an
> N of 8 and get adequate performance on a machine with 4 APCs.

In benchmarking, I found that reducing the number of dump processes to
4 (by setting -N 2 -- two kids, a parent and a grandparent) increased
real time by a minimal amount and total CPU time used by 50% (!).  I
couldn't judge the load impact from the benchmark.  We tried this in
production last night, and got complaints that the load was even
worse!  (But one comment that it was 100% better -- perhaps that one
user logged in while the operator was changing tapes!)  (I'm not sure
whether we'll stick with this another night, trying to attribute the
high load to a looping sendmail or something :-).)


wtm@bu-it.bu.edu (W. Thomas Meier) suggested:
> 1.  The network:  It sounds like you may be up against a bad connection
>     to the network or a lot of load on the network.  

We have been experiencing network problems off and on, and our network
group has been running around fixing things, but it is not a constant
problem (as the dumps are).  I haven't looked into this suggestion
further.

>  2.  Running dumps concurrently on two file systems means that a lot
>      of inodes and file descriptors are in use.  You may have to double
>      those numbers.  Also, how much physical memory do you have? In 
>      order to run concurrent backups, you may need more physical memory
>      or at least more swap space on disk (it should be no more than 
>      twice physical ).				 ^^^^^^^

We have 128 meg of physical memory, and about 500 meg of swap (1054464
blocks, on five separate disks -- 2 have 262848, 2 have 131808 and the
last 265152 blocks).

Did you really mean "no more"?  Am I losing in some obscure way by
having too much swap space?

>      Also, check the number of large and small ethernet frame packets
>      in your Umax.param file.  I set mine to 300 for each.

We tried this.  In fact, we modified a few things in sysparam -- set
these to 300, as you suggested (they were 120 before).  We also set
tcpackdelay to "true" (and discussed why such big system would ever
want it false...).  And the number of file system buffers had not been
updated since we increased our physical memory, so we fixed that
number.

Preliminary results were good -- I benchmarked a backup at *twice* the
former speed.  Production backups also ran much faster for the first
few days -- but then reverted, and became (by some reports) even worse
than before :-(.

I am currently supposing that the sysparam didn't have as much of an
effect as rebooting the system -- this may have cleared some sort of
memory leak or some other "software rot".


Wytze van der Raay <wytze@encore.nl> says:
> You don't specify your OS, but I presume you are using UMAX 4.3.
> By the way, are you running the same OS on the 310 and 510 ??

Oops, I knew I forgot something!  Yes, we're running Umax 4.3
("R4.1.0") on both.

> I don't know how UMAX 4.3 dump is implemented, but my suspicion is
> that your 510x4 case is suffering from spinlocking processes being
> rescheduled. If the parallel processes employed by dump synchronize
> with each other using spinlocks, and there are also other users on
> the system consuming CPU's (you quote a load of 5 on a 4-CPU system),
> severe performance degradation is possible.
> [he also suggests reducing the number of processes]

> CPU time isn't a very good measure for the effectiveness of dump
> though, you should look at the real time spent (that's what dump
> is supposedly optimized for).

I don't think I'm trying to measure effectiveness, I'm trying to
measure load impact.  I know that the straight numbers of CPU seconds
used won't exactly tell me this, but it's easy to measure and should
at least be (distantly?) related to load impact.  It's much harder to
measure the difference in load average and perceived slowness.


Andrew T. Como <como@max.bnl.gov> says:
> There was a tremendous amount of fragmented space on the file
> systems.
> My suggestion is force yourselve to take a good dump on a file system
> make a newfs and rebuild it.  It sounds like a lot of work but in the
> long run it will save you time because youd backups will become 
> increasingly longer.

Can I pass on this suggestion for a little while?  With 12 disks, this
would in fact be an awful lot of work...


Terence Kelleher <terryk@encore.com> says:
> 1- What kind of tape drive is the destination device?  The density of
> 54000 and size of 6000 ft does not sound reasonable. 

My "benchmark drive" is an 8mm tape drive hung off a Sun 4/280.  The
54000/6000 is supposed to sound unreasonable, since the 8mm tapes hold
an awful lot of data.  I believe the numbers are fudged, since the
tape is shorter and denser, but dump won't believe how dense it
actually is.  (The "production drives" are a pair of 9 track drives
hanging off a different 4/280.)

> 2- Are the 2 machines connected to the same network?  To the same
> cable?  Is there any hardware on the net between the 310 and 510 and
> the SUN 4? (repeaters, gateways, etc.).

The two machines sit side by side.  They are hooked into the same
cable, though may be on adjacent DELNI's (multiport repeaters).


Dennis Forgione <dennisf@encore.com> says:
> What is the configuration of your machines? 
> 510 -   What king of tape drives?
>         Where are they physically connected?  EMC? MSC? Same SCSI channels?

As I said above, it is an 8mm tape drive connected to our Sun 4/280.
As far as the Encore is concerned, it is connected to the network.

>         What kind of disk drives?
>         How are they connected on the MSC?

They are 12 CDC Wren V's.  They are connected to the MSC in three
groups of four.

> If this is getting to be a problem and I can't give you a quick
> answer, I would suggest placing a service call to the Technical
> Assistance Center at 1-800 TECH-AID.

We are in touch with Encore, but have not yet gotten a satisfactory
answer.

					--Melissa Metz
					  Unix Systems Group
