Newsgroups: news.software.b
Path: utzoo!henry
From: henry@utzoo.uucp (Henry Spencer)
Subject: Re: Dynamic "smart" expiration?
Message-ID: <1989Dec29.020109.16829@utzoo.uucp>
Organization: U of Toronto Zoology
References: <1989Dec27.033817.9953@smsc.sony.com> <1989Dec28.063932.13720@robohack.UUCP> <1989Dec28.171830.13130@smsc.sony.com>
Date: Fri, 29 Dec 89 02:01:09 GMT

In article <1989Dec28.171830.13130@smsc.sony.com> dce@Sony.COM (David Elliott) writes:
>>I would rather still have expire do the expiring, rather than rnews.
>>This allows more flexibility, not to mention archive support, etc.  I
>>would definitely not want relaynews to do expiring too!
>
>Actually, I was thinking more in terms of having newsrun doing the
>expiring as part of its loop.

Folks have done that with C News, although it's not something we support
officially.  Possibly we should, but the obvious technique -- dynamically
generating expire's control file and cranking down the numbers until space
is adequate -- interacts awkwardly with some of the fancier things you
can do in the control file.  If I can think of some graceful way to deal
with this, I'll probably make it available as an option.

>The big problem as I see it is that expire is slow (at least the B
>news version was), especially if you start adding special heuristics
>based on usefulness and group size and file age and number of
>subscribers and so forth.

C News expire is essentially entirely I/O-bound and dbm-bound (I haven't
yet run detailed timings with dbz, although I'll do it soon), so adding
a *little* complexity to the decision process would not be disastrous.

We were very close to adding the size of the file as another subfield
in the history file's middle field, so that it could be used as input
for decision making.  Alas, it's *not* easy to define exactly how such
policies should work in the presence of complications like per-group
expiry settings, and we tend to believe in the theory that you should
not collect data until you have some idea what you're going to do with it.

>If expire generated a list of files to expire once a day, you could
>still archive the files, and maintain flexibility, but when it's time
>for them to go to make room for other files, it's easy and fast,
>and until that time comes, they're still available.

I thought a bit about breaking expire into a decision part and an
implementation part, so to speak, like this.  I wasn't convinced that
it offered enough advantages to be worth the effort and possible
problems.  *However*... note that expire's -t option does almost exactly
what the decision module would do:  it prints a description of what
expire would do, but doesn't do it.  The output is *almost* an executable
shell file -- at one point it was one, until I noticed that there are some
complications like creating directories that are hard to deal with simply --
and picking out the file names would not be hard.  I will write up the
format in the documentation, so folks can depend on it.
-- 
1972: Saturn V #15 flight-ready|     Henry Spencer at U of Toronto Zoology
1989: birds nesting in engines | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
