21.6. Expiring News

In B News, expiration needs to be performed by a program called expire, which took a list of newsgroups as arguments, along with a time specification after which articles had to be expired. To have different hierarchies expire at different times, you had to write a script that invoked expire for each of them separately. C News offers a more convenient solution. In a file called explist, you may specify newsgroups and expiration intervals. A command called doexpire is usually run once a day from cron and processes all groups according to this list.

Occasionally, you may want to retain articles from certain groups even after they have been expired; for example, you might want to keep programs posted to comp.sources.unix. This is called archiving. explist permits you to mark groups for archiving.

An entry in explist looks like this:
grouplist perm times archive

grouplist is a comma-separated list of newsgroups to which the entry applies. Hierarchies may be specified by giving the group name prefix, optionally appended with all. For example, for an entry applying to all groups below comp.os, enter either comp.os or comp.os.all.

When expiring news from a group, the name is checked against all entries in explist in the order given. The first matching entry applies. For example, to throw away the majority of comp after four days, except for comp.os.linux.announce, which you want to keep for a week, you simply have an entry for the latter, which specifies a seven-day expiration period, followed by an expiration period for comp, which specifies four days.

The perm field details if the entry applies to moderated, unmoderated, or any groups. It may take the values m, u, or x, which denote moderated, unmoderated, or any type.

The third field, times, usually contains only a single number. This is the number of days after which articles expire if they haven't been assigned an artificial expiration date in an Expires: field in the article header. Note that this is the number of days counting from its arrival at your site, not the date of posting.

The times field may, however, be more complex than that. It may be a combination of up to three numbers separated from one another by dashes. The first denotes the number of days that have to pass before the article is considered a candidate for expiration, even if the Expires: field would have it expire already. It is rarely useful to use a value other than zero. The second field is the previously mentioned default number of days after which it will be expired. The third is the number of days after which an article will be expired unconditionally, regardless of whether it has an Expires: field or not. If only the middle number is given, the other two take default values. These may be specified using the special entry /bounds/, which is described a little later.

The fourth field, archive, denotes whether the newsgroup is to be archived and where. If no archiving is intended, a dash should be used. Otherwise, you either use a full pathname (pointing to a directory) or an at sign (@). The at sign denotes the default archive directory, which must then be given to doexpire by using the –a flag on the command line. An archive directory should be owned by news. When doexpire archives an article from say, comp.sources.unix, it stores it in the directory comp/sources/unix below the archive directory, creating it if necessary. The archive directory itself, however, will not be created.

There are two special entries in your explist file that doexpire relies on. Instead of a list of newsgroups, they have the keywords /bounds/ and /expired/. The /bounds/ entry contains the default values for the three values of the times field described previously.

The /expired/ field determines how long C News will hold onto lines in the history file. C News will not remove a line from the history file once the corresponding article(s) have been expired, but will hold onto it in case a duplicate should arrive after this date. If you are fed by only one site, you can keep this value small. Otherwise, a couple of weeks is advisable on UUCP networks, depending on the delays you experience with articles from these sites.

Here is a sample explist file with rather tight expiry intervals:
# keep history lines for two weeks. No article gets more than three months
/expired/                       x       14      -
/bounds/                        x       0-1-90  -
# groups we want to keep longer than the rest
comp.os.linux.announce          m       10      -
comp.os.linux                   x       5       -
alt.folklore.computers          u       10      -
rec.humor.oracle                m       10      -
soc.feminism                    m       10      -
# Archive *.sources groups
comp.sources,alt.sources        x       5       @
# defaults for tech groups
comp,sci                        x       7       -
# enough for a long weekend
misc,talk                       x       4       -
# throw away junk quickly
junk                            x       1       -
# control messages are of scant interest, too
control                         x       1       -
# catch-all entry for the rest of it
all                             x       2       -

Expiring presents several potential problems. One is that your newsreader might rely on the third field of the active file described earlier, which contains the number of the lowest article online. When expiring articles, C News does not update this field. If you need (or want) to have this field represent the real situation, you need to run a program called updatemin after each run of doexpire. (In older versions of C News, a script called upact did this.)

C News does not expire by scanning the newsgroup's directory, but simply checks the history file if the article is due for expiration.[1] If your history file somehow gets out of sync, articles may be around on your disk forever because C News has literally forgotten them.[2] You can repair this by using the addmissing script in /usr/lib/news/maint, which will add missing articles to the history file or mkhistory, which rebuilds the entire file from scratch. Don't forget to become user news before invoking it, or else you will wind up with a history file unreadable by C News.

Notes

[1]

The article's date of arrival is kept in the middle field of the history line and given in seconds since January 1, 1970.

[2]

I don't know why this happens, but it does from time to time.