1. What is the Usenet?

1.1. Discussion groups

The Usenet is a huge worldwide collection of discussion groups. Each discussion group has a name, e.g. comp.os.linux.announce, and a collection of messages. These messages, usually called articles, are posted by readers like you and me who have access to Usenet servers, and are then stored on the Usenet servers.

This ability to both read and write into a Usenet newsgroup makes the Usenet very different from the bulk of what people today call ``the Internet.'' The Internet has become a colloquial term to refer to the World Wide Web, and the Web is (largely) read-only. There are online discussion groups with Web interfaces, and there are mailing lists, but Usenet is probably more convenient than either of these for most large discussion communities. This is because the articles get replicated to your local Usenet server, thus allowing you to read and post articles without accessing the global Internet, something which is of great value for those with slow Internet links. Usenet articles also conserve bandwidth because they do not come and sit in each member's mailbox, unlike email based mailing lists. This way, twenty members of a mailing list in one office will have twenty copies of each message copied to their mailboxes. However, with a Usenet discussion group and a local Usenet server, there's just one copy of each article, and it does not fill up anyone's mailbox.

Another nice feature of having your own local Usenet server is that articles stay on the server even after you've read them. You can't accidentally delete a Usenet articles the way you can delete a message from your mailbox. This way, a Usenet server is an excellent way to archive articles of a group discussion on a local server without placing the onus of archiving on any group member. This makes local Usenet servers very valuable as archives of internal discussion messages within corporate Intranets, provided the article expiry configuration of the Usenet server software has been set up for sufficiently long expiry periods.

1.2. How it works, loosely speaking

Usenet news works by the reader first firing up a Usenet news program, which in today's GUI world will highly likely be something like Netscape Messenger or Microsoft's Outlook Express. There are a lot of proven, well-designed character-based Usenet news readers, but a proper review of the user agent software is outside the scope of this HOWTO, so we will just assume that you are using whatever software you like. The reader then selects a Usenet newsgroup from the hundreds or thousands of newsgroups which are hosted by her local server, and accesses all unread articles. These articles are displayed to her. She can then decide to respond to some of them.

When the reader writes an article, either in response to an existing one or as a start of a brand-new thread of discussion, her software posts this article to the Usenet server. The article contains a list of newsgroups into which it is to be posted. Once it is accepted by the server, it becomes available for other users to read and respond to. The article is automatically expired or deleted by the server from its internal archives based on expiry policies set in its software; the author of the article usually can do little or nothing to control the expiry of her articles.

A Usenet server rarely works on its own. It forms a part of a collection of servers, which automatically exchange articles with each other. The flow of articles from one server to another is called a newsfeed. In a simplistic case, one can imagine a worldwide network of servers, all configured to replicate articles with each other, busily passing along copies across the network as soon as one of them receives a new articles posted by a human reader. This replication is done by powerful and fault-tolerant processes, and gives the Usenet network its power. Your local Usenet server literally has a copy of all current articles in all relevant newsgroups.

1.3. About sizes, volumes, and so on

Any would-be Usenet server administrator or creator must read the "Periodic Posting about the basic steps involved in configuring a machine to store Usenet news," also known as the Site Setup FAQ, available from ftp://rtfm.mit.edu/pub/usenet/news.answers/usenet/site-setup or ftp://ftp.uu.net/usenet/news.answers/news/site-setup.Z. It was last updated in 1997, but trends haven't changed much since then, though absolute volume figures have.

If you want your Usenet server to be a repository for all articles in all newsgroups, you will probably not be reading this HOWTO, or even if you do, you will rapidly realise that anyone who needs to read this HOWTO may not be ready to set up such a server. This is because the volumes of articles on the Usenet have reached a point where very specialised networks, very high end servers, and large disk arrays are required for handling such Usenet volumes. Those setups are called ``carrier-class'' Usenet servers, and will be discussed a bit later on in this HOWTO. Administering such an array of hardware may not be the job of the new Usenet administrator, for which this HOWTO (and most Linux HOWTO's) are written.

Nevertheless, it may be interesting to understand what volumes we are talking about. Usenet news article volumes have been doubling every fourteen months or so, going by what we hear in comments from carrier class Usenet administrators. In the beginning of 1997, this volume was 1.2 GBytes of articles a day. Thus, the volumes should have roughly done five doublings, or grown 32 times, by the time we reach mid-2002, at the time of this writing. This gives us a volume of 38.4 GBytes per day. Assume that this transfer happens using uncompressed NNTP (the norm), and add 50% extra for the overheads of NNTP, TCP, and IP. This gives you a raw data transfer volume of 57.6 GBytes/day or about 460 Gbits/day. If you have to transfer such volumes of data in 24 hours (86400 seconds), you'll need raw bandwidth of about 5.3 Mbits per second just to receive all these articles. You'll need more bandwidth to send out feeds to other neighbouring Usenet servers, and then you'll need bandwidth to allow your readers to access your servers and read and post articles in retail quantities. Clearly, these volume figures are outside the network bandwidths of most corporate organisations or educational institutions, and therefore only those who are in the business of offering Usenet news can afford it.

At the other end of the scale, it is perfectly feasible for a small office to subscribe to a well-trimmed subset of Usenet newsgroups, and exclude most of the high-volume newsgroups. Starcom Software, where the authors of this HOWTO work, has worked with a fairly large subset of 600 newsgroups, which is still a tiny fraction of the 15,000+ newsgroups that the carrier class services offer. Your office or college may not even need 600 groups. And our company had excluded specific high-volume but low-usefulness newsgroups like the talk, comp.binaries, and alt hierarchies. With the pruned subset, the total volume of articles per day may amount to barely a hundred MBytes a day or so, and can be easily handled by most small offices and educational institutions. And in such situations, a single Intel Linux server can deliver excellent performance as a Usenet server.

Then there's the internal Usenet service. By internal here, we mean a private set of Usenet newsgroups, not a private computer network. Every company or university which runs a Usenet news service creates its own hierarchy of internal newsgroups, whose articles never leave the campus or office, and which therefore do not consume Internet bandwidth. These newsgroups are often the ones most hotly accessed, and will carry more internally generated traffic than all the ``public'' newsgroups you may subscribe to, within your organisation. After all, how often does a guy have something to say which is relevant to the world at large, unless he's discussing a globally relevant topic like ``Unix rules!''? If such internal newsgroups are the focus of your Usenet servers, then you may find that fairly modest hardware and Internet bandwidth will suffice, depending on the size of your organisation.

The new Usenet server administrator has to undertake a sizing exercise to ensure that he does not bite off more than he, or his network resources, can chew. We hope we have provided sufficient information for him to get started with the right questions.