The aim of this article is to introduce new Linux users to an invaluable resource, Larry Wall's patch program. Patch is an interface to the GNU diff utility, which is used to find differences between files; diff has a multitude of options, but it's most often used to generate a file which lists lines which have been changed, showing both the original and changed lines and ignoring lines which have remained the same. Patch is typically used to update a directory of source code files to a newer version, obviating the need to download an entire new source archive. Downloading a patch in effect is just downloading the lines which have been changed.
Patch originated in the nascent, bandwidth-constrained internet environment of a decade ago, but like many Unix tools of that era it is still much-used today. In the February issue of the programmer's magazine Dr. Dobb's Journal Larry Wall had some interesting comments on the early days of patch:
DDJ: By the way, what came first, patch or diff? LW: diff, by a long ways. patch is one of those things that, in retrospect, I was totally amazed that nobody had thought of it sooner, because I think that diff predated patch by at least ten years or so. I think I know why, though. And it's one of these little psychological things. When they made diff, they added an option called -e, I think it was, and that would spit out an ed script, so people said to themselves, "Well, if I wanted to automate the applying of a diff, I would use that." So it never actually occurred to someone that you could write a computer program to take the other forms of output and apply them. Either that, or it did not occur to them that there was some benefit to using the context diff form, because you could apply it to something that had been changed and still easily get it to do the right thing. It's one of those things that's obvious in retrospect. But to be perfectly honest, it wasn't really a brilliant flash of inspiration so much as self defense. I put out the first version of rn, and then I started putting out patches for it, and it was a total mess. You could not get people to apply patches because they had to apply them by hand. So, they would skip the ones that they didn't think they needed, and they'd apply the new ones over that, and they'd get totally messed up. I wrote patch so that they wouldn't have this excuse that it was too hard. I don't know whether it's still the case, but for many years, I told people that I thought patch had changed the culture of computing more than either rn or Perl had. Now that the Internet is getting a lot faster than it used to be, and it's getting much easier to distribute whole distributions, patches tend to be sent around only among developers. I haven't sent out a patch kit for Perl in years. I think patch has became less important for the whole thing, but still continues to be a way for developers to interchange ideas. But for a while in there, patch really did make a big difference to how software was developed.
Larry Wall's assessment of the diminishing importance of patch to the computing community as a whole is probably accurate, but in the free software world it's still an essential tool. The ubiquity of patch makes it possible for new users and non-programmers to easily participate in alpha- and beta-testing of software, thus benefiting the entire community.
It occurred to me to write this article after noticing a thread which periodically resurfaces in the linux-kernel mailing list. About every three months someone will post a plea for a split Linux kernel source distribution, so that someone just interested in, say, the i386 code and the IDE disk driver wouldn't have to download the Alpha, Sparc, etc. files and the many SCSI drivers for each new kernel release. A series of patient (and some not-so-patient) replies will follow, most urging the original poster to use patches to upgrade the kernel source. Linus Torvalds will then once again state that he has no interest in undertaking the laborious task of splitting the kernel source into chunks, but that if anyone else wants to, they should feel free to do so as an independent project. So far no-one has volunteered. I can't blame the kernel-hackers for not wanting to further complicate their lives; I imagine it would be much more interesting and challenging to work directly with the kernel than to overhaul the entire kernel distribution scheme! Downloading an eleven megabyte kernel source archive is time-consuming (and, for those folks paying by the minute for net access, expensive as well) but the kernel patches can be as small as a few dozen kilobytes, and are hardly ever larger than one megabyte. The 2.1.119 development kernel source on my hard disk has been incrementally patched up from version 2.1.99, and I doubt if I'd follow the development as closely if I had to download each release in its entirety.
Patch comes with a good manual-page which lists its numerous options, but 99% of the time just two of them will suffice:
The -p1 option strips the left-most directory level from the
filenames in the patch-file, as the top-level directory is likely to vary on
different machines. To use this option, place your patch within the directory
being patched, and then run patch -p1 < [patchfile] from within that
directory. A short excerpt from a Linux kernel patch will illustrate
diff -u --recursive --new-file v2.1.118/linux/mm/swapfile.c linux/mm/swapfile.c --- v2.1.118/linux/mm/swapfile.c Wed Aug 26 11:37:45 1998 +++ linux/mm/swapfile.c Wed Aug 26 16:01:57 1998 @@ -489,7 +489,7 @@ int swap_header_version; int lock_map_size = PAGE_SIZE; int nr_good_pages = 0; - char tmp_lock_map = 0; + unsigned long tmp_lock_map = 0;
Applying the patch from which this segment was copied with the -p1 switch effectively truncates the path which patch will seek; patch will look for a subdirectory of the current directory named /mm, and should then find the swapfile.c file there, waiting to be patched. In this excerpt, the line preceded by a dash will be replaced with the line preceded by a plus sign. A typical patch will contain updates for many files, each section consisting of the output of diff -u run on two versions of a file.
Patch displays its output to the screen as it works, but this output usually scrolls by too quickly to read. The original, pre-patch files are renamed *.orig, while the new patched files will bear the original filenames.
One possible source of problems using patch is differences between various versions, all of which are available on the net. Larry Wall hasn't done much to improve patch in recent years, possibly because his last release of the utility works well in the majority of situations. FSF programmers from the GNU project have been releasing new versions of patch for the past several years. Their first revisions of patch had a few problems, but I've been using version 2.5 (which is the version distributed with Debian 2.0) lately with no problems. Version 2.1 has worked well for me in the past. The source for the current GNU version of patch is available from the GNU FTP site, though most people will just use the version supplied with their distribution of Linux.
Let's say you have patched a directory of source files, and the patch didn't apply cleanly . This happens occasionally, and when it does patch will show an error message indicating which file confused it, along with the line numbers. Sometimes the error will be obvious, such as an omitted semicolon, and can be fixed without too much trouble. Another possibility is to delete from the patch the section which is causing trouble, but this may or may not work, depending on the file involved.
Another common error scenario: suppose you have un-tarred a kernel source archive, and while exploring the various subdirectories under /linux/arch/ you notice the various machine architecture subdirectories, such as alpha, sparc, etc. If you, like most Linux users, are running a machine with an Intel processor (or one of the Intel clones), you might decide to delete these directories, which are not needed for compiling your particular kernel and which occupy needed disk space. Some time later a new kernel patch is released and while attempting to apply it patch stalls when it is unable to find the Alpha or PPC files it would like to patch. Luckily patch allows user intervention at this point, asking the question "Skip this patch?" Tell it "y", and patch will proceed along its merry way. You will probably have to answer the question numerous times, which is a good argument for allowing the un-needed directories to remain on your disk.
Many Linux users use patch mainly for patching the kernel source, so a few tips are in order. Probably the easiest method is to use the shell-script patch-kernel, which can be found in the /scripts subdirectory of the kernel source-tree. This handy and well-written script was written by Nick Holloway in 1995; a couple of years later Adam Sulmicki added support for several compression algorithms, including *.bz, *.bz2, compress, gzip, and plain-text (i.e., a patch which has already been uncompressed). The script assumes that your kernel source is in /usr/src/linux,, with your new patch located in the current directory. Both of these defaults can be overridden by command-line switches in this format: patch-kernel [ sourcedir [ patchdir ] ]. Patch-kernel will abort if any part of the patch fails, but if the patch applies cleanly it will invoke find, which will delete all of the *.orig files which patch leaves behind.
If you prefer to see the output of commands, or perhaps you would rather keep the *.orig files until you are certain the patched source compiles, running patch directly (with the patch located in the kernel source top-level directory, as outlined above) has been very reliable in my experience. In order to avoid uncompressing the patch before applying it a simple pipe will do the trick:
gzip -cd patchXX.gz | patch -p1
bzip2 -dc patchXX.bz2 | patch -p1
After the patch has been applied the find utility can be used to check for rejected files:
find . -name \*.rej
At first the syntax of this command is confusing. The period indicates that find should look in the current directory and recursively in all subdirectories beneath it. Remember the period should have a space both before and after it. The backslash before the wildcard "*" "escapes" the asterisk in order to avoid confusing the shell, for which an asterisk has another meaning. If find locates any *.rej files it will print the filenames on the screen. If find exits without any visible output it's nearly certain the patch applied correctly.
Another job for find is to remove the *.orig files:
find . -name \*.orig -print0 | xargs -0r rm -f
This command is sufficiently cumbersome to type that it would be a good candidate for a new shell alias. A line in your ~/.bashrc file such as:
alias findorig 'find . -name \*.orig -print0 | xargs -0r rm -f'
will allow just typing findorig to invoke the above command. The single quotes in the alias definition are necessary if an aliased command contains spaces. In order to use a new alias without logging out and then back in again, just type source ~/.bashrc at the prompt.
While putting this article together I upgraded the version of patch on my machine from version 2.1 to version 2.5. Both of these versions come from the current FSF/GNU maintainers. Immediately I noticed that the default output of version 2.5 has been changed, with less information appearing on the screen. Gone is Larry Wall's "...hmm" which used to appear while patch was attempting to determine the proper lines to patch. The output of version 2.5 is simply a list of messages such as "patching file [filename]", rather than the more copious information shown by earlier versions. Admittedly, the information scrolled by too quickly to read, but the output could be redirected to a file for later perusal. This change doesn't affect the functionality of the program, but does lessen the human element. It seems to me that touches such as the old "...hmm" messages, as well as comments in source code, are valuable in that they remind the user that a program is the result of work performed by a living, breathing human being, rather than a sterile collection of bits. The old behavior can be restored by appending the switch --verbose to the patch command-line, but I'm sure that many users either won't be aware of the option or won't bother to type it in. Another difference between 2.1 and 2.5 is that the *.orig back-up files aren't created unless patch is given the -b option.
Patch is not strictly necessary for an end-user who isn't interested in trying out and providing bug-reports for "bleeding-edge" software and kernels, but often the most interesting developments in the Linux world belong in this category. It isn't difficult to get the hang of using patch, and the effort will be amply repaid.