6. Good development practice

Most of these are concerned with ensuring portability across all POSIX and POSIX-like systems. Being widely portable is not just a worthy form of professionalism and hackerly politeness, it's valuable insurance against future changes in Linux.

Finally, other people will try to build your code on non-Linux systems; portability reduces the amount of support email you receive.

6.1. Choose the most portable language you can

Choose a development language which minimizes the differences of the underlying environments in which it will run. C/C++ programs are likely to be more portable than Fortran. Java is preferable to C/C++, and a high-level scripting language such as Perl, Python or Ruby is the best choice of all (since scripting languages have only one cross-platform implementation).

Scripting languages that qualify include Python, Perl, Tcl, Emacs Lisp, and PHP. Historic Bourne shell (/bin/sh) does not qualify; there are too many different implementations with subtle idiosyncrasies, and the shell environment is subject to disruption by user customizations such as shell aliases.

If you choose a compiled language such as C/C++, do not use compiler extensions (such as allocating variable-length arrays on the stack, or indirecting through void pointers). Regularly build using as many different compilers and test machines as you can.

6.2. Don't rely on proprietary code

Don't rely on proprietary languages, libraries, or other code. In the open-source community this is considered rude. Open-source developers don't trust code for which they can't review the source.

6.3. Build systems

A significant advantage of open source distributions is they allow each source package to adapt at compile-time to the environment it finds. One of the choices you need to make is of a "build system", the toolkit you (and other people) will rely on to transform your source into executables.

One thing your build script cannot do is ask the user for system information at compile-time. The user installing the package will not necessarily know the answers to your questions, so this approach is doomed from the start. Your software, calling the build system tools, must be able to determine for itself any information that it may need at compile- or install-time.

Community notions of best practice in build systems are now (rearly 2010) in a starte of some flux.

6.3.1. GNU autotools: fading but still standard

Previous versions of this HOWTO urged using GNU autotools to handle portability issues, do system-configuration probes, and tailor your makefiles. This is still the standard and most popular approach, but it is becoming increasingly problematic because GNU autotools is showing its age. Autotools was always a pile of crocks upon hacks upon kluges, implemented in a messy mixture of languages and with some serious design flaws. The resulting mess was tolerable for many years, but as projects become more complex it is increasingly failing.

Still, people building from C sources expect to be able to type "configure; make; make install" and get a clean build. Supposing you choose a non-autotools system, you will probably want to emulate this behavior (which should be easy).

There is a good tutorial on autotools here.

6.3.2. SCons: leading a crowded field

The race to replace autotools does not yet have a winner, but it has a front runner: SCons. SCons abolishes makefiles; it combines the "configure" and "make" parts of the autotools build sequence into one step. It offers cross-platfoem builds with a single recipe on Unix/Linux, Maoc OS X, and Windows. It is written in Python, is extensible in Python, and is to some extent riding the increasing popularity of that language.

6.3.3. CMake and others

SCons is still a minority choice, and has stiff competition from several others, of which CMake and WAF are probably the most prominant. Fairly even-handed cross-comparisons, considering the source, can be found on the SCons wiki.

6.4. Test your code before release

A good test suite allows the team to buy inexpensive hardware for testing and then easily run regression tests before releases. Create a strong, usable test framework so that you can incrementally add tests to your software without having to train developers up in the intricacies of the test suite.

Distributing the test suite allows the community of users to test their ports before contributing them back to the group.

Encourage your developers to use a wide variety of platforms as their desktop and test machines so that code is continuously being tested for portability flaws as part of normal development.

6.5. Sanity-check your code before release

If you're writing C/C++ using GCC, test-compile with -Wall and clean up all warning messages before each release. Compile your code with every compiler you can find — different compilers often find different problems. Specifically, compile your software on true 64-bit machine. Underlying data types can change on 64-bit machines, and you will often find new problems there. Find a UNIX vendor's system and run the lint utility over your software.

Run tools that for memory leaks and other run-time errors; Electric Fence and Valgrind are two good ones available in open source.

For Python projects, the PyChecker program can be a useful check. It's not out of beta yet, but nevertheless often catches nontrivial errors.

If you're writing Perl, check your code with perl -c (and maybe -T, if applicable). Use perl -w and 'use strict' religiously. (See the Perl documentation for further discussion.)

6.6. Sanity-check your documentation and READMEs before release

Spell-check your documentation, README files and error messages in your software. Sloppy code, code that produces warning messages when compiled, and spelling errors in README files or error messages, leads users to believe the engineering behind it is also haphazard and sloppy.

6.7. Recommended C/C++ portability practices

If you are writing C, feel free to use the full ANSI features. Specifically, do use function prototypes, which will help you spot cross-module inconsistencies. The old-style K&R compilers are history.

Do not assume compiler-specific features such as the GCC "-pipe" option or nested functions are available. These will come around and bite you the second somebody ports to a non-Linux, non-GCC system.

Code required for portability should be isolated to a single area and a single set of source files (for example, an "os" subdirectory). Compiler, library and operating system interfaces with portability issues should be abstracted to files in this directory. This includes variables such as "errno", library interfaces such as "malloc", and operating system interfaces such as "mmap".

Portability layers make it easier to do new software ports. It is often the case that no member of the development team knows the porting platform (for example, there are literally hundreds of different embedded operating systems, and nobody knows any significant fraction of them). By creating a separate portability layer it is possible for someone who knows the port platform to port your software without having to understand it.

Portability layers simplify applications. Software rarely needs the full functionality of more complex system calls such as mmap or stat, and programmers commonly configure such complex interfaces incorrectly. A portability layer with abstracted interfaces ("__file_exists" instead of a call to stat) allows you to export only the limited, necessary functionality from the system, simplifying the code in your application.

Always write your portability layer to select based on a feature, never based on a platform. Trying to create a separate portability layer for each supported platform results in a multiple update problem maintenance nightmare. A "platform" is always selected on at least two axes: the compiler and the library/operating system release. In some cases there are three axes, as when Linux vendors select a C library independently of the operating system release. With M vendors, N compilers and O operating system releases, the number of "platforms" quickly scales out of reach of any but the largest development teams. By using language and systems standards such as ANSI and POSIX 1003.1, the set of features is relatively constrained.

Portability choices can be made along either lines of code or compiled files. It doesn't make a difference if you select alternate lines of code on a platform, or one of a few different files. A rule of thumb is to move portability code for different platforms into separate files when the implementations diverged significantly (shared memory mapping on UNIX vs. Windows), and leave portability code in a single file when the differences are minimal (using gettimeofday, clock_gettime, ftime or time to find out the current time-of-day).

Avoid using complex types such as "off_t" and "size_t". They vary in size from system to system, especially on 64-bit systems. Limit your usage of "off_t" to the portability layer, and your usage of "size_t" to mean only the length of a string in memory, and nothing else.

Never step on the namespace of any other part of the system, (including file names, error return values and function names). Where the namespace is shared, document the portion of the namespace that you use.

Choose a coding standard. The debate over the choice of standard can go on forever — regardless, it is too difficult and expensive to maintain software built using multiple coding standards, and so some coding standard must be chosen. Enforce your coding standard ruthlessly, as consistency and cleanliness of the code are of the highest priority; the details of the coding standard itself are a distant second.