Almost every computer user needs a method of preparing documents. In the world of personal computers, word processing is the norm: editing and manipulating text in a ``What-You-See-Is-What-You-Get'' (WYSIWYG) environment and producing printed copies of the text, complete with graphics, tables, and ornamentation.
Commercial word processors from Corel, Applix, and Star Division are available in the UNIX world, but text processing, which is quite different conceptually, is more common. In text processing systems, text is entered in a page-description language, which describes how the text should be formatted. Rather than enter text within a special word processing environment, you can modify text with any editor, like vi or emacs. Once you finish entering the source text (in the typesetting language), a separate program converts the source to a format suitable for printing. This is somewhat analogous to programming in a language like C, and ``compiling'' the document into printable form.
Many text processing systems are available for Linux. One is groff, the GNU version of the classic troff text formatter originally developed by Bell Labs and still used on many UNIX systems worldwide. Another modern text processing system is TeX, developed by Donald Knuth of computer science fame. Dialects of TeX, like LaTeX, are also available.
Text processors like TeX and groff differ mostly in the syntax of their formatting languages. The choice of one formatting system over another is based upon what utilities are available to satisfy your needs, as well as personal taste.
Many people consider groff's formatting language to be a bit obscure and use find TeX more readable. However, groff produces ASCII output which can be viewed on a terminal more easily, while TeX is intended primarily for output to a printing device. Various add-on programs are required to produce ASCII output from TeX formatted documents, or convert TeX input to groff format.
Another program is texinfo, an extension to TeX which is used for software documentation developed by the Free Software Foundation. texinfo can produce printed output, or an online-browsable hypertext ``Info'' document from a single source file. Info files are the main format of documentation used in GNU software like emacs.
Text processors are used widely in the computing community for producing papers, theses, magazine articles, and books. (This book is produced using LaTeX.) The ability to process source language as a text file opens the door to many extensions of the text processor itself. Because a source document is not stored in an obscure format that only one word processor can read, programmers can write parsers and translators for the formatting language, and thus extend the system.
What does a formatting language look like? In general, a formatted source file consists mostly of the text itself, with control codes to produce effects like font and margin changes, and list formatting.
Consider the following text:
Mr. Torvalds:
We are very upset with your current plans to implement post-hypnotic suggestions in the Linux terminal driver code. We feel this way for three reasons:
We hope you will reconsider.
This text might appear in the LaTeX formatting language as the following:
\begin{quote}
Mr. Torvalds:
We are very upset with your current plans to implement
{\em post-hypnotic suggestions\/} in the {\bf Linux} terminal
driver code. We feel this way for three reasons:
\begin{enumerate}
\item Planting subliminal messages in the kernel driver is not only
immoral, it is a waste of time;
\item It has been proven that ``post-hypnotic suggestions''
are ineffective when used upon unsuspecting UNIX hackers;
\item We have already implemented high-voltage electric shocks, as
a security measure, in the code for {\tt login}.
\end{enumerate}
We hope you will reconsider.
\end{quote}
The author enters the text using any text editor and generates formatted output by processing the source with LaTeX. At first glance, the typesetting language may appear to be obscure, but it's actually quite easy to understand. Using a text processing system enforces typographical standards when writing. All the enumerated lists within a document will look the same, unless the author modifies the definition of an enumerated list. The goal is to allow the author to concentrate on the text, not typesetting conventions.
When writing with a text editor, one generally does not think about how the printed text will appear. The writer learns to visualize the finished text's appearance from the formatting commands in the source.
WYSIWYG word processors are attractive for many reasons. They provide an easy-to-use visual interface for editing documents. But this interface is limited to aspects of text layout which are accessible to the user. For example, many word processors still provide a special format language for producing complicated expressions like mathematical formulae. This is text processing, albeit on a much smaller scale.
A not-so-subtle benefit of text processing is that you specify exactly which format you need. In many cases, the text processing system requires a format specification. Text processing systems also allow source text to be edited with any text editor, instead of relying on format codes which are hidden beneath a word processor's opaque user interface. Further, the source text is easily converted to other formats. The tradeoff for this flexibility and power is the lack of WYSIWYG formatting.
Some programs let you preview the formatted document on a graphics display device before printing. The xdvi program displays a ``device independent'' file generated by the TeX system under X. Applications like xfig and gimp provide WYSIWYG graphics interfaces for drawing figures and diagrams, which are subsequently converted to text processing language for inclusion in your document.
Text processors like troff were around long before WYSIWYG word processing was available. Many people still prefer their versatility and independence from a graphics environment.
Many text-processing-related utilities are available. The powerful METAFONT system, which is used to design fonts for TeX, is included in the Linux port of TeX. Other programs include ispell, an interactive spelling checker and corrector; makeindex, which generates indices in LaTeX documents; and many other groff and TeXbased macro packages which format many types of technical and mathematical texts. Conversion programs that translate between TeX or groff source to a myriad of other formats are also available.
A newcomer to text formatting is YODL, written by Karel Kubat. YODL is an easy-to-learn language with filters to produce various output formats, like LaTeX, SGML, and HTML.