Pinky Gone Driving Little tidbits not meant for human consumption

Reply to comment

ReStructuredText meets ODT: almost heaven

Wed, 2009-04-15 13:20

I have long used reStructuredText (reST) for writing simple documents that I might later want to turn into something formatted: Committee minutes, project TODO files, memos, etc. I also use reST when contributing documentation to projects such as Bazaar, NumPy and SciPy, since it is a commonly used format for Python-related documentation. I've recently come to see it as an ideal format for writing longer documents that will eventually be formatted, but where I don't want to worry about formatting initially.

What is it?

reST is a system of markup for text, similar to HTML or LaTeX. Markup is additional information added to text to indicate further information about the meaning of the text. For example, in the sentence "You really should check your work." that might appear in the instructions of an exam, the word really should be emphasized. To indicate this emphasis, you might italicize the word in your word processor. Using markup, you would put additional indications around the word that suggest emphasis: <em>really</em> (HTML), \emph{really} (LaTeX) or *really* (reST).

As you can see from the example, reST tries to be a lightweight markup language that doesn't detract too much from the original text representation. In the example, emphasis in plain text media such as email is *already* indicated with asterisks, so most readers know from the textual representation what it means.

Why Markup?

One of the distinguishing features of using markup to format a document is that it separates meaning or semantics from presentation. The markup formats above indicate a meaning for the word "really". They don't specify a presentation for the word, as you might in your word processor with italics. Markup is useful when the indicated semantics can then be translated into presentation formats that consumers understand. For HTML, that means rendering the page in a web browser. For LaTeX, it means typesetting the document into a PDF file.

This separation between semantics and presentation is a good thing for myself as an author because it means I am not distracted by the formatting of my writing during the process of writing. I just today heard someone say "I never get anything done when I'm writing a test because of all the time I spend formatting". When I heard this, I thought how much I enjoyed writing my exams using LaTeX's exam class which takes care of equations, problem numbering and points per problem without me doing any formatting. This feature of markup was also incredibly useful when writing a PhD thesis in LaTeX, where I could focus on finishing the content without having to tweak the formatting to insane ruler-nazi specifications at every stage. When I was finished with the LaTeX file, I tweaked the setup *once* so that the formatting (all of it, consistently) would match the specifications and then it was done. Those who have thesis-ified in Microsoft Word know the pain of doing it the other way.

Why reST?

For reST, the process of translating semantics into presentation is done by a collection of programs called rst2something.py. On my Mac, with the latest version of Python's docutils I have the following:

  • rst2html.py
  • rst2odt.py
  • rst2s5.py
  • rst2latex.py
  • rst2xml.py
  • rst2newlatex.py
  • rst2pseudoxml.py

rst2latex.py produces very nice PDF output when typeset, if not very good LaTeX code. rst2newlatex.py produces better LaTeX code, suitable for further editing or inclusion into another LaTeX document. rst2s5.py is particularly interesting in that it produces Powerpoint-style presentations that work in any standards-compliant browser directly from plain-text reST files. At times, I've used many of these to distribute the type of notes I mentioned before.

reST meets ODT

The newest member of these, is rst2odt.py which translates reST files into the Open Document format pioneered by OpenOffice.or. The magic of this is that OpenOffice.org can export ODT documents in Microsoft Word's DOC format. Now, it's possible for me to edit a document in plain text, using a powerful, keyboard-centered text editor (Vim) and then send the document on to my collaborators in their chosen format, Microsoft's DOC. The markup of reST is lightweight enough that integrating further changes made by my co-authors is easy with a bit of copying and pasting from Word into Vim.

So there you have it, a lightweight markup system (with all of those inherent benefits) that I can now share with the (benighted) rest of the world as Microsoft Word files. Thanks reST.

Reply

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options