Background
DITA is an impressive design feat. It defined a way to link many files together in flexible ways, while ensuring type-safety in the process. In other words, it guarantees that a text segment that is "transcluded" into a procedural topic will conform to the structural restrictions defined for that kind of topic. The ability to divide material into reusable chunks, and to guarantee that structural restrictions are honored, is a compelling feature of DITA.
At the same time, DITA adoption suffers from the need for expensive tools. The DITA processing tool (the DITA Open Toolkit) generates passable sample code, but falls short of production-quality output. Meanwhile, the need to include stylistic information renders the transformations more complex. Tools that solve the processing problem are expensive, because of the time required to create proprietary fixes for issues in Open Toolkit processing.
Another area of enormous expense is the need for a Content Management System. A collection of DITA documents is nothing so much as a mass of interconnected links. But when a file name changes, every file that links to it needs to change. And when a file changes locations, relative links within the file need to change, along with all of the files that link to it. And if, heaven forbid, a file is split into two, all of the links that refer to the orginal file need to be inspected, to see whether it is appropriate to link to the first file, the second, or both.
The RuDi project was created to address such problems. (It has been dormant for quite a while. But the problems it was designed to address are still present, and it does represent significant strides towards a solution, hence its continued availability.) To help achieve that goal, it is primarily built using the Ruby programming language.
Ruby's claim to fame is its ability to create domain-specific language--mini-languages that are designed and customized for a specific purpose. Some of the better known examples are:
- Rake
The Ruby-based build tool. It looks like Make, but avoids Make's whitespace weirdness, where spaces and tabs act differently, and where indenting, or failing to so, changes the way the instructions are interpreted. Because it avoids XML tags, it easier to write and read than the ANT build tool, but at the same time, it lets you write full Ruby-language procedures whenever you need to, and because you can define your own dependencies, it is superior to either of its main competitors. (For more, see Rake Rocks.)
This project takes it for granted that Rake will be the build tool of choice. (It need not be, but it is recommended.)
- RSpec
RSpec is the Ruby-language testing tool. It lets you write tests that have the form "expect X. result = (run some function). test succeeds if result is as expected." It's called RSpec, rather than RTest, because the collection of tests reads like a specification. In other words, it's a testing tool that lets you create readable, writeable, and runnable (executable) scripts--scripts that are effectively behavioral specifications that you run as unit tests.
Those tools make it easy to express the solution for the problem you are trying to solve. The ease of expression translates directly into rapid construction of new solutions, and ready comprehension of existing ones. (For more, see Ruby Rocks.)
Project Goals
The overarching goal is create a DITA-processing system that produces professional-quality results, affordably. To do that, it addresses:
- Document Generation
The first goal is to make it easy to transform DITA files into HTML-based output, by separating stylistic design from code processing. Using DreamWeaver templates lets a professional designer work with visual tools. It also separates the design task from code transformations, which makes the transformations simpler. And after the transformations are complete, DreamWeaver will automatically apply template-changes to existing files. So the system achieves both automation and a desirable separation of concerns.
- Link Management
The second goal is automate link processing, to ensure that links remain accurate when files change names or locations--and to do without requiring a mega-expensive content management system. The idea is to automatically generate and run a link-processing script when such changes have been made in a change-management system like Subversion. (To prevent links from being automatically adjusted, the changes can be made outside the system.)
- File System Storage
Using a file system for file storage has significant advantages over a database, primarily in the ability to create and run automated scripts on the collection of files. With the problems of document generation and link management solved, the need for an expensive Content Management system dissipates, leaving the file system as a far less expensive choice.
Note:
A unix-based system is ideal for this purpose. In particular, it allows the constrution of symlinks that act as a local stand-in for a remote file. That capability is needed to share common topics across DITA maps. Apparently the DITA Open Toolkit has a bug that surfaces if you try to use a conref to link in a file that resides outside the root of the map.
- Proofreading
The proofreading task can be made much simpler and easier with a list-based search-and-replace tool. That way, a list of common problems can be inspected, and changes can be made selectively. (Ideally, it will integrate with the authoring tool, so that surrounding text can be modified at the same time.)
- Improved automation for software-documentation systems.
For software documentation, there is tremendous scope for automation that has gone largely untapped. For one thing, it should be possible to run tests on sample programs, to be sure they still work as the product changes. Then, when the sample program is revised, it should be possible to automatically replace any sections of that code that were included in a document (so that the sample will be guaranteed correct) and, at the same time, alert the writer to places where changes have been made, so the surrounding text can be inspected for accuracy.
Another automatable-task surfaces for a tutorial a program that is built up in stages. Such programs are generally be built up in stages, working from a simple starting version and building out to a more complex final version, introducing the reader to new concepts at each stage. It is helpful to maintain a single source copy for such a program, and to generate each version
A third area that is ripe for automation is UI integration. At a minimum, the documentation should be referencing files used in the UI, to ensure that labels are correct. Ideally, those files should also define structural paths, so that instructions like "Click {menu} > {choice} > {tab} > {button}" are guaranteed to be accurate.
Existing Contents
- rXSLT - Ruby-version of XSLT.
In the tradition of Rake and RSpec, rXSLT is a Ruby-based language that looks and feels like a familiar language (XSLT). That makes it familiar to many. And it has XSLT's strength--the ease with which you can set up static transformations, so you don't have to write "procedures" to do simple things. But at the same time, you can write pure Ruby code for condtionals and processing loops, whenever you need to, which is its primary advantage over XSLT, where such things are very difficult to do.
- Ruby-based "fluent XML" module
A module that illustrates the power of Ruby. It lets you make function calls to output HTML, without worrying about closing tags, and without having to output a collection of strings. And you don't need to define functions for every HTML tag. Any undefined method name automatically generates an XML/xHTML tag with that name. Hash-map arguments to the call provide name/value pairs that become attributes for the tag.
- Manpage processing tools
Included in the project mostly because it was a reasonable place to put them.
Future Contents
- Sample DreamWeaver-based templates that allow CSS and styling to be modified by a designer, with changes automatically applied to all files in the site, so styling can be done using a visual tool, without having to code the DITA Open Source Toolkit to do it.
- A set of rXSLT transforms to transform simple HTML generated by the DITA Open Source Toolkit into template-based HTML pages.
- A Rake build script that runs rXSLT scripts on a set of sample files.
- A list-based search-and-replace engine (spec)
- Run interactively, it can be used to step through a list of search-replace pairs, searching or skipping each item, and doing individual replacements, current-file replacements, or global replacements on each.
- Run in batch mode, it is used to fix links when files change their name or location.
(One of the trickier aspects is normalize relative file paths to absolute form, do the conversion, and then replace the original link with the new relative path.)
- Subversion checkin code that creates a list of changes and runs the replace-tool when a file name or location has changed.
- Processing Tool for tutorial programs
The original version of this tool used standard XML "processing" instructions to identify added and removed material for each version of a tutorial program. (Since the file is treated as one long bit of text, the processing instructions are pretty much the only things in the file, other than the code and the standard XML header.) The tool generates the code for each version of the program. It also generates HTML segments that display the changes in each version, using bold for added material and strikethrough for removed material. So running it once generates file segments that can be inserted into the tutorial.