Chapter 24
SGML Presentations
CONTENTS
Ever since computer technology has been available in multiple
platforms, one of the problems that has plagued users has been
the difficulty, or downright impossibility, of transferring documents
from one computer system or software program to another.
The beginning of the cure for this plague came about in 1986 when
the Standard Generalized Markup Language (SGML) was adopted by
the International Organization for Standardization (ISO) as a
standard for the exchange of data worldwide. Since that time,
use of SGML has increased rapidly, particularly among the defense,
aerospace, automotive, electronics, and telecommunications industries.
SGML is a standard language for "marking" or coding
electronic documents and files that allows users to access information
regardless of the system or platform they are using. SGML works
by treating the content, format, and structure of a document as
three distinct elements, as shown in Figures 24.1 and 24.2.
Figure 24.1: SoftQuad's Panorama, showing an SGML document.
Figure 24.2: The SGML source code for the same document.
Note the similarities to HTML code.
Content is the actual information, such as text and images, in
the document. The format determines how the words and images appear
on the screen or paper-for example, font, point size, italics,
and bold. The structure of a document indicates the relationships
among the various pieces of content, such as paragraphs, headings,
subheadings, or lists. SGML is designed to preserve the structure
and content of the document.
For example, in this book you know that you are now reading Chapter
24. From the Table of Contents and from the section heading you
also know that Chapter 24 is in Part III. In the same way that
the editors have organized this book into logical sections and
subsections of information, SGML organizes and divides electronic
documents into a recognizable and retrievable structure.
Because it is an international standard, all computer platforms
can become capable of interpreting this code regardless of the
document's source. The universality of SGML allows for the efficient
and accurate transfer of all content and structural information
from one computer system to another, while still allowing individual
users to modify the format of a document to suit their needs and
requirements. No matter how technology changes in the future,
the SGML will make documents durable and exchangeable.
The capability to preserve the structural integrity of a document
is what makes SGML so revolutionary. For SGML to preserve the
document structure, the document must contain discrete markers
that identify the structural elements. These markers, called tags,
are located at the beginning and end of each structural element.
For example, suppose you have a paragraph such as the following:
This is the first paragraph of my document.
You can tell any computer with SGML capabilities to preserve this
information as a paragraph element by marking it like this:
<par>
This is the first paragraph of my document.
</par>
It's no coincidence that this looks a lot like HyperText Markup
Language (HTML), the language used to create documents for the
World Wide Web. HTML is simply an application of SGML. Thanks
to the original SGML research, Web browsers from different formats
can understand HTML files universally.
The specifics of the structural designations of HTML, and all
other applications of SGML, are located in its Document Type Definition,
or DTD.
If you regularly use documents that have an exact uniform structure,
such as Web documents, time sheets, or product specification forms,
you would want to use a template to ensure that these documents
remained identical in structure as they pass from one computer
system to another and as they are updated and re-created.
The template or framework for the various elements in an SGML
application is the DTD. The DTD not only preserves the structure
of a specific type of electronic file, but also enforces the rules
of that structure. For example, if you had to submit a specific
report, the DTD for that type of report could specify that the
report must contain sections A, B, and C, and that each of these
sections must contain at least one paragraph. In this way, the
DTD helps ensure that documents have a uniform, logical structure.
A document whose content has been tagged to conform to a particular
DTD is called a document instance.
The best SGML software programs allow you to tag information by
clicking on pull-down menus. When working within the confines
of a particular DTD, the pull-down menus will list only those
tags that are valid at the cursor's current position in the document.
Therefore, you cannot diverge from the structure of the DTD even
if you wanted to.
Different industries and companies obviously require different
types of DTDs to facilitate and manage their information. Many
SGML systems offer a set of preprogrammed DTDs that comes with
the software. Relatively few people will want to write their own
DTDs, because this process can be difficult. However, a high-quality
SGML product allows users to create a variety of document types.
This will give you the ability to create user-defined DTDs.
The primary benefit of SGML is that it dramatically increases
the ease with which people can access the information that your
company creates. However, other benefits can be found in improved
information collection, compilation, and dissemination, as well
as in increased cost efficiency.
Using SGML for document creation allows universal access to needed
information efficiently and accurately. At a time when different
computers, operating systems, and applications abound, only a
language that is hardware and software independent, like SGML,
will allow all your users to exchange documents with ease. This
way, if your art department creates a file on a Mac, your CEO,
who uses a PC, can view and edit it before it goes to press.
Structured guidelines for the creation of new documents will increase
productivity by eliminating the time spent formatting new documents.
It improves data integrity by reducing the need to filter data
from one format to another and lengthens the period of time that
stored information can be used by ensuring that the data will
be retrievable regardless of future changes in hardware or software.
Remember all those files you have on 5 1/4-inch floppies that
are in some ancient DOS word processing format nobody uses anymore?
With PDF, your files will have a much longer shelf life.
With electronic publishing sweeping the world, SGML enables you
to translate information that was prepared for traditional printing
methods into a wide variety of formats suitable for publishing
on everything from CD-ROM to the World Wide Web. SGML also can
improve information dissemination by allowing users to share whole
documents or sections of documents without the need for wasteful
hard copy reproduction and duplication.
All the previously mentioned benefits of SGML can translate into
direct and indirect cost savings by providing greater information
accessibility, improved data integrity, increased life span of
archival information, and a reduced need for printed products.
Think about all the money your company spends on printing human
resources documents, corporate directories, and internal memos
every year. These costs can be all but eliminated by creating
and duplicating your documents in electronic format using PDF.
And any changes mean a few keystrokes, not thousands of dollars
in printing and distribution costs.
When using SGML, you might find the enforced structure of the
DTD somewhat limiting. If you have a small number of DTDs available
to you and do not have the ability to write new DTDs, working
in SGML can be frustrating.
The universality of SGML is great, as long as your systems know
the code. SGML translators are not native to most computer systems,
so the ability to create and read documents with SGML tags requires
an investment in systems and software.
Although SGML is an English-based language, it is not always intuitive
and can be as complex as the documents you are trying to tag.
Thorough knowledge of SGML does not come easily. For industries
that are document intensive, the use of SGML must be considered
part of an entire information management strategy. Although standardizing
on SGML may require significant time and investment, the benefits
(as explained in the preceding section) can make the transition
worthwhile.
When considering whether to standardize some of your company's
information using SGML, several factors should be considered.
The following questions can help you define your current information
management needs:
- Do you need to exchange documents across
different computer environments?
- Do you produce documents that follow a
standard, uniform structure?
- Do many different departments or divisions
use the same types of information?
- Does your information have a long life
span? (That is, will it be useful for many years?)
- Does your information change frequently,
and is it used often?
- Is keeping your information up-to-date
expensive and time-consuming?
- Could a significant value be gained by
distributing updated information electronically?
- Do you produce information that must comply
with specific industry guidelines?
How you answer these questions will help you determine how SGML
fits into your information management strategy. Remember that
not all your documents will need to be standardized on SGML-only
those that have a definable structure and need to last.
Over the years, the expansion and application of the SGML system
has affected many areas of the digital information explosion.
SGML and its execution are often complex and cryptic processes
requiring well-trained and informed professionals who can best
utilize and tailor SGML to specific needs. This chapter is only
an introduction to SGML that should help you understand its uses
and inspire a more in-depth investigation of this technology.
|