A.1 What is XML?

XML is the Extensible Markup Language. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification.

It is called extensible because it is not a fixed format like HTML (a single, predefined markup language). Instead, XML is actually a `metalanguage' —a language for describing other languages—which lets you design your own customized markup languages for limitless different types of documents. XML can do this because it's written in SGML, the international standard metalanguage for text markup systems (ISO 8879).

A.2 What is XML for?

XML is intended `to make it easy and straightforward to use SGML on the Web: easy to define document types, easy to author and manage SGML-defined documents, and easy to transmit and share them across the Web.'

It defines `an extremely simple dialect of SGML which is completely described in the XML Specification. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.'

`For this reason, XML has been designed for ease of implementation, and for interoperability with both SGML and HTML'

A.3 What is SGML?

SGML is the Standard Generalized Markup Language (ISO 8879:1985), the international standard for defining descriptions of the structure of different types of electronic document. There is an SGML FAQ at http://lamp.man.deakin.edu.au/sgml/sgmlfaq.txt which is posted every month to the comp.text.sgml newsgroup, and the SGML Web pages are at http://xml.coverpages.org/.

A.4 What is HTML?

HTML is the HyperText Markup Language (RFC 1866), a small application of SGML used on the Web.

It defines a very simple class of report-style documents, with section headings, paragraphs, lists, tables, and illustrations, with a few informational and presentational items, and some hypertext and multimedia. See the question on extending HTML. There is also an XML version of HTML.

A.5 Aren't XML, SGML, and HTML all the same thing?

Not quite; SGML is the mother tongue, and has been used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients' clinical records to musical notation. SGML is very large and complex, however, and probably overkill for most common applications.

XML is an abbreviated version of SGML, to make it easier for you to define your own document types, and to make it easier for programmers to write programs to handle them. It omits all the options, and most of the more complex and less-used parts of SGML in return for the benefits of being easier to write applications for, easier to understand, and more suited to delivery and interoperability over the Web. But it is still SGML, and XML files may still be processed in the same way as any other SGML file (see the question on XML software).

A.6 Who is responsible for XML?

XML is a project of the World Wide Web Consortium (W3C), and the development of the specification is being supervised by their XML Working Group. A Special Interest Group of co-opted contributors and experts from various fields contributed comments and reviews by email.

XML is a public format: it is not a proprietary development of any company. The v1.0 specification was accepted by the W3C as Recommendation on Feb 10, 1998.

A.7 Why is XML such an important development?

It removes two constraints which were holding back Web developments:

  1. dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for;
  2. the complexity of full SGML, whose syntax allows many powerful but hard-to-program options.

XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program for.

A.8 Why not just carry on extending HTML?

HTML is already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information.

XML allows groups of people or organizations to create their own customized markup applications for exchanging information in their domain (music, chemistry, electronics, hill-walking, finance, surfing, petroleum geology, linguistics, cooking, knitting, stellar cartography, history, engineering, rabbit-keeping, mathematics, genealogy, etc).

HTML is at the limit of its usefulness as a way of describing information, and while it will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure.