To represent data in a consistent format, it needs to be given a meaningful structure. A well-formed document in XML may not necessarily have a meaningful structure. Anyone can create a well-formed structure, but this structure will be specific only to the XML document in which it is created. It cannot be applied consistently across multiple documents.
By defining the role of each element in a formal model, known as Document Type Definition (DTD), users of XML can check that each component of the document appears in a valid place in the XML document.
Document Type Definition
A DTD defines the structure of the content of an XML document, thereby allowing you to store data in a consistent format. It specifies the elements that can be present in the XML document, attributes of these elements, and their arrangement with relation to each other. It also allows you to specify whether an element is optional or mandatory.
Creating a DTD is similar to creating a table in a database. In DTDs, you specify the structure of data by declaring elements to denote the data. This is similar to creating columns in a table. You can also specify whether providing a value for the element is mandatory or optional. You can then store the data in an XML document that conforms to the DTD for that application. This is similar to adding records in a table.
XML allows you to create your own DTDs for applications. This gives you complete control over the process of checking the content and structure of XML documents created for an application. This checking process is called validation. XML documents that conform to a DTD are considered valid documents.
As a DTD allows you to specify the structure and type of data elements. A DTD can be created to specify the structure of the document.
Declaring Elements in a DTD:
After identifying the elements that can be used for storing structured data, they can be declared in a DTD. The XML document can then be checked against the DTD. In a DTD, Elements are declared using the following syntax:
<!ELEMENT elementname (content-type or content-model/)>
In the given syntax,
elementname specifies the name of the element. Content-type or content-model specifies whether the element contains textual data or other elements.
While declaring elements or attributes, you must consider some naming rules. These rules are discussed below.
Rules for naming Elements and attributes in XML
A name consists of at least one letter: a to z, or A to Z.
An element name may start with an underscore(_)
One or more letters, digits, hyphens, underscores, or full stops can follow the initial letter. Spaces and tabs are not allowed in element names, and the only punctuation signs allowed are the hyphen (-)and the period(.).
An element can be an empty, unrestricted, or a container element. Empty element means element with no data, unrestricted element means element which contains any element and container element means element which contains data or other elements.
Declaring Empty Elements
An empty element can be declared by specifying the content type as EMPTY. Consider the following example:
<!ELEMENT emptyelement EMPTY>
In the given example, an element called emptyelement is declared and the content type is specified as EMPTY. In this case, emptyelement can contain attributes. However, it cannot contain textual content or other elements.
Declaring Unrestricted Elements
An unrestricted element can be declared by specifying the content type as ANY. Consider the following example:
<!ELEMENT anyelement ANY>
In the given example, an element called anyelement is declared and its content type is specified as ANY. In this case, anyelement can contain any type of data including other elements that are declared elsewhere in the DTD.
Declaring container Elements
Using element declaration in a DTD, You can precisely specify which other elements are allowed inside an element, how often they may appear, and in what order. You do this by specifying an element content model.
Consider the following example:
For the given XML document to be valid, you need to create a DTD that contains declaration for three elements: Employee, Id, and name. In addition, you also need to decide whether Id and Name are mandatory or optional, whether they can be in any order or have to be in a specific order, and the number of times they can appear in an XML document. You can write element declarations for these decisions. The element declaration will differ for each of them.
For example, if both Id and Name have to be specified and Id should be followed by Name, the DTD would look as follows:
<!ELEMENT Employee (Id, Name)> <!--Element content -->
<!ELEMENT Id (#PCDATA)> <!--Character content -->
< !ELEMENT AUTHOR (#PCDATA)> <!--Character content -->
In the given code, the Employee element is declared with the content model Id and Name. The Id and Name elements have the content type as PCDATA. PCDATA stands for Parsable Character Data and is used to represent character content. To prevent you from confusing this keyword with a normal element name, the keyword is prefixed by the hash (#) symbol.
In a DTD different symbols are used to specify whether an element is mandatory or optional and whether it can occur more than once. The following list displays the various symbols used while specifying the element content in a DTD.
, => “and” in specific order. Eg: Id, Name
\ => “Or” Eg: Id\Name
?=> Optional Eg: Id?
*=>There can be zero or multiple occurrence of the element. Eg: (Id, Name) *
+=>at least one occurrence of the element, there can be multiple occurrences
Identify the method for declaring attributes.
In addition to declaring elements, attributes too can be declared in a DTD. These declarations are used during the process of validation. The syntax for declaring attributes in a DTD is as follows.
<!ATTLIST elementname attributename valuetype [attributetype] [“default”] >
The attributename valuetype [attributetype] [“default”] section is repeated as often as necessary to create multiple attributes for any given element. Each attribute declaration must include at least the attribute name and value type.
You can assign values to attributes. To do so, you need to know the different types of values that can be assigned to attributes.
The following list discusses the different value types that can be specified for an attribute in a DTD:
PCDATA – represents plain text values
ID – used to assign unique value to each element in document. ID must begins with alphabetic character
(enumerated) – used to assign a specific range of values and specified in parenthesis.
In addition to specifying the value type of an attribute, you also need to specify whether the attribute is optional or mandatory. You can do so by setting the attribute type to one of the following types in a DTD:
REQUIRED – value for the attribute is must
FIXED – value of the attribute cannot be changed
IMPLIED – attribute is optional
Consider the following example:
<!ATTLIST PRODUCT PRODID ID #REQUIRED>
In the given example, an attribute called PRODID is declared for the PRODUCT element. The value type of this attribute is set to ID, which indicates that the value of PRODID is unique for each appearance of the PRODUCT element in the XML document. In addition, the attribute type is specified as #REQUIRED. This indicates that the PRODID attribute is mandatory for each PRODUCT element in the XML document.
You can also specify the default value for an attribute. Consider the following example:
<!ATTLIST PRODUCT CATEGORY (TOY|BOOK) “TOY”>
In the above example, the CATEGORY attribute is declared for the PRODUCT element. The value type for this attribute is an enumerated list, which specifies that the value of the CATEGORY attribute can be set to either TOY or BOOK. If the user does not explicitly provide the value for the CATEGORY attribute in the XML document, the default value for the attribute will be taken as TOY.
Validate the structure of data.
To validate the structure of data stored in an XML document against the DTD. You need to use parsers. Parsers are software programs that check the syntax used in an XML file. There are two types of parsers. They are:
Non- validating parsers
A non-validating parsers checks if a document follows the XML syntax rules. It builds a tree structure from the tags used in an XML document and returns an error only when there is a problem with the syntax of the document. Non-validating parsers process a document faster because they do not have to check every element against a DTD. In other words, these parsers check whether the XML document adheres to the rules of well-formed documents. The expat parser is an example of a non-validating parser.
A validating parser checks the syntax, builds the tree structure, and compares the structure of the XML document with the structure specified in the DTD associated with the document. In other words, in addition to checking whether an XML document is well-formed, validating parsers also check whether the XML document adheres to the rules in the DTD used by the XML document. Microsoft MSXML parser is an example of a validating parser.
Declare elements and attributes.
Internal and external DTDs
You can declare elements and attributes in a DTD. A DTD can be a part of an XML document or can be a separate file containing declarations of elements and attributes. Thus, a DTD can be classified into two types, internal and external DTDs.
The following list discusses the differences between internal and external DTDs:
Internal DTD is the part of the XML document, where as external DTD is maintained in separate file.
You can use internal DTD for only document in which it created, you cannot use internal DTD for multiple documents. You can use external DTD for multiple documents.
To ensure that the structure of an XML document conforms to the DTD. You must associate the DTD with the XML document. The <!DOCTYPE> declaration is used to associate a DTD with an XML document. It can be used to define an internal DTD. It can also be used to reference an external DTD.
The syntax for defining an internal DTD in an XML document is as follows:
[element and attribute declarations]>
The syntax for referencing an external DTD in an XML document is as follows:
<!DOCTYPE rootelement PUBLIC|SYSTEM”path-of-file”>
In the above declaration,
rootelement represents the name of the root element.
PUBLIC notation specifies that the DTD is stored on a server; whereas SYSTEM notation specifies that the DTD is stored on the local system.
Path-of-file represents the name of the DTD file along with the path of the file. If the DTD file is stored in the same folder as that of the XML file, then you need not specify the path. If the DTD file is in a different folder, then you need to specify the entire path of the file.