Part 1, Basic structure of a document

Each HTML document consists of a few main sections.

  1. The DOCTYPE. The first line of your HTML file should be a DOCTYPE declaration. Browsers generally ignore this and don't care if it isn't there at all but as we're writing good HTML we'll include it. In theory this declaration tells user agents (browsers and any other programme which reads the HTML) which formal definition of HTML you are writing to. For now we'll stick to a single declaration:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    This indicates that are HTML complies with the HTML 4.01 Transitional DTD (Document Type Definition), which is fine for most web pages.
    NOTE: the DOCTYPE is case sensitive.
  2. The HTML element. This is the parent element for the whole document. It doesn't contain any content directly but does contain the next two sections.
  3. The HEAD element. This goes at the start of the document and contains information about the document in general. It has one compulsory child element: the TITLE element.
  4. The BODY element. This follows the HEAD and contains all the actual content of the document.

So our basic HTML document is as follows:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
 <HEAD>
  <TITLE>The title of your page...</TITLE>
 </HEAD>
 <BODY>
The contents of your page....
 </BODY>
</HTML>

In HTML terms an element is the whole assembly <FOO>content</FOO>, note that the content may contain other elements. It's bad HTML to have overlapping, as opposed to nested elements. For example:
Good: <X>some content <Y>some more</Y> and some more</X>
Bad: <X>some content <Y>some more</X> and some more</Y>
Browsers tend to cope with overlapping elements rather well, but there are exceptions and we are trying to write good HTML so we'll try to always nest our elements properly.

The code in the angled brackets are called tags. <X> is a start or opening tag and </X> is an end or closing tag. Many elements have optional end tags, but due to browser bugs and a desire for neatness we'll always include them. Some elements have no end tags at all: they are standalone elements with no content, the DOCTYPE is a rather special example of that.

Many elements have what are known as attributes <FOO BAR="some value"> which go on the opening tag and give additional information about the nature or presentation of that element. There are rules regarding which attribute values should be enclosed by quotes and which don't need to be. However, it is never wrong to include the quotes and so we'll always do that.

Tags are case insensitive, so <BODY>, <body> and <BodY> are all the same. The case of opening and closing tags do not need to match. So <BODY>....</body> is perfectly okay. (this changes in XML derived languages like WML and XHTML, but we're not going to worry about them for now.) Attribute names are also case insensitive, but attribute values are sometimes case sensitive.


Copy the following code into a text editor:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
 <HEAD>
  <TITLE>The title of your page...</TITLE>
 </HEAD>
 <BODY>
The contents of your page....
 </BODY>
</HTML>

Now change the page title and the content to whatever you like. Then save the file to your desktop with the extension .html (.htm is also widely used but only exists because some dumb old operating systems could only cope with three letter extensions). Click on the file and it should launch in your web browser.

Notice anything? Your content is all run together. If you did this:

<BODY>
Hello!

This is my first web page!
</BODY>

It will appear in your browser as 'Hello! This is my first web page!'. This is because in HTML, all white space characters (spaces, tabs, line breaks) are collapsed down to one (or zero in some situations) space.


Next: Part 2, Adding structure to the content.