Where do we go from here?

 


The latest version of HTML, version 4, doesn't change the language very radically: it's more of a consolidation version. It introduces several more tags, some useful attributes for existing tags, and removes a few tags that most people haven't used since version 1.0 anyhow. It does, however, promise support for various ideas that were developed over the past 2 years or so but didn't make it into version 3.2. These are: cascading style sheets, the ability to add SGML tags, and support for XML.

We don't learn any of this in this course. That's because this course is designed to enable educators, schools and colleges to produce good, functional, well-designed Web sites that are accessible by a wide range of people, most of them without top-of-the-line equipment that can run the latest browsers. For that purpose, you certainly don't need any of the above, and a lot of your users wouldn't be able to handle it. But in a year or so you may want to take advantage of the extra flexibility of HTML 4 (by then it'll probably be version 5!), so here's a brief explanation and some links to where you can find more information.

 


SGML

SGML stands for Standard Generalized MarkUp Language.

SGML is a standard for defining formatting (markup) languages. HTML is simply a set of tags defined as a markup language under the rules of SGML . HTML 4 paves the way for widening HTML to allow more of the functionality -- but also the complexity -- of SGML.

Think of it this way, by using SGML, you can create your own tags. I want the tag <ZORK> to represent text that is bold, italic, and arial font. By using SGML code, I can define a tag that will do what I want.

If you want to learn more about SGML, see On SGML and HTML, SGML: Getting Started, An Introduction to SGML, or SoftQuad's SGML primer.


XML

XML stands for Extensible MarkUp Language.

 XML is like a very simplified version of SGML, a version that people can understand. What it does is to allow you to set up your own tags, including for things like mathmatical equations; for fields of a data base; for objects you want to talk about, and so on. The SGML standard is defined in 300 pages; the XML standard, in only 50.  

If you want more than that, see the Frequently Asked Questions about the Extensible MarkUp Language page.

It will take at least a year or two for people to actually start using these languages. By then, there will probably be software that will do most of the XML work for you. And in any case, it's more for special needs, like creating hypertext versions of complicated documents, or hypertext databases, rather than for "normal" Web sites.


So what's new in HTML 4?

Most of the information below came from the World Wide Web Consortium's pages on HTML 4.0. You can go there and get it straight from the horse's mouth. But they're not for the faint of heart: too much, and too technical.

Basically, what interests us in HTML 4 are: new tags, new attributes of existing tags,and tags that still exist but are not longer recommended (there are better ways of doing the same thing). A few tags were also cancelled but I'm not listing them since they were so old that nobody used them any more anyhow.


  New Tags

     The following tags are "new" in HTML 4.0:

The Tag  What It Does 
<ABBR>  This indicates an abreviated form of a word. Example:

  <ABBR TITLE="United Nations">UN</ABBR>

The TITLE attribute produces a rollover title like the ALT command does on pictures. 

<ACRONYM>  This works the same way as <ABBR>except it denotes a acronym. Example:

  <ACRONYM TITLE="Self-Contained Underwater Breathing Apparatus>SCUBA</ACRONYM>

<BDO>  ("Bi-Directional Order") This is for pages including different languages that are written in different directions. For instance, Hebrew and English, or Arabic and French. The BDO tag tells the browser to leave the text in the direction denoted by the attribute dir=. dir=ltr is for left-to-right texts, e.g. English or French; dir=rtl is for (you guessed it!) right-to-left text, e.g. Hebrew or Arabic. It is most often used in the PRE tags. Example:

 <PRE>
<BDO DIR="LTR">hello</BDO>
</PRE>

<BUTTON>  With this tag we come to the really useful. This will become standard code for creating link buttons, like in a form. Example:

 <BUTTON name="submit" value="submit" type="submit"></BUTTON>

This makes it easy to widen the types of buttons you can have, and put them anywhere on a page. Till now, only "submit" and "clear" buttons were supported in forms, "type="link" worked but was undocumented, and if you wanted one outside a form you either had to put it in its own little form or create a graphic for it.

 What's more, this format will easily allow for an image to be placed on the button. 

<COLGROUP>  Another really useful command: gets over the problem that tables are defined row by row and therefore cannot address columns as entities. This command allows for an entire column of data in tables to be affected by one command rather than using a separate command for each cell. Example:

  <COLGROUP width="30%" color="red"></COLGROUP> 

<DEL>  Surrounding something with this command will provide a strike-through over what it deleted. Example:  Version <DEL>3</DEL><INS>4</INS>
<FIELDSET>  This allows people to group controls on a page together, like grouping buttons that affect a certain JavaScript so there won't be any interaction between other scripts on the same page or sections of a guestbook. In works together with the LEGEND command below. I'll give an example there.
<FRAME>  This works the same way as the FRAME command we have today, but has been much extended. It's intended for use with fStyle Sheets and SGML format styles. 
<FRAMESET>  This works the same way as the FRAMESET command we have today, but has been much extended. It's intended for use with Style Sheets and SGML format styles. For instance, you have a page with four frame cells. You want only the ones on the left to have green borders. You use this command to set aside those two vertical frames and assign traits to just that section.
<IFRAME>  This again works much the same way as the In-Line frames we currently use. Again, the reason this is listed is that it will be a specific subset of commands for use with SGML format styles. 
<INS>  This sets something aside as having been added or "inserted" at a later time. It is denoted through an underline. 
<LABEL>  This command attaches a label to form tags. Example:

 <FORM ACTION="--">
<LABEL form="email">Email Address</LABEL>
<INPUT type="text" name="email_address" id="email">
</FORM> 

<LEGEND>  Now we get to the example denoted above from the command FIELDSET. FIELDSET groups form items together. LEGEND denotes those sections. Example:

 <FIELDSET>
<LEGEND>Personal Information</LEGEND>
Name: [Input Text Box]
EMAIL: [Input Text Box]
AGE: [Input Text Box]
</FIELDSET>

 It keeps it all straight for the computer. 

<NOFRAMES>  This denotes text content that displays if the user does not have frame capabilities. We learnt it in the lesson on frames; but it wasn't officially part of the HTML standard till now.
<OBJECT>  This command will become a replacement command for IMG, ISMAP, APPLET, SCRIPT, and a myriad of other "objects" that appear on the page. This one command will represent that something is going to be placed on the page. The computer then decides what kind of object it is, from its extension. Example:

  <OBJECT data="image.gif" type="image/gif"></OBJECT>  ~or~ <OBJECT classid="applet.class"></OBJECT>  ~or~ <OBJECT data="movie.avi" type="application/avi"></OBJECT> 

<SPAN>  This is a container tag that defines a certain division of the page or span of text that can then be altered to your heart's content. Example:  <SPAN CLASS="green">This would be green text</SPAN>.

The main difference between <SPAN> and <DIV> is that <DIV> is intended for larger sections of the page, containing several paragraphs, and its attributes are about alignment and text flow. <SPAN> is intended for small bits of text and its attributes are about the properties of that text.

<TBODY>  This command will surround a block of table cells so that you can affect just that section. Keep reading... 
<TFOOT>  This will allow you to place a footer below each TBODY section of a table. Notice all the commands are TR rather than TD. Here's an example for both TBODY and TFOOT:

 <TABLE>
<TBODY bgcolor="white">
<TR> text
<TR> text
</TBODY> 
<TFOOT><TR>The above cells...</TFOOT> 
</TABLE> 

<THREAD>  This is header information for a group of cells - used the exact same way as the TFOOT above - except the text is displayed above the group of cells set apart by the TBODY command. Like this: 

<TABLE>
<THREAD><TR> The following cells...</THREAD> 
<TBODY bgcolor="--">
<TR> text
<TR> text
</TBODY> 
<TFOOT><TR>The above cells...</TFOOT> 
</TABLE> 

<Q>  Replaces the <BLOCKQUOTE> command. The only difference between them is that <Q> command is much quicker to write.

 


New Attributes

    Remember that attributes are not tags. They cannot occur on their own but only within the angle brackets of a tag. You can't write <DIR="ltr">; you can write <P dir="ltr">

The Attribute  What It Does 
CLASS  This is already in use in Explorer versions 3 and 4. First you set up a class with a Style Sheet command. Then you call for the style sheet using the class command. Example:  <SPAN CLASS="purple">Affected text</SPAN>
DIR  Defines whether the text is to be read LTR (Left to Right) or RTL (Right to Left). You can use it in <DIV>, <P>, or the new <BDO> and <SPAN> tags. To some extent it replaces align="right", but I'm not sure Netscape recognises it. People who don't need all this don't know how lucky they are!
ID  ID can be used in the same manner as CLASS above, however in HTML 4.0 it is also being used to denote sections of the page. In short - it acts like a label you can jump to with a hypertext link. Put it in <SPAN> or <DIV> and you can give a name to a section of your page, instead of just to a single point in it as a label does. Example:

  <A HREF="#sectionone-id">Jump to Section One</A>

The command above will jump to this:

  <SPAN ID="sectionone">section One</SPAN>

LANG  Again, this is an attribute for a tag denoting a section of text, such as <SPAN> or <DIV> or <P>. It allows a search engine to recognise that section as in a particular language rather than misspelled English. For example:

  <SPAN LANG="fr">Comment ca va?</SPAN>

 The LANG attribute does not translate. You must still write the text in the appropriate language. The LANG command just allows the search engines to recognize that section as french (or whatever) text.

Here are some codes: ar (Arabic), de (German), el (Greek), es (Spanish), fr (French), he (Hebrew), hi (Hindi), ja (Japanese), it (Italian), nl (Dutch), pt (Portuguese), ur (Urdu), ru (Russian), sa (Sandskirt), zh (Chinese).

 There is even a code set aside if you wish to denote a language that doesn't really exist, like Pig-Latin or Klingon. Follow the same rule as above, but add x- before the name. Like this: LANG="x-ubbee dubbie". The "X" means it's an experimental language. 

TITLE  Another in the really useful category. This works the same way as the ALT attribute in an IMG tag. It allows you to give a title to just about anything so that when the mouse remains stationary for a second - a text box pops up. Example:

  <SPAN title="National Football League">NFL</SPAN>

Now every time someone places their mouse on that set of initials, the box will pop up saying "National Football League". It can be very helpful. 

 


Outdated Tags

These tags are still supported, but are no longer recommended since their function has been taken over by a newer tag.

The tag 

Replaced by:

<APPLET>  The <OBJECT> tag 
<BASEFONT>  Style Sheet Commands.
<CENTER>  In fact this never was supported, since the align="center" attribute was introduced already in HTML 2; but Netscape insisted on it and it was already a de facto standard so people kept using it. The HTML standard recommends you to prefer align="center" or Style Sheet Commands. Strangely enough, Internet Explorer version 4 has at last given in and started to recognise the <CENTER> tag after refusing to do so for 3 versions.
<FONT>  Style Sheet Commands. In my opinion style sheets are a complicated way of doing things; if you don't have an alternative, they're a necessary evil. But if you can do something with a simple <FONT> command, why write a style sheet?
On the other hand, if you want to format many pieces of text the same way (e.g. set up heading formats) it's easier to define a style than write out the format every time. 
<ISINDEX>  An old command to create an input box, from the days before there were forms. There's been no reason to use it for the past two years. Use <INPUT> commands instead.
<MENU>  This was an old command to create plain (neither bulleted nor numbered) lists. People rarely used it since the list looked no different from just writing plain text. HTML 4 recommends that you create lists through the <UL> command 

 

 

Back to course entry page


Written by J. Koren for Unesco
©1998