Some Notes on the UTF-8 Tag



First of all, as promised, the pages on this site (except the charts) should display correctly under Netscape with the addition of the following line to the header of the html:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

Now, here's why I'm not adding this tag to these sheets myself...

The Internet Explorer 5.0 treats UTF-8 as a character set, but does not have any provision for assigning a specific font for UTF-8. (IE 4.x did have such a provision.)

The best way to view these pages is with the Internet Explorer 5.0 with the encoding set to "User Defined", and I'm sorry that this is so, as I do not wish to devise "browser-specific" web pages.

When the [ charset="UTF-8" ] tag is used, the Internet Explorer 5.0 will behave unpredictably, ignoring the user's preferred fonts settings and ignoring any font-face tags specified by the web page designer.

What this means is that the (IE 5.0) user cannot compare the performance of two different Unicode fonts on a multilingual page without getting a misleading picture of the fonts' character repertoire.

To illustrate:

screenshot comparison of two fonts showing false picture


The picture above was made using a sample of html calling for various scripts. The Internet Explorer is set to " View - Encoding - User Defined ".

Under " Tools - Internet Options - Fonts - User Defined ", the Code2000 font was selected to make the left half of the picture, the Bitstream Cyberbit font was selected to make the right half...

If the UTF-8 tag had been used, in order to switch to Cyberbit, I had guessed that it would be necessary to:


" Tools - Internet Options - Fonts - Latin " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Cyrillic " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Armenian " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Hebrew " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Thai " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Japanese " select Bitstream Cyberbit...
" Tools - Internet Options - Fonts - Chinese " select Bitstream Cyberbit...

...manually change the font for each script, which seemed like an excessive amount of work.

Of course, when I tested this theory, it didn't pan out. Just changing the Latin-based setting to Bitstream Cyberbit seemed to do the trick. After changing the Latin setting, the IE 5.0 was displaying Cyrillic, Thai, etc. in Cyberbit face even though my system has a different font specified for Cyrillic and Thai.

Using script/lang tags in the HTML doesn't seem to make any difference.

Unfortunately, without the charset=UTF-8 tag, these pages are not viewable on some of the other platforms/browsers.

Netscape, for example, seems to require the UTF-8 tag even when decimal numeric character references are used (NCR's).

NCR's are one method of expressing Unicode. UTF-8 is an encoding scheme for Unicode. NCR's are not UTF-8, so it seems odd that a UTF-8 tag would be required for a document which does not contain any UTF-8 material.

Some of the pages (linked on the scriptlinks page) are in UTF-8 encoding format and the UTF-8 tag is used in the HTML header. These sheets should display properly...




Here's a test. The following are the same string of characters using font-face tags for the larger Unicode-based TTFs. For IE 5.x users, the UTF-8 tag is present in the header of this page, so this test will not display correctly.

(There is a faction which is pushing for the removal of the font-face tag from HTML. Hopefully this won't happen, as the tag is the only way that a web page author can specify a font. Not to suggest that web developers should always make font-specific pages, but there remain many instances where specific fonts are absolutely required. Perhaps this is properly a decision to be made by individual web authors rather than a committee.)

As soon as the IE 5.0 finds the UTF-8 tag, it automatically switches the user's preset encoding preference to UTF-8 encoding. The user must switch it back to "Latin based", "Chinese - Traditional", "User Defined", or, whatever...

In order for the following to display properly, IE 5.0 users must [ View - Encoding - User Defined ]. And, this only works briefly. As soon as the IE 5.0 screen is refreshed, or it reads another page with the UTF-8 tag, it will once again automatically switch the user's preferred encoding scheme to UTF-8. (Note: if the target page is encoded in UTF-8, changing the IE 5.0 encoding to User Defined will result in gibberish.)

So, because of the IE 5.0's erratic behavior with the UTF-8 tag, I generally do not use the tag in web pages unless the page is actually encoded in UTF-8.

font face = MingLiu
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = Code2000
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = Bitstream Cyberbit
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = GulimChe
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = MS Hei
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = MS Song
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = Tahoma
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = Lucida Sans Unicode
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = Times New Roman
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと


font face = My Default Font
CHINESE: 凗凘凙凚 ŊŒŮǕʃẼ ЖИРѦӤ ԱԲԳԴԵ א ב ג ד ה ฒณดตถ たちつてと



On my system, because of the UTF-8 tag, the Chinese glyphs appear identical for Code2000, Bitstream Cyberbit, MS Song, Tahoma, Lucida Sans Unicode, and Times New Roman.

The line of text which is supposed to be from Lucida Sans Unicode is using Code2000 for Armenian, MS Song for Chinese, Tahoma for Thai, Code2000 for Hiragana, and is mixing typefaces for the Latin even though Lucida Sans has all but one of the Latin characters in its repertoire.

Bitstream Cyberbit has an attractive Chinese character set, yet my system is displaying the Chinese portion of the supposed Cyberbit text using MS Song.

Because of the UTF-8 tag, the IE 5.0 display for each of the specified fonts above is similarly incorrect.

I am grateful to Jaap Pranger for alerting me to several errors in the HTML of this page.

My home page