Jump to content

Help:Special characters: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
move lots of outdated text to talk page in preparation for rewriting
Line 55: Line 55:
For the purpose of searching, a word with a special character can best be written using the first method. If the second method is used a word like Odiliënberg can only be found by searching for Odili, euml and|or nberg; this is actually a bug that should be fixed—the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also [[Help:Searching]].
For the purpose of searching, a word with a special character can best be written using the first method. If the second method is used a word like Odiliënberg can only be found by searching for Odili, euml and|or nberg; this is actually a bug that should be fixed—the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also [[Help:Searching]].


== ISO-8859-1 Characters ==


The following [[Wikipedia:en:extended ASCII|extended ASCII]] characters are safe for use in all Wiki pages. The table below shows the character itself, lists the code for each character in hexadecimal and decimal, shows the HTML entity name, and gives the common name of the character.

<table border=1 cellpadding=5 cellspacing=0>
<tr><th>Literal<th>Hex<th>Dec<th>Entity<th>Character
<tr><td>&nbsp; <td>00A0 <td>0160 <td>&amp;nbsp; <td>[[w:no-break space|no-break space]]
<tr><td>&iexcl; <td>00A1 <td>0161 <td>&amp;iexcl; <td>[[w:inverted exclamation|inverted exclamation]]
<tr><td>&cent; <td>00A2 <td>0162 <td>&amp;cent; <td>[[w:cent sign|cent sign]]
<tr><td>&pound; <td>00A3 <td>0163 <td>&amp;pound; <td>[[pound sign]]
<tr><td>&curren;<td>00A4 <td>0164 <td>&amp;curren;<td>[[intl. currency sign]]
<tr><td>&yen; <td>00A5 <td>0165 <td>&amp;yen; <td>[[yen sign]]
<tr><td>&sect; <td>00A7 <td>0167 <td>&amp;sect; <td>[[section sign]]
<tr><td>&uml; <td>00A8 <td>0168 <td>&amp;uml; <td>[[diaeresis]] (umlaut)
<tr><td>&copy; <td>00A9 <td>0169 <td>&amp;copy; <td>[[copyright sign]]
<tr><td>&ordf; <td>00AA <td>0170 <td>&amp;ordf; <td>[[feminine ordinal]]
<tr><td>&laquo; <td>00AB <td>0171 <td>&amp;laquo; <td>[[left double-angle quote]]
<tr><td>&not; <td>00AC <td>0172 <td>&amp;not; <td>[[not sign]]
<tr><td>&reg; <td>00AE <td>0174 <td>&amp;reg; <td>[[registered trademark sign]]
<tr><td>&macr; <td>00AF <td>0175 <td>&amp;macr; <td>[[macron]]
<tr><td>&deg; <td>00B0 <td>0176 <td>&amp;deg; <td>[[degree sign]]
<tr><td>&plusmn;<td>00B1 <td>0177 <td>&amp;plusmn;<td>[[plus-minus sign]]
<tr><td>&acute; <td>00B4 <td>0180 <td>&amp;acute; <td>[[acute accent]]
<tr><td>&micro; <td>00B5 <td>0181 <td>&amp;micro; <td>[[micro sign]]
<tr><td>&para; <td>00B6 <td>0182 <td>&amp;para; <td>[[Wikipedia:en:pilcrow|pilcrow]] (paragraph) sign
<tr><td>&middot;<td>00B7 <td>0183 <td>&amp;middot;<td>[[middle dot (Georgian comma)]]
<tr><td>&cedil; <td>00B8 <td>0184 <td>&amp;cedil; <td>[[cedilla]]
<tr><td>&ordm; <td>00BA <td>0186 <td>&amp;ordm; <td>[[masculine ordinal]]
<tr><td>&raquo; <td>00BB <td>0187 <td>&amp;raquo; <td>[[right double-angle quote]]
<tr><td>&iquest;<td>00BF <td>0191 <td>&amp;iquest;<td>[[inverted question]]
<tr><td>&Agrave;<td>00C0 <td>0192 <td>&amp;Agrave;<td>[[A grave]]
<tr><td>&Aacute;<td>00C1 <td>0193 <td>&amp;Aacute;<td>[[A acute]]
<tr><td>&Acirc; <td>00C2 <td>0194 <td>&amp;Acirc; <td>[[A circumflex]]
<tr><td>&Atilde;<td>00C3 <td>0195 <td>&amp;Atilde;<td>[[A tilde]]
<tr><td>&Auml; <td>00C4 <td>0196 <td>&amp;Auml; <td>[[A diaeresis]]
<tr><td>&Aring; <td>00C5 <td>0197 <td>&amp;Aring; <td>[[A ring]]
<tr><td>&AElig; <td>00C6 <td>0198 <td>&amp;AElig; <td>[[AE ligature]]
<tr><td>&Ccedil;<td>00C7 <td>0199 <td>&amp;Ccedil;<td>[[C cedilla]]
<tr><td>&Egrave;<td>00C8 <td>0200 <td>&amp;Egrave;<td>[[E grave]]
<tr><td>&Eacute;<td>00C9 <td>0201 <td>&amp;Eacute;<td>[[E acute]]
<tr><td>&Ecirc; <td>00CA <td>0202 <td>&amp;Ecirc; <td>[[E circumflex]]
<tr><td>&Euml; <td>00CB <td>0203 <td>&amp;Euml; <td>[[E diaeresis]]
<tr><td>&Igrave;<td>00CC <td>0204 <td>&amp;Igrave;<td>[[I grave]]
<tr><td>&Iacute;<td>00CD <td>0205 <td>&amp;Iacute;<td>[[I acute]]
<tr><td>&Icirc; <td>00CE <td>0206 <td>&amp;Icirc; <td>[[I circumflex]]
<tr><td>&Iuml; <td>00CF <td>0207 <td>&amp;Iuml; <td>[[I diaeresis]]
<tr><td>&Ntilde;<td>00D1 <td>0209 <td>&amp;Ntilde;<td>[[N tilde]]
<tr><td>&Ograve;<td>00D2 <td>0210 <td>&amp;Ograve;<td>[[O grave]]
<tr><td>&Oacute;<td>00D3 <td>0211 <td>&amp;Oacute;<td>[[O acute]]
<tr><td>&Ocirc; <td>00D4 <td>0212 <td>&amp;Ocirc; <td>[[O circumflex]]
<tr><td>&Otilde;<td>00D5 <td>0213 <td>&amp;Otilde;<td>[[O tilde]]
<tr><td>&Ouml; <td>00D6 <td>0214 <td>&amp;Ouml; <td>[[O diaeresis]]
<tr><td>&Oslash;<td>00D8 <td>0216 <td>&amp;Oslash;<td>[[O stroke]]
<tr><td>&Ugrave;<td>00D9 <td>0217 <td>&amp;Ugrave;<td>[[U grave]]
<tr><td>&Uacute;<td>00DA <td>0218 <td>&amp;Uacute;<td>[[U acute]]
<tr><td>&Ucirc; <td>00DB <td>0219 <td>&amp;Ucirc; <td>[[U circumflex]]
<tr><td>&Uuml; <td>00DC <td>0220 <td>&amp;Uuml; <td>[[U diaeresis]]
<tr><td>&szlig; <td>00DF <td>0223 <td>&amp;szlig; <td>[[sharp s (ess-zed)]]
<tr><td>&agrave;<td>00E0 <td>0224 <td>&amp;agrave;<td>[[a grave]]
<tr><td>&aacute;<td>00E1 <td>0225 <td>&amp;aacute;<td>[[a acute]]
<tr><td>&acirc; <td>00E2 <td>0226 <td>&amp;acirc; <td>[[a circumflex]]
<tr><td>&atilde;<td>00E3 <td>0227 <td>&amp;atilde;<td>[[a tilde]]
<tr><td>&auml; <td>00E4 <td>0228 <td>&amp;auml; <td>[[a diaeresis]]
<tr><td>&aring; <td>00E5 <td>0229 <td>&amp;aring; <td>[[a ring]]
<tr><td>&aelig; <td>00E6 <td>0230 <td>&amp;aelig; <td>[[ae ligature]]
<tr><td>&ccedil;<td>00E7 <td>0231 <td>&amp;ccedil;<td>[[c cedilla]]
<tr><td>&egrave;<td>00E8 <td>0232 <td>&amp;egrave;<td>[[e grave]]
<tr><td>&eacute;<td>00E9 <td>0233 <td>&amp;eacute;<td>[[e acute]]
<tr><td>&ecirc; <td>00EA <td>0234 <td>&amp;ecirc; <td>[[e circumflex]]
<tr><td>&euml; <td>00EB <td>0235 <td>&amp;euml; <td>[[e diaeresis]]
<tr><td>&igrave;<td>00EC <td>0236 <td>&amp;igrave;<td>[[i grave]]
<tr><td>&iacute;<td>00ED <td>0237 <td>&amp;iacute;<td>[[i acute]]
<tr><td>&icirc; <td>00EE <td>0238 <td>&amp;icirc; <td>[[i circumflex]]
<tr><td>&iuml; <td>00EF <td>0239 <td>&amp;iuml; <td>[[i diaeresis]]
<tr><td>&ntilde;<td>00F1 <td>0241 <td>&amp;ntilde;<td>[[n tilde]]
<tr><td>&ograve;<td>00F2 <td>0242 <td>&amp;ograve;<td>[[o grave]]
<tr><td>&oacute;<td>00F3 <td>0243 <td>&amp;oacute;<td>[[o acute]]
<tr><td>&ocirc; <td>00F4 <td>0244 <td>&amp;ocirc; <td>[[o circumflex]]
<tr><td>&otilde;<td>00F5 <td>0245 <td>&amp;otilde;<td>[[o tilde]]
<tr><td>&ouml; <td>00F6 <td>0246 <td>&amp;ouml; <td>[[o diaeresis]]
<tr><td>&divide;<td>00F7 <td>0247 <td>&amp;divide;<td>[[divide sign]]
<tr><td>&oslash;<td>00F8 <td>0248 <td>&amp;oslash;<td>[[o stroke]]
<tr><td>&ugrave;<td>00F9 <td>0249 <td>&amp;ugrave;<td>[[u grave]]
<tr><td>&uacute;<td>00FA <td>0250 <td>&amp;uacute;<td>[[u acute]]
<tr><td>&ucirc; <td>00FB <td>0251 <td>&amp;ucirc; <td>[[u circumflex]]
<tr><td>&uuml; <td>00FC <td>0252 <td>&amp;uuml; <td>[[u diaeresis]]
<tr><td>&yuml; <td>00FF <td>0255 <td>&amp;yuml; <td>[[y diaeresis]]
</table>

These characters are a subset of the most common extended ASCII character set in use on the [[Wikipedia:en:Internet|Internet]], [[Wikipedia:en:ISO 8859-1|ISO 8859-1]]. MediaWiki pages are identified by the server as containing ISO-8859-1 text. The characters above are a subset selected to improve compatibility with other machines.

For example, the [[Wikipedia:en:Apple Macintosh|Apple Macintosh]] is in common use on the Internet, is not limited to any specific language, and its native character set (which is not ISO-8859-1) contains many of the common international characters. Many Macintosh browsers will correctly translate ISO text into the native character set, as long as the characters used are available. So the table above is the subset of ISO-8859-1 characters that are also available on the native Macintosh character set. (This is the situation up through [[Wikipedia:en:Mac OS 9|Mac OS 9]].x, at any rate; [[Wikipedia:en:Mac OS X|Mac OS X]] appears to use Unicode as its native encoding.)

[[Wikipedia:en:Microsoft Windows|Microsoft Windows]] standard code page 1252 set is a superset of ISO-8859-1, so these characters will be readable as is on Windows machines. The most common Latin character sets other than ISO-8859-1 are MS-DOS (pre-Windows) code page 437, Macintosh Roman, and other ISO sets such as ISO-8859-2. The number of pre-Windows MS-DOS machines with web browsers is small and they are often dedicated-purpose machines that wouldn't be using MediaWiki anyway, so it is reasonably safe to sacrifice compatibility with them for the sake of needed foreign characters. Other ISO sets are generally intended to be read by other browsers using those same sets in the same country, and so those pages should use a language-specific set.

These characters can be entered either as HTML named character entity references such as '''&amp;agrave;''', directly from foreign keyboards, or with whatever facilities are available to the Wiki author for entering these characters. For example, Wiki authors using Windows machines can enter these by holding down the Alt key while typing the 4-digit decimal code of the character on the numeric pad of the keyboard. It is important that all 4 digits (including the leading 0) be typed; typing a 3-digit code will enter characters from the obsolete code page 437. Wiki authors using Macintosh machines should take care to either use special facilities to enter these in ISO-8859-1 format rather than with the native character set, or else use HTML named character entity references. Note that some Windows users may have trouble with versions of Microsoft Internet Explorer that use "Alt-Left-Arrow" and "Alt-Right-Arrow" for page movement. These will interfere with entering codes that contain the digits 4 and 6. Use HTML named character entity references in this case.

The characters from the table above can be used directly as 8-bit characters in all Wiki pages, and are sufficient for all pages primarily in English, Spanish, French, German, and languages that require no more special characters than those (such as Catalan). These are also generally safe to use in titles, except for a few characters like double quotes, less than and greater than, and a few others.

=== Unsafe characters ===

Note especially what is missing here from the full ISO-8859-1 set: The broken bar (<code>0166=&amp;brvbar;</code> [&brvbar;]¹), soft hyphen (<code>0173=&amp;shy;</code> [&shy;]¹), superscript digits (<code>0178=&amp;sup2;</code> [&sup2;]¹, <code>0179=&amp;sup3;</code> [&sup3;]¹), vulgar fractions (<code>0188=&amp;frac14;</code> [&frac14;]¹, <code>0189=&amp;frac12;</code> [&frac12;]¹, <code>0190=&amp;frac34;</code> [&frac34;]¹), Old English (and [[Wikipedia:en:Icelandic language|Icelandic]] and [[Wikipedia:en:Old Norse language|Old Norse language]]) eth and thorn (<code>0208=&amp;ETH;</code> [&ETH;]¹, <code>0240=&amp;eth;</code> [&eth;]¹, <code>0222=&amp;THORN;</code> [&THORN;]¹, <code>0254=&amp;thorn;</code> [&thorn;]¹), and multiply sign (<code>0215=&amp;times;</code> [&times;]¹). These should be considered unsafe (and adequate substitutes are available for most of them).

Special care should be taken with characters that do exist in the native character set of popular machines but not in the above set. These are not safe, even though they may display correctly to you when you use them. Characters from Windows code page 1252 not in ISO-8859-1 include the euro sign (<code>&amp;euro;</code> [&euro;]¹), dagger and double dagger (<code>&amp;dagger;</code> [&dagger;]¹, <code>&amp;Dagger;</code> [&Dagger;]¹), bullet (<code>&amp;bull;</code> [&bull;]¹), trade mark sign (<code>&amp;trade;</code> [&trade;]¹), typeset-style punctuation (see below), per mille sign (<code>&amp;permil;</code> [&permil;]¹), some Eastern European caron-accented letters, and the oe/OE ligatures. Characters from the Macintosh Roman set not in ISO-8859-1 include dagger and double dagger, bullet, trade mark sign, a few math symbols such as infinity (<code>&amp;infin;</code> [&infin;]¹) and not equal (<code>&amp;ne;</code> [&ne;]¹), a few commonly-used Greek letters such as pi (<code>&amp;pi;</code> [&pi;]¹), ligatures like oe/OE and fl, typeset-style punctuation, per mille sign, and lone accents such as the breve, [[Wikipedia:en:ogonek|ogonek]], and caron.

[http://www.w3.org/TR/html4/ HTML 4.0] defines named character entities for some Latin characters not in ISO-8859-1 that are used by popular languages, such as OE ligature (<code>&amp;OElig;</code> [&OElig;]¹, <code>&amp;oelig;</code> [&oelig;]¹), uppercase Y with diaeresis (<code>&amp;Yuml;</code> [&Yuml;]¹), and some Eastern European accented characters like <code>&amp;scaron;</code> [&scaron;]¹. These are also unsafe; though if they entered as HTML named character entity references, they may display on some machines.

In short, don't assume that it is safe to use a special character just because it looks correct on your machine. Use the ones from the table above, and read and understand how to use others shown below.

:<small>¹ sample in square brackets to see if they work on your configuration</small>

== Possibly usable non-ISO characters ==

Some characters not listed as safe above may still be usable when entered as named HTML character entity references, because web browsers will recognize them and render them correctly, perhaps by switching to alternate fonts as needed. All of these should be considered less safe to use than those above, but only in the sense that they may not display properly, though in the form of HTML character entity references they are unambiguous, and preserve data integrity.

For many of these, adequate substitutes and workarounds are available, and should be used when the value of making the text available to users of older computers and software exceeds the value of good presentation to those with newer software (in the judgment of the author or editor).

=== Typeset-style Punctuation ===

Absent from the ISO-8859-1 character set, but commonly used and present in both Macintosh Roman and Windows code page 1252 character sets, are proper English quotation marks and dashes. These can be entered as character entity references, and should appear correctly on most machines running recent software. Even on ISO-based machines such as [[Wikipedia:en:Unix|Unix]]/[[Wikipedia:en:X Window System|X]], browsers should be able to interpret these references and make appropriate substitutes using plain ASCII straight quotes and hyphens. ([[Wikipedia:en:Mozilla|Mozilla]] does this correctly, for example.) These references were not present in older versions of HTML, so may not be recognized by older software. Since using these characters maintains data integrity even on those machines that may not display them correctly, it should be considered safe to use these unless proper display on old software is critical. German "low-9" quotation marks are a similar case, but are less commonly translated by browsing software, and so are not quite as safe. The table below shows these characters next to a capital letter "O" for better visibility:

<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&lsquo;O</td><td>&amp;lsquo;</td><td>left single quote</td>
<td>&mdash;O</td><td>&amp;mdash;</td><td>em dash</td></tr>
<tr><td>&rsquo;O</td><td>&amp;rsquo;</td><td>right single quote</td>
<td>&ndash;O</td><td>&amp;ndash;</td><td>en dash</td></tr>
<tr><td>&ldquo;O</td><td>&amp;ldquo;</td><td>left double quote</td>
<td>&sbquo;O</td><td>&amp;sbquo;</td><td>single low-9 quote</td></tr>
<tr><td>&rdquo;O</td><td>&amp;rdquo;</td><td>right double quote</td>
<td>&bdquo;O</td><td>&amp;bdquo;</td><td>double low-9 quote</td></tr>
</table>

Many web sites targeted for a Windows-using audience use code page 1252 references for these characters: for example, using <code>&amp;#151;</code> for the em dash. This is not a recommended practice. To ensure future data integrity and maximum compatibility, recode these as named references such as <code>&amp;mdash;</code>. If you really want to use a number, you can use <code>&amp;#8210;</code>.

Be aware that if you edit text in a separate [[Wikipedia:en:word processor|word processor]] or other program to cut and paste into your browser, and it "automatically" converts quotes to the left and right "smart quotes" for you, you may unknowingly mangle markup, either your own or already existing, by replacing the standard quotes in HTML tags &amp; properties with the smart quotes, which will cause the tags to fail in various ways. Furthermore, some people consider the extra encoding of smart quotes, fancy "&amp;rsquo;" apostrophes used in possessives and contractions, etc., to be a waste of bytes that could be put to better use, and will replace them with the standard single characters at will.

Set your wordprocessor options such as Auto Edit and Auto Correction such that undesired replacements do not occur.

== Greek letters and math symbols ==

Compare &amp;nabla; and <nowiki><math>\nabla</math></nowiki>, giving &nabla; and <math>\nabla</math>, respectively. Depending on [[Help:Preferences#Rendering_math|preferences]], the second may be the same as the first (HTML rendering), or an image. The HTML symbol depends on the font size and type, the image has a fixed size in terms of pixels. The color of symbol and background in the first case are those of text in general, according to the settings, and for the image they are black on white.

:''Note: much of the text below regarding mathematical symbols is obsolete now that MediaWiki supports embedded [[Wikipedia:en:TeX|TeX]] within pages. Non-trivial mathematical equations are probably best notated in TeX using the MediaWiki math tags. See the page [[MediaWiki User's Guide: Editing mathematical formulae]] for more on this.''

Web standards for writing about mathematics are very recent (In fact MathML 2.0 was just released in February of 2001.), so many browsers made before these standards were in place try to compensate by at least allowing characters commonly used in mathematics, including most of the [[Wikipedia:en:Greek alphabet|Greek alphabet]]. These are necessarily entered as character entity references. Browsers might render these by switching to a "Symbol" font or something similar.

Upper- and lowercase Greek letters simply use their full names for character entities. These should, of course, only be used for occasional Greek letters in primarily-Latin text. (Large quantities of Greek-language text should be written using an editor with native [[Wikipedia:en:UTF-8|UTF-8]] [[Wikipedia:en:Unicode|Unicode]] support to facilitate editing and reduce page bloat). Here are a few samples:

<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&alpha;</td><td>&amp;alpha;</td><td>&Gamma;</td><td>&amp;Gamma;</td></tr>
<tr><td>&beta;</td><td>&amp;beta;</td><td>&Lambda;</td><td>&amp;Lambda;</td></tr>
<tr><td>&gamma;</td><td>&amp;gamma;</td><td>&Sigma;</td><td>&amp;Sigma;</td></tr>
<tr><td>&pi;</td><td>&amp;pi;</td><td>&Pi;</td><td>&amp;Pi;</td></tr>
<tr><td>&sigma;</td><td>&amp;sigma;</td><td>&Omega;</td><td>&amp;Omega;</td></tr>
<tr><td>&sigmaf;</td><td colspan="3">&amp;sigmaf; (final sigma, lowercase only)</td></tr>
</table>

Other common math symbols

<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&ne;</td><td>&amp;ne;</td><td>&prime;</td><td>&amp;prime;</td></tr>
<tr><td>&le;</td><td>&amp;le;</td><td>&Prime;</td><td>&amp;Prime;</td></tr>
<tr><td>&ge;</td><td>&amp;ge;</td><td>&part;</td><td>&amp;part;</td></tr>
<tr><td>&equiv;</td><td>&amp;equiv;</td><td>&int;</td><td>&amp;int;</td></tr>
<tr><td>&asymp;</td><td>&amp;asymp;</td><td>&sum;</td><td>&amp;sum;</td></tr>
<tr><td>&infin;</td><td>&amp;infin;</td><td>&prod;</td><td>&amp;prod;</td></tr>
<tr><td>&radic;</td><td>&amp;radic;</td><td colspan="2">&nbsp;</td></tr>
</table>

It was once customary to use the Adobe Symbol Symbol character set to render Greek letters and mathematical symbols. Both Macintosh and Windows operating systems provided a Symbol font using this set; a compatible Symbol font was included in most laser printers along with external truetype or postscript versions for computer use; and public domain Truetype and Postscript symbol fonts using this set were easily found. However, in web use, characters greater than hex 7F often did not transfer consistently between operating systems.

However, all of these characters were included in Unicode from the beginning and all are now firmly part of Unicode. Also many browsers no longer support separate Symbol fonts as their encoding methods break HTML rules. Accordingly use of the Symbol character set is strongly discouraged. Some products such as [[Wikipedia:en:TtH|TtH]] still use a special hacked Symbol font to render equations which can be viewed on such browsers as do not support a normal Symbol font, but you should be aware that if you create text requiring such a font, you are restricting your audience to users who also have this font. (Whether or not that's acceptable is a judgement you will have to make as an author.)

== Other common symbols ==

Some characters such as the bullet, [[Wikipedia:en:Euro|Euro]] currency sign, and trade mark sign are special cases. They are likely to be understood and rendered in some way by many browsers. Because they are important for international trade, many computers specifically add them to fonts at some non-standard location and render them when requested, or else render them in special ways that don't require them to be present in a font. See below for how your browser renders these:

<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&bull;</td><td>&amp;bull;</td><td>[[Wikipedia:en:Bullet_(typography)|bullet]]</td></tr>
<tr><td>&euro;</td><td>&amp;euro;</td><td>euro currency sign</td></tr>
<tr><td>&trade;</td><td>&amp;trade;</td><td>trade mark sign</td></tr>
</table>

Other somewhat less commonly used symbols include these:

<table border="1" cellspacing="0" cellpadding="3">
<tr><td>&dagger;</td><td>&amp;dagger;</td><td>[[Wikipedia:en:dagger_(typography)|dagger]]</td>
<td ROWSPAN="7">&#12288;</td>
<td>&spades;</td><td>&amp;spades;</td><td>black spade suit</td></tr>
<tr><td>&Dagger;</td><td>&amp;Dagger;</td><td>[[Wikipedia:en:dagger_(typography)|double dagger]]</td>
<td>&clubs;</td><td>&amp;clubs;</td><td>black club suit</td></tr>
<tr><td>&loz;</td><td>&amp;loz;</td><td>lozenge</td>
<td>&hearts; ''or'' <font face="sans-serif" color="red">&hearts;</font></td><td>&amp;hearts; (see below)</td><td>red heart suit</td></tr>
<tr><td>&larr;</td><td>&amp;larr;</td><td>leftward arrow</td>
<td>&diams; ''or'' <font face="sans-serif" color="red">&diams;</font></td><td>&amp;diams; (see below)</td><td>red diamond suit</td></tr>
<tr><td>&uarr;</td><td>&amp;uarr;</td><td>upward arrow</td>
<td>&lsaquo;</td><td>&amp;lsaquo;</td><td>single left-pointing angle quote</td></tr>
<tr><td>&rarr;</td><td>&amp;rarr;</td><td>rightward arrow</td>
<td>&rsaquo;</td><td>&amp;rsaquo;</td><td>single right-pointing angle quote</td></tr>
<tr><td>&darr;</td><td>&amp;darr;</td><td>downward arrow</td>
<td>&permil;</td><td>&amp;permil;</td><td>per mille sign</td></tr>
</table>

These should be considered unsafe to use except perhaps on pages intended for a specific audience likely to have very up-to-date software on popular machines. Even then, in some cases, [[Wikipedia:en:Internet Explorer|IE]] 6.0 does not show the diamond symbol above. The regular diamond &diams; displays in IE 5 but not 6. The alternative code for the red diamond <font face="sans-serif" color="red">&diams;</font>, which works in IE 6 but not 5, is <nowiki><font face="sans-serif" color="red">&amp;diams;</font></nowiki>.

== Unicode ==

The official [[Wikipedia:en:character set|character set]] of [http://www.w3.org/TR/html4/charset.html HTML 4.01] is the [[Wikipedia:en:ISO 10646|ISO 10646]] [[Wikipedia:en:UCS|Universal Character Set]], which is equivalent to the character set defined by [[Wikipedia:en:Unicode|Unicode]]. Many browsers, though, are only capable of displaying a small subset of the full UCS repertoire.

Numeric character entity references are the only way to enter these characters into a Wiki page at present.

There are two ways:
*decimal, e.g. <code><b><font style="font-size:120%"> &amp;#1049;</font></b></code> giving <b><font style="font-size:120%"> &#1049;</font></b> on your browser
*hexadecimal, in this case <code><b><font style="font-size:120%"> &amp;#x419;</font></b></code> giving <b><font style="font-size:120%"> &#x419;</font></b>.

These should be the same. However, decimal encoding will increase the number of browsers on which they will work. [http://unicode.coeurlumiere.com/] shows for all possible values whether they work and how they look in your browser, using decimal code.

For example, the codes <code>&amp;#1049; &amp;#1511; &amp;#1605;</code> display on your browser as '''&#1049;''', '''&#1511;''', and '''&#1605;''', which ideally look like the [[Wikipedia:en:Cyrillic alphabet|Cyrillic]] letter "Short I", the [[Wikipedia:en:Hebrew alphabet|Hebrew]] letter "Qof", and the [[Wikipedia:en:Arabic alphabet|Arabic]] letter "Meem", respectively. It is unlikely that your computer has all of those fonts and will display them all correctly unless you have a Macintosh or have installed the fonts, though it may display a subset of them. Because they are encoded according to the standard, though, they ''will'' display correctly on any system that is compliant and has the characters available.

These characters should not be used in MediaWiki pages unless they make no
difference to the understanding of the text, and are just extra information.

See [[Wikipedia:en:Unicode and HTML|Unicode and HTML]] for character entities tables.

Most wikimedia wikis have now switched to utf-8 allowing direct entry of unicode text however care must still be taken to avoid overuse of strange unicode charactors in places where people are likely to be unable to see them.

== Advanced Entities ==

The following additional entities are available. On some browsers, these are converted to Unicode equivalents.

''[table missing]''

Special Note: The Del symbol ("nabla;"), among others, is not supported on Windows 95 or 98. On the English Wikipedia it has been uploaded as an image, and can there be referenced as <nowiki>[[Image:Del.gif]]</nowiki>, or here and some other projects as <nowiki>http://en.wikipedia.org/upload/d/db/Del.gif</nowiki>, and looks like this: http://en.wikipedia.org/upload/d/db/Del.gif. On projects where this does not work, upload a copy of the image to that project.

However, the del symbol is usually found in formulæ which are better facilitated using [[MediaWiki User's Guide: Editing mathematical formulae]].

==Egyptian Hieroglyphs==

E.g. <nowiki><hiero>P2</hiero></nowiki> gives <hiero>P2</hiero> See [[Help:WikiHiero syntax]].

This is not dependent on browser capabilities, because it uses images on the servers.

==Browser differences==

Not all characters are displayed in all browsers. Also, since the font in the edit box may well be different from that of the rendered page, the browser may show the characters properly in one of the two areas and not in the other. For each, try to choose fonts which show all characters you need.

In the case of ISO-8859-1 encoding, special characters in the edit box are converted to code that consists of the common characters &, #, digits and a semi-colon, which are always displayed properly.

At any rate, the HTML source code shows the codes of both the characters that are displayed and those that are not. The HTML source code of a preview webpage also shows these for the wikitext.

Note that as a reader, it is best to use a browser with maximum capabilities, but as an author the least capable of the common browsers is a better guideline.

Alternatives include using a similar, more common symbol, or using an image, e.g. [[eo:&#348;ablono:El]]: http://eo.wikipedia.org/upload/d/db/Ikono_tero_malgranda.png.

Also you can describe the character.


==See also==
==See also==

Revision as of 14:39, 17 August 2005

Systems for character encoding

From MediaWiki 1.5, all projects use Unicode (UTF-8) character encoding.

Until the end of June 2005, when this new version came into use on Wikimedia projects, the English, Dutch, Danish, and Swedish Wikipedias used windows-1252 (they declared themselves to be ISO-8859-1 but in reality browsers treat the two as synonmous and the mediawiki software made no attempt to prevent use of stuff from windows-1252). Pre-upgrade wikitext in thier databases remains stored in windows-1252 and is converted on load. Edits made since the upgrade will be stored as UTF-8 in the database. This conversion on load process is invisible to users.

  • Unicode (UTF-8)
    • a variable number of bytes per character
    • special characters, including CJK characters, can be treated like normal ones; not only the webpage, but also the edit box shows the character; in addition it is possible to use the multi-character codes; they are not automatically converted in the edit box.
  • ISO 8859-1
    • one byte per character
    • special characters that are not available in the limited character set are stored in the form of a multi-character code; there are usually two or three equivalent representations, e.g. for the character € the named character reference &euro; and the decimal character reference &#8364; and the hexadecimal character reference &#x20AC;. The edit box shows the entered code, the webpage the resulting character. Unavailable characters which are copied into the edit box are first displayed as the character, and automatically converted to their decimal codes on Preview or Save.
    • the most common special characters, such as é, are in the character set, so code like &eacute;, although allowed, is not needed.

Note that Special:Export exports using UTF-8 even if the database is encoded in ISO 8859-1, at least that was the case for the English Wikipedia, already when it used version 1.4.

To find out which character set applies in a project, use the browser's "View Source" feature and look for such as this:

<meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" />

or

<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

Esperanto

Mediawiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.

The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. you may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (A, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.

in edit boxin database and output
SS
SxŜ
SxxSx
SxxxŜx
SxxxxSxx
SxxxxxŜxx


For example, the interlanguage link [[en:Luxury car]] to en:Luxury car has to be entered in the edit box as [[en:Luxxury car]] on eo:. This has caused problems with interwiki update bots in the past.

Ways to enter special characters

Many characters not in the repertoire of standard ASCII will be useful—even necessary—for wiki pages, especially for foreign language textbooks. This page contains recommendations for which characters are safe to use and how to use them. There are three ways to enter a non-ASCII character into the wikitext:

  1. Enter the character directly from a foreign keyboard, or by cut and paste from a "character map" type application, or by some special means provided by the operating system or text editing application. Some browsers will change characters outside the charset of the wiki into html numeric charater entities (see below).
  2. Use an HTML named character entity reference like &agrave;. This is unambiguous even when the server does not announce the use of any special character set, and even when the character does not display properly on some browsers. However, it may cause difficulties with searches (see below).
  3. Use an HTML numeric character entity reference like &#161;. Unfortunately some old browsers incorrectly interpret these as references to the native character set. It is, however, the only way to enter Unicode values for which there is no named entity, such as the Turkish letters. Note that because the code points 128 to 159 are unused in both en:ISO-8859-1 and Unicode, character references in that range such as &#131; are illegal and ambiguous, though they are commonly used by many web sites. (Note they are not technically unused, but they map to rare control codes that are illegal in html.) Also note that almost all browsers treat iso-8859-1 as windows-1252, which does have printable characters in that space, and they often find their way into article titles on en, which really causes confusion when trying to create interwiki links to said pages.

Generally speaking, Western European languages such as Spanish, French, and German pose few problems. For specific details about other languages, see: Turkish. (More will be added to this list as contributors in other languages appear.)

For the purpose of searching, a word with a special character can best be written using the first method. If the second method is used a word like Odiliënberg can only be found by searching for Odili, euml and|or nberg; this is actually a bug that should be fixed—the entities should be folded into their raw character equivalents so all searches on them are equivalent. See also Help:Searching.


See also

Help contents
Meta · Wikinews · Wikipedia · Wikiquote · Wiktionary · Commons: · Wikidata · MediaWiki · Wikibooks · Wikisource · MediaWiki: Manual · Google
Versions of this help page (for other languages see further)
What links here on Meta or from Meta · Wikipedia · MediaWiki
Reading
Go · Search · Namespace · Page naming · Section · Backlinks · Redirect · Category · Image page · Special pages · Printing
Tracking changes
Recent changes (enhanced) | Related changes · Watching pages · Diff · Page history · Edit summary · User contributions · Minor edit · Patrolled edit
Logging in and preferences
Logging in · Preferences
Editing
Starting a new page · Advanced editing · Editing FAQ · Export · Import · Shortcuts · Edit conflict · Page size
Referencing
Links · URL ·  · Footnotes
Style and formatting
Wikitext examples · CSS · Reference card · HTML in wikitext · Formula · Lists · Table · Sorting · Colors · Images and file uploads
Fixing mistakes
Show preview · Reverting edits
Advanced functioning
Expansion · Template · Advanced templates · Parser function · Magic words · System message · Substitution · Arrays · Expr parser function syntax · Transclusion
Others
Special characters · Renaming (moving) a page · Preparing a page for translation · Talk pages · Signatures · Sandbox · Legal issues for editors
Other languages: