Plain text: Difference between revisions

Content deleted Content added
+refnec
Plain text and rich text + Unicode definition
Line 6:
The [[Character encoding|encoding]] has traditionally been either [[ASCII]], one of its many derivatives such as [[ISO/IEC 646]] etc., or sometimes [[EBCDIC]]. [[Unicode]]-based encodings such as [[UTF-8]] and [[UTF-16]] are gradually replacing the older ASCII derivatives limited to 7 or 8 bit codes.
 
==Plain text and rich text==
Files that contain [[markup language|markup]] or other [[meta-data]] are generally considered plain-text, as long as the entirety remains in directly [[human-readable]] form (as in [[HTML]], [[XML]], and so on (as Coombs, Renear, and DeRose argue<ref>{{cite journal|date=November 1987|title=Markup systems and the future of scholarly text processing|journal=[[Communications of the ACM]]|publisher=[[Association for Computing Machinery|ACM]]|volume=30|issue=11|pages=933–947|doi=10.1145/32206.32209|url=http://xml.coverpages.org/coombs.html|first1=James H.|last1=Coombs|first2=Allen H. |last2=Renear|first3=Steven J. |last3=DeRose}}</ref>, punctuation is itself markup). The use of plain-text rather than bit-streams to express markup, enables files to survive much better "in the wild", in part by making them largely immune to computer architecture incompatibilities.
 
According to The Unicode Standard,
* «''Plain text'' is a pure sequence of character codes; plain Unicode-encoded text is therefore a sequence of Unicode character codes.»
* ''styled text'', also known as ''rich text'', is any text representation containing plain text completed by information such as a language identifier, font size, color, hypertext links<ref>
The Unicode Standard, version 6.1, General Structure, page 14
[[http://www.unicode.org/versions/Unicode6.1.0/]]</ref>.
For instance, Rich text such as SGML, RTF, HTML, XML, and TEX relies on plain text. Wiki technologi is another such example.
 
According to The Unicode Standard, plain text has two main properties in regard to Rich tetx:
* «plain text is the underlying content stream to which formatting can be applied.»
* «Plain text is public, standardized, and universally readable.»<ref>
The Unicode Standard, version 6.1, General Structure, page 14
[[http://www.unicode.org/versions/Unicode6.1.0/]]</ref>.
 
==Plain text, the Unicode definition==
 
* «Plain text represents the basic, interchangeable content of text.»
* «Plain text represents character content only, not its appearance. »
* «It can be displayed in a varity of ways and requires a rendering process to make it visible with a particular appearance.»
* «If the same plain text sequence is given to disparate rendering processes, there is no expec-
tation that rendered text in each instance should have the same appearance. »
* «Instead, the disparate rendering processes are simply required to make the text legible according to the intended reading. »
* «This legibility criterion constrains the range of possible appearances. »
* «The relationship between appearance and content of plain text may be summarized as follows: Plain text must contain enough information to permit the text to be rendered legibly, and nothing more.»
* «The Unicode Standard encodes plain text.»
* «The distinction between plain text and other forms of data in the same data stream is the function of a higher-level protocol and is not specified by the Unicode Standard itself.»<ref>
The Unicode Standard, version 6.1, General Structure, page 15
[[http://www.unicode.org/versions/Unicode6.1.0/]]</ref>.
 
 
{{refnec|More formally, the fundamental distinction of "plain text" is that no information would be lost if you went through and translated the file to a completely different [[character encoding]], or translated it to ''no'' encoding by just printing it out generically (provided the printer has a good enough font that you can correctly distinguish all the characters!). No information is conveyed by the fact that an "A" in the printout was originally stored as a byte with value 65 (as it would be in [[ASCII]]), or with value 193 (as in [[EBCDIC]]); and it certainly wasn't meant to express half of the bits of an integer.{{Clarify|post-text=(complicated jargon)|date=April 2012}} }}