Arithmetic coding: Difference between revisions

Content deleted Content added
Add citation
Add citation
Line 2:
{{Use dmy dates|date=December 2019}}
{{more footnotes|date=September 2016}}
'''Arithmetic coding''' is a form of [[entropy encoding]] used in [[lossless data compression]]. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the [[American Standard Code for Information Interchange|ASCII]] code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding, such as [[Huffman coding]], in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, an [[arbitrary-precision arithmetic|arbitrary-precision]] fraction ''q'' where 0.0 ≤ ''q'' &lt; 1.0. It represents the current information as a range, defined by two numbers.<ref name="LiDrew2014">{{cite book|author1=Ze-Nian Li|author2=Mark S. Drew|author3=Jiangchuan Liu|title=Fundamentals of Multimedia|url=https://books.google.com/books?id=R6vBBAAAQBAJ|date=9 April 2014|publisher=Springer Science & Business Media|isbn=978-3-319-05290-8}}</ref> Recent family of entropy coders called [[asymmetric numeral systems]] allows for faster implementations thanks to directly operating on a single natural number representing the current information.<ref name=PCS2015>[http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7170048 J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp, ''The use of asymmetric numeral systems as an accurate replacement for Huffman coding''], Picture Coding Symposium, 2015.</ref>
 
[[File:Arithmetic coding example.svg|400px|thumb|right|An arithmetic coding example assuming a fixed probability distribution of three symbols "A", "B", and "C". Probability of "A" is 50%, probability of "B" is 33% and probability of "C" is 17%. Furthermore, we assume that the recursion depth is known in each step. In step one we code "B" which is inside the interval {{Square bracket open}}0.5,&nbsp;0.83): The binary number "0.10''x''" is the shortest code that represents an interval that is entirely inside {{Square bracket open}}0.5,&nbsp;0.83). "''x''" means an arbitrary bit sequence. There are two extreme cases: the smallest ''x'' stands for zero which represents the left side of the represented interval. Then the left side of the interval is dec(0.10)&nbsp;=&nbsp;0.5. At the other extreme, ''x'' stands for a finite sequence of ones which has the upper limit dec(0.11)&nbsp;=&nbsp;0.75. Therefore, "0.10''x''" represents the interval {{Square bracket open}}0.5,&nbsp;0.75) which is inside {{Square bracket open}}0.5,&nbsp;0.83). Now we can leave out the "0." part since all intervals begin with&nbsp;"0." and we can ignore the "''x''" part because no matter what bit-sequence it represents, we will stay inside {{Square bracket open}}0.5,&nbsp;0.75).]]