Binary and text files
(Redirected from Text file)
Categories: Computer file formats | Computer data
Computer files can be divided into two broad categories: binary and text. Text files are files which contain ordinary textual characters with essentially no formatting; binary files are all other files. Or, rather, text files are a special case of binary files, since any file is fundamentally a sequence of bits, and many computer components (for example, all hard disk circuitry and most system software) make no distinction between file types. However, a large percentage of application programs can understand and use text files in some way, but few programs can typically understand and use the contents of a particular binary file. Hence the distinction can be useful to computer users.
Text files
Text files (or plain text files) are files where most bytes (or short sequences of bytes) represent ordinary readable characters such as letters, digits, and punctuation (including space), and include some control characters such as tabs, line feeds and carriage returns. This simplicity allows a wide variety of programs to display their contents.
The similar term plaintext is most commonly used in a cryptographic context. The similarity sometimes causes confusion, especially among those new to computers, cryptography, or data communications.
Generally, a text file contains characters in an ASCII-based encoding, or much less commonly an EBCDIC-based encoding, without any embedded information such as font information, hyperlinks or inline images. Text files are often encoded in an extension of ASCII; these include ISO 8859, EUC, a special encoding for Windows, a special encoding for Mac OS, and Unicode encoding schemes (common on many platforms) such as UTF-8 or UTF-16.
Although many text files are generally meant for humans to read, some are (also) used for data storage by computer programs. Text files are sometimes advantageous even for data storage because they avoid certain problems with binary files, such as endianness, padding bytes, or differences in the number of bytes in a machine word.
Plain text is often used as a readable representation of other data that is not itself purely textual: for example, a formatted webpage is not plain text, but its HTML source is. Similarly, source code for computer programs is usually stored in text files, but is compiled into a binary form for execution.
Text files usually have the MIME type "text/plain", usually with additional information indicating an encoding. Prior to the advent of Mac OS X, the Mac OS system regarded the content of a file (the data fork) to be a text file when its resource fork indicated that the type of the file was "TEXT". The Windows system regards a file to be a text file if the suffix of the name of the file is "txt". However, source code for computer programs are also text, but usually have file name suffixes indicating which programming language the source is written in.
Unix, Macintosh, Microsoft Windows, and DOS differ not only in which character encodings are common on the platform, but also in which line ending convention is most common on the platform. See new line for a discussion of this.
Binary files
Binary files, in contrast, may contain any data whatsoever (including plain text, since binary file is a more general concept), and usually mostly contain bytes that should not be directly interpreted as characters. Compiled computer programs are typical examples, as the data and CPU instructions they contain can — in principle — be any binary value. As a result, compiled applications (object files) are sometimes referred to as binaries. But binary files can also be image files, sound files, compressed versions of other files (of either type), etc. — in short, any file content whatsoever. Many binary file formats contain parts that are plain text.
To send binary files through certain systems (such as e-mail) that do not allow all data values, they are often translated into a plain text representation (using, for example, Base64). This encoding has the disadvantage of increasing the file's size by approximately 30% during the transfer, as well as requiring translation back into binary after receipt. See ASCII armor for more on this subject.
Binary is nothing more than a numeral system. Binary files are usually thought of as being a sequence of bytes, which means the binary digits (bits) are grouped in eights. If you open a binary file in a text editor, each group of eight bits will be translated as a single character, and you will see a (probably unintelligible) display of textual characters. If you were to open it in some other application, that application will have its own use for each byte: maybe the application will treat each byte as a number and output a stream of numbers between 0 and 255 — or maybe interpret the numbers in the bytes as colors and display the corresponding picture. If the file is itself treated as an executable and run, then the computer will attempt to interpret the file as a series of instructions in its machine language.
See binary numeral system to understand how you can convert eight bits into a "normal" decimal number.
Related Links
es:Documento de texto fr:Fichier texte ja:プレーンテキスト ru:Текстовый файл