Big endian, little endian, endianness. What do these terms mean and how do they affect engineers?
What Is an Endian?
It turns out, this is not the right question to ask. An "endian" is not a standalone term when discussing data. Rather, the terms "big-endian" and "little-endian" refer to formats of byte arrangement.
The terms originate from Johnathan Swift's Gulliver's Travels, in which a civil war breaks out between those who favor breaking boiled eggs on the big end ("big-endians") and those who favor breaking them on the little end ("little-endians").
In 1980, Israeli computer scientist Danny Cohen wrote a paper ("On Holy Wars and a Plea for Peace") in which he presented a tongue-in-cheek explanation of a similarly petty "war" caused by a single question:
"What is the proper byte order in messages?"
To explain this issue, he borrowed the terms big endian and little endian from Swift to describe the two opposing sides of the debate surrounding what he called "endianness".
When Swift was writing Gulliver’s Travels sometime in the first quarter of the eighteenth century, he certainly had no idea that his work would one day serve as inspiration for twentieth-century neologisms that specify the arrangement of digital data in memory and communication systems. But such is life—often strange, and always unpredictable.
Why We Need Endianness
Despite Cohen's satirical treatment of big endians versus little endians, the question of endianness is actually very important for how we deal with data.
A unit of digital information is a sequence of ones and zeros. These ones and zeros begin at the least significant bit (LSb—note the lowercase “b”) and end at the most significant bit (MSb).
This seems simple enough; consider the following hypothetical scenario:
A 32-bit processor is ready to store data, and consequently it transfers 32 bits of data into 32 corresponding memory units. These 32 memory units are collectively assigned an address, say 0x01. The data bus in the system is designed such that there is no chance of mixing up the LSb with the MSb, and all operations use 32-bit data, even if the numbers involved could easily be represented by 16 or even 8 bits. When the processor needs to access the stored data, it simply reads out 32 bits from memory address 0x01. This system is robust, and there is no need to introduce the concept of endianness.
You may have noticed that the word “byte” was never mentioned anywhere in the description of this hypothetical processor. Everything is based on 32-bit data—why is there any need to divide this data into 8-bit portions, if all of the hardware is designed for processing 32-bit data? Well, this is where theory and reality diverge. Real digital systems, even those that can directly process 32-bit or 64-bit data, make extensive use of the 8-bit data segment known as a byte.
Endianness in Memory
The process of storing digital data is a convenient means of demonstrating endianness in action—and of explaining the difference between big endian and little endian. Imagine that we’re using an 8-bit microcontroller. All of the hardware in this device is designed for 8-bit data, including the memory locations. Thus, memory address 0x00 can store one byte, address 0x01 stores one byte, and so forth.
This diagram depicts 11 bytes of memory—i.e., 11 memory locations, with each location storing 8 bits of data.
Let’s say that we decide to program this microcontroller using a C compiler that allows us to define 32-bit (i.e., 4-byte) variables. The compiler needs to store these variables in memory. It makes sense to store them in contiguous memory locations, but what’s not so clear is whether the most significant byte (MSB—note the uppercase “B”) or the least significant byte (LSB) should be stored in the lowest memory address.
In other words, should the system use a big-endian memory arrangement or a little-endian memory arrangement?
Big-endian data storage vs. little-endian data storage. “D” refers to the 32-bit data word, and the subscript numbers indicate the individual bits, from MSb (D31) to LSb (D0).
There really is no right or wrong answer here—either arrangement can be perfectly effective. The decision between big endian and little endian might be based, for example, on maintaining compatibility with previous versions of a given processor, which of course raises the question of how the engineers made the decision for the first processor in that product family. I don’t know; maybe the CEO flipped a coin.
Big Endian vs. Little Endian
Big endian indicates an organization of digital data that begins at the “big” end of the data word and continues toward the “little” end, where “big” and “little” correspond to the more-significant bits and the less-significant bits, respectively.
Little endian indicates organization that begins at the “little” end and continues toward the “big” end.
The decision between big-endian formatting and little-endian formatting goes beyond memory arrangements and 8-bit processors. The byte is a universal unit in digital systems. Just think about personal computers: hard-drive space is measured in bytes, RAM is measured in bytes, USB transfer speeds are reported in bytes per second (or bits per second)—despite the fact that 8-bit personal computing is completely obsolete. The question of endianness comes into play whenever a digital system combines byte-based storage or data transfer with numerical values that are longer than 8 bits.
Engineers need to be cognizant of endianness when data is being stored, transferred, or interpreted. Serial communication is especially susceptible to endian issues, because it is inevitable that the bytes contained in a multi-byte data word will be transferred sequentially, usually either MSB to LSB or LSB to MSB.
Endianness in the context of serial data transfer.
Parallel buses are not immune to endian confusion, though, because the bus width might be shorter than the data width, in which case a big endian or little endian order will have to be chosen for the one-byte-at-a-time parallel data transfers.
An example of endian-based interpretation is when bytes of data are transferred from a sensor module to a PC via the “serial port” (which nowadays almost certainly means a USB connection being used as a serial port). Let’s say that all you need to do is plot this data using some MATLAB code. When you incorporate the bytes into your MATLAB environment and convert them to normal variables, you have to interpret the individual byte values according to the order in which the bytes are stored in memory.
It’s really too bad that a universal endian scheme wasn’t established way back at the beginning of the digital age. I don’t even want to know how many collective hours of human life have been dedicated to sorting out problems caused by mismatched endianness.
In any event, we can’t change the past, and we also are not likely to convince every semiconductor and software company on earth to overhaul their product lines in order to achieve uniform endianness. What we can do is seek consistency in our own designs and provide clear documentation if there is the possibility of conflict between two portions of a system.