Voyager 2

Published on
60 min read––– views

Less morose and more present. Dwell on my gifts for a second

A moment one solar flare would consume, so why not

Spin this flammable paper on the film that's my life


The date is July 9th, 1979. You are the Voyager 2 space craft, and for the past 23 months, you have been voyaging towards the outer planets of Earth's solar system. Cape Canaveral is over 440 million miles away and growing about half a million miles1 more distant every "day," but mankind has never been closer to Europa. Europa looks dope btw.

As with your twin craft on an exploratory mission of the same name, you are outfitted with numerous instruments for measuring the phenomena of the furthest reaches of our planetary system, including magnetic fields, low energy charged particles, cosmic rays, plasma, and plasmatic waves. You're also equipped with several cameras for capturing images of celestial bodies:

– Oh, and a Golden Record containing, amongst other things, a playlist curated by Carl Sagan himself. The images you capture will remain unparalleled in quality until the launch of the Hubble telescope, which won't happen for another decade. You are the king of space exploration.

However, considering the technological constraints of your mission's birth in 1972, your brain is reaaally small. So small that you can barely remember the first sentence of this exposition. Perhaps that's an exaggeration, but you're only working with about 70kb of memory which isn't even enough to store a novel. It's no help that despite the success of your predecessors, like the Apollo 11 mission, Tricky Dicky has cut funding to your program.

NASA's Response to Richard Nixon Slashing Funding

All this to say that you're firing on all cylinders, transmitting observed measurements and images in real time, as you encounter them, lest you forget.

Telemtry takes place via a 3.7 meter 14 foot high-gain antenna (HGA), which can send and receive data at a smooth 160 bits/second. Transmission is pretty much all-or-nothing, though, since your memory-span is as competitive as triceratops' or another maastrichtian creature of comparable synaptic ELO. There's no retry protocol, you've already forgotten your name, and you're hurtling through space faster than any prior man made object. No takesies backsies. No do-overs. And certainly no exponential backoff.

Among the numerous tried and true scapegoats which software engineers so often leverage when their code inevitably malfunctions are 🔥 cosmic ray bit flips 🔥. "How could I, in my infinite wisdom, have produced illogical behavior within my program. Nay, 'twas the sun who hath smited mine server farm and flipped the bits just so." (The anthropomorphisation of inanimate objects isn't going away any time soon). And, of course, there's the obligatory relevant xkcd:

But in OUTER SPACE, where part of the goal of the mission is to measure cosmic rays, said cosmic rays are slightly more prevalent. Bit flips are bound to happen at some point during the ~16 hour transit from the HGA back to Houston. And the engineers at NASA knew this to be the case before sending their half-a-billion dollar reconnaisance babies out into the wild. How, then, is the integrity of the data preserved?

Information Theory: Forward Correction

Luckily, with their backs against the wall, and Nixon's cronies looking for a reason to scuff the entire program,2 NASA did not have to also invent the wheel of Error Correcting Codes. Much of the groundwork of this realm of Information Theory had already been laid by Claude Shannon, John Hamming, and other bright minds of Bell Research and Signal Corps Laboratories. I have a few other ever-growing posts on Information Theory which house the supporting research for this post in particular, which go into more depth about what error correcting codes are and how they work. But the relevant bits have been included here, since that's kind of like the whole purpose.

In order to ensure that the information being beamed across the galaxy didn't get nuked in transit like leftovers in a microwave, the signals were encoded with additional data which offered some guarantees about allowable rates of error and mechanisms for correcting those errors. They employed a Golay Encoding, an extension of a Hamming Code, which is perhaps the most famous of binary error correction mechanisms. To understand how they work, let's start with some definitions and terminology.


Error Control Codes

Error Control Codes are comprised of error detection and error correction, where an error is any bit in a message sequence which differs from its intended position. For example, our message mm might be [1,0,1,0][1, 0, 1, 0] and an error in this message might manifest as a pesky bit flip in the 3rd position: m=[1,0,0,0]m' = [1, 0, \color{red}0\color{black}, 0].


Error is intrinsically related to Entropy, which is a measure of average uncertainty about random variables.

  • For a discrete random variable X{0,1}X \in \{0, 1\}, the information conveyed by observing an outcome xx is log2P(X=x)-\log_2 P(X=x) bits.
  • If P(X=1)=1P(X=1)=1, an outcome of 1 is certain, then observing X=1X=1 yields log2(1)=0- \log_2(1) = 0 bits of information.
  • Conversely, observing X=0X=0 in this case yields log2(0)=- \log_2(0) = \infty; a total surprise at observing an impossible outcome

Noiseless Coding Theorem

Shannon's Noiseless Coding Theorem states that given nn independent and identically distributed random variables, each with entropy H(x)H(x):

  • H(x)H(x) can be compressed into n(H(x)+ϵ)n(H(x) + \epsilon) bits with negligible loss
  • or alternatively compressed into n(H(x)ϵ)n(H(x) - \epsilon) bits entailing certain loss
  • Given a noisy communication channel, a message of kk bits output by a source coder (like our HGA), the theorem states that every such channel has a precisely defined real numbered capacity that quantifies the maximum rate of reliable communication

In other words, provided that the rate RR of transmission is less than the channel capacity CC, there exists a code such that the probability of error can be made arbitrarily small


An error correcting code CC of length nn over a finite alphabet Σ\Sigma is a subset of Σn\Sigma^n.

  • Elements of CC are codewords in CC
  • The alphabet of CC is Σ\Sigma, and if Σ=q|\Sigma| = q, we say that CC is qq-ary (binary, ternary, etc.)
  • The length nn of codewords in CC is called the block length of CC
  • Associated with a code is also an encoding map EE which maps the message set M\mathcal M, identified in some canonical way with {1,2,...,C}\{1, 2, ..., |C| \} to codewords in Σn\Sigma^n
    • The code is then the image of the encoding map


Given a noisy channel with a capacity CC, if information is transmitted at rate RR (meaning k=nRk=nR message bits are transmitted in nn uses of the channel), then

  • if R<CR < C, there exist coding schemes (encoding and decoding pairs) that guarantee negligible probability of miscommunication,
  • whereas if R>CR > C, then –regardless of a coding scheme– the probability of an error at the receiver is bounded below some constant (which increases with RR, exponentially in kk)

The rate of a code can also be understood in terms of the amount of redundancy it introduces.

R(C)=logCnlogΣnR(C) = \frac{\log |C|}{n \log |\Sigma^n|}

which is the amount of non-redundant information per bit in codewords of a code CC, not to be confused with capacity.

  • The dimension of CC is given by logClogΣn\frac{\log |C|}{\log |\Sigma^n|}
  • A qq-ary code of dimension \ell has qq^\ell codewords


The Hamming Distance between two strings x,yx,y of the same length over the alphabet Σ\Sigma is denoted Δ(x,y)\Delta(x,y) and defined as the number of positions at which xx and yy differ:

Δ(x,y)={ixiyi}\Delta(x,y) = |\{ i | x_i \neq y_i\}|

The fractional or relative Hamming distance between x,yΣnx,y \in \Sigma^n is given by

δ(x,y)=Δ(x,y)n\delta(x,y) = \frac{\Delta(x,y)}{n}

The distance of an error correcting code is the measurement of error resilience quantified in terms of how many errors need to be introduced to confuse one codeword with another

  • The minimum distance of a code CC denoted Δ(C)\Delta(C) is the minimum Hamming distance between two distinct codewords of CC:
Δ(C)=minc1,c2Cc1c2Δ(c1,c2) \Delta(C) = \min\limits_{{c_1, c_2 \in C \atop c_1 \neq c_2}} \Delta(c_1, c_2) \\
  • For every pair of distinct codewords in CC, the Hamming distance is at least Δ(C)\Delta(C)
  • The relative distance of CC denoted δ(C)\delta(C) is the normalized quantity Δ(C)n\frac{\Delta(C)}{n} where nn is the block length of CC. Thus, any two codewords of CC differ in at least a fraction of δ(C)\delta(C) positions
  • For example, the parity check code, which maps kk bits to k+1k+1 bits by appending the parity of the message bits, is an example of a distance 2 code with rate k/(k+1)k/(k+1)

Hamming Weight

The Hamming Weight is the number of non-zero symbols in a string xx over alphabet Σ\Sigma

w(x)={ixi0}w(xy)=Δ(x,y)\begin{aligned} w(x) &= |\{ i | x_i \neq 0 \}| \\ w(x-y) &= \Delta(x, y) \end{aligned}

Hamming Ball

A Hamming Ball is given by the radius rr around a string xΣnx \in \Sigma^n given by the set

{yΣnΔ(x,y)r}\{y \in \Sigma^n | \Delta(x,y) \leq r\}

Properties of Codes

  1. CC has a minimum distance of 2t+12t + 1
  2. CC can be used to correct all tt symbol errors
  3. CC can be used to detect all 2t2t symbol errors
  4. CC can be used to correct all 2t2t symbol erasures – that is, some symbols are erased and we can determine the locations of all erasures and fill them in using filled position and the redundancy code

An Aside About Galois Fields

A significant portion of the discussion surrounding error correcting codes takes place within the context of Fields which sound more imposing than they actually are.

A finite or Galois Field is a finite set on which addition, subtraction, multiplication, and division are defined and correspond to the same operations on rational and real numbers.

  • A field is denoted as Fq\mathbb F_q, where qq is a the order or size of the field given by the number of elements within
    • A finite field of order qq exists if and only if qq is a prime power pkp^k, where pp is prime and kk is a positive integer
    • In a field of order pkp^k, adding pp copies of any element always results in zero, meaning tat the characteristic of the field is pp
    • All fields of order q=pkq = p^k are isomorphic, so we can unambiguously refer to them as Fq\mathbb F_q
  • Fq\mathbb F_q, then, is the field with the least number of elements and is said to be unique if the additive and multiplicative identities are denoted 0,10, 1 respectively

They should be pretty familiar for use in bitwise logic as these identities correspond to modulo 2 arithmetic and the boolean exclusive OR operation:


Other properties of binary fields

  • Every element xx of F2\mathbb F_2 satisfies x+x=0;  x=xx + x = 0; \;-x = x
  • Every element xx of F2\mathbb F_2 satisfies x2=0x^2 = 0

Error Correction and Detection: Naive Approaches

There are numerous trivial means of encoding information in an effort to preserve its integrity. Some are better than others.

Repetition Code

A repetition code, denoted (n,k)(n, k) where nn is odd is a code obtained by repeating the input of length kk-bits nn times in the output codeword. That is, the codeword representing the input 0 is a block of nn zeros and likewise for the codeword representing the input 1.

The code CC consists of the set of the two codewords:

C={[0,0,...,0],[1,1,...,1]}F2nC = \{[0, 0, ..., 0], [1, 1, ..., 1] \} \subset \mathbb F_2^n

If mm is our desired message, the corresponding codeword is then

c=[m,m,....,mn copies]c = [\underbrace{m, m, ...., m}_{n \text{ copies}} ]

The rate of such a code is abysmal.

Parity Check

A simple Parity-bit Check code uses an additional bit denoting the even or odd parity of a messages Hamming Weight which is appended to the message to serve as a mechanism for error detection (but not correction since it provides no means of determining the position of error).

  • Such methods are inadequate for multi-bit errors of matching parity e.g., 2 bit flips might still pass a parity check, despite a corrupted message
dataHamming WeightCode (even parity)odd


  • Suppose Alice wants to send a message m=1001m=1001
  • She computes the parity of her message peven(m)=1+0+0+0+1(mod2)=0p_{even}(m) = 1+ 0 + 0 + 0 + 1 (\mod 2) = 0
  • And appends it to her message: m=mp(m)=10010m' = m || p(m) = 10010
  • And transmits it to Bob AmBA \xrightarrow{m'} B
  • Bob recieves mm' with questionable integrity and computes the parity for himself:
1+0+0+0+1+0(mod2)=01+ 0 + 0 + 0 + 1 + 0(\mod 2) = 0

and observes the expected even-parity result.

If, instead, the message were corrupted in transit such that

m=10110m' = 10\color{red}1\color{black}10

then Bob's parity check would detect the error:

1+0+0+1+1+0(mod2)=101+ 0 + 0 + \color{red} 1\color{black} + 1 + 0(\mod 2) = 1 \neq 0

and Bob would curse the skies for flipping his bits. This same process works under and odd-parity scheme as well, by checking the mod 2 sum of Hamming Weight of the message against 1 instead of 0.


Interleaving is a means of distributing burst errors over a message. A burst error is an error which affects a neighborhood of bits in a message sequence, rather than just a single position.

For example, if we have a message:

m=[m1,m2,m3,...,m16]m = [m_1, m_2, m_3, ..., m_{16}]

which is struck by a particularly beefy cosmic ray such that a range of positions are affected:

m=[m1,m2,m3,m4,m5,m6,m7,m8,...,m16]m' = [m_1, m_2, m_3, \color{red} m_4 \color{black},\color{red} m_5 \color{black},\color{red} m_6\color{black},\color{red} m_7 \color{black}, m_8,..., m_{16}]

By arranging the message in a matrix, transposing it, then unwrapping the matrix back into a vector, an effect analogous to cryptographic diffusion is achieved:

M=[m1m2m3m4m5m6m7m8m9m10m11m12m13m14m15m16]    MT=[m1m5m9m13m2m6m10m14m3m7m11m15m4m8m12m16]mT=[1,5,9,13,2,6,10,14,3,7,11,15,4,8,12,16]m=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]M = \begin{bmatrix} m_1 & m_2 & m_3 & \color{red} m_4 \\ \color{red} m_5 & \color{red} m_6 & \color{red} m_7 & m_8 \\ m_9 & m_{10} & m_{11} & m_{12} \\ m_{13} & m_{14} & m_{15} & m_{16} \\ \end{bmatrix} \implies M^T = \begin{bmatrix} m_1 & m_5 & m_9 & \color{red} m_{13} \\ \color{red} m_2 & \color{red} m_6& \color{red} m_{10} & m_{14} \\ m_3 & m_{7} & m_{11} & m_{15} \\ m_{4} & m_{8} & m_{12} & m_{16} \\ \end{bmatrix} \\ \begin{aligned} \\m^T &= [1, 5, 9, \color{red}13 \color{black},\color{red} 2\color{black},\color{red} 6\color{black},\color{red} 10 \color{black}, 14, 3, 7, 11, 15, 4, 8, 12, 16] \\ m &= [1, \color{red} 2 \color{black}, 3, 4, 5, \color{red}6 \color{black}, 7, 8, 9, \color{red}10 \color{black}, 11, 12, \color{red}13 \color{black}, 14, 15, 16] \\ \end{aligned}

thus distributing the error across the message, making it easier to recover the complete message (lots of "small" errors are preferable to one gaping hole in our data as we'll see shortly).

Error Correction and Detection: Better Approaches

Hamming Codes

Hamming Codes are the most prevalent of more sophisticated mechanisms for error correction and detection. A qq-ary code block of length nn and dimension kk is referred to as an [n,k]q[n, k]_q code.

  • If the code has distance dd, it may be expressed as [n,k,d]q[n, k, d]_q and when the alphabet size is obvious or irrelevant, the subscript may be ommitted

The simplest Full Hamming Code is [3,1,3]2[3, 1, 3]_2 and can be represented geometrically as a cube where the two code words are opposite vertices:

Opposite corners are necessarily three edges away from each other per this code's Hamming Ball property. We have two codewords which we assume are accurate (000000, 111111) representing our 1-bit message, and label all the other corners with "what we might get" if one of the bits gets corrupted.

"But Peter," you might say, "that's not a sphere, that's a cube."

Go fuck yourself:

To decode, we take the correct message to correspond to the majority of the bits in our codeword. The corners adjacent to codeword 000000 are 001,010,100001, 010, 100. If we "round" according to the majority of bits present, we preserve our target message of 00.

This code is "perfect" since each corner is used as either a target {000",111"}\{``000", ``111" \} or a correction vector. If the receiver of an encoded message observes a correction vector, the decoding component will select the nearest composite codeword target.

The next full Hamming Codes are [7,4,3][7, 4, 3] and [15,11,3][15, 11, 3], the former of which is also perfect on a 7-dimensional hypercube. Note that the degree of accuracy of all three of these codes 3,7,15,...3, 7, 15, ... are all 2n12^n - 1, that is, one less than a power of 2.

Linear Codes and Generator Matrices

General codes may have no structure as we've seen above, but of particular interest to us are code with an additional structure called Linear Codes.

If Σ\Sigma is a field and CΣnC \subset \Sigma^n is a subspace of Σn\Sigma^n then CC is said to be a Linear Code. As CC is a subspace, there exists a basis c1,c2,...,ckc_1, c_2, ..., c_k where kk is the dimension of the subspace. Any codeword can be expressed as the linear combination of these basis vecors. We can write these vectors in matrix form as the columns of an n×kn \times k matrix which is called a Generator Matrix.

Hamming Codes can be understood by the structure of their corresponding parity-check matrices which allow us to generalize them to codes of larger lengths.

CHamC_\text{Ham} = [7,4,3]2[7,4,3]_2 is a Hamming Code using the Generator Matrix GG, and a parity matrix HH:

G=[1000010000100001011110111101],H=1    2    3    4    5    6    7[000111101100111010101] G = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 \\ \end{bmatrix}, \qquad H = \begin{array}{rcl} &\begin{array}{c} \color{red}1 \ \ \ \ 2 \ \ \ \ 3 \ \ \ \ 4 \ \ \ \ 5 \ \ \ \ 6 \ \ \ \ 7 \end{array} \\ &\begin{bmatrix} 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \end{bmatrix} \end{array}

and we can see that HG=0HG = \mathbf 0, which means that if a message is encoded with GG yield GxGx and the product against the parity-check matrix is not 0, we know that we have an error, and we can not only deduce the location of the error, but also correct it.

Linear Construction

The original presentation of the Hamming code was offered by guess-who in 1949 to correct 1-bit errors using fewer redundant bits. Chunks of 4 information bits x1,x2,x3,x4x_1, x_2, x_3, x_4 get mapped to ("encoded") a codeword of 7 bits as

x1,x2,x3,x4,x2x3x4,x1x3x4,x1x2x4x_1, x_2, x_3, x_4, \\ x_2 \oplus x_3 \oplus x_4, \\ x_1 \oplus x_3 \oplus x_4, \\ x_1 \oplus x_2 \oplus x_4

which can be equivalently represented as a mapping from xx to GxGx (operations performed mod2\mod 2) where xx is the column vector [x1x2x3x4]T[x_1x_2x_3x_4]^T and GG is the matrix

G=[1000010000100001011110111101]G = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 \end{bmatrix}

Two distinct 4-bit vectors x,yx,y get mapped to the codewords Gx,GyGx, Gy which differ in the last 3 bits. This code can correct all single bit-flip errors since, for any 7-bit vector, there is at most one codeword which can be obtained by a single bit flip. For any y{0,1}7y \in \{0,1 \}^7 which is not a codeword, there is always a codeword which can be obtained by a single bit flip.

Graph Construction

A Wolf trellis for a Hamming Code is a graph associated with a block of a given called. Paths on the graph correspond to vectors v\mathbf v that satisfy the parity check condition vHT=0\mathbf vH^T = \mathbf 0. The trellis states at the kkth stage are obtained by taking all possible binary linear combinations of the first kk columns of HH. The Viterbi Algorithm for decoding this representation finds the best path through the graph.

This is the Wolf trellis for the above [7,4,3]2[7,4,3]_2 Hamming Code:


I'm just going to come out and say it. John Conway was a lizard man with an unparalleled case of chronic pattern-matching-brain. The man was of a different breed; the opposite of a maastrichtian.

Legend has it that a collegial IEEE Fellow of Conway's saw him dominating the schoolyard children in various games of Turning Turtles and Dots and Boxes and said "I bet you can't turn that into a Binary Lexicode of [n44,k,d10]2[n \leq 44, k, d \leq10]_2" to which Conway famously responded "Hold my beer."3

Attached for the author's amusement is an image of this absurd paper:

Madlad. Conway and Sloane showed that winning positions of a game of Mogal can be used to construct a Golay Code.

a position in Mogul is a row of 24 coins. Each turn consists of flipping from one to seven coins such that the leftmost of the flipped coins goes from head to tail. The losing positions are those with no legal move. If heads are interpreted as 1 and tails as 0 then moving to a codeword from the extended binary Golay code guarantees it will be possible to force a win.

Correcting Single Errors with Hamming codes

Suppose that yy is a corrupted version of some unknown codeword cCc \in C with a single error (bit flip). We know that by the distance property of CHamC_\text{Ham}, cc is uniquely determined by yy. We could determine cc by flipping each bit of yy and check if the resulting vector is in the null space of HH

  • Recall that the nullspace of a matrix AA is the set of all nn-dimensional column vectors xx such that Ax=0Ax = \mathbf 0

However, a more efficient correction can be performed. We know that y=c+eiy = c + e_i, where eie_i is the column vector of all zeros except for a single 1 in the iith position. Note that Hy=H(c+ei)=Hc+Hei=Hy = H(c+ e_i) = Hc + He_i = the iith column of HH which is the binary representation of ii, and thus this recovers the location ii of the error. We say that HyHy is the syndrome of yy.

Examples of Hamming Codes

  • [n,n1,2]2[n, n-1, 2]_2 corresponds to the binary parity check code consisting of all vectors in F2n\mathbb F_2^n of even Hamming Weight
  • [n,1,n]2[n, 1, n]_2 is a binary repetition code consisting of the two vectors 0n,1n\mathbf 0^n, \mathbf 1^n
  • [7,4,3]2[7,4,3]_2 is the linear Hamming Code discussed above

While fascinating, these codes can only correct 1 error, so a more robust code may be needed.

Perfect Codes

There are a few perfect binary codes:

  • The Hamming Code CHam(r)C^{(r)}_\text{Ham}
  • The Golay Code
  • The trivial codes consistng of just 1 codeword, or the whole space, or the repetition code {0n,1n}\{0^n, 1^n\} for odd nn

The Golay Code

Like other Hamming Codes, the Golay Code can be constructed in a number of ways. A popular construction of Gol23\text{Gol}_{23} consists of

(c0,c1,...,c22){0,1}23(c_0, c_1, ..., c_{22}) \in \{0,1\}^{23}

when the polynomial c(X)=c0X0+c1X1+...+c22X22c(X) = c_0X^0 + c_1X^1 + ... +c_{22}X^{22} is divisible by

g(X)=X11+X9+X7+X6+X5+X+1g(X) = X^{11} + X^9 + X^7 + X^6 + X^5 + X +1

in the ring F[X]/(X231)\mathbb F[X]/(X^{23}-1). "The Ring" here is just a generalization of a Galois field with some additional properties. It is sufficient to think of it as a field with mod2\mod 2 operations.

The seminal paper on this code was initially published by Marcel Golay in 1949 in an effort to devise a lossless binary coding scheme under a repetition constraint (i.e. all or nothing, no retries). It extends Claude Shannon's 7-block example to the case of any 2n12^n-1 binary string. When encoding a message, the nn redundant symbols xmx_m are determined in terms of the message symbols YkY_k from the congruent relations:

Emxm+k=1km(pn1)/(p1)namk/Yk0(modp)E_m \equiv x_m + \sum_{k=1}^{k_m(p^n-1)/(p-1)-n} a_{mk} / Y_k \equiv 0 (\mod p)

In decoding, the EE's are recalculated with the received symbols, and their ensemble forms a number on the base pp which determines unequivocally the transmitted symbol and its correction.

This approach can be generalized from nn+1n \rightarrow n+1 via a matrix of nn rows and pn1/p1p^n-1/p-1 columns formed with the coefficients of the XX's and YY's in the expression above related pp times horizontally, while an n+1n+1st row added –consisting of pn1/p1p^n-1/p-1 zeroes, followed by as many ones up to p1p-1; an added column of nn zeroes with a one for the lowest term completes the matrix for n+1n+1.4

The coding scheme below for 23 binary symbols and a max of 3 transmission errors yields a power saving of 1121 \frac{1}{2} db for vanishing probabilities of errors, and approaches 3 db for increasing nn's of blocks of 2n12^n-1 binary blocks, and decreasing probabilities of error, but loss is always encountered for n=3n=3.

1    2    3    5    6    7    8    9    10    11    121234567891011[100111000111101011011001101101101010101110110100110011101100110101110001110110011010111001010110111010100011111100001100011111111111] \begin{aligned} \begin{array}{rcl} &\begin{array}{c} 1 \ \ \ \ 2 \ \ \ \ 3 \ \ \ \ 5 \ \ \ \ 6 \ \ \ \ 7 \ \ \ \ 8 \ \ \ \ 9 \ \ \ \ 10 \ \ \ \ 11 \ \ \ \ 12 \end{array} \\ \begin{matrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \\ 7 \\ 8 \\ 9 \\ 10 \\ 11 \end{matrix} \hspace{-1em} &\begin{bmatrix} 1 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1 & 0 & 1 & 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 & 1 & 1 & 0 & 0 & 0 & 1 \\ 1 & 1 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 1 & 0 \\ 1 & 1 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix} \end{array} \end{aligned}

Using Golay Codes

The Golay Code consists of 12 information bits and 11 parity bits, resulting in a Hamming distance of 7: [23,12,7]2[23, 12, 7]_2.

Given a data stream, we partition our information into 12-bit chunks and encode each as a codeword using special Hamming Polynomials:

X11+X9+X7+X6+X5+X+1=101011100011X11+X10+X6+X5+X4+X2+1=110001110101\begin{aligned} X^{11} + X^9 + X^7 + X^6 + X^5 + X + 1 &= \color{blue} 101011100011 \\ X^{11} + X^{10} + X^6 + X^5 + X^4 + X^2 + 1 &= \color{blue} 110001110101 \\ \end{aligned}

We'll define our data as a trivial message of convenient length 12:

m=101010101010m = \color{red}101010101010

We append 11 zero bits to the right of our data block:

m=m011=10101010101012  0000000000011\begin{aligned} m' &= m || \mathbf 0^{11} \\ &= \color{red} \underbrace{\color{red}101010101010}_{\color{black}12} \color{black} \; \underbrace{00000000000}_{11} \end{aligned}

and perform division on the polynomial via the bitwise XOR operation:

1010101010100000000000010101110001100000100100100000101011100011001111000011001010111000110101111011110101011100011000100111101000101011100011001100001011\begin{array}{cc} \color{red} 1 & \color{red} 0 & \color{red} 1 & \color{red} 0 & \color{red} 1 & \color{red} 0 & \color{red} 1 & \color{red} 0 & \color{red} 1 & \color{red} 0 & \color{red} 1 & \color{red} 0 & \color{black} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\ \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \color{blue} 1 & \color{blue} 0 & \color{blue} 0 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \color{black} \downarrow & \downarrow & \downarrow & \downarrow & \downarrow & | & | & | & | & | & | & \\ \hline % good 0 & 0 & 0 & 0 & 0 & | 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & | & | & | & | & | & | \\ % good & & & & & | \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \color{blue} 1 & \color{blue} 0 & \color{blue} 0 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \downarrow & \downarrow & | & | & | & | \\ \hline % good & & & & & 0 & 0 & | 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & | & | & |& | \\ % good & & & & & & & | \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \color{blue} 1 & \color{blue} 0 & \color{blue} 0 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \downarrow & | & | & | \\ \hline % good & & & & & & & 0 & | 1 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 1 & 1 & 1 & 0 & | & | & | \\ % good & & & & & & & & | \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \color{blue} 1 & \color{blue} 0 & \color{blue} 0 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \downarrow & \downarrow & \downarrow \\ \hline % good & & & & & & & & 0 & 0 & 0 & | 1 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 0 & 0 & 0 \\ % good & & & & & & & & & & & | \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 & \color{blue} 1 & \color{blue} 0 & \color{blue} 0 & \color{blue} 0 & \color{blue} 1 & \color{blue} 1 \\ \hline % good & & & & & & & & & & & 0 & \color{green} 0 & \color{green}1 & \color{green}1 & \color{green}0 & \color{green}0 & \color{green}0 & \color{green}0 & \color{green}1 & \color{green}0 & \color{green}1 & \color{green}1 \\ % good \end{array}


Thus, m/hmod2m' / h \mod 2 yields 1100001011\color{green}1100001011: the syndrome of the message. The 11-bit zero vector is replaced with the syndrome, and if decoded with a Hamming polynomial the resultant syndrome or remainder will be 0:

101010101010  01100001011/101011100011=0mod2\color{red}101010101010 \; \color{green}01100001011 \color{black}/ \color{blue} 101011100011\color{black} = 0 \mod 2

A 1-bit error in transmission such as

101010101010  00100001011/101011100011=00100000000\color{red}101010101010 \; \color{green}0\color{black}\underset{\uparrow}{0}\color{green}100001011 \color{black} / \color{blue} 101011100011\color{black} = 00\color{red}1\color{black}00000000

would have a syndrome of 1.

However, syndrome should not be construed as proportionate to magnitude of error. A 1-bit error in the information bits yields a syndrome of 7:

101110101010  0100001011/101011100011=1100011011\color{red}101\color{black}\underset{\uparrow}{1}\color{red}10101010 \; \color{green}0\color{green}100001011 \color{black} / \color{blue} 101011100011\color{black} = \color{red}11\color{black}000\color{red}11\color{black}0\color{red}11

An algorithm for decoding the parity bits of a Golay Code is as follows:

  1. Compute the syndrome s^\hat{s} of the codeword. If it is 0, then no errors exist, and proceed to (5).
  2. If w(s^)n=3w(\hat{s}) \leq n = 3, the syndrome matches the error pattern bit-for-bit and can be used to XOR the errors out of the codeword. If possible, remove errors and toggle back the toggled bit and proceed to (5).
  3. Toggle a trial bit in the codeword to eliminate one bit error. Restore any previously toggled trial bit. Proceed to (4), else return to (2) with n=2n=2.
  4. Rotate the codeword cyclically left by 1 bit and go to (1)
  5. Rotate the codeword back to its initial position.

How to Actually Beam Them Up, Scotty?

"Okay but how do these encodings actually get transmitted? I mean we're talking about space here, not an IRC channel."

Binary Phase-Shift Keying is a technique for mapping a binary sequence to a waveform for transmission over a channel. Given a binary sequence {...,b2,b1,b0,b1,b2,...}\{..., b_{-2}, b_{-1}, b_{0}, b_{1}, b_{2}, ... \}, we map the set {0,1}{1,1}\{0, 1\} \rarr \{-1, 1\} via

b~i=(2bi1) or b~i=(2bi1)\tilde{b}_i = (2b_i - 1) \text{ or }\tilde{b}_i = -(2b_i - 1)

We also have ai=Eb(2bi1)=Eb(1)bi=Ebb^ia_i = \sqrt{E_b}(2b_i -1) = -\sqrt{E_b}(-1)^{b_i} = \sqrt{E_b}\hat{b}_i, a mapping of a bit bib_i or (b^i\hat{b}_i) to a signal amplitude that multiplies a waveform φ1(t)\varphi_1(t) which is a unit-energy signal:

φ1(t)2dt=1\int^\infty_{-\infty} \varphi_1(t)^2dt = 1

over [0,T)[0, T). EbE_b is the amount of energy required to transmit a single bit across the channel. A bit bib_i arriving at iTiT can be represented as the signal aiφ1(tiT)a_i\varphi_1(t - iT) where the energy required to transmit the bit bib_i is given by

(aiφ1(t))2dt=Eb\int^\infty_{-\infty} (a_i \varphi_1(t))^2 dt = E_b

The transmitted signal ±Ebφ1(t)\pm \sqrt{E_b}\varphi_1(t) are points ±Eb\pm \sqrt{E_b} in a (for our purposes: one-) dimensional signal space on the unit-energy φ1\varphi_1 axis

such that a sequence of bits [1,1,0,1,0][1,1,0,1,0] can be expressed as a waveform6:

We can expand this representation to two dimensions by introducing another signal φ2(t)\varphi_2(t) and modulating the two to establish a signal constellation. Let M=2m;mZM = 2^m; m \in \mathbb Z, be the number of points in a constellation. MM-ary transmission is obtained by placing MM points S={(a1k,a2k),k=0,1,...,M1}\mathcal{S} = \{(a_{1k}, a_{2k}), k = 0, 1, ..., M-1 \} in this signal space and assigning each point a unique pattern of mm bits.

Once our message has been encoded for error correction, and further transformed via signal constellation S\mathcal S into a waveform, it must be read by a receiver somewhere. We call the transmitted value SS\mathbf S \in \mathcal S, chosen with prior probability P(S=s)P(\mathbf S=s). The receiver uses the received point R=r\mathbf R =r to determine what value to acsribe to the signal. This estimated decision is denoted

s^=[a^1a^2]S\hat s = \begin{bmatrix} \hat a_1 \\ \hat a_2\end{bmatrix} \in \mathcal S

and is used to determine the likelihood of observing a signal given a received data point P(sr)P(s | r). The decision rule which minimizes the probability of error is to choose s^\hat s to be the value of ss which maximizes P(sr)P(s | r), where the possible values of ss are those in the signal constellation:

s^=argmaxsS  P(sr)=argmaxsS  pRS(rs)P(s)pR(r)(Bayes’ rule)\begin{aligned} \hat s &= \arg \max \limits_{s \in \mathcal S} \;P(s | r) \\ \tag{Bayes' rule} &= \arg \max \limits_{s \in \mathcal S} \;\frac{p_{R | S}(r | s)P(s)}{p_R(r)} \end{aligned}

Since the denominator of this expression does not depend on ss, we can further simplify:

=argmaxsS  pRS(rs)P(s)= \arg \max \limits_{s \in \mathcal S}\; p_{R | S}(r | s)P(s)

leaving us with the Maximum A Posteriori decision rule. And, in the case that all priors are equal (as we assume with our binary code), this can also be further simplified to a Maximum Likelihood decision rule:

s^=argmaxsS  pRS(rs)\hat s = \arg \max \limits_{s \in \mathcal S} \;p_{R | S}(r | s)


Once the decision s^\hat s is made, the corresponding bits are determined by the constellation; the output of the receiver is a maximum likelihood estimate of the actual bits transmitted. s^\hat s is selected to be the closest point according to Euclidian distance rs^2\parallel r - \hat s^2 \parallel.

The "raw" signal distributions are dueling Gaussians about the corresponding mean measurements of b{0,1}b \in \{0, 1\} under our projections to b^{1,1}\hat{b} \in \{-1, 1\}.

p(rs=Eb)=12πe12σ2(rEb)2p(rs=Eb)=12πe12σ2(r+Eb)2 p(r|s=\sqrt{E_b}) = \frac{1}{2 \pi }e^{-\frac{1}{2 \sigma^2}(r - \sqrt{E_b})^2} \\ p(r|s=-\sqrt{E_b}) = \frac{1}{2 \pi }e^{-\frac{1}{2 \sigma^2}(r + \sqrt{E_b})^2}

and the weighted conditional densities of the Maximum A Posteriori decision rule distribution resemble:

where τ\tau is threshold at which

p(rs=Eb)P(s=Eb)=p(rs=Eb)P(s=Eb) p(r|s =\sqrt{E_b})P(s=\sqrt{E_b}) = p(r|s = -\sqrt{E_b})P(s=-\sqrt{E_b})

which, when met or exceeded, implies that our decision rule becomes

s^={Eb    bi=0if r<τEb    bi=1if r>τ \hat{s} = \begin{cases} -\sqrt{E_b} &\implies b_i = 0 &\text{if } r < \tau \\ \sqrt{E_b} &\implies b_i = 1& \text{if } r > \tau \\ \end{cases}

which can be computed explicitly by solving for the intersection point:

τ=σ22EblnP(s=Eb)P(s=Eb) \tau = \frac{\sigma^2}{2\sqrt{E_b}}\ln \frac{P(s = -\sqrt{E_b})}{P(s = \sqrt{E_b})}


Binary Detection errors in signal reception can still occur when a received point RR exceeds this threshold τ\tau which can be computed with a Cumulative Distribution Function:

Q(x)=P(N>x)=12πxen2/2dn Q(x) = P(N > x) = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-n^2/2} dn

which we will use in measuring the performance of various coding schemas.

In the special case where measured signal strengths are assumed to be equal, the overall probability of error is given by:

P(ε)=Q(ba2σ) P(\varepsilon) = Q\Big(\frac{|b-a|}{2\sigma}\Big)

where a,ba,b are signals.

All together, our Binary Phase-Shift Keying transmission system from HGA to Houston looks something like this:

Performance of Error Correcting Codes

How "expensive" is it to add these redundant bits to our transmission? Is it worth it?

What, you didn't think Europa looked cool?

Recall that in our system, kk inputs result in nn output bits where n>kn > k. R=k/nR=k/n is the rate of our correcting code, and for some arbitrary transmission budget we have EbE_b joules/bit for the unencoded kk bits of data which is spread over more nn encoded bits. This relationship can be expressed as Ec=REbE_c = RE_b, where EcE_c is understood as the "energy per coded bit," and necessarily Ec<EbE_c < E_b meaning that unencoded transmission performs better in terms of energy, but we risk error by omitting the encoding process.

At the receiver, the detected coded bits are passed to the decoder to attempt to correct errors. In order to be of any use to us, the value of the code must be strong enough so that the bits received can compensate for the lower energy per bit in the channel.

Reptition Code

For a repetition code with a transmission rate of 1 bit/second, we must send nn coded bits/second and with nn times as many bits to send, there is still a constant amount of power shared by all bits: Ec=Eb/nE_c = E_b/n. Thus there is less energy available for each bit to convey information, and the probability of error is

Q(2Ec/N0)=Q(2Eb/nN0) Q(\sqrt{2E_c/N_0}) = Q(\sqrt{2E_b/nN_0})

Notice that the probability of error is higher as a result of using such a code! 💩

Golay Code

As a concrete example, let's compare the relative performance of a Golay Code against a pure stream of data, that is a message that has 0 redundant bits (but high risk of data loss):

  • System S1\mathscr S_1 transmits data bits directly
  • System S2\mathscr S_2 trasmits a total of 23 bits for every 12 data bits

We assume a constant unit of power is available to each system. System S2\mathscr S_2 employs a Golay Code transmitting roughly twice as many bits as S1\mathscr S_1, so the strength of energy across each bit is halved. aa is the amplitude of detected output, which will be 1/21/\sqrt{2} since EA2E \propto A^2,7 but the cost of these penalty bits is necessary to ensure corrective ability. P(ε)P(\varepsilon) is the measure of penalty in terms of error for decreasing the power of transmission by a factor of 2.

S1\mathscr S_11111/20001/2000
S2\mathscr S_21/21/21/21/\sqrt{2}1/1001/100

Thus, S2\mathscr S_2 is twenty-times more susceptible to error because of the reduced strength of energy per bit, meaning that the receiver will have more difficulty correctly measuring which codeword is being transmitted. However, the redundancies which the Golay Code introduces allow for guaranteed recovery of the complete message, whereas one in two thousand bits in our pure stream of data from S1\mathscr S_1 are simply lost forever like the Bitcoin I mined on the family computer ten years ago.

The Hadamard Code

The dual of the Hamming Code CHam(r)C^{(r)}_\text{Ham} has a generator matrix Gr=(Hr)TG_r = (H_r)^T which is a (2r1)×r(2^r-1)\times r matrix whose rows are all non-zero bit vectors of length rr. This yields a [2r1,r]2[2^r-1, r]_2 code and is called a simplex code.

The Hadamard Code, is obtained by adding an all-zeros row to GrG_r.

Hadr\text{Had}_r is a [2r,r]2[2^r, r]_2 linear code whose 2r×r2^r \times r generator matrix has all rr-bit vectors as its rows. Its encoding map encodes xF2rx \in \mathbb F_2^r by a string in F22rF_2^{2^r} consisting of the dot product x,a\langle x,a \rangle for every aFqka \in \mathbb F^k_q. The Hadamard code can also be defined over Fq\mathbb F_q, by encoding a message in Fqk\mathbb F_q^k with its dot product with every other vector in that field.

It is the most redundant linear code in which no two codeword symbols are equal in every codeword, implying a robust distance property:

  • The Hadamard Code has a minimum distance of 2r12^{r-1}.
  • The qq-ary Hadamard Code of dimension rr has a distance (11q)qr(1-\frac{1}{q})q^r

Binary codes cannot have a relative distance δ(c)\delta(c) of more than 12\frac{1}{2}, unless they only have a fixed number of codewords. Thus, the relative distance of Hadamard Codes is optimal, but their rate is necessarily poor.

Limits of Error Correcting Codes

The Hamming and Hadamard codes exhibit two extreme trade-offs between rate and distance

  • Hamming Code's rates approach 1 (optimal), but their distance is only 3
  • Conversely, Hadamard Code's have an optimal relative distance of 12\frac{1}{2}, but their rate approaces 0

The natural question that follows is is there a class of codes which have good rate and good relative distance such that neither approach 0 for large block lengths? To answer this question, we consider asymptotic behavior of code families:

Define a code family C\mathcal C to be an infinite collection {CiiN}\{ C_i | i \in N \}, where CiC_i is a qiq_i-ary code of block length nin_i, with ni>ni1n_i > n_{i-1} and qi>qi1q_i > q_{i-1}

The Familial Rate and Distance of a category of codes are denoted:

R(C)=lim_ikini \mathcal {R(C)} = \lim\_{i \rightarrow \infty } \frac{k_i}{n_i}
δ(C)=lim_iΔ(Ci)ni \delta\mathcal {(C)} = \lim\_{i \rightarrow \infty } \frac{\Delta(C_i)}{n_i}

A qq-ary family of codes is said to be asymptotically good if its rate and relative distance are bounded away from zero such that there exist constant

R_0>0,δ0>0  s.t.R(C)>R0,δ(C)>δ0 \begin{aligned} \mathcal{R}\_0 &> 0, &\delta_0 > 0 \; s.t. \\ \mathcal{R(C)} &> \mathcal{R}_0, &\delta(\mathcal{C}) > \delta_0 \end{aligned}

While a good deal is known about the existence or even explicit construction of good codes as well as their limitations, the best possible asymptotic trade-off between rate and relative distance for binary codes remains a fundamental open question which has seen little advance since the late 70s, in part due to the the fundamental trade off between large rate and large relative distance.

The Hamming and Hadamard Codes help to highlight the flexibility and main take aways of the Channel Coding Theorem as outlined by Wiley:

  • As long as R<CR < C, arbitrarily reliable transmission is possible
  • Code lengths may have to be long to achieve desired reliability. The closer RR is to CC, the larger we would expect nn to need to be in order to obtain some specified degree of performance
  • The theorem is general, based on ensembles of random codes, and offers little insight into what the best code should be. We don't know how to design the, just that they exist
  • Random codes have a higher probability of being good, so we can reliably just pick one at random

What then, is the problem? Why the need for decades of research into the field if random selection of a code might be our best bet?

The answer lies in the complexity of representing and decoding a "good" code. To represent a random code of length nn, there must be sufficient memory to store all associated codewords, which requires n2Rnn2^{Rn} bits. To decode a received word yy, Maximum Likelihood Estimation decoding for a random signal requires that a received vector must be compared with all 2Rn2^{Rn} possible codewords. For a middling rate of R=1/2R=1/2, with block length n=1,000n = 1,000 (still relatively modest), 25002^{500} comparisons must be made for each received signal vector... This is prohibitively expensive, beyond practical feasibility for even massively parallelized computing systems, let alone our 70kb, Fortran-backed integrated circuit.


Today Voyager 2 is 12 billion miles away, and traveling over 840,000 miles further each day. We're still actively transmitting and receiving data (intact) from both of the Voyager spacecrafts.

It's somewhat difficult to appreciate the absurd amounts of genius which made make that feat possible. I find the difference in the orders of magnitude between the 30,000-odd words worth of computing power available to the space craft and the billions of miles of distance separating the scientists from their experiment which they expertly designed to outlive them to be mind boggling.

Pretty sick if you ask me.

Happy Discovery of Adrastea day.

References and Footnotes


  1. Um, so yeah. I'm going to be using imperial measurements. If only there were a convenient way to quickly convert between metric and imperial, say the 42nd and 43rd numbers of the Fibonacci sequence? Easily, we recall that 433,494,437 precedes 701,408,733 in the sequence and thus, 440 million miles is roughly 700 million kilometers.

  2. Nixon was actually a large proponent of the space program, but this story needs an antagonist, and he's as good as any.

  3. More lizard brain pattern matching at work here.

  4. Marcel was a crackhead for thinking this is a sufficient explanation for the construction. The Paper, if it can even be called that, could fit on a napkin. Mathematicians of a certain flavor exhibit the same kind of stoney-Macgyver behavior, see Conway.

  5. These diagrams took me several hours. Please clap.

  6. This diagram was taken from Error Correction Coding: Mathematical Methods and Algorithms which is a good book.

  7. Energy in a wave is proportional to Amplitude squared.