Wait, what? What do you mean, "what is [math]\pi[/math]"? It's the ratio of the circumference of a circle to its diameter. Or it's the area of a circle of radius 1, or something like that. Everyone knows that.

*Wrong*.

Well, ok, it's not entirely wrong. It is true that [math]\pi[/math] has these properties. But that's not its essence, it's not what it fundamentally *is*. The stuff with the circles is just one aspect of [math]\pi[/math], and not the most profound one.

Consider electricity. It's called "electricity" because the Greek word for *amber* is *ἤλεκτρον*. Why amber? Because when you rub amber with fur it attracts small things. Historically, this was one of humanity's first brushes (haha) with electricity, so we named the whole concept after it. Today we know so much more about electricity, nobody would consider saying "electricity is the capacity of amber to attract small things". It is that, too, but that's not the essence of it.

Yet we still – for historical reasons, and because it's so simple and intuitive – often *define* [math]\pi[/math] as the ratio of circumference to diameter. That's how we teach it to children, and it may be inevitable (I'll argue that it's not). But it certainly causes much confusion.

How?

Because of this association, many people are led to believe that [math]\pi[/math] somehow depends on the geometry of the universe, or the way we measure angles, or lengths, or areas, or whatnot (see this question on Quora, or this one, or that one. There are tons, believe me). That's false, of course. [math]\pi[/math] doesn't care at all about the structure of the universe, or real-life circles. In fact it's not really about circles at all.

That's the second source of confusion: because [math]\pi[/math] is "about circles", we are shocked to find it hanging out in places that are decidedly non-circular, like [math]1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7}+\ldots=\frac{\pi}{4}[/math] (do you see any circles here?), or the normal distribution (where's the circle?), or the granddaddy of them all, Euler's formula [math]e^{i\pi}+1=0[/math].

That's like being shocked that your cell-phone charges despite there being no amber around.

It's not really surprising, right? Your cell-phone charges because electricity, and rubbed amber attracts little things because electricity, and both these things are caused by what electricity actually is, which is a profound aspect of the physical universe. You wouldn't try to explain cell-phone charging by arguing from amber, and you shouldn't need to explain the presence of [math]\pi[/math] in the definition of the normal distribution by hunting for circles.

What, then, *is* [math]\pi[/math]? And why is it so ubiquitous? What is its "electricity", the true underlying principle?

The answer lies with a profound, incredible, and beautiful object: **the (complex) exponential function**. You may be used to thinking of "exponent" as a slightly boring curve that rises faster and faster. In fact, the exponential function looks like this near [math]0[/math]:

Now, the essence of the exponential function, and the source of its ubiquity (and with it, the ubiquity of *both* [math]e[/math] and [math]\pi[/math]), is the simplest and most basic of all differential equations:

[math]\displaystyle f' = f.[/math]

That formula is so crucial that I'm going to write it again, really big. You should stare at it just to see how simple, natural, and profound it is.

Isn't that pretty? A function whose derivative is itself. But why is it so important? And what on earth does it have to do with [math]e[/math] or [math]\pi[/math]?

Let's take these questions in order. Actually, let's take them in reverse order. I'll first show how the exponential function relates to [math]e[/math] and [math]\pi[/math], and then I'll try to talk about why it shows up all over the place in math and science. Let me just say this first so we keep our head straight: there is only one solution to the equation [math]f'=f[/math] (once we normalize it to have [math]f(0)=1[/math], removing an unimportant degree of freedom), and this solution is **the most important function in mathematics**.

Don't take my word for it. Here's how Walter Rudin chooses to open the inimitable "Real and Complex Analysis":

So, let's start.

What Rudin does in the very next line is define the exponential function via a power series, but I'd like to postpone this for a little bit. Power series are slightly intimidating, and more importantly, I'd like to show how the properties of [math]\exp[/math] are a direct consequence of [math]f'=f[/math], so as to justify my claim that it's that beautiful equation which underlies all those seemingly unrelated surprises.

So, let's think about what we can deduce about a function [math]f[/math] which satisfies [math]f'=f[/math].

First, there's an obvious and uninteresting solution to that equation: [math]f(x)=0[/math] everywhere. Since [math]0[/math] is constant and the derivative of a constant is [math]0[/math], the [math]0[/math] function satisfies [math]f'=f[/math]. We're curious about whether there are other, more interesting, solutions.

Suppose that we've found some solution [math]f[/math]. We can use it to manufacture other solutions. For instance, [math]f(x+c)[/math] will also be a solution, by which I mean that the function [math]g(x)=f(x+c)[/math] must also satisfy [math]g'=g[/math]. This is an immediate consequence of the chain rule for derivatives. Similarly the function [math]kf(x)[/math] will also be a solution, for any constant [math]k[/math].

So if we have one solution, we seem to have many. But are they really different? It's a fact that when a differential equation like this has a solution, that solution is actually *unique* once we fix its value somewhere. That's pretty intuitive: the differential equation tells you how the function changes. If you know the value of the function at any particular point, and you know how it changes, then you know its value at any other point.

So now start with any solution [math]f(x)[/math], and suppose that it's not [math]0[/math] at [math]0[/math]: [math]f(0) \neq 0[/math]. (In fact, if it is [math]0[/math] at [math]0[/math] then, by uniqueness, it must be the uninteresting constant [math]0[/math] solution we've already discussed). We can multiply that solution by some constant to ensure that [math]f(0)=1[/math]. That's a pretty harmless normalization: all other solutions are just this one multiplied by whatever it takes to get them the right value at [math]0[/math].

Now let's pick an arbitrary number [math]c[/math] and compare the two modified solutions [math]f(c+x)[/math] and [math]f(c)f(x)[/math]. Those two functions are both solutions, and they both have the value [math]f(c)[/math] at [math]x=0[/math]. Ergo, they must be equal, by the uniqueness property. *Boom*: [math]f(c+x)=f(c)f(x)[/math], and since the constant [math]c[/math] was completely arbitrary we can rewrite this in the more familiar form

[math]\displaystyle f(x+y)=f(x)f(y)[/math] for all [math]x, y[/math].

If you're familiar with the exponential function, you should recognize this as the well-known rule [math]e^{x+y}=e^xe^y[/math]. But we've just seen that this isn't just some arbitrary rule dictated by your teachers, and its importance doesn't just stem from the fact that it turns addition into multiplication, useful as this might be. It is simply a *consequence* of the differential equation [math]f'=f[/math], plus the intuitively clear (and rigorously provable) uniqueness property.

Now here's an interesting consequence of this law. If it ever happens that [math]f(P)=1[/math] for any particular number [math]P[/math], it follows that

[math]f(x+P) = f(x)f(P) = f(x)[/math], for *every* [math]x[/math].

so the number [math]P[/math] is a *period* of the solution [math]f[/math]. A "period" simply means a value which you can shift your variable by and the function stays the same.

As you can imagine, being periodic is a pretty basic feature of a function, and if the solution to [math]f'=f[/math] happens to be periodic with some period [math]P[/math], that number [math]P[/math] is surely one heck of an important number. The equation [math]f'=f[/math], after all, is "dimensionless": it doesn't have any constants in it, it doesn't seem to dictate any particular measure or order of magnitude. Wouldn't it be surprising if it turns out to be periodic? And what could that magic period be? Surely, the period isn't going to come up as 17.29 or some such random number. If anything, it should be 1.

Only it's not.

The function [math]f[/math] which is the unique solution of [math]f'=f[/math] with the initial condition [math]f(0)=1[/math] is unique, and it's called the exponential function, and it is, in fact, periodic. Its period is (drumroll) [math]P=2\pi i[/math]. That's the magic number,** and this is how we define **[math]\pi[/math]**.**

That period is a fundamental constant of nature, coming to us as it does without any reference to lengths, units of measurement, circles, arbitrary scales or any sort of human intervention. It's an imaginary number, pure and simple, and the exponential function (which by now we might as well call by its proper name [math]\exp[/math]) satisfies

[math]\exp(x+2\pi i) = \exp(x)[/math] for *every* [math]x[/math], real or complex.

And this is the correct, modern definition of [math]\pi[/math]: it's the smallest positive real number [math]p[/math] for which [math]2 p i[/math] is a period of the exponential function (why "smallest"? Because of course if [math]P[/math] is a period, so are [math]2P[/math] and [math]3P[/math] and so on. Every periodic function has a truly fundamental period, which is the smallest one).

Look up, at the image of [math]\exp[/math] I showed. See how it repeats when you move up the picture? That's the periodicity. And the wavelength of this periodic behavior is none other than [math]2\pi i[/math].

Oh, by the way. What's [math]\exp(1)[/math]? This must also be an important number. And, of course, that number is none other than [math]e[/math]. You may have been told, at some point in your mathematical education, that [math]e[/math] is "the value of an investment of $1 after 1 year, at 100% interest, compounded continuously". This, if you beg my pardon, is bullshit. I mean, of course it's true, but that's an awful, awful way to define that number. Who the hell cares about the value of $1 after a year of continuously compounding 100% interest? No banker in the world, that's for sure. Even the associated explicit equation

[math]\displaystyle e = \lim_{n \to \infty}\left(1+\frac{1}{n}\right)^n[/math]

is sort of the worst way of thinking about this number. Sure, yes, it's sometimes helpful, but it falls far short of explaining why this number *matters*. Why would that limit matter? You can invent a dozen limits like this in ten minutes. The reason, the real reason, why the number [math]e[/math] has any utility – and it has plenty – is because it is the special value at [math]x=1[/math] of the exponential function, the most important function in mathematics.

In fact it's not hard to show that the exponential function can also be interpreted as that particular number [math]e[/math] raised to the power [math]x[/math]. And by the way, from [math]\exp(2\pi i) = 1[/math] (and the minimality of the choice of period) it follows quickly that [math]e^{\pi i} = -1[/math], and here we have Euler's formula falling out like it's the most natural thing in the world. This is where *it* comes from, and it's not really that mysterious or deep. It's just another way of expressing the periodicity of the exponential function.

Because I just pitched [math]2\pi i[/math] as the really fundamental constant, I should probably say something about [math]2\pi[/math] which some people propose to call [math]\tau[/math] and use it to depose [math]\pi[/math] from its perch. It is [math]\tau[/math], the claim goes, that is the true magic constant, and so "[math]\pi[/math] is wrong". Well, I agree that [math]2\pi[/math] is the more natural constant, though in fact I actually think it's [math]2\pi i[/math] that's the queen of all numbers. But honestly, it doesn't matter that much, and arguing over convention is usually futile. A factor of 2 (or [math]2 i[/math]) isn't what makes math better or worse.

Now, finally, I need to explain why the equation [math]f'=f[/math] is so goddamn important. We've seen that it naturally breeds the two constants [math]e[/math] and [math]\pi[/math], and we've argued that it does so in a very natural and pure way. But why should we care about this equation? (*Note: I previously touched on those questions in** this answer**, but for completeness I include them here*).

So.

Differentiation is one of the most fundamental ideas we rely on to understand the universe, both the physical one and the mathematical one. The laws of electromagnetism, the laws of Newtonian mechanics, the laws of quantum mechanics, the laws of economic behavior, the laws of heat and complex functions and smooth manifolds and many, many other things are described in terms of differential operators. And we find, not surprisingly, that the fundamental solution of the fundamental equation [math]f'=f[/math] plays a crucial role in all of those investigations.

For example, Newton's law [math]F=ma[/math] says that acceleration is proportional to the force exerted on an object. In a very important instance, that force is itself proportional to the displacement, with a negative sign: [math]F = -kx[/math]. This applies to vibrating springs, but more importantly it is the first approximation of almost any physical system near its equilibrium, and it's hard to overstate how important that is. Harmonic oscillators are everywhere.

So we get [math]ma = -kx[/math] or [math]m\ddot{x} = -kx[/math] or [math]\ddot{x} = -\omega x[/math] where [math]\omega[/math] is some constant, and [math]\ddot{x}[/math] denotes the second derivative of the position [math]x[/math] with respect to time. How do we solve this? That's easy once you know how to solve [math]f'=f[/math]. Take [math]g(x) = f(cx)[/math],where [math]c[/math] is any constant you care to choose, and so

[math]g' = f'(cx) = cf(cx) = cg(x)[/math]

and it follows that [math]g'' = c^2 g(x)[/math]. Now pick the constant [math]c[/math] such that [math]c^2 = -\omega[/math], change the variable from [math]x[/math] to [math]t[/math], and you have your solutions for the motion of a harmonic oscillator (there's two such fundamental solutions, corresponding to the two possible choices of [math]c[/math]).

This example is important because it shows how the exponential function doesn't limit itself to processes of "exponential growth" or "exponential decline". It does these things too, of course (radioactive decay is governed by the same function, for instance), but its complex form hides inside of it also the periodic functions [math]\sin[/math] and [math]\cos[/math], whose periodicity follows immediately from the periodicity of [math]\exp[/math] itself.

Physicists and engineers are well aware of the central importance of Fourier analysis, the decomposition of functions into ingredients that look like [math]\sin(\omega x)[/math] and [math]\cos(\omega x)[/math], for discrete or continuous values of [math]\omega[/math]. It's hard to move one inch in signal processing, or in the physics of heat or the Schrödinger equation or what have you, without utilizing Fourier.

For example, Heisenberg's Uncertainty Principle can be seen as a fairly simple result in Fourier analysis, and like all such results it can be cast pretty naturally in the form of the exponential function. The momentum-space density function of a particle is the Fourier transform of its position-space density. Like any two so-called "conjugate variables", this induces a limitation on how localized they can simultaneously be. If the position is sharply localized, the momentum is dispersed, and vice versa. Uncertainty. Exponential.

To move (slightly) away from physics, in probability and statistics there is one all-important distribution called the Normal distribution, or Gaussian, or "the bell curve", or some such names. You've seen it countless times, and you've observed it without knowing it even more times.

Why is this distribution so important? Because it is its own convolution with itself, is why. If you average many similar things, what you get has to be a fixed point of the "averaging" operation, which for a probability density function is the same thing as taking a convolution with itself. And that fixed point is the normal distribution (under some technical conditions), and it also maximizes the entropy for the same reason, and so on.

This feature of the distribution can be formalized as, of course, a differential equation: [math]f'(x)+xf(x) = 0[/math]. And that equation, of course, can be solved using the exponential function, and so the formula for the normal distribution – by far the most important function in probability and statistics and huge parts or engineering, economics, and social studies – comes out to be

[math]\displaystyle N(x) = \frac{1}{\sqrt{2 \pi}}\exp(-x^2/2)[/math].

(This is for mean [math]0[/math] and standard deviation [math]1[/math]; it's easy to shift and scale that formula for any mean and standard deviation).

So here, again, is the exponential function, stemming from its unique property as the fixed point of the operator [math]\frac{d}{dx}[/math]. And here, again, is our friend [math]\pi[/math], always accompanying its master of which it is the period (times [math]2i[/math]). And this, finally, is where [math]\pi[/math] comes from, and what it is, and there are no circles to be found.

Happy [math]\pi[/math] day.

I love the feeling of having a new way to think about the world. I especially love when there’s some vague idea that gets formalized into a concrete concept. Information theory is a prime example of this.

Information theory gives us precise language for describing a lot of things. How uncertain am I? How much does knowing the answer to question A tell me about the answer to question B? How similar is one set of beliefs to another? I’ve had informal versions of these ideas since I was a young child, but information theory crystallizes them into precise, powerful ideas. These ideas have an enormous variety of applications, from the compression of data, to quantum physics, to machine learning, and vast fields in between.

Unfortunately, information theory can seem kind of intimidating. I don’t think there’s any reason it should be. In fact, many core ideas can be explained completely visually!