• Chemical Wires

    Working with Mahala Wanner and Gus Thomas, Niklas Manz and I recently published an article Chemical wires: reaction-diffusion waves as analogues of electron drift in the journal Transport Phenomena. Mahala began the work during our summer 2022 REU, and Gus continued it for his 2025 Senior IS.

    We used chemical reaction-diffusion waves in narrow channels to model electron drift in wires. By varying the initial conditions of an excitable Belousov–Zhabotinsky (BZ) medium, we achieved careful, quantitative control of BZ wave speeds in the range of electron drift speeds in conductors, a few millimeters per minute. We compared the speeds of the easily observable BZ waves and their computer simulations with theoretical electron drift speeds to explore the effects of wire radius, electric current, and material composition. Such BZ waves are compelling visual analogues of electron drift.

    The slow effective speed of typical electrical currents, despite the large quantum and thermal speeds of electrons in common wires, contributes to misconceptions about the nature of information transfer via current; individual electron trajectories do not transmit electrical signals at high speeds, but perturbations in the accompanying electromagnetic fields do. To build better intuition, below is an animation of a 10 mm / min electron drift, represented by the filling bar, with a penny for scale.

    Electron drift simulation to scale
    Electron drift simulation to scale.
    (You may need to click or tap to see the animation.)

  • e is Transcendental

    The Euler-Napier-Bernoulli constant e =2.71828\ldots is not just irrational, it is transcendental, as first proved by Charles Hermite in 1873. Inspired by the work of Mathologer (Burkard Polster with Marty Ross), here I offer an elementary proof of e‘s transcendence. As warmup, I first present a well-known proof of its irrationality while hinting at the proof of its transcendence.

    The infinite series expansion of the exponential function

    e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots + \frac{x^m}{m!} + \cdots

    implies the Euler-Napier-Bernoulli constant

    e = 1 + 1 + \frac{1}{2!} + \frac{1}{3!} + \cdots + \frac{1}{m!} + \cdots .

    By way of contradiction, assume e can be written as a root of the linear polynomial

    be-a = 0

    for positive integers a,b \in \mathbb{Z}^+ or, equivalently, that e is the rational number

    e = \frac{a}{b}.

    Use b as a cutoff to separate e‘s infinite series expansion into a \color{blue}\text{large body} large body and a \color{red}\text{small tail}

    \frac{a}{b} = e = \color{blue} 1 + 1 + \frac{1}{2!} + \frac{1}{3!} + \cdots + \frac{1}{b!} \color{red}+ \frac{\delta}{b!}\color{black},

    where

    \begin{aligned}0 < \delta &= \frac{b!}{(b+1)!} + \frac{b!}{(b+2)!} + \frac{b!}{(b+3)!} + \cdots \\&= \frac{1}{b+1} + \frac{1}{(b+1)(b+2)} + \frac{1}{(b+1)(b+2)(b+3)} + \cdots \\&< \frac{1}{b+1} + \frac{1}{(b+1)^2} + \frac{1}{(b+1)^3} + \cdots \\&= \frac{1/(b+1)}{1-1/(b+1)} = \frac{1}{b} \le 1\end{aligned}

    using the sum formula for an infinite geometric series. But

    \frac{a}{b} = e = \frac{b!\left(1 + 1 + \frac{1}{2!} + \frac{1}{3!} + \cdots + \frac{1}{b!}\right) + \delta}{b!} \color{red} = \frac{N^\prime + \delta}{N}

    and so

    a\color{red}N\color{black}-b\color{red}N^\prime\color{black}-b\color{red}\delta\color{black}=0

    and

    a(b-1)!-b!\left(1 + 1 + \frac{1}{2!} + \frac{1}{3!} + \cdots + \frac{1}{b!}\right)-\delta = 0,

    which is impossible, as all terms are integers accept 0 < \delta <1. Hence e is not the root of a linear polynomial with integer coefficients and, equivalently, is an irrational number.

    First consider the polynomial

    f(x) = \color{red}x^{p-1}\color{black}(x-1)^p(x-2)^p \cdots (x-n)^p,

    where n > 1 and p is a large prime. As |x-m|\le n on the interval x\in [0,n], it is bounded by

    |f| \le \color{red}n^{p-1}\color{black}n^p n^p \cdots n^p = n^{np+p-1},

    and its expansion begins

    \begin{aligned}f(x) &= (-1)^p(-2)^p\cdots (-n)^p \color{red}x^{p-1}\color{black} + \mathcal{O}(x^{p})\\&= (-1)^{np}(n!)^p \color{red}x^{p-1}\color{black} + \mathcal{O}(x^{p}).\end{aligned}

    Use the factorial integral

    \int_0^\infty \hspace{-0.8em}\text{d}x \,e^{-x} x^m = m!

    to expand powers of e like

    e^m = \frac{ \int_0^\infty \hspace{-0.4em}\text{d}x \,e^{m-x} f(x)}{ \int_0^\infty \hspace{-0.4em}\text{d}x \,e^{-x} f(x)} \color{red}=\ \frac{N_m + \delta_m}{N}\color{black},

    where

    \begin{aligned}N &=\frac{1}{(p-1)!} \int_0^\infty \hspace{-0.8em}\text{d}x \,e^{-x} f(x)\\&= (-1)^{np}(n!)^p + M p,\end{aligned}

    where M \in \mathbb{Z} and so N \in \mathbb{Z}, and if p > n then N\neq 0, as p divides the second term but not the first,

    \begin{aligned}N_m &=\frac{1}{(p-1)!} \int_m^\infty \hspace{-0.8em}\text{d}x \,e^{m-x} f(x)\\&= \frac{1}{(p-1)!} \int_0^\infty \hspace{-0.8em}\text{d}y \,e^{-y} f(y+m) = M^\prime p,\end{aligned}

    where M^\prime \in \mathbb{Z}, as f(y+m) has a factor of y^p enabling the integral to absorb the denominator, and

    \begin{aligned}\delta_m &=\frac{1}{(p-1)!}\int_0^m \hspace{-0.8em}\text{d}x \,e^{m-x}f(x)\\&\le \frac{1}{(p-1)!} \color{red}e^m\int_0^\infty \hspace{-0.8em}\text{d}x \,e^{-x}\color{blue} n^{np+p-1}\color{black}\\&=\frac{1}{(p-1)!} \color{red}e^m\color{blue}\frac{n^{(n+1)p}}{n}\color{black}\\&\le \frac{1}{(p-1)!} \color{red}e^n\color{blue} n^{(n+1)p}\color{black} = \frac{c\,d^p}{(p-1)!} \rightarrow 0,\end{aligned}

    as p\rightarrow \infty, where c = e^n and d = n^{n+1}>1.

    Now, by way of contradiction, assume e can be written as a root of the polynomial

    a_n e^n+a_{n-1}e^{n-1}+\cdots+a_0 = 0

    of lowest degree with a_m\in\mathbb{Z}, for m=1,\ldots,n, and a_0\neq 0. Replace powers of e to get

    a_n \left(\frac{N_n+\delta_n}{N}\right)+a_{n-1}\left(\frac{N_{n-1}+\delta_{n-1}}{N}\right)+\cdots+a_0 = 0

    and so

    a_0 \color{red}N\color{black}+(a_1 \color{red}N_1\color{black} + \cdots + a_n \color{red}N_n\color{black}) + (a_1 \color{red}\delta_1\color{black} + \cdots + a_n\color{red}\delta_n\color{black}) = 0,

    which is impossible, as the N and N_m terms are integers but the \delta_m terms are arbitrarily small. (For large enough p >|a_0| and p >n, p does not divide a_0 N but does divide N_m, and the integer parts cannot vanish.) Hence e is not the root of a polynomial with integer coefficients and, therefore, is a transcendental number.

  • My First Patent

    In last month’s blog post, I described my second patent, which raises the question, What was my first patent?

    In 1998, my colleagues and I were issued United States Patent No. US 5 789 961  “Nose- and coupling-tuned signal processor with arrays of nonlinear elements”. The work began during my 1994-1995 sabbatical with the Applied Chaos Lab at Georgia Tech and continued for several years, including the two publications listed below.

    Already well-known was the phenomenon of stochastic resonance, where the ability of a bistable system to detect a weak periodic signal could be optimized by non-zero noise, which might seem counter-intuitive. In simulations I performed on a NeXT computer, we explored the effects of coupling such stochastic resonators into arrays and found that signal detection was further enhanced by intermediate coupling, even if it were local and linear.

    The March 1996 Physics Today cover was an illustration, created in my Wooster office, summarizing our results: with coupling increases rightward and noise increases upward, each square represents the spatiotemporal evolution of the array at that coupling and noise with space increasing rightward and time increasing upward and doppler colors indicating movement between the two stable states (coded red and blue). Note the regular alternating bands at the optimal coupling and noise, \{10^2,10^5\}.

    This Physics Today cover prompted vehement letters-to-the-editor, not about the figure itself, but about the vertical split in the cover, which opened to an advertisement! I believe this is the only such cover Physics Today has ever published.

    Henon-Heiles flows
    Physics Today cover I created in my Wooster office based on our array enhanced stochastic resonance research, for which we were awarded a US patent. The remarkable cover features a split that opens to an advertisement!

    Array Enhanced Stochastic Resonance and Spatiotemporal Synchronization, J. F. Lindner, B. Meadows, W. Ditto, M. Inchiosa, A. Bulsara, Physical Review Letters, volume 75, pages 3-6 (3 July 1995) doi.org/10.1103/PhysRevLett.75.3

    Scaling Laws for Spatiotemporal Synchronization and Array Enhanced Stochastic Resonance, J. F. Lindner, B. Meadows, W. Ditto, M. Inchiosa, A. Bulsara, Physical Review E, volume 53, pages 2081-2086 (March 1996) doi.org/10.1103/PhysRevE.53.2081

  • My Second Patent

    Today, about six years after beginning the relevant research, my colleagues and I were issued United States Patent No. US 12 450 468 B2, “Physics augmented neural networks configured for operating in environments that mix order and chaos”. The work began during my 2019-2020 sabbatical at the Nonlinear Artificial Intelligence Lab at North Carolina State University and continued for two years, across the four publications listed below, accompanied by years of legal work by NCSU.

    Neural networks are powerful computational tools but they can be confounded by the mix of order and chaos in natural and artificial phenomena. Our solution to this “chaos blindness” exploits an elegant and deep structure to everyday motion discovered by William Rowan Hamilton, who remarkably re-imagined such motion as an incompressible energy-conserving flow in an abstract, higher-dimensional space of positions and momenta. In this phase space, motions are unique trajectories confined to constant-energy surfaces, and regular motions are further confined to donut-like hypertorii. This structure constrains our Hamiltonian neural networks to better forecast systems that mix order and chaos, such as the Hénon-Heiles motion of stars in galaxies.

    Henon-Heiles flows
    Two sample Hénon-Heiles flows for different initial conditions forecast by conventional neural network (left), Hamiltonian neural network (center), and Hénon-Heiles differential equations (right), for small (bottom), medium (middle), and large (top) bounded energies. Hues code momentum magnitudes, from red to violet; orbit tangents code momentum directions. Orbits fade into the past.

    Physics enhanced neural networks learn order and chaos, A. Choudhary, J. F. Lindner, E. G. Holliday, S. T. Miller, S. Sinha, W. L. Ditto, Physical Review E, volume 101, pages 062207(1-8) (June 2020) doi.org/10.1103/PhysRevE.101.062207

    The scaling of physics-informed machine learning with data and dimensions, S. T. Miller, J. F. Lindner, A. Choudhary, S. Sinha, W. L. Ditto, Chaos, Solitons & Fractals: X, volume 5, pages 100046(1-7) (2020) doi.org/10.1016/j.csfx.2020.100046

    Forecasting Hamiltonian dynamics without canonical coordinates, A. Choudhary, J. F. Lindner, E. G. Holliday, S. T. Miller, S. Sinha, W. L. Ditto, Nonlinear Dynamics, volume 103 number 2, pages 1553-1562 (2021) doi.org/10.1007/s11071-020-06185-2

    Negotiating the separatrix with machine learning, S. T. Miller, J. F. Lindner, A. Choudhary, S. Sinha, W. L. Ditto, Nonlinear Theory and Its Applications, volume 12, number 2, pages 1-9 (April 2021) doi.org/10.1587/nolta.12.134

  • Extreme SI Prefixes

    In the spring of 2020, on my NC State sabbatical, during the initial lock-down for the worst pandemic of my lifetime, I stayed busy in part by writing a text called g = 2: A Gentle Introduction to Relativistic Quantum Mechanics. In the text, I tried using SI prefixes to simply express small quantities but was disappointed that they were inadequate. Not until recently did I notice that six new extreme prefixes were added to the Système international d’unités in 2022, as included in the table below.

    quecto q 10-30quetta Q 1030
    rontor10-27ronnaR1027
    yoctoy10-24yottaY1024
    zeptoz10-21zettaZ1021
    attoa10-18exaE1018
    femtof10-15petaP1015
    picop10-12teraT1012
    nanon10-9gigaG109
    microμ10-6megaM106
    millim10-3kilok103
    centic10-2hectoh102
    decid10-1decad101

    I am now pleased to report the following data, to 3 significant figures.

    Earth’s mass is 5.97 ronnagrams,

    M_E = 5.97~\text{Rg} = 5.97\times 10^{24}~\text{kg}.

    An electron’s mass is 911 quectograms,

    m_e = 911~\text{qg} = 9.11\times 10^{-31}~\text{kg}.

    An elementary charge is 160 zeptocoulombs,

    e = 160~\text{zC} = 1.60\times 10^{-19}~\text{C}.

    Light speed is 300 meters per microsecond,

    c = 300~\text{m}/\text{μs} = 3.00\times 10^{8}~\text{m}/\text{s}.

    The quantum of action is 105 yoctojoules per terahertz,

    \hbar = 105~\text{yJ}/\text{THz} =1.05\times 10^{-34}~\text{J}/\text{Hz}.

    Vacuum permittivity is 8.85 microcoulombs squared per joule per meter,

    \epsilon_0 = 8.85~\text{μC}^2/(\text{J}~\text{m} )= 8.85\times 10^{-12}~\text{C}^2/(\text{J}~\text{m}).

    Thus, the fine structure constant

    \alpha = \frac{e^2}{4\pi\epsilon_0 \hbar c} = \frac{(160\,\text{zC})^2}{4\pi\, 8.85\frac{\text{μC}^2}{\text{J}\,\text{m} } 105\frac{\text{yJ}}{\text{THz}}  300\frac{\text{m}}{\text{μs}} } = \frac{1}{137}.

  • Wooster Physicists

    I recently discovered that the College’s yearbooks, The Index, are now online, and I spent several days extracting some physics history, supplemented by the Alumni Catalogue 1870-1925 and several Annual Catalogues, also online, as well as the Physics Department’s web site, which for many years I helped build and maintain. I cross checked the online information with an eBay paper copy of the Alumni Directory 2000 and attempted to resolve inconsistencies. The result is the Wooster Physics Faculty Timeline below. (Click or press for a much larger version.) I hope to continue improving it, and I welcome corrections, additions, and suggestions.

    Wooster physics timeline

    Beneath the timeline is an approximate plot of the effective number of physics faculty versus academic year, with half-filled squares counting as 1/2 (usually because of semester leaves, administrative work, or split positions) and empty squares as 0 (usually because of yearlong leaves).

    History is hard, and I sometimes struggled to decide who to include as “physics faculty”. The timeline omits Lyman Knight and other instructors of science in the Wooster Preparatory School, which prepared students for the University of Wooster’s liberal arts college during 1872-1918. It also does not include lab assistants or adjuncts. For its first 70 years, starting with Samuel Kirkwood, one of the College’s mathematicians was part-time astronomy, presumably for the Wooster Observatory, but these faculty are also omitted.

    Among the 79 included faculty, the timeline (but not the number plot) does feature some nineteenth century natural scientists, including Orange Stoddard (who was recruited from Miami of Ohio in 1870 when Wooster welcomed its first class). Other faculty include the American Physical Society’s first president Henry Rowland (who supervised the discovery of the Hall effect at JHU), Wooster graduate Karl Compton (who became president of MIT), Chia-Hua Chang, Stanley Shepherd, and Shila Garg.

    Some Wooster physics faculty photos

  • Growing Neural Networks

    Artificial neural networks are increasingly important in society, technology, and science, and they are increasingly large and energy hungry. Indeed, the escalating energy footprint of large-scale computing is a growing economic and societal burden. Must we always use brute force, or can we get by with less?

    I just co-authored an article in Proceedings of the Royal Society A describing feed-forward neural networks that are small and dynamic, whose nodes can be added (or subtracted) during training. A single neuronal weight in the network controls the network’s size, while the weight itself is optimized by the same gradient-descent algorithm that optimizes the network’s other weights and biases, but with a size-dependent objective or loss function. We trained and evaluated such Nimble Neural Networks on nonlinear regression and classification tasks where they outperform the corresponding static networks, partly because the growing networks have fewer local minima to frustrate
    the gradient descent.

    After explaining conventional neural networks, I provide here an example of one of our growing neural networks.

    Conventional Neural Networks

    Imagine feed-forward neural networks as interconnected nodes organized in layers, with an input layer, one or more hidden layers, and an output layer. The neurons’ nonlinear activation functions \sigma act on linear combinations of their inputs with the vectorized nested structure

    \hat y(x) \overset{\text{vec}}{=} \cdots w^3\sigma(w^2\sigma(w^1 x+b^1 )+b^2)+b^3 \cdots,

    where w^l and b^l are the weight matrices and bias vectors of layer l. The weights and biases are free parameters that are tuned during the optimization process.

    For example, a neural network of 1 input, 1 output, and a single layer of 3 hidden neurons, outputs

    \begin{aligned}\hat y(x) = &\color{blue}+w_{11}^2\,\color{red} \sigma\left(w_{11}^1 x + b_1^1 \right) \\ &\color{blue}+w_{12}^2\,\color{red} \sigma \left(w_{21}^1 x + b_2^1 \right) \\ &\color{blue} +w_{13}^2\,\color{red} \sigma \left(w_{31}^1 x + b_3^1 \right) \color{blue}+b_1^2,\end{aligned}

    where the weights and biases w^l_{nm} and b^l_n are real numbers. If \sigma(x) = \tanh(x), the special case

    \hat y(x) = \sigma(x-6)-\sigma(x-8)

    generates the blip

    of height 2\tanh(1)\approx 1.5 centered at x = 7 , and combining multiple such blips at different locations with different heights can approximate any reasonable function arbitrarily well, hence the expressive power of neural networks.

    An objective or error function, often called a cost or loss function (perhaps from financial applications), \mathcal{L}(x_n,y_n,\hat y(x_n)), quantifies the performance of the network, where \{x_n,y_n\} are the training data. Training attempts to minimize the loss function by repeatedly adjusting the network’s weights and biases. Gradient descent of a neural network weight w can be modeled by a tiny particle of mass m at position x sliding with viscosity \gamma down a potential energy surface V(x), where Newton’s laws imply

    m \ddot x = F_x = -\frac{dV}{dx}-\gamma \dot x.

    Assuming the inertial term is negligible, an Euler update is

    x \leftarrow x + dx = x + \dot x\, dt \sim x- \frac{1}{\gamma} \frac{dV}{dx} dt.

    With position x = w, height V = \mathcal{L}, and learning rate \eta = dt/\gamma, a neural network weight evolves like

    w \leftarrow w-\eta \frac{\partial \mathcal{L}}{\partial w},

    and similarly for a bias. While such gradient descent is not guaranteed to find a loss global minimum, it often finds good local minima.

    Growing Neural Networks

    A size-dependent loss function can drive the network size via gradient descent if the size is identified with an auxiliary weight. An identity activation function with a zero bias converts prepended 1s to the leading weight w^0_{11} = N, identified with the network size, which gradient descent naturally adjusts along with the other weights and biases.

    To illustrate the auxiliary-weight algorithm, implement a network with a single hidden layer of up to 9 neurons, with input \mathcal{I}=\{1,x\} and output \mathcal{O} = \hat y, by

    \begin{aligned}\hat y(x) = &\color{blue}+w_{11}^2 \color{teal}\sigma_1\big( w_{11}^1 \color{red} I(w_{11}^0 \color{black}1\color{red}+0)\color{teal}+ w_{12}^1 \color{black}x\color{teal}+ b_1^1\big) \\ &\color{blue}+w_{12}^2 \color{teal}\sigma_2\big( w_{21}^1 \color{red} I(w_{11}^0 \color{black}1\color{red}+0)\color{teal}+ w_{22}^1 \color{black}x\color{teal}+ b_2^1\big) \\ &\color{blue}+\cdots \\ &\color{blue}+w_{19}^2 \color{teal}\sigma_9\big( w_{91}^1 \color{red} I(w_{11}^0 \color{black}1\color{red}+0)\color{teal}+ w_{92}^1 \color{black}x\color{teal}+ b_9^1\big)\color{blue}+b_1^2 \\[0.5em] = &\color{blue}+w_{11}^2 \color{teal}\sigma_{0-N}\big( w_{11}^1 \color{red} N\color{teal}+ w_{12}^1 \color{black}x\color{teal}+ b_1^1\big) \\ &\color{blue}+w_{12}^2 \color{teal}\sigma_{1-N}\big( w_{21}^1 \color{red} N\color{teal}+ w_{22}^1 \color{black}x\color{teal}+ b_2^1\big) \\ &\color{blue}+\cdots \\ &\color{blue}+w_{19}^2 \color{teal}\sigma_{8-N}\big( w_{91}^1 \color{red} N\color{teal}+ w_{92}^1 \color{black}x\color{teal}+ b_9^1\big)\color{blue}+b_1^2,\end{aligned}

    where the possible activation functions

    \sigma_r(x) = \theta_r \sigma(x) = \theta_r \tanh(x),

    where the C^1 step (down) function

    \theta_r = \begin{cases} 1, &\phantom{-1\le} r<-1, \\ \sin^2\left(r \pi / 2\right), & -1\leq r\leq 0, \\ 0, &\phantom{-} 0<r \end{cases}

    effectively adds and deletes neurons from the network.

    Start with N=0 hidden neurons, so \hat y(x) = b^2_1, and choose a loss function

    \mathcal{L} = \mathcal{L}_0+\delta \mathcal{L},

    where the base loss varies as the mean-square error

    \mathcal{L}_0 = \frac{1}{\mathcal{N}}\sum_{n=1}^{\mathcal{N}} \big(y_n-\hat y(x_n)\big)^2 = \bigg\langle (y-\hat y)^2 \bigg\rangle,

    which vanishes for perfect agreement y = \hat y, and the size loss

    \delta \mathcal{L} = \lambda (N-N_\infty)^2,

    which encourages the network to grow to a final size of N = N_\infty hidden neurons. Update the weights and biases, including N = w^0_{11}, via gradient descent.

  • Logic with Nonlinear Maps

    In 1999, Bryan Prusha ’98 and I wrote an article for Physics Letters A illustrating why logic requires nonlinearity. Recently, with Bill Ditto, I revisited this theme by demonstrating how to encode all 16 binary boolean (true-false) functions in single iterations of a unimodal map, just published in Physica D. These encodings may facilitate the next generation of dynamics-based computation.

    More specifically, we encode binary logic gates in single iterations of step functions of unimodal maps of affine transformations of inputs. Define a boolean function family from \{0,1\}\times\{0,1\}\rightarrow \{0,1\} by the composition

    g_{\vec p} = H \circ f\circ a_{\vec p},

    or equivalently

    g_{\vec p}\,(x,y) = H\bigg(f\Big(a_{\vec p}\,(x,y)\Big)\bigg),

    where

    H(f) = \mathbf{1}_{f\ge 0} = \left\{\begin{array}{ll} 0, & f < 0, \\ 1, & f\geq 0, \end{array}\right.

    is the Heaviside step function,

    f(z) = 4 z (1-z)

    is the nonlinear, unimodal, logistic map, and

    a_{\vec p}\,(x,y) = p_1 + p_2 x + p_3 y

    is an affine transformation of the boolean input variables \{x,y\} \in\{0,1\}^2 and a linear function of the real parameters \vec p = {p_1,p_2,p_3} \in [-1,1]^3.

    For example, if p_3 = 1/2, then there exists open sets of p_1, p_2 that encode all 6 common logic gates AND, OR, XOR, NAND, NOR, XNOR, as in the figure below.

    Logic encoding
    Parameters p_1,p_2 that realize 6 common binary logic gates for p_3=1/2.

    My favorite sentence in the article is, “Chaotic dynamics can obfuscate computation for increased security, but like Shakespeare’s engineer ‘Hoist with his own petard’, are chaogates subverted by chaos itself?” You don’t always get to quote Shakespeare in a physics article! (But you’ll need to read the article for the answer.)

  • Barred Warped Wobbly Spiral

    Advances in astronomy can rewrite even introductory textbooks. Although no spacecraft have yet exited our Milky Way galaxy to image it from the outside, the Gaia astrometry space telescope recently completed a dozen years of accurately measuring the positions, distances, and motions of billions of Milky Way stars from a Lissajou orbit about the EarthSun L_2 Lagrange point. This data confirmed the Milky Way’s barred, warped, wobbly, spiral disk shape, as in the reconstructions below. A collision with a smaller galaxy may have caused the warp and wobble.

    Milky Way Top View
    Gaia space telescope reconstruction of the Milky Way, top view. (ESA, Gaia, DPAC, Stefan Payne-Wardenaar.)

    Milky Way Side View
    Gaia space telescope reconstruction of the Milky Way, side view. (ESA, Gaia, DPAC, Stefan Payne-Wardenaar.)

  • Weak Prime Number Theorem

    As a child, I was inspired by Arthur C. Clarke‘s 1956 science fiction novel The City and the Stars to search for patterns in prime numbers. Chapter 6 begins:

    Jeserac sat motionless within a whirlpool of numbers. The first thousand primes, expressed in the binary scale that had been used for all arithmetical operations since electronic computers were invented, marched in order before him. Endless ranks of l’s and o’s paraded past, bringing before Jeserac’s eyes the complete sequences of all those numbers that possessed no factors except themselves and unity. There was a mystery about the primes that had always fascinated Man, and they held his imagination still. Jeserac was no mathematician, though sometimes he liked to believe he was. All he could do was to search among the infinite array of primes for special relationships and rules which more talented men might incorporate in general laws. … He knew the laws of distribution that had already been discovered, but always hoped to discover more.

    The City and the Stars novel
    My 95-cent childhood copy of The City and the Stars.

    Indeed, I remember manually plotting the smallest primes on graph paper searching for patterns. Today, for example, I know that the prime number theorem  describes the asymptotic distribution of the primes, which become less common as they become larger. If \pi(x) is the number of primes less than or equal to x, then \pi(x) \sim x / \log x, for large x.

    Here I present an elementary proof of a weak form of the prime number theorem based on a proof by Jan-Hendrik Evertse, which itself was based on work by Paul Erdős.

    First recall the binomial theorem

    (a+b)^n = \sum_{m=0}^n \binom{n}{m}a^{n-m}b^m,

    where the binomial coefficients

    \binom{n}{m} = \frac{n!}{m!(n-m)!}.

    Also recall the floor function

    \lfloor x\rfloor =\max\{n\in \mathbb {Z} \mid n\leq x\},

    so

    x−1<\lfloor x \rfloor≤x.

    Write \log x = \log_e x for the natural logarithm, as is common in pure mathematics and theoretical physics.

    Introductory Lemmas

    Lemma A: Number of times a prime divides a factorial

    \nu_p(n!)=\sum_{m=1}^\infty \left\lfloor \frac{n}{p^m} \right\rfloor

    To count the number of times that prime p divides n!, note that each multiple of p\le n contributes a factor of p, each multiple of p^2\le n contributes another factor of p, each multiple of p^3\le n contributes yet another factor of p, and so on, so that

    \nu_p(n!) = \left\lfloor \frac{n}{p} \right\rfloor + \left\lfloor \frac{n}{p^2} \right\rfloor + \left\lfloor \frac{n}{p^3} \right\rfloor + \cdots = \sum_{m=1}^\infty \left\lfloor \frac{n}{p^m} \right\rfloor,

    where all terms in the sum vanish when p^m > n.

    Lemma B: Prime product divides binomial coefficient

    0<2b\le a \Rightarrow \prod_{a-b+1\le p\le a} \hspace{-1em}p\hspace{1em} \Bigg| \binom{a}{b}

    Since

    \binom{a}{b} = \frac{a!}{b!(a-b)!} = \frac{a(a-1)\cdots (a-b-1)}{b(b-1) \cdots \hspace{2.1em} 1\hspace{2.1em}},

    each prime b<b+1\le a-b+1\le p\le a divides the numerator but not the denominator, and thus so does their product.

    Lemma C: Prime power divides binomial coefficient

    a > b > 0,\ p^m\Big| \binom{a}{b}\ \Rightarrow\ p^m \le a

    By Lemma A, the number of times a prime divides a binomial coefficient

    \begin{aligned}\nu_p\left(\binom{a}{b}\right)&=\nu_p\left(\frac{a!}{b!(a-b)!}\right)\\ &= \sum_{m=1}^\infty \left( \left\lfloor \frac{a}{p^m} \right\rfloor-\left\lfloor \frac{a-b}{p^m} \right\rfloor-\left\lfloor \frac{b}{p^m} \right\rfloor \right).\end{aligned}

    Each positive integer summand

    \begin{aligned}&\left\lfloor \frac{a}{p^m} \right\rfloor-\left\lfloor \frac{a-b}{p^m} \right\rfloor-\left\lfloor \frac{b}{p^m} \right\rfloor \\& < \frac{a}{p^m}-\left( \frac{a-b}{p^m}-1 \right)-\left( \frac{b}{p^m}-1\right)=2,\end{aligned}

    and so must be either 0 or 1. Furthermore, since each summand with p^m>a>b is 0, the sum can grow no larger than the largest m for which p^m \le a.

    Lemma D: Bounds on the symmetric binomial coefficient

    n\in\mathbb{N}_1\ \Rightarrow\ \frac{2^n}{n+1} \le \binom{n}{\lfloor n /2 \rfloor }\le 2^{n-1}

    For the lower bound, since the central binomial coefficient (corresponding to the middles of the rows of Pascal’s triangle) is largest,

    2^n=(1+1)^n=\sum_{m=0}^n\binom{n}{m}1^m 1^{n-m}\le(n+1) \binom{n}{\lfloor n/2 \rfloor}.

    For the upper bound, first let n=2k+1 be odd, so

    \binom{2k+1}{k}=\binom{2k+1}{k+1}

    and

    \begin{aligned}\binom{n}{\lfloor n/2 \rfloor} &=\binom{2k+1}{k}=\frac{1}{2}\left( \binom{2k+1}{k}+\binom{2k+1}{k+1} \right) \\& \le \frac{1}{2}\sum_{m=0}^{2k+1} \binom{2k+1}{m}=(1+1)^{2k}=2^{2k}=2^{n-1}.\end{aligned}

    Next let n=2k be even, so

    \binom{2k}{k-1}=\binom{2k}{k+1}=\binom{2k}{k}\frac{k}{k+1}\ge \frac{1}{2}\binom{2k}{k}

    and

    \begin{aligned}\binom{n}{\lfloor n/2 \rfloor} &=\binom{2k}{k}\le\frac{1}{2}\left( \binom{2k}{k-1}+\binom{2k}{k}+\binom{2k}{k+1} \right) \\& \le \frac{1}{2}\sum_{m=0}^{2k} \binom{2k}{m}=(1+1)^{2k-1}=2^{2k-1}=2^{n-1}.\end{aligned}

    Pascal's triangle.
    Pascal’s triangle and binomial coefficients.

    Theorem

    The number of primes less than or equal to x\ge 3 is bounded above and below by multiples of x/\log x, specifically

    \frac{1}{2} \frac{x}{\log x} \le \pi(x) \le 2 \frac{x}{\log x}.

    Equivalently, in Donald Knuth‘s notation, \pi(x) = \Theta(x / \log x), and in G. H. Hardy‘s notation, \pi(x) \asymp x/\log x. (The stronger relation \pi(x) \sim x / \log x is harder to prove.)

    Bounding the prime counting function.
    Bounding the prime counting function.

    By direct calculation, both bounds are good for 3\le x\le 500, as in the above graph. Let n=\lfloor x \rfloor, so n\le x < n+1.

    Lower bound

    Prime factor the central binomial coefficient

    \binom{n}{\lfloor n/2 \rfloor} = p_1^{k_1}\cdots p_t^{k_t}.

    By Lemma C, 1<p_m \le p_m^{k_m}\le n, and so p_m \le n for m=1, \ldots,t. Thus the number of prime factors t \le \pi(n), which combined with Lemma D gives

    \frac{2^n}{n+1} \le \binom{n}{\lfloor n/2 \rfloor} \le n^{\pi(n)},

    and so

    \begin{aligned}\pi(x)&=\pi(n)\ge \frac{n \log 2-\log(n+1)}{\log n}\\ & \ge \frac{(x-1)\log 2-\log(x+1)}{\log x}\\ & \ge \frac{1}{2}\frac{x}{\log x},\end{aligned}

    where the final inequality is by Appendix Lemma E.

    Upper bound

    Proceed by induction. If n=2m+1 is odd, then by Lemma D and B,

    2^{2m} \ge \binom{2m+1}{m}\ge \prod_{m+2\le p \le 2m+1} \hspace{-1.8em}p\hspace{0.8em} \ge (m+2)^{\pi(2m+1)-\pi(m+1)},

    and so

    \pi(2m+1)-\pi(m+1)\le \frac{2m \log 2}{\log(m+2)}.

    Use the induction hypothesis on \pi(m+1) with m+2>m+1>n/2 to find

    \begin{aligned}\pi(n)=\pi(2m+1)&=\frac{2m \log 2}{\log(m+2)}+\pi(m+1) \\ &\le \frac{2m \log 2}{\log(m+2)}+2\frac{m+1}{\log(m+1)} \\ & < \frac{2m\log 2 + 2m+2+\log 2}{\log(m+1)}\\ & < \frac{n(\log 2 + 1)+1}{\log(n/2)}\\ & < 2\frac{n}{\log n},\end{aligned}

    where the final inequality is by Appendix Lemma F. If n is even, then n-1 is odd and

    \pi(n)=\pi(n-1)\le 2\frac{n-1}{\log(n-1)}<2\frac{n}{\log n},

    as x/\log x is increasing for x>e .

    Appendix Lemmas

    Lemma E: For lower bound

    L(x)=\frac{(x-1)\log 2-\log(x+1)}{\log x}\ge \frac{1}{2}\frac{x}{\log x}=R(x)\Leftarrow x>100

    Left and right functions
    The left and right functions don’t intersect after x ≈ 19.1.

    If the difference

    \Delta(x) = L(x)-R(x),

    then the derivative

    \Delta^\prime(x) = N(x)/D(x),

    where

    N=N_0+N_1 x+N_2 x^2,

    and

    \begin{aligned}N_0 &= 2 \log 2 + 2\log(1+x)\\&>0 \Leftarrow x>-1/2,\\ N_1 &=1-3\log x+ 2\log 2\log x+2\log(1+x) \\ &> 1-3\log x+2\log 2\log x + 2\log x \\&= 1+(2\log 2-1)\log x\\&>(2\log 2-1)\log x\\&> 0 \Leftarrow x>1, \\N_2 &= 1-2\log 2-\log x + 2\log 2\log x\\&= (1-2\log 2)(1-\log x)\\&>0 \Leftarrow x>e,\\D& =2x(1+x)(\log x)^2\\&>0\Leftarrow x>1.\end{aligned}

    Hence, \Delta'(x) >0 for x>e, L(x) and R(x) continue separating after crossing at x\approx 19.1, and the lemma follows.

    Lemma F: For upper bound

    L(x)= \frac{n(\log 2 + 1)+1}{\log(n/2)} < 2\frac{n}{\log n}=R(x)\Leftarrow x>500

    Left and right functions
    The left and right functions don’t intersect after x ≈ 106.

    If the difference

    \Delta(x) = L(x)-R(x),

    then the derivative

    \Delta^\prime(x) = N(x)/D(x),

    where

    N=N_0+N_2 (\log x)^2,

    and

    N_0=2\log 2(m_0\log x+b_0)x <0 \Leftrightarrow \log x > -\frac{b_0}{m_0}\Leftrightarrow x > \exp\left(-\frac{b_0}{m_0}\right) \Leftrightarrow x > \exp\left(\frac{1}{1+2/\log 2}\right)\approx 1.29, N_2=-1+(m_2\log x+b_2)x < 0 \Leftarrow \log x > -\frac{b_2}{m_2}=-\frac{1+2\log 2-(\log 2)^2}{\log 2-1} \Leftrightarrow x >\exp\left(\log 2-1-\frac{2}{\log 2-1}\right)\approx 498,

    and

    D=x(\log x)^2(\log x-\log 2)^2>0 \Leftarrow x>2.

    Hence, generously, \Delta'(x) <0 for x>500, L(x) and R(x) continue separating after crossing at x\approx 106, and the lemma follows.

Recent Comments

Recent Posts

Categories

Archives

Meta