Machine Learning Turns PDE Singularity Search Into Computer-Assisted Proof

Yixuan WangMicrosoft ResearchTuesday, May 26, 202613 min read

Caltech applied math PhD candidate Yixuan Wang argues that high-precision computation can make singularity questions in nonlinear PDEs tractable only when it is tied to stability analysis and rigorous verification. In a Microsoft Research seminar on Navier-Stokes blowup and weak-solution nonuniqueness, Wang presents machine-learning tools such as PINNs, neural operators, and Kolmogorov–Arnold Networks as ways to discover candidate singular structures, not as substitutes for proof. His broader case is that numerics, analytical a posteriori estimates, and interval-certified computation must work together if singularities in systems such as Navier-Stokes are to be identified and verified.

The Navier-Stokes question splits into blowup and weak continuation

Yixuan Wang framed the mathematical stakes of the three-dimensional incompressible Navier-Stokes equations as two related but distinct problems: whether smooth solutions can develop finite-time singularities, and whether physically admissible weak solutions are unique once singular behavior is allowed.

The equation under discussion was the incompressible velocity-pressure form:

u_{t} + u \cdot \nabla u = - \nabla p + ν Δ u, \nabla \cdot u = 0.

Wang called it “THE equation” for fluid flows, with applications including turbulence modeling, aerospace, weather forecasting, and blood flow. But the seminar stayed on the mathematical question. For nonlinear PDEs such as Navier-Stokes, explicit solutions are generally unavailable, so the basic questions become existence and uniqueness.

The Clay Millennium problem concerns global well-posedness versus finite-time blowup for smooth initial data in the whole space. In Wang’s formulation, a blowup of a quantity of interest such as velocity or vorticity means that its $L^{\infty}$ norm becomes unbounded as time approaches a finite terminal time:

t \to T^{-} lim sup ∥ f (t) ∥_{L^{\infty}} = \infty, T < + \infty.

He distinguished this from a second, related issue: uniqueness of Leray-Hopf weak solutions. Leray’s construction gives global weak solutions that satisfy the equation weakly and obey energy constraints, but uniqueness of such solutions had remained open since 1934. Wang said that he, Tom Hou, and Chenhen Yang established nonuniqueness of Leray-Hopf solutions to the unforced incompressible 3D Navier-Stokes equation by computer-assisted proof, emphasizing Yang’s role as “a driving force” in that project.

The conceptual picture Wang offered was careful rather than sweeping: smooth initial data produce a unique strong solution for some time; after singularity, if one relaxes to the Leray-Hopf weak class, nonunique weak continuations can occur. That is a continuation-after-singularity picture, not a settlement of the Clay problem and not a direct proof of physical post-singularity branching. The Clay question remains about strong classical solutions and whether they blow up in finite time.

The open-problems slide made the separation explicit. One line stated the Millennium Prize problem as “Global well-posedness or finite time blowup” for smooth data on whole space. Another identified “Uniqueness of Leray-Hopf (global) weak solutions” as a continuation-after-singularity question, with Wang, Hou, and Yang’s computer-assisted nonuniqueness result listed as its resolution.

Self-similarity makes singularity searchable, but it can also be the wrong ansatz

A central strategy in Wang’s work is to make singularity formation structured enough to compute. A general singularity can be too hard to locate or characterize directly, so mathematicians often search for self-similar blowup: a solution whose singular behavior is governed by a time-independent profile after rescaling space and amplitude.

The self-similar form Wang displayed was:

f (t, x) = (T - t)^{c_{f}} F (\frac{x}{( T - t ) ^{c_{l}}}) .

Here $T$ is the blowup time, $c_{l}$ is a blowup rate, and $F$ is the profile. Wang’s simple intuition was a family of rescaled Gaussians concentrating toward a Dirac mass: the profile is fixed, while the scale changes with time. The exponents encode how quickly the spatial and amplitude rescalings occur.

Wang situated this approach in a line of work on Euler and related models. He cited Hou and Luo’s 2013 numerical evidence for self-similar blowup in the 3D axisymmetric Euler equation with boundary, subsequent mathematical work by Elgindi and by Chen-Hou-Huang, and the 2022 rigorous proof by Jiajie Chen and Tom Hou of self-similar blowup for Euler in a boundary setting. He also mentioned his own 2024 work with Hou on a modified one-dimensional Hou-Li model intended to mimic interior blowup of Euler or Navier-Stokes.

But self-similarity is not merely a useful simplification; it can also be too restrictive. Wang’s slide stated that the original 3D Euler/Navier-Stokes equations “might not have self-similar singularity under assumptions,” citing work by Chae, Necas-Ruzicka-Sverak, and Hou. Wang’s conclusion was not that all singularity routes are ruled out, but that two questions become central once the standard ansatz is uncertain: how to propose a good ansatz for singularity, and how to compute it accurately enough that analysis can follow.

Numerics become proof only after the problem is turned into an a posteriori stability estimate

Wang’s account of computer-assisted proof was deliberately modular. The numerical computation does not by itself prove a singularity. It provides an approximate profile with very small residual. The proof then shows that a true profile exists near that approximate object and that the relevant dynamics are stable in the profile variables.

The punchline is that we transform the goal of such a priori estimate that you need to show the profile exist. And with the numerics, it sort of help us to translate that task into an a posteriori type estimate.

Yixuan Wang · Source

Instead of proving from scratch that a profile exists, one computes a high-precision candidate and proves stability around it. The governing estimate has the schematic form:

\partial_{t} E \leq - c E + ϵ + C E^{2} .

In this picture, $E$ is an energy measuring the error or perturbation, $- c E$ is the stabilizing linear contribution, $C E^{2}$ is the nonlinear term, and $ϵ$ is the numerical residual. If the residual is sufficiently small, the estimate can be closed, for example by solving the associated quadratic inequality to obtain a uniform bound.

Step	Role in the proof pipeline
High-precision approximate profile	Compute a candidate profile with a small residual.
Stability analysis	Show nearby data remain controlled in the profile equation; Wang called this the hardest, mostly analytical part.
Energy estimate	Close an inequality of the form partial_t E <= -cE + epsilon + C E^2 when the residual is sufficiently small.
Interval verification	Replace floating-point quantities by upper and lower interval bounds and certify the finite computations.

Wang’s slide presented computer-assisted proof as a pipeline from numerical profile discovery to interval-certified estimates.

The last step is rigorous numerical verification. Since the approximate profile may contain thousands or tens of thousands of basis coefficients, Wang said the computation must be represented using interval arithmetic: each floating-point value is replaced by an upper and lower bound, and all verification is performed on intervals rather than trusted floating-point outputs.

The nonuniqueness proof for Leray-Hopf solutions used the same general philosophy. Earlier work had connected nonuniqueness to forward self-similar solutions with unstable eigenvalues. Wang said the project required a sophisticated numerical search using finite elements and spectral methods, including substantial effort to build divergence-free spectral bases. The proof then used a perturbative argument: linearized operators were decomposed into a coercive stable part and a compact part, with finite-rank approximations used in the verification. The rigorous step again relied on interval arithmetic.

The division of labor was clear: numerics produce a viable candidate; analysis supplies the stability mechanism; interval algorithms certify the finite computations needed to make the argument rigorous.

Machine learning enters as high-precision search, not as a substitute for analysis

Wang’s motivation for machine learning was not that neural networks replace PDE analysis. It was that the profiles of interest are often not explicitly computable, and in some cases even accurate candidates are unstable under time-marching. The search problem has the flavor of an inverse problem: the blowup law, profile, and scaling may all have to be inferred.

He described several machine-learning directions in that context: Kolmogorov-Arnold Networks for symbolic discovery, neural operators for learning families of solutions, and high-precision physics-informed neural networks for singularity computation.

KANs came from a symbolic-search motivation. Wang said the architecture was inspired by the Kolmogorov-Arnold representation theorem: instead of using fixed activation functions as in standard multilayer perceptrons, KANs make nonlinearities learnable, parameterizing one-dimensional functions through trainable coefficients. He described this as combining a basis-function viewpoint with the connectionist structure of neural networks, making the models more interpretable and sometimes giving better scaling behavior.

The slide on KANs described the work as an “interpretable and accurate alternative,” noting broad attention since the May 2024 preprint release, including more than 3,000 citations and 16,000 GitHub stars. Wang also listed follow-up work on KAN 2.0, expressiveness and spectral bias, initialization schemes, and a Kolmogorov-Arnold neural operator.

But Wang did not present KANs as the only route to high-precision PDE computation. He separated several ingredients that matter when neural networks are used to solve singularity problems: architecture, hard physical constraints, optimizers, and auxiliary techniques such as boosting.

Hard constraints encode physical or structural information directly into the model: parity, far-field vanishing conditions, nondegeneracy conditions, and other constraints implied by the equation or ansatz. Wang argued that these inductive biases are important because the task differs sharply from computer vision or language modeling. PDE losses often contain derivatives, and backpropagation through those losses requires higher-order derivative calculations, making both optimization and formulation more delicate.

Optimizers were another major emphasis. Wang said stochastic first-order methods such as Adam may not be ideal when the goal is to drive a residual to high precision with moderate batch sizes. His own work used a self-scaled BFGS method, a second-order deterministic optimizer. But full second-order methods are difficult to scale, so he pointed to curvature-informed alternatives such as Shampoo, SOAP, and MUON. A current ICML submission with Anima Anandkumar’s group, he said, improves SOAP through adaptively self-scaled preconditioning and seems to offer a good tradeoff between accuracy and training speed.

10^-8

training accuracy Wang reported on a simple Burgers singularity example using Adam with hierarchical feature encoding

Wang also described ongoing work borrowing hierarchical feature encoding ideas from computer vision. On a simple Burgers singularity example, he said, that architecture reached $1 0^{- 8}$ training accuracy even with Adam. He treated this as evidence that architecture can matter substantially, while also stressing that the example was simple.

Continuation and dynamic rescaling are ways to avoid guessing the hardest profile outright

One proposed machine-learning strategy was to avoid solving the hardest equation directly at first. Wang described a continuation approach: if the target equation is represented as $L (u) = 0$ , one can introduce a family of modified equations $L^{α} (u) = 0$ and let $α \to 0$ , moving gradually toward the true Euler or Navier-Stokes equation. Alternatively, one can solve $L (u) = f$ over a distribution of right-hand sides with $f \to 0$ , learning a family of nearby solutions rather than a single instance.

The motivation is empirical and structural. Wang suggested that one can modify terms to make blowup more pronounced—for example, by suppressing terms that suppress singularity—then continue from the more robust blowup regime toward the original equation. This plays to the strength of neural operators and related models: learning not just one solution, but a family of solutions over equations, parameters, or forcing terms.

Wang’s analytical framework beyond self-similarity centered on dynamic rescaling of time and space. The point is to avoid assuming the exact asymptotic form in advance while still obtaining stability in a rescaled profile equation. The displayed example included a logarithmic correction:

f (t, x) = (T - t)^{c_{f}} F (\frac{x}{( T - t ) ∣ lo g ( T - t ) ∣}) .

This is not the standard polynomial self-similar scaling. Wang described the logarithmic correction as subtle: as $t$ approaches $T$ , the correction is smaller than the dominant scale but still asymptotically important. Identifying it numerically can be delicate.

The “Proofs beyond self-similarity” slide listed nonlinear heat, complex Ginzburg-Landau, and Keller-Segel equations as settings for this approach. It emphasized four properties: the method goes beyond self-similarity, gives a clear characterization of stability, aligns with numerical profile discovery, and is meant to be amenable to computer-assisted proofs even when there is no spectral information or explicit profile.

In response to Yuanqi Du’s question about how to understand the profile $F$ and local orthogonality, Wang explained that the ansatz still treats the physical variable as a rescaled profile, but the scaling law need not be a simple polynomial. The profile is independent of time to leading order, while the rescaling may include corrections such as the logarithmic factor. By working directly in the profile equation and enforcing stability through a carefully chosen rescaling—what he called local orthogonality—the method can infer the scaling law at the same time as it computes the singular profile.

That is why Wang described the method as both numerical discovery and proof mechanism. In Keller-Segel, he said, the singularity is not unconditionally stable; there are unstable directions, and the method can characterize them precisely.

The broader advantage, in Wang’s view, is that the proof does not require too much detailed information about the approximate profile. In some cases one may know theoretically that a profile exists without knowing it explicitly. For Navier-Stokes, the situation is harder: one may not know whether the desired profile exists at all. A non-intrusive stability framework that uses only limited information about a numerical candidate could therefore be more amenable to computer-assisted proof.

Scaling PINNs changes what overfitting means

Du pressed Wang on whether optimizer tuning is less concerning in this setting because the goal is existence: if one finds a candidate and verifies it, does it matter whether the optimizer was specially tuned? Wang agreed with that framing, but used the question to distinguish PDE-solving neural networks from ordinary supervised learning.

In physics-informed neural networks, there is often no labeled ground truth. The loss is evaluated at collocation points where the PDE residual is computed. “Overfitting” therefore means something different: the model may fit the sampled collocation points without satisfying the equation across the full domain. Since only finitely many points are sampled at each step, the hope is that resampling over enough epochs leads to generalization across the domain. Wang said the consensus and intuition are yes, but proving it is another matter.

He offered an analogy with traditional numerical methods. In many traditional discretizations, the trial and test spaces are closely tied, and increasing resolution increases the dimension of a system that may require expensive matrix inversions, with costs scaling like $N^{3}$ in the simplified comparison he gave. In neural-network methods, the trial space is the class of functions representable by the architecture, potentially large because of compositional nonlinearities, while the test space is the set of sampled points. Training samples modest sets of test points repeatedly, rather than solving one large coupled discretized system at each step.

Wang compared this to a powerful generator tested by an ensemble of modest discriminators. The hope is that the result can be a faithful candidate, but the right way to scale remains unclear. Should one scale architecture size, training time, sample size, residual connections, Transformer-like structures, or sampling strategy? Wang said that, in academia, people often report that scaling architecture is not beneficial for PINNs, but he found that counterintuitive in light of broader AI scaling trends.

He was explicit that industrial simulation experts often still prefer traditional methods because they are more faithful and tailored to domain needs such as bridge design or oil extraction. Neural-network methods are more versatile and adaptable across equations, but the high-precision regime remains underdeveloped.

More general ansätze widen the search while making verification harder

Du also asked whether one could avoid committing to a self-similar form and instead let Monte Carlo sampling or a flexible neural representation absorb the more complicated singular behavior. Wang’s response was cautious. A more general representation can search for more general similarities, but he said he would avoid directly simulating something infinite. His preference is to separate the singular part from the regular computation: choose a good ansatz, normalize the output in that ansatz, and then solve for a controlled profile.

That philosophy also applied to traveling-wave and multiple-scale singularities. In the basic self-similar ansatz, the singularity can be thought of as centered at the origin after translation. A traveling-wave-type singularity has a moving center, effectively replacing $x$ with something like $x$ minus a traveling center. That introduces another scale or degree of freedom. Wang noted that interacting structures, such as two soliton-like objects, could introduce still more scales.

A more general ansatz may make singularities easier to represent if they are actually present, but it makes numerical optimization harder. More variables create more constraints, and machine-learning formulations already require balancing multiple loss terms from equations and side conditions. Wang identified multiple-scale and traveling-wave singularities as a future direction, not as a solved extension of the current framework.

Euler with boundary is a real breakthrough, but not the Clay problem

In the final exchange, Du asked about progress toward the Millennium Prize problem. Wang clarified that the 1934-era result concerns weak solutions, while the Clay problem concerns strong classical solutions. Strong solutions are known to exist for short time, but whether they develop singularities in finite time remains open.

Du then asked about the 2022 breakthrough by Jiajie Chen and Tom Hou. Wang explained that it established finite-time singularity for the Euler equation, not Navier-Stokes, and in the presence of a boundary. The boundary plays an important role in the formation of the singularity. In that regime, Wang said, one has finite-time singularity for strong solutions.

That still leaves the central Navier-Stokes problem: no boundary, and positive viscosity. Du noted that in many settings viscosity helps stability, as in diffusion-like equations. Wang agreed. Precisely because viscosity suppresses singularity, it makes blowup harder to find. He speculated—carefully labeling it as speculation—that Euler singularity in the whole space may be more realistic to tackle than Navier-Stokes blowup, but he also said that such a result would not by itself solve the Clay problem because the prize statement is specifically about Navier-Stokes.

AI Research Methods