The constraint topology that transforms discrete selection from optimization-dependent exploration into systematic mathematical cartography
Most neural networks struggle with basic arithmetic. They approximate, they fail on extrapolation, and they're inconsistent. But what if there was a way to make them systematically reliable at discrete selection tasks? Is Neural Arithmetic as we know it a discrete selection task?
When understood and used properly, the constraint W = tanh(Ŵ) ⊙ σ(M̂) (introduced in NALU by Trask et al. 2018) creates a unique parameter topology where optimal weights for discrete operations can be calculated rather than learned. During training, they're able to converge with extreme speed and reliability towards the optimal solution.
It's difficult to imagine that neural arithmetic has such a simple solution. Play with these widgets to see how setting just a few weights to specific values creates reliable mathematical operations. Each primitive demonstrates machine-precision mathematics through discrete selection.
How matrix multiplication with specific weights performs mathematical operations
How exponential primitives with specific weights perform operations
How projecting inputs onto the unit circle allows for trigonometric operations
How four fundamental trigonometric products enable trigonometric operations
Now that you've seen discrete weight configurations producing perfect mathematics, let's understand why this is remarkable. There's a fundamental tension between what neural network optimizers do naturally and what discrete selection requires.
Mathematical operations require specific weight values:
Stable operations emerge from specific weight configurations.
Gradient descent learns unbounded weights:
Optimizers need freedom to follow gradients anywhere.
Hill Space—the constraint topology created by W = tanh(Ŵ) ⊙ σ(M̂)—maps any unbounded learned weights to the [-1,1] range, where stable plateaus naturally guide optimization toward discrete selections.
Optimizers learn any values they need: -47.2, 156.8, 0.001
tanh bounds to [-1,1], sigmoid provides gating
Maps to [-1,1] range, naturally converging toward discrete selections
Dive into the paper for a detailed explanation of the Hill Space learning dynamics, a systematic framework for exploring new primitives and spaces, comprehensive experiments, and implementation details.