Drawing from a diverse array of mathematical theories and computer science principles, machine learning algorithms are not just changing entire industries and fields — they’re revolutionising our approach to problem-solving itself. In this OpEd, Anil Ananthaswamy, an award-winning science writer and acclaimed author of “Why Machines Learn: The Elegant Maths Behind Modern AI”, challenges the oversimplified narrative that reduces ML to "glorified statistics". Instead, he sheds light on the intricate mathematical foundations that elevate ML far beyond traditional statistical methods.

Machine learning is not “just” statistics

Guest Author: Anil Ananthaswamy

August 14, 2024

“Once you encounter all these algorithms, methods and models, you’ll be hard-pressed to dismiss machine learning as just glorified statistics.”

You might have heard arguments that machine learning is nothing but glorified statistics. I beg to differ. My perspective comes from having been a former software engineer, and now having researched and written my book, Why Machines Learn: The Elegant Math Behind Modern AI. For the book, I had to relearn coding after a 20-year hiatus. Two decades ago, I used to be a distributed systems software engineer, in the pre-ML/AI days. As I learned python and ML, I was intrigued by the change in thinking warranted by machine learning, when it comes to solving problems: ML-based techniques are distinctly different from non-ML methods.

From a software engineering perspective, you have to flip your instincts about how to solve problems: from thinking algorithmically to learning how to pose questions of the data you have in hand, and use machine learning to, well, 𝘭𝘦𝘢𝘳𝘯 the model to represent the data, which can then be used for inference/prediction/generation. You have to learn how to 𝘴𝘦𝘦 data differently: 𝘢𝘴 𝘢 𝘳𝘦𝘱𝘰𝘴𝘪𝘵𝘰𝘳𝘺 𝘰𝘧 𝘢𝘯𝘴𝘸𝘦𝘳𝘴 𝘵𝘰 𝘺𝘰𝘶𝘳 𝘲𝘶𝘦𝘴𝘵𝘪𝘰𝘯𝘴.

Machine learning-based techniques are not just statistics. Yes, statisticians build sophisticated models of the patterns that exist in data and use these models to infer/predict. But ML is so much more than simply writing code to automate what statisticians do.

My hope is that Why Machines Learn will help those of us who know some high-school/first-year undergrad math and maybe even did some old-style software engineering (and even be of use to those, of course, coming to ML entirely untainted by old ways) to appreciate the technological changes happening underfoot. I took what I felt was a representative sample of ML algorithms to illustrate the math that undergirds this new way of thinking, while also providing a somewhat curated historical account.

Let’s say you wanted to write a piece of software to recognize images of cats and dogs. In the non-ML way of thinking about this problem, you’d first need to identify the kinds of features one thinks are characteristic of cats and dogs. For example, features might focus on the shape and size of ears, the length and width of bodies, the shape of tails, and so on. And then, you write software that recognizes such features in images, and depending on the features found in any given image, you tag the image as being that of a cat or a dog. One can well imagine the intractable nature of the problem: you can never come up with an exhaustive list of features, nor can you anticipate the manners in which such features will be visible or occluded in any given image.

But what if you had a large dataset of images of cats and dogs that had been annotated as such by humans. Well, then you feed the images as inputs to a machine learning algorithm (such as an artificial neural network) and ask it to categorise it as either a dog or cat. Because the images are already annotated, the algorithm knows the correct answer. So, if the ML model makes a mistake, the algorithm modifies the model’s parameters such that the error it makes when given the same image again is reduced a little. And you keep doing this until the ML model makes minimal errors for all the images in the training dataset.

Now, if you give the model a previously unseen image, it can in all likelihood tag it correctly as that of a dog or a cat. Internally, the model has—one hopes—figured out the relevant features that distinguish cats from dogs. Crucially, we didn’t have to identify such features a prior.

It’s true that the ML model is learning about the statistics of such patterns in the data; but it’s also true that to build the ML system, one has to go beyond simply thinking about the statistics.

In Why Machines Learn, we develop an intuition for ML-thinking, starting with Frank Rosenblatt’s perceptron algorithm (the first artificial neuron that learned, developed in 1958) and the Widrow-Hoff least mean squares (LMS) algorithm (1959), which can lay claim to be the true precursor to the “backpropagation” algorithm used today to train deep neural networks. There’s also Bayes Theorem, and the Optimal and Naïve Bayes classifiers (you can't do ML without really appreciating the role probability and statistics play in ML).

Then there’s the seminal k-Nearest Neighbor algorithm. It’s a wonderful way to develop a sense for how data is represented in vector spaces, and the notion of similarity in Euclidean space, and how all this falls apart when you move to high-dimensional spaces.

One of the coolest tools one can borrow from statisticians is principal component analysis, to appreciate the power of matrices, and how they can help bring high-dimensional data down to lower dimensions for computational efficiency and easier visualisation, among other things. Once high-dimensional data is projected down to lower dimensions, one can use standard ML algorithms to learn about inherent patterns, say, for classification.

Sometimes you have to move your data into higher dimensions to optimally classify the data into different categories (𝘢 𝘭𝘢 support vector machines), but the algorithm has to stay grounded in lower dimensions for computation efficiency. This can be done using kernel methods, which are so much more than just statistics.

And then, of course, there’s the revolution that's happened in artificial neural networks (ANNs) and deep learning. Some key ideas include the universal approximation theorem, the backpropagation algorithm and specific architectures of ANNs, such as the convolutional neural networks for image classification. And let’s not forget Hopfield networks, which give us insights into dynamical networks that may one day rule the roost.

Once you encounter all these algorithms, methods and models, you’ll be hard-pressed to dismiss machine learning as just glorified statistics.

EU's AI Act: A Landmark Regulation Reshaping the Future of Artificial Intelligence

Are AI’s energy demands spiralling out of control?

Big Tech is prioritising speed over AI safety

Who are the AI power users, and how to become one

Unmasking the coded gaze: Dr. Joy Buolamwini's fight for fair AI

OpenAI develops web search capabilities

Issue 44

Read article

OpenAI's latest move is to give ChatGPT real-time web searching powers.

Getting Machine Learning Projects from Idea to Execution

Issue 43

Read article

Eric Siegel, Ph.D., former Columbia University professor and CEO of Gooder AI, outlines practical strategies discussed in his new book, The AI Playbook: Mastering the Rare Art of Machine Learning Deployment, to help organisations turn machine learning projects into real-world successes.

The World's Largest AI Supercluster: xAI Colossus

Issue 42

Read article

Designing computer chips has long been a complex and time-consuming process. Now, Google believes it's found a way to dramatically accelerate this task using AI.

The Future of AI Cannot Be a Race to the Bottom

Issue 41

Read article

Sama CEO Wendy Gonzalez shares invaluable insights on building an ethical foundation for AI development in our latest thought leadership piece.

Moral AI and How Organisational Leaders Can Get There

Issue 40

Read article

Renowned neuroscientist and moral AI researcher Dr Jana Schaich Borg shares valuable insights on how industry leaders can implement moral AI

The future of high-performance compute: How Northern Data Group is powering the next generation
of AI

Issue 39

Read article

In our latest Q&A, Rosanne discusses how Northern Data Group is powering the next generation of innovation through its sustainable, state-of-the-art, HPC solutions.

7 OCTOBER | LONDON 2024

SEPTEMBER 12TH - 14TH
The O2, LONDON

Machine learning is not “just” statistics

“Once you encounter all these algorithms, methods and models, you’ll be hard-pressed to dismiss machine learning as just glorified statistics.”

OpenAI develops web search capabilities

Getting Machine Learning Projects from Idea to Execution

The World's Largest AI Supercluster: xAI Colossus

The Future of AI Cannot Be a Race to the Bottom

Moral AI and How Organisational Leaders Can Get There

The future of high-performance compute: How Northern Data Group is powering the next generation
of AI

SUBSCRIBE TO MAILING LIST

Related Articles

Machine learning is not “just” statistics

“Once you encounter all these algorithms, methods and models, you’ll be hard-pressed to dismiss machine learning as just glorified statistics.”

OpenAI develops web search capabilities

Getting Machine Learning Projects from Idea to Execution

The World's Largest AI Supercluster: xAI Colossus

The Future of AI Cannot Be a Race to the Bottom

Moral AI and How Organisational Leaders Can Get There

The future of high-performance compute: How Northern Data Group is powering the next generationof AI

SUBSCRIBE TO MAILING LIST

Related Articles

The future of high-performance compute: How Northern Data Group is powering the next generation
of AI