A radically new know-how permits AI to study virtually with out knowledge


Machine learning usually requires tons of examples. In order for an AI model to recognize a horse, you must show it thousands of horse pictures. This makes the technology computationally intensive – and very different from human learning. A child often only needs to see a few examples, or even one, before they can see it for life.

In fact, sometimes children don't need examples to identify something. Shown photos of a horse and a rhinoceros and the statement that a unicorn is something in between, they can recognize the mythical creature in a picture book when they see it for the first time.

Hmm … ok, not exactly.


In a new article from the University of Waterloo, Ontario, it is now suggested that AI models should also be able to do this – a process the researchers refer to as “less than one” shot or LO-shot learning. In other words, an AI model should be able to accurately identify more objects than the number of examples it has been trained on. This could be a big deal for a field that has become increasingly expensive and inaccessible as the datasets used keep getting larger.

How "less than a" shot of learning works

The researchers first demonstrated this idea when experimenting with the popular MNIST computer vision dataset. MNIST, which contains 60,000 training images with handwritten digits from 0 to 9, is widely used to test new ideas in the field.

In a previous article, MIT researchers had introduced a technique to "distill" huge data sets into tiny ones, and as proof of concept, they had compressed MNIST into just 10 images. The images were not selected from the original data set, but were carefully developed and optimized to contain an amount of information appropriate to the information set. If an AI model is trained on the 10 images only, it can therefore achieve almost the same accuracy as on all MNIST images.

Handwritten digits between 0 and 9 from the MNIST record.Sample images from the MNIST data set.


Ten pictures that look nonsensical but are the distilled versions of the MNIST dataset.The 10 images "distilled" by MNIST that can train an AI model to achieve 94% recognition accuracy for handwritten digits.


The Waterloo researchers wanted to push the distillation process further. If it's possible to shrink 60,000 images down to 10, why not five. They realized that the trick was to create images that mix multiple digits together and then feed them into an AI model with hybrid or "soft" lettering. (Think back to a horse and a rhinoceros with partial features of a unicorn.)

"When you think of the number 3, it also looks like the number 8, but not the number 7," says Ilia Sucholutsky, doctoral student at Waterloo and lead author of the paper. “Soft labels try to capture these shared functions. Instead of telling the machine, "This picture is the number 3", we say, "This picture is 60% the number 3, 30% the number 8 and 10% the number 0."

The Limits of LO-Shot Learning

After the researchers successfully used soft labels to achieve LO-shot learning on MNIST, they wondered how far this idea could actually go. Is there a limit to the number of categories you can teach an AI model to identify from a small number of examples?

Surprisingly, the answer seems to be no. With carefully designed soft labels, even two examples could theoretically encode any number of categories. "With two points you can separate a thousand classes or 10,000 classes or a million classes," says Sucholutsky.

Apples and oranges on a chart by weight and color.Draw apples (green and red dots) and oranges (orange dots) by weight and color.


The researchers demonstrate this in their latest work by means of a purely mathematical investigation. They play the concept off with one of the simplest machine learning algorithms known as k-next neighbors (kNN) that classify objects using a graphical approach.

To understand how kNN works, take the task of classifying fruits as an example. If you want to train a kNN model to understand the difference between apples and oranges, the first thing you need to do is select the functions that you want to use to represent each fruit. Maybe you choose color and weight. For each apple and each orange, give the kNN a data point with the color of the fruit as the x value and the weight as the y value. The kNN algorithm then plots all of the data points on a 2D chart and draws a boundary line in the middle between the apples and the oranges. At this point, the graph is neatly split into two classes, and the algorithm can now decide whether new data points represent one or the other based on the side of the line they fall on.

To study LO-shot learning using the kNN algorithm, the researchers created a series of tiny synthetic datasets and carefully developed their soft labels. Then they had the kNN draw the boundary lines it saw and found that it successfully divided the chart into more classes than data points. The researchers also had a great deal of control over where the boundary lines fell. With the help of various optimizations to the soft labels, they could get the kNN algorithm to draw precise patterns in the form of flowers.

Various diagrams showing the boundary lines recorded by a kNN algorithm. Each diagram has more and more boundary lines, all of which are encoded in tiny data sets.Using soft-highlighted examples, the researchers trained a kNN algorithm to encode increasingly complex boundary lines and divided the diagram into far more classes than data points. Each of the colored areas in the plots represents a different class, while the pie charts next to each plot show the distribution of soft labels for each data point.


Of course, these theoretical investigations have some limitations. While the idea of ​​LO-Shot learning should be carried over to more complex algorithms, the task of developing the soft-lettered examples becomes much more difficult. The kNN algorithm is interpretable and visual, and allows humans to design the labels. Neural networks are complicated and impenetrable, which means that it may not be. The data distillation that is used to design soft neural network examples also has a major drawback: you have to start with a huge data set to shrink it down to something more efficient.

Sucholutsky says he's currently working on other ways to construct these tiny synthetic datasets – whether that means designing them by hand or using a different algorithm. Despite these additional research challenges, however, the paper provides the theoretical foundations for LO shot learning. "The conclusion depends on what kind of data sets you have. You can probably make massive gains in efficiency," he says.

This is what interests Tongzhou Wang most, an MIT graduate student who led the earlier research on data distillation. "The paper builds on a really novel and important goal: learning powerful models from small amounts of data," he says of Sucholutsky's contribution.

Ryan Khurana, a researcher at the AI ​​Ethics Institute in Montreal, confirms this feeling: "Most importantly, less than a single study would radically reduce the data requirements for creating a working model." This could make AI more accessible to companies and industries that have previously been hampered by the data requirements of the field. This could also improve data protection as less information would have to be extracted from individuals in order to train useful models.

Sucholutsky insists that the research is early on, but he's excited. Every time he presents his work to other researchers, their first reaction is that the idea is impossible, he says. When they suddenly realize that this is not the case, a whole new world opens up.


Steven Gregory