Of all the AI models in the world, OpenAI’s GPT-3 has most captured the public’s imagination. It can spew poems, short stories, and songs with little prompting, and has been demonstrated to fool people into thinking its outputs were written by a human. But its eloquence is more of a parlor trick, not to be confused with realintelligence.

Nonetheless, researchers believe that the techniques used to create GPT-3 could contain the secret to more advanced AI. GPT-3 trained on an enormous amount of text data. What if the same methods were trained on both text and images?

Now new research from the Allen Institute for Artificial Intelligence, AI2, has taken this idea to the next level. The researchers have developed a new text-and-image model, otherwise known as a visual-language model, that can generate images given a caption. The images look unsettling and freakish—nothing like the hyperrealistic deepfakes generated by GANs—but they might demonstrate a promising new direction for achieving more generalizable intelligence, and perhaps smarter robots as well.

Fill in the blank

GPT-3 is part of a group of models known as “transformers,” which first grew popular with the success of Google’s BERT. Before BERT, language models were pretty bad. They had enough predictive power to be useful for applications like autocomplete, but not enough to generate a long sentence that followed grammar rules and common sense.

BERT changed that by introducing a new technique called “masking.” It involves hiding different words in a sentence and asking the model to fill in the blank. For example:

The woman went to the ___ to work out.They bought a ___ of bread to make sandwiches.

The idea is that if the model is forced to do these exercises, often millions of times, it begins to discover patterns in how words are assembled into sentences and sentences into paragraphs. As a result, it can better generate as well as interpret text, getting it closer to understanding the meaning of language. (Google now uses BERT to serve up more relevant search results in its search engine.) After masking proved highly effective, researchers sought to apply it to visual-language models by hiding words in captions, like so:

A giraffe standing near a tree.
A ____ stands on a dirt ground near a tree.AI2

This time the model could look at

Read More



By: Karen Hao
Title: These weird, unsettling photos show that AI is getting smarter
Sourced From: www.technologyreview.com/2020/09/25/1008921/ai-allen-institute-generates-images-from-captions/
Published Date: Fri, 25 Sep 2020 16:01:58 +0000