a picture is worth a thousand megabytes

Ivan Mehta reports in The Next Web that Samsung’s new AI can create talking avatars from a single photo.

Egor Zakharov, Aliaksandra Shysheya, Egor Burkov and Victor Lempitsky of the Skolkovo Institute of Science and Technology and the Samsung AI Center, both in Moscow, Russia, envisioned a system that…

“…performs lengthy meta-learning on a large dataset of videos, and after that is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators. Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters.”

But why did the researchers set out to do this?

They wanted to make better avatars for Augmented and Virtual Reality:

“We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group.”

Read their research paper.

My take: surely this only means more Deepfakes? The one aspect of this that I think is fascinating is the potential to bring old paintings and photographs to life. I think this would be a highly creative application of the technology. With which famous portrait would you like to interact?

Tero Karras, Samuli Laine and Timo Aila of Nvidia have just published breakthrough work on Generative Adversarial Networks and images:

“We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation.”

I’ve blogged about Nvidia’s GANs and image generation before, but this improvement in quality is remarkable.

If I understand it correctly, the breakthrough is applying one picture as a “style” or filter on another picture. Applying the filters in the left column to the pictures across the top yields the AI-generated pictures in the middle.

Read the scientific paper for full details.

Of course, we’ve seen something similar before. Way back in 1985 Godley & Creme released a music video for their song Cry; the evocative black and white video used analogue wipes and fades to blend a myriad of faces together, predating digital morphing. Here’s a cover version and video remake by Gayngs, including a cameo by Kevin Godley:

My take: Definitely scary. But if that’s the current state of the art, I think it means we are _not_ living in the Simulation — yet, even though Elon Musk says otherwise.

Michael Korican Thinks

Thoughts on Independent Filmmaking and the Emerging Economic System

Tag Archives: a picture is worth a thousand megabytes

Samsung’s new AI can bring photos to life

AI-generated photos now life-like