Disney scientists perfect deep fakes

We propose an algorithm for “fully automatic neural face swapping in images and videos.

So begins a startling revelation by Disney Researchers Jacek NaruniecLeonhard HelmingerChristopher Schroers and Romann M. Weber in a paper delivered virtually at The 31st Eurographics Symposium on Rendering in London recently.

Here’s the abstract:

“In this paper, we propose an algorithm for fully automatic neural face swapping in images and videos. To the best of our knowledge, this is the first method capable of rendering photo-realistic and temporally coherent results at megapixel resolution. To this end, we introduce a progressively trained multi-way (comb network) and a light- and contrast-preserving blending method. We also show that while progressive training enables generation of high-resolution images, extending the architecture and training data beyond two people allows us to achieve higher fidelity in generated expressions. When compositing the generated expression onto the target face, we show how to adapt the blending strategy to preserve contrast and low-frequency lighting. Finally, we incorporate a refinement strategy into the face landmark stabilization algorithm to achieve temporal stability, which is crucial for working with high-resolution videos. We conduct an extensive ablation study to show the influence of our design choices on the quality of the swap and compare our work with popular state-of-the-art methods.”

Got that?

My advice: just watch the video and be prepared to be wowed.

My take: Deep fakes were concerning enough. However, this technology actually has production value. I envision a (very near) future where “substitute actors” (sub-actors?) are the ones who give the performances on set and then this Disney technology replaces their faces the those of the “stars” they represent. In fact, if I was an agent, I’d be looking for those subactors now so I could package the pair. A star who didn’t want to mingle with potentially COVID-19 carriers could send their doubles to any number of projects at the same time. All that would be left is to do a high resolution 3D scan and some ADR work. Of course — Jimmy Fallon already perfected this technique five years ago:

TikTok emerges as worthy Vine replacement

Joshua Eferighe posits on OZY that The Next Big Indie Filmmaker Might Be a TikToker.

Joshua’s key points:

  • “The social media platform is shaping the future of filmmaking.
  • Novice filmmakers are using the platform’s sophisticated editing tools to learn the trade and test their work.
  • Unlike Instagram, TikTok’s algorithm allows users without many followers to go viral, adding to its popularity.”

What is TikTok? The Chinese app claims to be “the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy.”

Why is TikTok valuable to filmmakers? The hashtag #cinematics with 3.7 billion views.

See these risks and this safety guide.

My take: Shorter is better! Remember Vine?

Some SmartTVs to become obsolete

Catie Keck reports in Gizmodo: Here’s Why Netflix Is Leaving Some Roku and Samsung Devices.

She says,

“Select Roku devices, as well as older Samsung or Vizio TVs, will soon lose support for Netflix beginning in December…. With respect to Roku devices in particular, the issue boils down to older devices running Windows Media DRM. Since 2010, Netflix has been using Microsoft PlayReady. Starting December 2, older devices that aren’t able to upgrade to PlayReady won’t be able to use the service.”

Netflix says,

“If you see an error that says: ‘Due to technical limitations, Netflix will no longer be available on this device after December 1st, 2019. Please visit netflix.com/compatibledevices for a list of available devices.’ It means that, due to technical limitations, your device will no longer be able to stream Netflix after the specified date. To continue streaming, you’ll need to switch to a compatible device prior to that date.”

Antonio Villas-Boas writes on Business Insider:

“This has surfaced one key weakness in Smart TVs — while the picture might still be good, the built-in computers that make these TVs ‘smart’ will become old and outdated, just like a regular computer or smartphone. That was never an issue on ‘dumb’ TVs that are purely screens without built-in computers to run apps and stream content over the internet.”

He concludes, “You should buy a streaming device like a Roku, Chromecast, Amazon Fire TV, or Apple TV instead of relying on your Smart TV’s smarts.”

My take: does this happen to cars as well?

The Internet turns 50!

Last Sunday, October 29, 2019, the Internet turned 50 years old.

We’ve grown from the 1970 topology:

to this in 2019:

internetmap072

Okay, here’s a real representation of the Internet.

What’s next? The Interplanetary Internet of course.

My take: It’s important to note that the World Wide Web is not the same thing as the Internet. (The Web wouldn’t be invented for another 20 years!) The Internet is the all-important backbone for the numerous networking protocols that traverse it, http(s) being only one.

Meet the world’s smallest stabilized camera

Insta360 has released the world’s smallest action camera, the GO. It is so small it’s potentially a choking hazard.

They call it the “20 gram steady cam.”

Here are some specs:

  • Standard, Interval Shooting, Timelapse, Hyperlapse, and Slow Motion modes
  • 8GB of on-board storage
  • iPhone or Android Connections
  •  IPX4 water-resistant
  • Charge Time: GO: approx. 20min, Charger Case: approx. 1hr
  • .Mp4 files exported via app at 1080@25fps; Timelapse and Hyperlapse at 1080@30fps;  Slow Motion: 1600×900@100fps and output at 1600×900@30fps

Some sample footage:

See some product reviews.

You can buy it now for $270 in Canada.

My take: this is too cool! My favourite features are the slow motion and the barrel roll you can add in post. This technology sparks lots of storytelling ideas!

Inside a Virtual Production

BBC Click has revealed glimpses of the virtual production techniques Jon Favreau harnessed before the “live action” Lion King was digitally animated.

The discussion of virtual production technology starts at 0:40. Details begin flowing about the Technicolor Virtual Production pipeline at 1:38.

Director Favreau explains further at 8:01 below:

My favourite line is: “We’d move the sun if we had to.”

Here’s Technicolor’s pitch for virtual production:

More here.

My take: Am I the only one that thinks it’s absurd for photo-realistic animals to talk and sing? I can buy the anthropomorphism in most animation, as the techniques they use are suitably abstracted, but this just looks too real. Maybe thought balloons?

AI Portraits can paint like Rembrandt

In the week that FaceApp went viral, Mauro Martino has updated AIportraits to convert your photos into fine art.

This web-based app uses an AI GAN trained on 54,000 fine art paintings to “paint” your portrait in a style it chooses.

Mauro explains:

“This is not a style transfer. With AI Portraits Ars anyone is able to use GAN models to generate a new painting, where facial lines are completely redesigned. The model decides for itself which style to use for the portrait. Details of the face and background contribute to direct the model towards a style. In style transfer, there is usually a strong alteration of colors, but the features of the photo remain unchanged. AI Portraits Ars creates new forms, beyond altering the style of an existing photo.”

Some samples:

He continues:

“You will not see smiles among the portraits. Portrait masters rarely paint smiling people because smiles and laughter were commonly associated with a more comic aspect of genre painting, and because the display of such an overt expression as smiling can seem to distort the face of the sitter. This inability of artificial intelligence to reproduce our smiles is teaching us something about the history of art. This and other biases that emerge in reproducing our photos with AI Portraits Ars are therefore an indirect exploration of the history of art and portraiture.”

My take: This is a lot of fun! I would love to be able to choose the “artist” though, rather than let the AI choose, based on the background. One thing that does NOT work is to feed it fine art; I tried the Mona Lisa and was terribly disappointed!

1000 episodes for BBC’s Click

This week the BBC celebrated the 1000th episode of their technology magazine show Click with an interactive issue.

Access the show and get prepared to click!

One of the pieces that caught my eye was an item in the Tech News section about interactive art, called Mechanical Masterpieces by artist Neil Mendoza.

The exhibit is a mashup of digitized high art and Rube Goldberg-esque analogue controls that let the participants prod and poke the paintings. Very playful! I’ve scoured the web to find some video. This is Neil’s version of American Gothic:

Getting ready for the weekend with another piece from Neil Mendoza’s Mechanical Masterpieces, part of #ToughArt2018. pittsburghkids.org/exhibits/tough-art

Posted by Children's Museum of Pittsburgh on Friday, September 28, 2018

And here is his version of The Laughing Cavalier:

Check out Neil’s latest installation/music video.

My take: I love Click and I love interactive storytelling. But I’m not sure the BBC’s experiment was entirely successful. What I thought was missing was an Index, a way to quickly jump around their show. For instance, it was tortuous trying to find this item in the Tech News section. Of course, Click is in love with their material and expects viewers to patiently lap up every frame, even as they click to choose different paths through the material. But it’s documentary/news content, not narrative fiction, and I found myself wanting to jump ahead or abandon threads. On the other hand, my expectations of a narrative audience looking for A-B interactive entertainment is that they truly are motivated to explore various linear paths through the story. And an Index would reveal too much of what’s up ahead. But I wonder if that’s just me, as a creator, speaking. Perhaps interactive content is relegated to the hypertext/website side of things, versus stories that swallow you up as they twist and turn on their way to revealing their narratives.

Coming soon: fix it in Post with text editing

Scientists working at Stanford University, the Max Planck Institute for Informatics, Princeton University and Adobe Research have developed a technique that synthesizes new video frames from an edited interview transcript.

In other words, soon we’ll be able to alter speech in video clips simply by typing in new words:

“Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript.”

Why do this?

“Our main application is text-based editing of talking-head video. We support moving and deleting phrases, and the more challenging task of adding new unspoken words. Our approach produces photo-realistic results with good audio to video alignment and a photo-realistic mouth interior including highly detailed teeth.”

Read the full research paper.

My take: Yes, this could be handy in the editing suite. But the potential for abuse is very concerning. The ease of creating Deep Fakes by simply typing new words means that we would never be able to trust any video again. No longer will a picture be worth a thousand words; rather, one word will be worth a thousand pixels.

Samsung’s new AI can bring photos to life

Ivan Mehta reports in The Next Web that Samsung’s new AI can create talking avatars from a single photo.

Egor ZakharovAliaksandra ShysheyaEgor Burkov and Victor Lempitsky of the Skolkovo Institute of Science and Technology and the Samsung AI Center, both in MoscowRussia, envisioned a system that…

“…performs lengthy meta-learning on a large dataset of videos, and after that is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators. Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters.”

But why did the researchers set out to do this?

They wanted to make better avatars for Augmented and Virtual Reality:

“We believe that telepresence technologies in AR, VR and other media are to transform the world in the not-so-distant future. Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities. It will distribute jobs more fairly and uniformly around the World. It will better connect relatives and friends separated by distance. To achieve all these effects, we need to make human communication in AR and VR as realistic and compelling as possible, and the creation of photorealistic avatars is one (small) step towards this future. In other words, in future telepresence systems, people will need to be represented by the realistic semblances of themselves, and creating such avatars should be easy for the users. This application and scientific curiosity is what drives the research in our group.”

Read their research paper.

My take: surely this only means more Deepfakes? The one aspect of this that I think is fascinating is the potential to bring old paintings and photographs to life. I think this would be a highly creative application of the technology. With which famous portrait would you like to interact?