Disney software is virtual fountain of youth

The fountain of youth is a spring that is said to restore the youth of anyone who drinks or bathes in its waters. This idea has been mentioned in many different cultures throughout history, often as a symbol of eternal youth and rejuvenation. In some stories, the fountain is guarded by a powerful being, such as a nymph or a fairy, and must be sought out by brave adventurers. Despite many people searching for the fountain throughout history, it has never been found and is generally considered to be a mythical concept.

Until now.

Disney Research has created production-ready face re-aging for visual effects.

Andrew Liszewski writing on Gizmodo explains their approach:

“To make an age-altering AI tool that was ready for the demands of Hollywood and flexible enough to work on moving footage or shots where an actor isn’t always looking directly at the camera, Disney’s researchers, as detailed in a recently published paper, first created a database of thousands of randomly generated synthetic faces. Existing machine learning aging tools were then used to age and de-age these thousands of non-existent test subjects, and those results were then used to train a new neural network called FRAN (face re-aging network). When FRAN is fed an input headshot, instead of generating an altered headshot, it predicts what parts of the face would be altered by age, such as the addition or removal of wrinkles, and those results are then layered over the original face as an extra channel of added visual information. This approach accurately preserves the performer’s appearance and identity, even when their head is moving, when their face is looking around, or when the lighting conditions in a shot change over time. It also allows the AI generated changes to be adjusted and tweaked by an artist, which is an important part of VFX work: making the alterations perfectly blend back into a shot so the changes are invisible to an audience.”

At five seconds per frame, FRAN can age or de-age one minute of footage in two hours.

That’s got to be cheaper than Hollywood VFX.

My take: Imagine if they had this technology for The Curious Case of Benjamin Button!

AI-assisted time travel

Open Culture invites us to See 21 Historic Films by Lumière Brothers, Colorized and Enhanced with Machine Learning (1895-1902).

They highlight a collection of films originally created by the Lumière Brothers and now digitally enhanced by Denis Shiryaev.

Shot and projected at 16 frames per second, this footage has had its original frame rate restored, stabilized, upscaled to 240 fps at 4K, colourized and the faces enhanced with AI and finally output at 60 fps.

Dennis details his process in the first four and a half minutes of the film and categorically states, “This is enhanced material and is not historically accurate.”

Nevertheless, the films are a fantastic view into the past. Travel back in time to France, England and Egypt, among other countries. The motion smoothing does impart a different feeling to the footage than the jerky black and white aesthetic we normally associate with old newsreels.

My take: for me, the best shot, at 13:44, is “Panorama of the Golden Horn, Turkey, Istanbul” because it’s one of the few shots that is truly “cinematic” imho. All the other shots are filmed from a tripod and therefore static. This shot is also on a tripod but because we’re on a boat the effect is to dolly to the right, resulting in magical movement with very pleasing foreground, middle ground and background action.

Darth Vader to be voiced by AI

Chance Townsend reports on Mashable that James Earl Jones signs over rights to voice of Darth Vader to be replaced by AI.

James Earl Jones (8516667383)

He refers to a Deadline article by Caroline Frost, titled James Earl Jones Signs Over Rights To Voice Of Darth Vader, Signalling Retirement From Legendary Role.

She in turn refers to a much more interesting Vanity Fair article by Anthony Breznican titled Darth Vader’s Voice Emanated From War-Torn Ukraine.

The real story here is about Respeecher, the tech company that has managed to make computer-generated voices sound human.

From their FAQ:

Why is STS (speech to speech) different from TTS (text to speech)?

The difference between the two is significant. A few important limitations text to speech has:

  1. In most cases, TTS provides non-natural, robotic emotions. AI doesn’t know where to take emotions from, so it tries to generate them based on the text alone.Very limited control over emotions. Some TTS can make the converted voice sound sad or excited using text annotation. But it is hard to manually encode intricacies of human acting using these annotations alone.
  2. Words only. TTS are based on dictionaries. Unknown words and abbreviations pose a significant problem. Natural speech contains lots of non-verbal content as well. TTS struggles to render that.
  3. Most TTS systems face challenges with low-resource languages due to higher data requirements.
  4. The Respeecher voice cloning system works solely in the acoustic domain. We convey all the emotions and sounds of the source speaker while converting their timbre and other subtle variations into the target speaker.

Audition the almost 70 voices in the Voice Marketplace.

They even have a program for Small Creators and will accept pitches from interesting projects.

Here’s a glimpse of their online interface:

My take: well, that’s it. Along with deep fakes, now you can’t trust anything you hear either. I guess that leaves “real life” as the one thing you can trust — most of the time, that is. Maybe we are living in a simulation after all….

Netflix Approves Sony FX3 Cinema Camera

Alyssa Miller reports on No Film School that The Sony FX3 Gets the Netflix Stamp of Approval and Why You Should Care.

She says:

Netflix has just approved the Sony FX3 to be used for its 4K Netflix Originals. This approval is a result of the latest Firmware 2.0 that constitutes a major upgrade to the FX3 capabilities regarding cinematography and workflow.”

She goes on to explain Netflix’s minimum camera standard:

Not only does the camera need the ability to record in 4K, but it also has to have a bit depth of 10-bit or higher, a data rate with a minimum of 240Mbps at 24FPS, a screen-referred color space, a scene-referred transfer function, and a timecode written as metadata, and it has to be capable of jamming to an external source. And this is just the start of the list, as ergonomics, durability, and usability also come into play.”

And why you should care:

“Why? Standards. Not in the biblical sense, but in manufacturing. Most camera brands (save for maybe Sony) aren’t building their next camera with a specific exhibition in mind. Codecs are all over the place, not all sensors are the same, and sometimes you even have to worry about overheating. Those kinds of issues on a film set can break your film. So if an exhibitor sets some standards for camera manufacturers, we’re inclined to support it, whether or not we’re shooting for Netflix.”

Here’s Netflix’s “cheat sheet” for FX3 camera settings and the Sony firmware update.

My take: I haven’t checked the prices of all the cameras on the Netflix list but this is probably the cheapest. Just sayin’.

UST stands for Ultra Short Throw

Chinese XGIMI has released Aura, a 4K Ultra Short Throw Laser Projector.

Claiming “Your Next (150″!) TV is Not a TV” they state:

“Simply put, AURA revolutionizes the home cinematic experience. This space-saving, stylish laser projector utilizes a laser-powered UST projection 17.3” from any wall, remarkable 4K UHD resolution, and insanely bright 2400 ANSI lumens to provide you a luxurious TV-like experience — without the TV.”

Buy it here: https://ca.xgimi.com/products/aura for only $4K!

My take: Hmm. If you have a spare four grand in your pocket, would you replace your current TV with this, or jet off to Mexico for an all-inclusive holiday? This unit sounds nice, but I’d have to see it in person to judge how large the picture is, and how bright it is in daylight — I’d want to be able to use it without having to close the curtains at noon.

Samsung’s new Freestyle digital projector

Samsung has just introduced a fantastic 1080p digital projector: the Freestyle.

Janko Roettgers of Protocol reports:

  • “The new Samsung Freestyle is a portable projector capable of projecting video from 30 inches to 100 inches. It offers access to the very same UI and apps as any of the company’s other 2022 smart TVs, but that’s pretty much where the similarities to a traditional TV end.
  • Weighing 830 grams, the Freestyle is designed for portability. “It’s about the same weight as a coconut or cauliflower,” Samsung Senior Director of Lifestyle TV Product Marketing Stephen Coppola told me recently. The projector can be powered via USB-C from a wall plug or external battery pack.
  • The Freestyle can be angled to use any free wall space as a screen, including the ceiling. It automatically calibrates the image to keep it in focus, level it and keystone it. “This is the magical feature on this device,” Coppola said.
  • The projector ships with a modified smart TV remote, but can also be controlled with voice commands via a far-field microphone after a voice assistant (Google, Alexa or Bixby) has been enabled.
  • The Freestyle ships with a lens cap that turns it into an ambient light projector, which is a pretty ingenious way of using a TV-like device for something that’s definitely not at all like a TV.
  • Later this year, Samsung wants to sell an optional light bulb socket adapter, further doubling down on this “my TV is a mood light in its spare time” idea.
  • There’s also a built-in speaker, which comes in handy in combination with far-field voice control. “There’ve been smart speakers, but never really a smart speaker with a 100-inch screen attached to it,” Coppola said.”

My take: I think the optional screw-in base is brilliant. Imagine using a goose-neck lamp in your living room to drive this! How long before they come up with a higher resolution? More uses:

Robot advances

Emma Roth reports on The Verge that robots are making advances: A humanoid robot makes eerily lifelike facial expressions; it’s interesting and a little scary.

She writes:

Engineered Arts, a UK-based designer and manufacturer of humanoid robots, recently showed off one of its most lifelike creations in a video posted on YouTube. The robot, called Ameca, is shown making a series of incredibly human-like facial expressions.”

But wait, there’s more! Meet Mesmer, even more life-like:

This, of course, builds on the research of Dr. Paul Ekman and his exploration of expression.

His FACS (Facial Action Coding System) is used by major animation studios to bring emotion to their creations.

My take: I wonder if robots will ever develop to the point where we can cast them in movies. I mean, we’re half way there with CG VFX.

New lightfield lens records depth info

John Aldred reports on DIYPhotography that the K|Llens One lens is about to released on Kickstarter.

He says:

“The K|Llens One lens, teased earlier this year by German company K|Lens, is finally about to released on Kickstarter. They say that this is the world’s first light field lens that can be used with regular DSLR and mirrorless cameras — and it works for both stills and video. Designed for full-frame cameras, the lens is a “ground-breaking mix of state-of-the-art lens and software technology” which K|Lens says will open up new worlds of creativity to users.”

The lens shoots nine images at once, with each taking up 1/9th the area of the sensor in a 3×3 grid. Custom software then manipulates those images into the desired result.

Because this lens turns any camera into a 3D camera it might have application for specific tasks like Visual Effects, where having depth information is vital for compositing.

Aldred adds:

“Interestingly, while all of the software was developed in-house, the lens itself, they say, was developed in cooperation with Carl Zeiss Jena GmbH, who they say will also be doing all of the manufacturing. So, while K|Lens might be a company that few have heard of, it will essentially be a Zeiss lens. And not just their name stamped on somebody else’s product as Huawei did with Leica, as they’re actually making the thing.”

See the company website.

My take: I’ve blogged about the light field a few times in the last decade and I really like the promise. Could it be the end of out of focus shots for ever? All we need is a similar “sound field” that would allow us to capture every sound source at once and later go into the soundscape to re-record those sources much closer. Right? (Hmm. Is this that?)

Colour Display AR Smart Glasses

Deirdre O Donnel reveals on Notebookcheck some of the most advanced Smart Glasses yet.

She writes:

“Thunderbird is an augmented reality (AR) -focused start-up supported by the display-centric OEM TCL. Now, the two brands have unveiled something apparently three years in the making: the new Smart Glasses Pioneer Version, with a groundbreaking color micro-LED display geared toward an optimal AR experience. This pair of spectacles is, as the name suggests, the kind of ‘true’ smart glasses that integrate a working, partially transparent display capable of overlaying a mixed-reality display over the wearer’s real-world surroundings. Thunderbird and TCL make the new device sound like a blend of features from the Facebook Ray-Bans and Xiaomi’s own concept Smart Glasses. They do integrate a camera — obtrusively found on the nose-piece — and touch controls on the outside of the ear-hooks to interact with the glasses and the content, phone-like apps, smart-home and -car controls they are rated to sync with.”

My take: These are much better than Google Glass and Snap Spectacles. Still too nerdy for me though, but they might appeal to someone wearing a Smart Watch. BONUS: here’s the excellent music from the Thunderbird video: Black Math’s Point Blank (Alternate).

Google AI can now enhance low res pix

Remember those laughable TV episodes in which someone asks, “Can you enhance that?”

Well, laugh no more. Google AI has mastered “high fidelity image generation.”

You can just about hear it: “HAL, unlock the enhancing algorithm.”

Google explains their new method:

“Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced.”

Add noise to the picture, and then denoise it?

Here is the Super-Resolution via Repeated Refinement paper.

And the Cascaded Diffusion Models for High Fidelity Image Generation paper.

My take: It was Arthur C. Clarke who said, “Any sufficiently advanced technology is indistinguishable from magic.” Google has just given us more magic. And we so smugly said those enhancing programs can’t add resolution back into a pixilated picture. Looks like we were wrong, yet again.