AI reads minds, makes pictures

As reported by Tristan Greene on The Next Web, scientists at Kyoto University in Japan have created a deep neural network that can decode brainwaves.

That’s right, AI that can read your mind.

Tristan summarizes:

“When these machines are learning to “read our minds” they’re doing it the exact same way human psychics do: by guessing. If you think of the letter “A” the computer doesn’t actually know what you’re thinking, it just knows what the brainwaves look like when you’re thinking it…. AI is able to do a lot of guessing though — so far the field’s greatest trick has been to give AI the ability to ask and answer its own questions at mind-boggling speed. The machine takes all the information it has — brainwaves in this case — and turns it into an image. It does this over and over until (without seeing the same image as the human, obviously) it can somewhat recreate that image.”

Or, as Guohua Shen, Tomoyasu Horikawa, Kei Majima and Yukiyasu Kamitani illustrate:

To my eye, some of the results look awfully reminiscent of William Turner‘s oil paintings, particularly Snow Storm.

See the full paper.

My take: Let’s be honest. This technology, as amazing as it is, is not yet ‘magical.’ (Arthur C. Clarke‘s third law is, “Any sufficiently advanced technology is indistinguishable from magic.”) However, if we think about it a bit and mull over the possibilities, this might one day allow you to transcribe your thoughts, paint pictures with your mind or even become telepathic.

Google uses neural net to synthesize female voice

Research at Google is making huge advances in text-to-speech (TTS) technology. Check this out:

From their Twitter post:

“Building on TTS models like ‘Tacotron’ and deep generative models of raw audio like ‘Wavenet’, we introduce ‘Tacotron 2’ a neural network architecture for speech synthesis directly from text.”

How do they do it? From their blog post:

“In a nutshell it works like this: We use a sequence-to-sequence model optimized for TTS to map a sequence of letters to a sequence of features that encode the audio. These features, an 80-dimensional audio spectrogram with frames computed every 12.5 milliseconds, capture not only pronunciation of words, but also various subtleties of human speech, including volume, speed and intonation. Finally these features are converted to a 24 kHz waveform using a WaveNet-like architecture.”

The results are amazing.

Want more? Here’s the full research paper.

The limitations? Some complex words, sentiment and generation in real time. “Each of these is an interesting research problem on its own,” they conclude.

Listen to more samples.

My take: I’ve used TTS functionality to generate speech for songs and for voice-over. I love it! As the quality improves to the point where it becomes indistinguishable from human voice, I will admit that I’m not quite sure what that will mean in a future where we won’t be sure if the voice we’re hearing is human or robot.

Google wants you to have the best selfie

Building on last year’s GIF builder, Motion Stills, Google Research has just released two more ‘appsperiments‘ in time for your holiday merriment: Scrubbies and Selfissimo!

Scrubbies lets you “shoot a video in the app and then remix it by scratching it like a DJ. Scrubbing with one finger plays the video. Scrubbing with two fingers captures the playback so you can save or share it.”

Selfissimo! lets you “tap the screen to start a photoshoot. The app encourages you to pose and captures a photo whenever you stop moving. Tap the screen to end the session and review the resulting contact sheet.”

Are you worried that taking so many selfies might give you “selfitis” and turn you into a narcissist? Well, don’t. Snopes disproved that potential mental disorder.

What I love about Selfissimo! is that by taking the photos for you, it gives you more of a true photo session experience, heightened by the fact it only shoots in black and white. Think of the photo shoot scene in Austin Powers ‘The Spy Who Shagged Me’, which itself is homage to the photo shoot scene in Michelangelo Antonioni‘s 1966 masterful film ‘Blow-Up’.

My take: I highly recommend Selfissimo! because it’s so much fun! Here’s to a great 2018, everyone!

Battling AI’s create new realities

The adage “Seeing is believing” is no longer true.

Three researchers, Ming-Yu Liu, Thomas Breuel and Jan Kautz, working for Nvidia, have created an AI that can generate life-like images.

In their system, multiple neural networks learn together by trying to fool each other with better and better solutions to the problem at hand. These are generative adversarial networks or GANs.

See their paper and GitHub. A sample below:

My take: this is kinda scary. Neat to think of “environmental” filters to add to genuine footage (think Nighttime, Winter, Rainy, etc.) but that this technology can create genuine-looking unreal footage is downright Orwellian. How do we distinguish true from fiction, real from fake? The only conclusion is that everything is now suspect. Sad.

Seeing is not believing

At the recent Adobe Max conference, one of the sneak peeks really caught my eye: Adobe Cloak.

This “content aware fill for video” is amazing and could be revolutionary if it ever sees the light of day in a product or service.

It’s powered by Adobe Sensei and it works by imagining what’s underneath the objects you want to remove.

By the way, if you want to do this today, you can use the Remove Module in Mocha Pro.

My take: the ease and speed of this is literally astounding. There were lots of great sneak peeks this year, including SonicScape for 360/VR sound editing. First come the tools, then comes the art.

Computational Video Editing may replace Assistant Editors

Eric Escobar writes on Film Independent about his trip to Siggraph 2017 and the one technology that blew his mind: Computational Video Editing.

Three researchers from Stanford University and one from Adobe demonstrated a system that:

“automatically selects the most appropriate clip from one of the input takes, for each line of dialogue, based on a user-specified set of film-editing idioms. Our system starts by segmenting the input script into lines of dialogue and then splitting each input take into a sequence of clips time-aligned with each line. Next it labels the script and the clips with high-level structural information (e.g., emotional sentiment of dialogue, camera framing of clip, etc.). After this pre-process, our interface offers a set of basic idioms that users can combine in a variety of ways to build custom editing styles. Our system encodes each basic idiom as a Hidden Markov Model that relates editing decisions to the labels extracted in the pre-process. For short scenes (< 2 minutes, 8-16 takes, 6-27 lines of dialogue) applying the user-specified combination of idioms to the pre-processed inputs generates an edited sequence in 2-3 seconds.”

That’s right. Three seconds. For a 90 second scene. Versus 90 minutes for a human. If my math is correct, that makes this system 180,000% faster!

The idioms, from the research notes:

  • Avoid jump cuts
  • Change zoom gradually
  • Emphasize character
  • Intensify emotion
  • Mirror position
  • Peaks and valleys
  • Performance fast/slow
  • Performance loud/quiet
  • Short lines
  • Speaker visible
  • Start wide
  • Zoom consistent
  • Zoom in/out

Editors combine a number of these idioms and weight them to generate different assemblies of the rushes, automatically.

Of course, editors will then proceed to polish these rough cuts, tweaking the edits and finessing the sound.

My take: This promises to take out all the tedium in editing and let editors focus on truly being creative. Eric envisions a client-side version of this in which every viewer’s version of a film is custom-generated for them, based on their favourite editing style. That may be going a little too far but what I find fascinating about this system is that it starts with the script, once again highlighting how crucial it is.

OPA chips may one day replace optical lenses

Caltech researchers have created an optical phased array chip that can capture images.

The technological breakthrough has the potential to revolutionize photography.

Ali Hajimiri, Bren Professor of Electrical Engineering and Medical Engineering in the Division of Engineering and Applied Science at Caltech, claims:

We’ve created a single thin layer of integrated silicon photonics that emulates the lens and sensor of a digital camera, reducing the thickness and cost of digital cameras. It can mimic a regular lens, but can switch from a fish-eye to a telephoto lens instantaneously — with just a simple adjustment in the way the array receives light.

He continues:

“The ability to control all the optical properties of a camera electronically using a paper-thin layer of low-cost silicon photonics without any mechanical movement, lenses, or mirrors, opens a new world of imagers that could look like wallpaper, blinds, or even wearable fabric.”

Read the PDF.

My take: This is the perhaps unseen conclusion of digitization. First film. Soon lenses. Both usurped by ones and zeroes. I wonder what the future of visual storytelling will look like when almost anything flat — walls, windows, ceilings — can become image capturing tools.

Snap Spectacles

Snap Inc. has sold more than 100,000 of its funky retro Spectacles.

Formerly Snapchat Inc., the social multimedia firm now considers itself…

“…a camera company. We believe that reinventing the camera represents our greatest opportunity to improve the way people live and communicate. Our products empower people to express themselves, live in the moment, learn about the world, and have fun together.”

In related news, a judge has ruled against a trademark infringement case brought against Snap by Eyebobs of Minnesota. They…

“…felt the similarity of the eyeball logos would lead a Spectacles user or Eyebobs customer to think the two companies were partners, or had collaborated.”

However, Snap…

“…denied these claims, adding that a crucial flaw in Eyebobs’s argument was the trademark in question. While Snap held a trademark on its eyeball logo, Eyebobs only had a trademark on its name, not its logo.”

Ouch!

You be the judge:

My take: I’m intrigued by the POV angle of the camera in these smartglasses but a negative is the circular 1088 resolution. And getting your video out of Snap and into something you can edit might take some finagling.

How to encode movies in cells using DNA

As reported widely last week, Seth Shipman, from Harvard Medical School, has used CRISPR-Cas technology to encode a 36 x 26 pixel movie into the DNA of living E. coli bacteria.

“The mini-movie, really a GIF, is a five-frame animation of a galloping thoroughbred mare named Annie G. The images were taken by the pioneering photographer Eadweard Muybridge in the late 1800s for his photo series titled ‘Human and Animal Locomotion.'”

They explain it all in a bigger movie:

They hope to turn cells into living recorders to store information from the immediate environment.

Curiously, the scientists who did this in March of this year don’t seem to have received much coverage. And they accomplished much more: encoding, among other things, a gift card and a computer virus. Obviously, the Harvard brand has better publicists.

And similar feats have been done before. IBM spelled out its name in atoms in 1989.

My take: this is just a stunt to prove we can encode information in DNA, something Mother Nature has been doing for billions of years. But of course, let’s not forget the unintended consequences. When you mess around with Mother Nature, things don’t always go as planned. Imagine encoding ‘Godzilla‘ — and then the DNA mutates!

Interactive video comes to Netflix

Casey Newton reports on The Verge that Netflix is testing interactive video with half its audience — kids.

The first title is Puss in Book: Trapped in an Epic Tale.

Carla Engelbrecht Fisher, Netflix’s director of product innovation, says:

“Kids are already talking to the screen. They’re touching every screen. They think everything is interactive.”

The result is a branching story that results in varying viewing lengths of 18 to 39 minutes.

A second title will have a simpler structure with four endings:

Note that Netflix has not invented branching stories — Choose Your Own Adventure published 250 million gamebooks over two decades in the previous millennium.

My take: it’s interesting that Netflix only has the technology working on half of their platforms. Nevertheless the potential is seductive. Think of the dramatic possibilities: “Feeling lucky, punk?” or “You take the blue pill—the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill — you stay in Wonderland, and I show you how deep the rabbit hole goes.” One thing writers will have to get used to is the cavalier ease at which the audience will be able change the narrative — kind of like print designers had to let go of exact specificity when the web came along.