Thursday, December 24, 2020

Virtual performance problems and solutions

 

Blithering horse shit in progress...originally, I was wanting to think of a way to create an AI-driven audience that can provide feedback to a performer, but ultimately, I think it would require a semantic and sentiment engine tantamount to singularity. 

Oh, well. Commence with other ideas attempting to solve virtual performance issues.

--

I love stand up comedy, and one problem comedians are having is that they need audience feedback to develop their material. Virtual performances just aren't the same, and it's a unique problem compared to other performers like musicians. Even though musicians have audience feedback, comedians modify the material beat by beat. They need to be able to see and hear, make eye contact.

I saw an article a while back with some high profile comedians talking about this problem.

The article discusses virtual performances with comedians, and specifically addresses how just the audio alone is problematic because the audience needs to be muted to avoid one viewer ruining the audio for others by contributing unwanted noise.

Listening to other interviews with comedians talking about different venues, some places have a brick wall that gives them the strong feedback that echoes when the laughs really register. The audio is essential feedback.

I  had the pleasure of sitting down and speaking with one of my favorite comedians after a show, and he mentioned having the same problem - that the shows are too empty due to COVID, so he's having difficulty developing material because he relies on audience feedback.

So, if he's wanting to reach audiences en masse, he'll have to go virtual unless the venues return to capacity. Regardless, I'm sure virtual experiences will remain popular and standard, unfortunately.

Here's an idea I'm exploring about how to get auditory feedback during a virtual performance without unwanted audio disturbances sabotaging the event.

The general idea is to create a video library of individual people laughing, match the laughter audio libraries to the video expression libraries, then simulate real-time laughs based on video only.

I read that machine learning has proven more accurate than lip reading. I'm guessing it could be possible to simulate a person's voice based on video (no audio) of them talking. 

Use facial recognition of audience video and machine learning libraries to generate a real-time laughter audio track.

1. Collect video of people laughing to create two separate libraries: 

A) Video library of people laughing

B) Audio library of people laughing

2. Train the learning engine to recognize which facial expressions likely match with various laughter audio patterns. Allow this engine to serve as a foundation generation for generating laughter audio tracks to match laughter video.

A) From this specific library sample, the machine will be able to generate new voice-specific laughter audio to match a face-specific laughter video. 

B) From this specific library sample, the machine will be able to generate non-specific laughter audio to match a non-specific laughter video. However, the generated laughter audio may match pitch based on gender recognition. For example, the machine should be able to create generic new audio to match any face.

Generate an audio track that represents the audience's laughter in real-time, driven by facial recognition captured from the camera of each audience member, such that a comedian has the auditory feedback needed to gauge audience response during a virtual performance without the interference created by actual audio from audience members.

Another consideration should include physical interaction. 

Though the primary, priority consideration might be on the turning and direction of the head of the performer, as well as eye gaze, in relation to audience members, another aspect may include motion. 

Possibly, audience would be using VR headsets/glasses for orientation to a volumetric video. 

Perhaps the audience could be virtually 'seated', i.e. digitally arranged in such a way that they appear to pivot in, and advance or recede on one screen as the performer walks in front of the monitor - left or right, forward or backward -  such that he can appear to approach audience members and actually engage in eye contact.

Or, perhaps a wrap around screen surrounds the performer, assuming the performer prefers to remain unencumbered by glasses or headset. 

Eventually, haptic feedback might be part of the interaction, depending on the nature of the performance.