Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Friday, March 14, 2025

Apple's Airpod AI + iOS 19 = real-time translation. July '25 release


Update

Accoding to Perplexity.ai:

Apple AirPods Live Translation

The upcoming live translation feature for AirPods will leverage the earbuds' microphones and the iPhone's Translate app to facilitate seamless cross-language communication12. Users will be able to hear translations through their AirPods while the original audio plays through the phone's speaker, eliminating the need to pass devices back and forth during conversations3. This enhancement builds upon Apple's existing Translate app, introduced in iOS 14, by integrating it directly with AirPods for a more streamlined experience14. The feature is expected to support multiple languages, though specific details on language availability have not yet been announced.


...After searching Perplexity, I found EZDubs and AI Phone Translator.

EZDubs offers voice emulation as I described in my requirements, and AI Phone Translator offers no lag and 99% accuracy. EZDubs' lag leaves room to hear the speaker's voice, followed by the emulating voice translation. The app offers 15+ minute free trial, and costs $14.99/mo. thereafter.

My Summary about Apple's Pending Releases

From what I understand, Apple's Airpod's will achieve real-time translation using it's native translation app and AI (upgraded Siri voice control to invoke the app) to communicate on a call (rather than out loud in person using the app on the phone).

It's the same real-time translation capability as that of an Android device working with Google Translate and earbuds for a more seamless translation experience (allows for voice control via Google Assistant instead of invoking the app on the device by hand).

So, I guess this means on both devices, you can use Google Translate conversation mode or Apple's Translate app to speak and hear translations through the phone.

Currently, I can speak to Google Translate from my Airpods, but the translation output voice only goes to the device, not back to me. I've not tried using the app on a call.


My ideal future requirements (short-list):

  • Airpods or Earbuds work with any device
  • Airpods or Earbuds work with Google Translate as well as Apple's Translation app
  • Mute speaker's words as the translated voice is expressed to prevent overlap
  • Send a text translation using voice only
  • Send a text translation using voice only while driving? E.g. "Hey Siri, translate text to Jose in Spanish"..."Okay, what do you want to say?"...and confirm the message in my language before sending.
  • Emulate the tone and rhythm of the person speaking (i.e. will Apple acquire / achieve Eleven Labs technological capability)
  • Airpods and Earbuds eventually contain all voice functionality of the phone to enable platform-agnostic voice communication, translation, and AI assistance at any distance
  • Incorporate voice capability into glasses like Meta's Raybans to incorporate AR and screen capabilities

-

Apple's real-time translation with Airpod AI + iOS 19 (July '25 public beta release) 

https://www.reuters.com/technology/apple-plans-airpods-feature-that-can-live-translate-conversations-bloomberg-news-2025-03-13/

 https://www.cnet.com/tech/apple-reportedly-planning-airpods-that-can-translate-languages-live/

https://www.tomsguide.com/phones/iphones/ios-19

From Perplexity.ai:

Based on the reported information, Apple's upcoming live translation feature for AirPods will process translations on the iPhone, not within the AirPods themselves135.

The system is expected to work as follows:

  1. The AirPods' microphones will capture the spoken audio.

  2. The iPhone will process and translate the audio using its Translate app and AI capabilities.

  3. The translated audio will be sent back to the AirPods for playback to the user.

This approach leverages the iPhone's more powerful processing capabilities and existing translation technology. The AirPods will primarily serve as input and output devices, while the iPhone acts as the translation hub.

This integration is expected to be part of the iOS 19 update, creating a seamless experience between AirPods and iPhones for real-time language translation. By processing translations on the iPhone, Apple can potentially offer more accurate and faster translations, as well as support for a wider range of languages, compared to processing within the AirPods themselves.

-

Google's current real-time translation:

Google Translate conversation mode is processed on the device, not in the earbuds.

"Live Translate" (Released 2020): 

Samsung Galaxy S24 or Pixel Phone + Pixel Buds + Google Translate = Real-time translation.

Google Pixel Buds: The Live Translate feature is available on Google Pixel Buds Pro, Pixel Buds A-Series, and Pixel Buds 2 when used with an Android device running Android 6.0 or later

https://www.amazon.com/Google-Pixel-Buds-Pro-Headphones/dp/B0B1N7Z8B3

From Perplexity.ai:

To use the real-time translation feature:

  1. Wear your Pixel Buds and connect them to your phone.

  2. Activate Google Assistant by saying "Hey Google" or pressing and holding the earbud.

  3. Say "Help me speak [language]" to launch the conversation mode in Google Translate.

  4. Press and hold the earbud to speak in your native language.

  5. Use the Google Translate app on your phone to have the other person respond.

It's important to note that while the Pixel Buds enable a more seamless translation experience, the actual translation processing occurs on the connected Android device, not within the earbuds themselves.



Wednesday, June 3, 2020

How to use SIRI voice-to-text, reference and rules for dictation


For me, texting manually is painfully slow and riddled with errors, so I rely on voice-to-text with some editing as needed. Overall, it's effective, but often it produces some bizarre results.

I often wonder what rules govern voice-to-text when using Siri, which clearly isn't contextually aware, I would estimate far from NLU, and makes some outright weird interpretations.

For example, when I'm meaning to say 'many' but Siri writes 'mini', though this isn't consistent either. I'm unclear whether it depends on how I pause or how I pronounce, though my own pronunciation seems consistent to me. I end up guessing how to trick Siri into doing what I want.

Regardless of pace of speech or enunciation, there are many words or perhaps phrases that it simply cannot handle, such as misunderstanding 'I' with 'are' at the beginning of a sentence. Sometimes, Siri puts in random names in place of words, or injects popular phrases in place of standard language. I get the impression the collective jargon has a bigger influence than any standardized language reference. And, personally, I don't wanna represent myself in this manner, I want to represent myself like an upright, civilized human being.

Here's some reference:

https://www.oreilly.com/library/view/iphone-the-missing/9781449372781/ch04.html


https://www.macworld.com/article/2048196/beyond-siri-dictation-tricks-for-the-iphone-and-ipad.html

https://www.imore.com/how-use-dictation-mac

This is more about voice control and the part on speech-to-text is brief:
https://www.apple.com/macos/catalina/docs/Voice_Control_Tech_Brief_Sept_2019.pdf


Saturday, May 2, 2020

ML AI music

OpenAI's Jukebox AI Produces Music in Any Style From Scratch -- Complete With Lyrics (venturebeat.com)22

OpenAI this week released Jukebox, a machine learning framework that generates music -- including rudimentary songs -- as raw audio in a range of genres and musical styles. From a report:Provided with a genre, artist, and lyrics as input, Jukebox outputs a new music sample produced from scratch. The code and model are available on GitHub, along with a tool to explore the generated samples. Jukebox might not be the most practical application of AI and machine learning, but as OpenAI notes, music generation pushes the boundaries of generative models. Synthesizing songs at the audio level is challenging because the sequences are quite long -- a typical 4-minute song at CD quality (44 kHz, 16-bit) has over 10 million timesteps. As a result, learning the high-level semantics of music requires models to deal with very long-range dependencies.

Monday, April 6, 2020

synthetic life - xenobots - virtual creatures


Scientists Create 'Xenobots' -- Virtual Creatures Brought to Life (nytimes.com)28

"If the last few decades of progress in artificial intelligence and in molecular biology hooked up, their love child — a class of life unlike anything that has ever lived — might resemble the dark specks doing lazy laps around a petri dish in a laboratory at Tufts University."

The New York Times reports on a mind-boggling living machine that's programmable -- and biodegradable.Strictly speaking, these life-forms do not have sex organs — or stomachs, brains or nervous systems. The one under the microscope consisted of about 2,000 living skin cells taken from a frog embryo. Bigger specimens, albeit still smaller than a millimeter-wide poppy seed, have skin cells and heart muscle cells that will begin pulsating by the end of the day. These are all programmable organisms called xenobots, the creation of which was revealed in a scientific paper in January...

A xenobot lives for only about a week, feeding on the small platelets of yolk that fill each of its cells and would normally fuel embryonic development. Because its building blocks are living cells, the entity can heal from injury, even after being torn almost in half. But what it does during its short life is decreed not by the ineffable frogginess etched into its DNA — which has not been genetically modified — but by its physical shape. And xenobots come in many shapes, all designed by roboticists in computer simulations, using physics engines similar to those in video games like Fortnite and Minecraft...

All of which makes xenobots amazing and maybe slightly unsettling — golems dreamed in silicon and then written into flesh. The implications of their existence could spill from artificial-intelligence research to fundamental questions in biology and ethics. "We are witnessing almost the birth of a new discipline of synthetic organisms," said Hod Lipson, a roboticist at the Columbia University who was not part of the research team. "I don't know if that's robotics, or zoology or something else."

An algorithm running for about 24 hours iterated through possible body shapes, after which the the two researchers tried "to sculpt cellular figurines that resembled those designs." They're now considering how the process might be automated with 3-D cell printers, and the Times ponders other future possibilities the researchers have hinted at for their Xenobots. ("Sweep up ocean microplastics into a larger, collectible ball? Deliver drugs to a specific tumor? Scrape plaque from the walls of our arteries?")

Sharing the Times' story on Twitter, Vint Cerf summed it up with just three words> .

"This is weird."

Sunday, August 18, 2019

deep fake for making music?


So, I understand that deep fake can be used for guiding audio, not just video, and I now understand that one threat could be using deep fakes being sent to first responders.

I did search for 'deepfake music' but so far nothing.

My question is whether deep fake could be applied to music, so maybe we take one vocalist and map it over another, so Geddy Lee from Rush ends up singing Ice Cream Man by Van Halen. Not that I'm dying to do that, but it opens up some interesting possibilities. 

So, for example, separate the tracks that make up a song by Van Halen, and substitute the stylings of Rush. So, for example, have AI 'listen' to the entire body of work by a particular drummer, then it can play in that style.

I think there would be a break down at the point of writing lyrics unless it's open to non-sequitur, which would probably be my approach - leave it to people's imagination and let it be poetic without concrete meaning, kind of mad libs rule engine. 

So, have Bach writing melodies to Neil's drumming so it stretches and blends more like a rock song. Now you have an original instrumental composition, just add lyrics, then add vocals and bass. How would Geddy's bassline fit in? Well, take Bach's classical rules that would drive a more proper bassline, but add in a rule engine for Geddy's style. Layer on a funk filter. Swap in a Steve Harris bassline from Iron Maiden. Adjust to fit a particular period, like more 80s Rush, or more 90s Rush. Not that we really want this necessarily, but it's a starting point. Ultimately, don't we want originality? Genuine emotion? Isn't that what art is? Can a machine do that? Music today is all about style, not substance. It's formulaic as hell.

What would also be really nice for me - I want to discover new music. Forget about Pandora and Spotify. Let's have AI identify the patterns of music I like, then find music of a similar ilk. I actually thought about this in 1999, the idea was to be able to visually search music. The mood and/or composition would be represented by a color, like a heat map, and you could zoom into various colors to quickly identify variants within that range, filtering by attribute.

Of course, the extension of this would be to create music of an ilk that I like. But not necessarily just by copying styles as I described previously and tracking them over other compositions. Let's zoom into why of I liking certain music. It's not just the rhythm or the instrumentation or compositions or mood. That's what makes this question of creativity and originality interesting - what separates humans from AI. We're talking about personality and point of view. Philosophy and context and emotion, not merely style.

 

Friday, June 14, 2019

AI personalized brain music

https://techcrunch.com/2019/06/12/we-wont-be-listening-to-music-in-a-decade-according-to-vinod-khosla/

https://www.creativedestructionlab.com/

We won’t be listening to music in a decade according to Vinod Khosla

Depending on who you ask, the advantage of technology based on artificial or machine intelligence could be a topsy-turvy funhouse mirror world — even in some very fundamental ways.
“I actually think 10 years from now, you won’t be listening to music,” is a thing venture capitalist Vinod Khosla said onstage today during a fireside chat at Creative Destruction Lab’s  second annual Super Session event.
Instead, he believes we’ll be listening to custom song equivalents that are automatically designed specifically for each individual, and tailored to their brain, their listening preferences and their particular needs.
Khosla noted that AI-created music is already making big strides — and it’s true that it’s come a long way in the past couple of years, as noted recently by journalist Stuart Dredge writing on Medium.
As Dredge points out, one recent trend is the rise of mood or activity-based playlists on Spotify  and channels on YouTube. There are plenty of these types of things where the artist, album and song name are not at all important, or even really surfaced. Not to mention that there’s a big financial incentive for an entity like Spotify to prefer machine-made alternatives, as it could help alleviate or eliminate the licensing costs that severely limit their ability to make margin on their primary business of serving up music to customers.
AI-generated chart toppers and general mood music is one thing, but a custom soundtrack specific to every individual is another. It definitely sidesteps the question of what happens to the communal aspect of music when everyone’s music-replacing auditory experience is unique to the person. Guess we’ll find out in 10 years.