Last Week in AI #278 – Apple eyes OpenAI board seat, OpenAI caught storing chats in plain text, Kyutai’s Moshi, and more!

Top News

Apple Poised to Get OpenAI Board Observer Role as Part of AI Pact

Apple Inc. has secured an observer role on OpenAI’s board, with Phil Schiller, Apple’s App Store head and former marketing chief, appointed to the position. This move follows Apple’s announcement to integrate ChatGPT into its iPhone, iPad, and Mac devices. The board observer role allows Apple to attend OpenAI’s board meetings without voting rights, aligning it with Microsoft, OpenAI’s main AI provider. Schiller’s appointment highlights Apple’s strategic AI partnership, despite potential conflicts due to Apple’s historical rivalry with Microsoft.

OpenAI’s ChatGPT Mac app was storing conversations in plain text

OpenAI’s ChatGPT macOS app was found to be storing user conversations in plain text, making them easily accessible to potential malicious actors. The issue was demonstrated by Pedro José Pereira Vieito, who created an app that could access and display these conversations. After being alerted by The Verge, OpenAI released an update that encrypts the chats, rendering Pereira Vieito’s app ineffective. The original issue was discovered when Pereira Vieito questioned why OpenAI had opted out of using Apple’s app sandbox protections, which are mandatory for software distributed via the Mac App Store but not for apps like ChatGPT that are distributed through their own websites.

Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak

Kyutai has open-sourced Moshi, a real-time native multimodal foundation AI model that can listen and speak simultaneously. Moshi, which surpasses some functionalities of OpenAI’s GPT-4o, is designed to understand and express emotions, and can handle two audio streams at once. The model was fine-tuned using 100,000 synthetic conversations and achieves an end-to-end latency of 200 milliseconds. Kyutai has also developed a smaller variant of Moshi that can run on consumer-sized devices. The company has incorporated watermarking to detect AI-generated audio and has committed to transparency and collaborative development by releasing Moshi as an open-source project. Future iterations of Moshi will be refined based on user feedback, and its licensing aims to foster widespread adoption and innovation.

Other News

Tools

This is Google AI, and it’s coming to the Pixel 9 – Google is introducing new AI features, including a Recall-like function for screenshots, to the upcoming Pixel 9 series, aiming to enhance user experience and privacy.

Mozilla Llamafile, Builders Projects Shine at AI Engineers World’s Fair – Mozilla’s Llamafile and Builders Projects were showcased at the AI Engineer World’s Fair, emphasizing democratized access to AI technology and the potential of local AI applications.

WhatsApp is developing an AI avatar generator – WhatsApp is developing a generative AI feature that allows users to create personalized avatars of themselves for use in various settings, using a combination of user-supplied images, text prompts, and Meta’s AI Llama model.

Suno launches iPhone app — now you can make AI music on the go – Create AI music on the go with Suno’s new iPhone app, which allows users to generate full songs from text prompts or sound, and offers in-app purchases for Pro and Premier plans.

Resemble AI’s next-generation AI audio detection model, Detect-2B, is 94% accurate – Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More

Perplexity’s ‘Pro Search’ AI upgrade makes it better at math and research – Perplexity’s Pro Search AI upgrade enhances its ability to provide in-depth answers to complex queries, but the company faces accusations of plagiarism.

Cloudflare rolls out feature for blocking AI companies’ web scrapers – Cloudflare Inc. today debuted a new no-code feature for preventing artificial intelligence developers from scraping website content. The capability is available as part of the company’s flagship CDN, or content delivery network.

Business

Report: AI Video Startup Runway Looking to Raise $450 Million – AI video startup Runway is seeking to raise $450 million at a $4 billion valuation, offering software that generates videos from text prompts or images.

Exclusive-AI coding startup Magic seeks $1.5-billion valuation in new funding round, sources say – AI coding startup Magic seeks $1.5-billion valuation in new funding round, aiming to develop AI models for writing software and competing in the growing market for AI code assistants.

AI Firm ElevenLabs Sets Audio Reader Pact With Judy Garland, James Dean, Burt Reynolds and Laurence Olivier Estates – AI audio firm ElevenLabs has secured agreements with the estates of iconic celebrities to use their voices for reading books, articles, and other text material on its new Reader App, emphasizing the use of “Iconic Voices” for individual streaming use.

Let’s Get Agentic: LangChain and LlamaIndex Talk AI Agents – AI agents were the focus of two leading AI engineering startups, LangChain and LlamaIndex, at the AI Engineer World’s Fair, with LangChain offering a purpose-built agent architecture and LlamaIndex rebranding AI agents as “knowledge assistants” for enterprises.

Anthropic Pushes for Third-Party AI Model Evaluations – Anthropic is advocating for third-party AI model evaluations to assess capabilities and risks, focusing on safety levels, advanced metrics, and efficient evaluation development.

Playing the Waiting Game for ChatGPT’s Voice Assistant – Waiting for OpenAI’s updated voice assistant after a brief access.

Research

Mind-reading AI recreates what you’re looking at with amazing accuracy – AI can accurately recreate what someone is looking at based on brain activity, greatly improved when the AI learns which parts of the brain to focus on.

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding – A new framework called OMG-LLaVA combines powerful pixel-level vision understanding with reasoning abilities, accepting various visual and text prompts for flexible user interaction and achieving image-level, object-level, and pixel-level reasoning and understanding in a single model.

Scaling Synthetic Data Creation with 1,000,000,000 Personas – A novel persona-driven data synthesis methodology leverages a large language model to create diverse synthetic data at scale, showcasing its versatility and potential impact on research and development.

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation – A new benchmark, MMEvalPro, addresses biases in evaluating Large Multimodal Models (LMMs) by introducing a trilogy evaluation pipeline and more rigorous metrics, making evaluations more challenging and trustworthy.

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents – OmniJARVIS is a novel Vision-Language-Action model that uses unified tokenization of multimodal interaction data to enable open-world instruction-following agents in Minecraft, demonstrating strong reasoning and efficient decision-making capabilities.

Revealing Fine-Grained Values and Opinions in Large Language Models – Uncovering biases and disparities in large language models through analysis of responses to politically charged statements and the impact of demographic features on outcomes.

AI Agents That Matter – AI agents’ benchmarks and evaluation practices have shortcomings, such as a narrow focus on accuracy, leading to needlessly complex and costly agents, and a lack of standardization in evaluation practices, hindering their usefulness in real-world applications.

Magic Insert: Style-Aware Drag-and-Drop – A new method called Magic Insert allows for style-aware drag-and-drop of subjects from one image to another, addressing the challenges of style-aware personalization and realistic object insertion in stylized images.

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems – Challenging long-context LLMs and RAG systems with the “Summary of a Haystack” task, the article presents a new evaluation method for AI systems’ output quality on long-context tasks, highlighting the need for improved performance.

Study reveals why AI models that analyze medical images can be biased – AI models analyzing medical images can be biased, particularly against women and people of color, and while debiasing strategies can improve fairness, they may not generalize well to new patient populations.

Concerns

Deepfake Creators Are Revictimizing GirlsDoPorn Sex Trafficking Survivors – Deepfake creators are using videos of sex trafficking victims to create nonconsensual videos, revictimizing survivors and highlighting the need for laws to protect those targeted.

Policy

The US intelligence community is embracing generative AI – The US intelligence community is embracing generative AI for various classified uses, but is also cautious about the potential risks and is working with top officials to ensure responsible and secure implementation.

Analysis

Will generative AI transform robotics? – Generative AI’s potential to transform robotics is debated, with the need for substantial scaling and training data, as well as the challenge of reliability and trust in real-world interactions, highlighted.

Last Week in AI #278 – Apple eyes OpenAI board seat, OpenAI caught storing chats in plain text, Kyutai’s Moshi, and more!

Top News

Apple Poised to Get OpenAI Board Observer Role as Part of AI Pact

OpenAI’s ChatGPT Mac app was storing conversations in plain text

Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak

Other News

Tools

Business

Research

Concerns

Policy

Analysis

About The Author

Leave a reply Cancel reply

Recent Posts

Recent Comments

Last Week in AI #278 – Apple eyes OpenAI board seat, OpenAI caught storing chats in plain text, Kyutai’s Moshi, and more!

Top News

Apple Poised to Get OpenAI Board Observer Role as Part of AI Pact

OpenAI’s ChatGPT Mac app was storing conversations in plain text

Kyutai Open Sources Moshi: A Real-Time Native Multimodal Foundation AI Model that can Listen and Speak

Other News

Tools

Business

Research

Concerns

Policy

Analysis

About The Author

Related Posts

LWiAI Podcast #180 – Ideogram v2, Imagen 3, AI in 2030, Agent Q, SB 1047

Last Week in AI #329 – GPT 5.2, GenAI.mil, Disney in Sora

Google’s Duet AI invades Workspace, AI21 Labs hits $1.4B valuation, OpenAI’s ChatGPT goes enterprise, Driverless cars face a bumpy road, and more!

LWiAI podcast #177 – Instagram AI Bots, Noam Shazeer -> Google, FLUX.1, SAM2

Leave a reply Cancel reply

Recent Posts

Recent Comments