Hello Engineering Leaders and AI Enthusiasts!
Another eventful week in the AI realm. Lots of big news from huge enterprises.
In today’s edition:
🧮 Google DeepMind’s LLM solves complex math
📘 OpenAI released its Prompt Engineering Guide
🤫 ByteDance secretly uses OpenAI’s Tech
🔥 OpenAI’s new ‘Preparedness Framework’ to track AI risks
🚀 Google Research’s new approach to improve performance of LLMs
🖼️ NVIDIA’s new GAvatar creates realistic 3D avatars
🎥 Google’s VideoPoet is the ultimate all-in-one video AI
🎵 Microsoft Copilot turns your ideas into songs with Suno
💡 Runway introduces text-to-speech and video ratios for Gen-2
🎬 Alibaba’s DreaMoving produces HQ customized human videos
💻 Apple optimises LLMs for Edge use cases
🚀 Nvidia’s biggest Chinese competitor unveils cutting-edge AI GPUs
🧚♂️ Meta’s Fairy can generate videos 44x faster
🤖 NVIDIA presents new text-to-4D model
🌟 Midjourney V6 has enhanced prompting and coherence
Let’s go!
Google DeepMind’s LLM solves complex math
Google DeepMind has used an LLM called FunSearch to solve an unsolved math problem. FunSearch combines language model Codey with other systems to suggest code that will solve the problem. After several iterations, FunSearch produced a correct and previously unknown solution to the cap set problem.
This approach differs from DeepMind’s previous tools and it has shown promising results in solving the bin packing problem.
OpenAI released its Prompt Engineering Guide
OpenAI released its own Prompt Engineering Guide. This guide shares strategies and tactics for improving results from LLMs like GPT-4. The methods described in the guide can sometimes be combined for greater effect. They encourage experimentation to find the methods that work best for you.
By following these strategies, users can improve the performance and reliability of the language models.
ByteDance secretly uses OpenAI’s Tech
ByteDance, the parent company of TikTok, has been secretly using OpenAI’s technology to develop its own LLM called Project Seed. This goes against OpenAI’s terms of service, prohibiting the use of their model output to develop competing AI models.
OpenAI’s new ‘Preparedness Framework’ to track AI risks
OpenAI published a new safety preparedness framework to manage AI risks.
OpenAI’s updated “Preparedness Framework” aims to identify and address catastrophic risks, with high-risk models prohibited from deployment and critical risks halting further development.
Google Research’s new approach to improve LLM performance
Google Research released a new approach to improve the performance of LLMs; It answers complex natural language questions. The approach combines knowledge retrieval with the LLM and uses a ReAct-style agent that can reason and act upon external knowledge.
NVIDIA’s new GAvatar creates realistic 3D avatars
Nvidia has announced GAvatar, a new technology that allows for creating realistic and animatable 3D avatars using Gaussian splatting. Gaussian splatting combines the advantages of explicit (mesh) and implicit (NeRF) 3D representations.
GAvatar outperforms existing methods in terms of appearance and geometry quality and achieves fast rendering at high resolutions.
Google’s VideoPoet is the ultimate all-in-one video AI
To explore the application of language models in video generation, Google Research introduces VideoPoet, an LLM that is capable of a wide variety of video generation tasks, including:
Text-to-video
Image-to-video
Video editing
Video stylization
Video inpainting and outpainting
Video-to-audio
VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator.
Microsoft Copilot turns your ideas into songs with Suno
Microsoft has partnered with Suno, a leader in AI-based music creation, to bring their capabilities to Microsoft Copilot. Users can enter prompts into Copilot and have Suno, via a plug-in, bring their musical ideas to life. Suno can generate complete songs– including lyrics, instrumentals, and singing voices.
The experience will begin rolling out to users starting today, ramping up in the coming weeks.
Runway introduces text-to-speech and video ratios for Gen-2
Text to Speech: Users can now generate voiceovers and dialogue with simple-to-use and highly expressive Text-to-speech. It is available for all plans starting today.
Ratios for Gen-2: Quickly and easily change the ratio of your generations to better suit the channels you’re creating for. Choose from 16:9, 9:16, 1:1, 4:3, 3:4.
We need your help!
We are working on a Gen AI survey and would love your input.
It takes just 2 minutes.
The survey insights will help us both.
And hey, you might also win a $100 Amazon gift card!
Every response counts. Thanks in advance!
Alibaba’s DreaMoving produces HQ customized human videos
Alibaba’s Animate Anyone saga continues, now with the release of DreaMoving by its research. DreaMoving is a diffusion-based, controllable video generation framework to produce high-quality customized human videos. It can generate high-quality and high-fidelity videos given guidance sequence and simple content description, e.g., text and reference image, as input.
Apple optimises LLMs for Edge use cases
Apple has published a paper, ‘LLM in a flash: Efficient Large Language Model Inference with Limited Memory’, outlining a method for running LLMs on devices that surpass the available DRAM capacity.
The methods here collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.
Nvidia’s biggest Chinese competitor unveils cutting-edge AI GPUs
Chinese GPU manufacturer Moore Threads announced the MTT S4000, its latest graphics card for AI and data center compute workloads. It’s brand-new flagship will feature in the KUAE Intelligent Computing Center, a data center containing clusters of 1,000 S4000 GPUs each.
Moore Threads is also partnering with many other Chinese companies, including Lenovo, to get its KUAE hardware and software ecosystem off the ground.
Meta’s Fairy can generate videos 44x faster
GenAI Meta research has introduced Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy.
Remarkably efficient, Fairy generates 120-frame 512×384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x.
NVIDIA presents a new text-to-4D model
NVIDIA research presents Align Your Gaussians (AYG) for high-quality text-to-4D dynamic scene generation. It can generate diverse, vivid, detailed and 3D-consistent dynamic 4D scenes, achieving state-of-the-art text-to-4D performance.
Midjouney V6 has improved prompting and image coherence
Midjourney has started alpha-testing its V6 models. Here is what’s new in MJ V6:
Much more accurate prompt following as well as longer prompts
Improved coherence, and model knowledge
Improved image prompting and remix
Minor text drawing ability
Improved upscalers, with both ‘subtle‘ and ‘creative‘ modes (increases resolution by 2x)
An entirely new prompting method had been developed, so users will need to re-learn how to prompt.
That’s all for now!
Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.
Thanks for reading, and see you on Monday. 😊
Read More in The AI Edge