Hello Engineering Leaders and AI Enthusiasts!

Another eventful week in the AI realm. Lots of big news from huge enterprises.

In today’s edition:

🧮 Google DeepMind’s LLM solves complex math
📘 OpenAI released its Prompt Engineering Guide
🤫 ByteDance secretly uses OpenAI’s Tech
🔥 OpenAI’s new ‘Preparedness Framework’ to track AI risks
🚀 Google Research’s new approach to improve performance of LLMs
🖼️ NVIDIA’s new GAvatar creates realistic 3D avatars
🎥 Google’s VideoPoet is the ultimate all-in-one video AI
🎵 Microsoft Copilot turns your ideas into songs with Suno
💡 Runway introduces text-to-speech and video ratios for Gen-2
🎬 Alibaba’s DreaMoving produces HQ customized human videos
💻 Apple optimises LLMs for Edge use cases
🚀 Nvidia’s biggest Chinese competitor unveils cutting-edge AI GPUs
🧚‍♂️ Meta’s Fairy can generate videos 44x faster
🤖 NVIDIA presents new text-to-4D model
🌟 Midjourney V6 has enhanced prompting and coherence

Let’s go!

Google DeepMind’s LLM solves complex math

Google DeepMind has used an LLM called FunSearch to solve an unsolved math problem. FunSearch combines language model Codey with other systems to suggest code that will solve the problem. After several iterations, FunSearch produced a correct and previously unknown solution to the cap set problem. 

This approach differs from DeepMind’s previous tools and it has shown promising results in solving the bin packing problem.

Source

OpenAI released its Prompt Engineering Guide

OpenAI released its own Prompt Engineering Guide. This guide shares strategies and tactics for improving results from LLMs like GPT-4. The methods described in the guide can sometimes be combined for greater effect. They encourage experimentation to find the methods that work best for you.

By following these strategies, users can improve the performance and reliability of the language models.

Source

ByteDance secretly uses OpenAI’s Tech

ByteDance, the parent company of TikTok, has been secretly using OpenAI’s technology to develop its own LLM called Project Seed. This goes against OpenAI’s terms of service, prohibiting the use of their model output to develop competing AI models.

Source

OpenAI’s new ‘Preparedness Framework’ to track AI risks

OpenAI published a new safety preparedness framework to manage AI risks.

OpenAI’s updated “Preparedness Framework” aims to identify and address catastrophic risks, with high-risk models prohibited from deployment and critical risks halting further development.

Source

Google Research’s new approach to improve LLM performance

Google Research released a new approach to improve the performance of LLMs; It answers complex natural language questions. The approach combines knowledge retrieval with the LLM and uses a ReAct-style agent that can reason and act upon external knowledge. 

Source

NVIDIA’s new GAvatar creates realistic 3D avatars

Nvidia has announced GAvatar, a new technology that allows for creating realistic and animatable 3D avatars using Gaussian splatting. Gaussian splatting combines the advantages of explicit (mesh) and implicit (NeRF) 3D representations. 

GAvatar outperforms existing methods in terms of appearance and geometry quality and achieves fast rendering at high resolutions.

Source

Google’s VideoPoet is the ultimate all-in-one video AI

To explore the application of language models in video generation, Google Research introduces VideoPoet, an LLM that is capable of a wide variety of video generation tasks, including:

Text-to-video

Image-to-video

Video editing 

Video stylization

Video inpainting and outpainting

Video-to-audio

VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator.

Source

Microsoft Copilot turns your ideas into songs with Suno

Microsoft has partnered with Suno, a leader in AI-based music creation, to bring their capabilities to Microsoft Copilot. Users can enter prompts into Copilot and have Suno, via a plug-in, bring their musical ideas to life. Suno can generate complete songs– including lyrics, instrumentals, and singing voices.

The experience will begin rolling out to users starting today, ramping up in the coming weeks.

Source

Runway introduces text-to-speech and video ratios for Gen-2

Text to Speech: Users can now generate voiceovers and dialogue with simple-to-use and highly expressive Text-to-speech. It is available for all plans starting today.

Ratios for Gen-2: Quickly and easily change the ratio of your generations to better suit the channels you’re creating for. Choose from 16:9, 9:16, 1:1, 4:3, 3:4.

Source

We need your help!

We are working on a Gen AI survey and would love your input.
It takes just 2 minutes. 
The survey insights will help us both.
And hey, you might also win a $100 Amazon gift card!

Every response counts. Thanks in advance!

Alibaba’s DreaMoving produces HQ customized human videos

Alibaba’s Animate Anyone saga continues, now with the release of DreaMoving by its research. DreaMoving is a diffusion-based, controllable video generation framework to produce high-quality customized human videos. It can generate high-quality and high-fidelity videos given guidance sequence and simple content description, e.g., text and reference image, as input.

Source

Apple optimises LLMs for Edge use cases

Apple has published a paper, ‘LLM in a flash: Efficient Large Language Model Inference with Limited Memory’, outlining a method for running LLMs on devices that surpass the available DRAM capacity.

The methods here collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.

Source

Nvidia’s biggest Chinese competitor unveils cutting-edge AI GPUs

Chinese GPU manufacturer Moore Threads announced the MTT S4000, its latest graphics card for AI and data center compute workloads. It’s brand-new flagship will feature in the KUAE Intelligent Computing Center, a data center containing clusters of 1,000 S4000 GPUs each. 

Moore Threads is also partnering with many other Chinese companies, including Lenovo, to get its KUAE hardware and software ecosystem off the ground.

Source

Meta’s Fairy can generate videos 44x faster

GenAI Meta research has introduced Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy.

Remarkably efficient, Fairy generates 120-frame 512×384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x.

Source

NVIDIA presents a new text-to-4D model

NVIDIA research presents Align Your Gaussians (AYG) for high-quality text-to-4D dynamic scene generation. It can generate diverse, vivid, detailed and 3D-consistent dynamic 4D scenes, achieving state-of-the-art text-to-4D performance.

Source

Midjouney V6 has improved prompting and image coherence

Midjourney has started alpha-testing its V6 models. Here is what’s new in MJ V6:

Much more accurate prompt following as well as longer prompts

Improved coherence, and model knowledge

Improved image prompting and remix

Minor text drawing ability

Improved upscalers, with both ‘subtle‘ and ‘creative‘ modes (increases resolution by 2x)

An entirely new prompting method had been developed, so users will need to re-learn how to prompt.

Source

That’s all for now!

Subscribe to The AI Edge and gain exclusive access to content enjoyed by professionals from Moody’s, Vonage, Voya, WEHI, Cox, INSEAD, and other esteemed organizations.

Subscribe now

Thanks for reading, and see you on Monday. 😊

Read More in  The AI Edge