Image credit: Pixabay

Hey Everyone,

Forward

2023 has been the year when AGI, or artificial general intelligence, first felt attainable or even near in the history of human civilization.

This Newsletter is article is made free by our sponsor Contentful.

The future of intelligent composable content. With generative AI the possibilities are as limitless as your team’s creativity.

As the leading intelligent composable content platform, Contentful enables developers and marketers alike to easily deliver compliant on-brand experiences and speed and scale—all within one unified content system.

Get Started

The AGI Explosion

A lot of the debate around AGI in 2023 stems from the fact of the aggressive marketing of some Generative A.I. leaders.

OpenAI’s motto is “Creating safe AGI that benefits all of humanity”. Of course we don’t have a valid definition or global consensus of what AGI is or how to determine if it will benefit humanity or whom it might hurt. A.I journalist Karen Hao recently summed it up well in her talk with BigThink (YouTube).

If you want to catch all my deep dives support the channel.

Subscribe now

I find AGI so interesting and important, this Newsletter has its own section called AGI. There has been a lot of debate about whether GPT-4 or GPT-5 would pass the Turing Test.

Read the Paper

While 2023 was a year characterized by FOMO with regards to LLMs and Generative A.I. integration in Enterprise and firms, there’s another kind of sentiment happening here, a kind of FOASI happening, if you will.

Fear of Artificial Super Intelligence

Share

FOASI refers to “fear of artificial super-intelligence”. There is also a lot of group-think around AGI that appears to be more marketing based than in touch with reality.

My favorite interview in recent times with a CEO I actually respect (unlike Sam Altman who tried to oust Helen Toner and played board members against each other), is Jensen Huang. This interview is so good for understanding Nvidia, I listened to it twice on different sittings. [link or watch below]

While speculation around AGI in 2023 has been an assault on the senses (if you care to pay attention), many argue we’re not quite there yet. Society is more likely to get “cat-level” or “dog-level” AI years before human-level AI, Meta chief scientist Yann LeCun recently pointed out yet again. Yet in 2023 we’ve also had far-flung essays proclaiming “Artificial General Intelligence Is Already Here”.

However the debate has even made its way to academic papers, some of which masquerade as thinly veiled public relations in the name of science (Microsoft). Speaking of AGI and academic papers though, enter Google. Suddenly in 2023, AI engineers are called “scientists”, and we have to ask serious questions about this AGI marketing trend. Still more A.I. papers on the theme continue to roll out.

I asked from Confessions of a Code Addict to take a look at this paper with his great summaries of A.I. papers.

“Levels of AGI”

Share

Read the Paper

The paper may be important too as GDM (Google Deepmind) researchers Meredith Morris, Jascha Sohl-Dickstein, Noah Fiedel, Allan Dafoe, Aleksandra Faust, Clément Farabet, Shane Legg and Tris Warkentin propose a comprehensive framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) systems and their precursors.

Tris explained on a linkedin post how Similar to “Levels of Self-Driving,” and that GDM hopes that this will provide a common language to compare models, assess risks, and measure progress along the path to AGI. But are BigTech employees and researchers the best researchers, “experts” and scientists who should be defining for us what AGI is for the rest of the world?

That question may be beyond the scope of this article but it is worth considering. What follows is an examination of the actual paper in more detail.

By December, 2023.

Charting the Path to AGI: DeepMind’s Levels and Risks Framework

A year has passed since the release of ChatGPT, and AI has progressed rapidly in this short amount of time. This progress has also triggered discussions about artificial general intelligence (AGI). Some people believe that ChatGPT has shown sparks of AGI, while some believe that state-of-the-art large language models (LLM) are already AGI. And, some scientists believe that we are not yet at the level of AGI but accelerating towards it at a rapid pace.

When we talk about AGI, it also brings up the debate around the risks it poses to society. A part of the industry is even asking for regulations to control the research and development of AI to mitigate the potential risks. However, everyone in this debate has their own definition of AGI, which makes the discussion very subjective.

Before we talk about regulations, it’s important that the scientists come up with an objective definition of AGI, design a benchmark to test for AGI, and create a framework to assess the associated risks based on the capabilities of these models.

Till now, very little progress has happened on this front. However, recently a team at DeepMind released a paper which may build the foundation for defining AGI and the associated risks. The paper outlines a gradual pathway toward AGI going from “no AI” to “artificial superintelligence”, along with a framework for assessing the risks associated with each level of AGI.

This paper might be the start of an important discussion for formulating the risks of AGI. This article will unpack the paper for you, and highlight the important insights. So, without wasting anymore words, let’s dive in.

For those looking to stay on top of the latest developments in AI policy discussions, consider subscribing to “AI Policy Perspectives”, a Substack curated by a team of researchers at DeepMind. They publish a comprehensive monthly report highlighting the most significant news and updates around AI policy.

Six Key Principles for Defining Levels of AGI

Scientists and philosophers have long thought about AGI, and it is important to consider how they have thought about it in the past. The authors analyze nine definitions of AGI proposed between the period of 1950 and 2023 [Turing 1950, Searle 1980, Legg 2008, Shanahan 2015, OpenAI 2018, Marcus 2022, Suleyman and Bhaskar 2023, Norvig et al. 2023] and come up with six key principles that should form the basis of a framework for defining AGI and its risks. These six principles are as follows:

Focus on capabilities, not processes: Many AGI definitions are focused on the mechanisms of AGI, such as sentience, consciousness, or humans-like thinking processes. Instead, the focus needs to be on the AI model’s ability to perform tasks, irrespective of the underlying processes which may drive it.

Focus on generality and performance: While there is an obvious focus on generality when defining AGI, it needs to be accompanied with a focus on performance. An AI can be deemed generalizing only if it performs well on a benchmark composed of a diverse array of tasks.

Focus on Cognitive and Metacognitive tasks: The benchmark for testing AGI systems needs to include both cognitive and metacognitive tasks. Metacognitive tasks measure the ability of the AI system to learn. Whereas cognitive tasks are non-physical tasks, i.e., a robotic embodiment is not needed to perform them. The authors believe that the ability to perform physical tasks increases a system’s generality, but that should not be a requirement for being defined as an AGI.

Focus on potential, not deployment: The emphasis should be on an AI system’s potential for achieving a goal, as opposed to its deployment in the real-world and actually doing that. This is important because deployment in the real-world can be time consuming, ridden with regulatory hurdles, and risky. For instance, demonstrating that an AI has potential of substituting labor is sufficient, rather than requiring actual labor substitution in real-world scenarios.

Focus on ecological validity: Another aspect that should be considered when designing the tasks for benchmarking AGI systems is their real-world value, i.e., they should be meaningfully useful to society. Otherwise, there are many tasks that are easy to automate and quantify but have no alignment with the real-world.

Focus on the path to AGI: Finally, instead of thinking of AGI as an end goal, we need to focus on the path for getting there. The road to AGI is going to be through small and incremental improvements, and not through a sudden discovery in a lab. A well defined gradual pathway leading to AGI will help the discussion around policy and regulation of these systems.

Read more of this Author

Go here

Following are some of his articles which have made an impact on the readers:

What Every Developer Should Know About GPU Computing

A Linear Algebra Trick for Computing Fibonacci Numbers Fast

Decoding the ACL Paper: Gzip and KNN Rival BERT in Text Classification

Understanding DeepMind’s AlphaDev Breakthrough in Optimizing Sorting Algorithms

How CPython Implements and Uses Bloom Filters for String Processing

Six Levels of AGI

Using these six key principles, the authors delineate six gradual levels of AGI. These levels are defined using ‘performance’ and ‘generality’ as the two dimensions to measure them. 

Here, performance refers to how well an AI system can perform a task in comparison to a skilled human. While, generality is concerned with the breadth of the tasks that the AI can perform. For instance, is it skilled in a narrow domain such as protein folding, or is it capable of performing well on tasks from a wide array of domains.

The following table shows these six levels of AGI as defined in the paper. The two columns, Narrow and General, specify the generality of the AI, while the six rows define the six levels of AGI with gradually increasing levels of capabilities. 

Table 1 from the paper showing Levels of AGI with examples

Although, these leveled definitions are straightforward, few important points are worth highlighting:

To certify an AI model at a certain level, the model needs to perform well on most (not all) of the tasks at that level. For example, to become a competent AI, it should be able to perform as good as the median skilled human on most of the tasks at that level.

In reality, the performance of AI systems on these benchmarks is going to be very uneven. For instance, they may be able to perform well on some tasks at “Competent” and “Expert” level and yet be certified as “Emergent” because for most of the tasks their performance is at the “Emergent” level. 

The order in which these systems acquire skills can have safety implications. For instance, acquiring expertise in chemical engineering before learning about ethics can be dangerous.

Also, the progression from one level to another may not be linear, but at a faster rate. For instance, once an AI system acquires the ability to learn new tasks, it may be able to progress through the levels much faster than anticipated.

Finally, even if an AI system is capable of performing at a certain level as per this rubric, in reality it may not be able to achieve that level of performance when deployed. This may happen due to the limitations of the environment, or the interface. For instance, even though the DALL-E 2 model is superior in drawing skills than most humans, it is categorized as an expert level narrow AI system, instead of virtuoso or higher. This limitation is because of the prompting based interface which limits the quality of the model’s output.

Defining a Benchmark for AGI

The 6 levels of AGI as defined in the paper give us a pathway to classify the progress of AI systems. However, it does not specify any benchmark that should be used to test these systems to certify them as belonging to one of these levels.

These tasks need to be diverse, challenging, and relevant to real-world use cases. This is a serious undertaking which needs to include multiple perspectives. The benchmarks need to measure both cognitive and metacognitive abilities, and include tasks from diverse areas such as mathematical and logical reasoning, linguistics, coding, spatial reasoning, social intelligence, the ability to learn new skills, and creativity. 

Exhaustively enumerating tasks for such a benchmark is a monumental task and impossible to get right in the first attempt. Additionally, this benchmark needs to be a living and breathing piece of work which can get updated with new tasks as we understand more about the AI systems and their capabilities. For these reasons, the authors leave out the definition of a representative benchmark from the paper. However, they note that this is an important goal for the AI community to strive for. 

A Framework for Assessing Risks of AGI 

Just defining levels of AGI is a job half done, this also needs to be accompanied by a framework for assessing risks associated with each of these levels. For defining this framework, the authors introduce the concept of autonomy.

The autonomy of an AI system depends on its capabilities, and the environment in which it is operating. The environment here means the interface which enables human-AI interaction. The authors introduce six levels of autonomy which are directly correlated with the levels of AGI. Progressing through the levels of AGI unlocks higher levels of autonomy in the model. Because of this, the interface design of these systems is going to play a crucial role in the safe deployment of AGI in the real-world.

The following table from the paper shows these six levels of AGI autonomy, and examples of some of the associated risks:

Table 2 from the paper showing autonomy levels of AI and associated risks

I would highlight couple of points about this framework:

Each level of AGI opens up a new set of risks. However, this also means that the risks from the previous levels are no longer an issue. For instance, the “Expert” AGI might introduce risks of economic disruption and job replacement, but it also eliminates risks associated with “Emerging” and “Competent” AGI, such as incorrect task execution.

The paper lists down the six levels of autonomy along with concrete examples of the associated risks. But these are just a few examples, and not an exhaustive list. The interplay between the human-AI interaction and the capabilities of the AI models is going to determine the exact set of risks we are dealing with. But a framework like this makes the discussion on AGI more constructive, and can help the industry, and the governments in designing the course of policy around AI safety. 

Conclusion

The rate at which AI technologies are advancing demands a cohesive framework for recognizing AGI and assessing its potential impact. The DeepMind paper sets a foundation by charting a structured trajectory with clear AGI levels. What remains critical is the development of detailed benchmarks that will reliably measure AI capabilities across these levels. These benchmarks require input from a broad spectrum of fields to ensure they encompass the necessary complexities of real-world tasks.

Moreover, a nuanced risk assessment framework is essential for the development of advanced AI systems. Each incremental step of AI advancement brings with it distinctive challenges and risks that must be anticipated and managed. A well-constructed framework will dispel misconceptions and mitigate the rush to implement potentially stifling regulations that could hinder progress in AI research.

The AI community needs to extend the dialog started by the paper, filling in gaps with in-depth analyses and tools for testing AGI systems. Only through such concerted efforts will we be able to align the march toward AGI with prudent oversight and ethical considerations, ensuring that the transition to higher levels of intelligence is both beneficial and secure.

About the author

Abhinav is a seasoned software engineer with over a decade of experience in the industry in various roles ranging from dev ops, backend engineering to ML.

He is an explorer who likes to break things open to understand how they work from the inside. This passion of learning has led him on the path of writing to share the insider’s perspective with his audience.

On his Substack, “Confessions of a Code Addict”, he talks about a myriad of topics including AI, programming languages, compilers, databases and many more. His in-depth exploration and insights offer readers a unique understanding of these subjects from a practitioner’s viewpoint, making complex concepts accessible and engaging

Of late I’ve really enjoyed in A.I. paper summaries. Thanks for reading!

If you know someone interested in AGI, why not share it with them.

Share

Read More in  AI Supremacy