Hey Everyone,
This could be an important announcement, whereas the last two years (2022-2024) LLMs have showed us an architecture of predicting language (token prediction), the next two years 2025-2027 could be more about predicting computer use.
In an environment where so many companies are developing AI agents (Asana) and AI builder architectures (Salesforce), are agents the new chatbots?
AI agents and the intersection between AI and software engineering now becomes even more important as task automation is the new focus of the Generative AI movement (2025-2030). Chatbots and Copilots had a limited impact on increasing individual worker productivity, so can AI agents improve the productivity of entire departments, teams and companies?
A new product focused Enterprise AI era has arrived. As consumers flocked to ChatGPT, they didn’t flock to much else. Anthropic’s Computer Use is mostly targeting developers.
Enter ‘Computer Use’ by Anthropic
Anthropic yesterday, October 22nd, 2024 introduced an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. They also introduced an experiment product called Computer Use, which I find just fascinating.
In a world where computers might learn how we use computers, will we still need to spend most of our lives behind one?
Developing a computer use model
Generative AI should improve our ability to do software automation at scale. But will it? A lot of software developers already use Claude.
Claude’s computer use capabilities according to Anthropic are groundbreaking, while they remain experimental. That sounds a lot like OpenAI’s o1 model, released about five weeks ago.
If a frontier model can control your computer, is the era of pesky prompt-engineering going to be short-lived? Prompt literacy according to people like are demonstrating new avenues of learning and engagement for students.
Computer Use ironically allows Claude to control your computer screen based on a prompt and take actions on your behalf!
In this demo, Claude orchestrates a multi-step task by searching the web, using native applications, and creating a plan with the resulting information (see LinkedIn post).
Watch the Video
3:03
Computer use is being released early for feedback from developers says Anthropic. A wide range of AI startups are working on products similar to Computer Use, so 2025 and the later 2020s might have some interesting surprises in store for us.
Anthropic’s new ‘computer use’ feature for Claude AI is now available to developers.
AI that is able to write software and the Generative AI aspect of no-code might become fairly important. There are some early glimmers in this area that are frankly fairly exciting.
Who are Anthropic?
Anthropic is the most mature rival to OpenAI on track for a significant funding round soon, and are Amazon and Google backed. Started in 2021, by ex-OpenAI VPs and siblings Dario Amodei (CEO) and Daniela Amodei (President). Prior to launching Anthropic, Dario Amodei was the VP of Research at OpenAI, while Daniela was the VP of Safety & Policy at OpenAI. Computer use is exciting because we are talking about artificial intelligence agents that can use a computer to complete complex tasks like a human would.
It’s worth reading Anthropic’s recent announcements:
Read about Anthropic’s new models.
Read about Anthropic’s experimental computer use.
Anthropic’s upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku are easy to understand without the weird lingo OpenAI uses. Frontier models are improving fast, as Claude 3.5 Haiku matches the performance of Claude 3 Opus.
Anthropic oddly said nothing about Claude 3.5 Opus.
“Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding. GitLab, which tested the model for DevSecOps tasks, found it delivered stronger reasoning (up to 10% across use cases) with no added latency, making it an ideal choice to power multi-step software development processes.” – Anthropic Team
OK, Computer
Anthropic’s take on alignment is also very much careful in the ethics involved here:
Unlike OpenAI’s Code Interpreter mode, Anthropic are not providing hosted virtual machine computers for the model to interact with. You call the Claude models as usual, sending it both text and screenshots of the current state of the computer you have tasked it with controlling. It sends back commands about what you should do next. – Simon Willison
Anthropic’s Projects and Artifacts and OpenAI’s Canvas have really in recent months made interacting with these frontier models a bit more appealing and easier. Computer Use is a fascinating signal Anthropic is giving the world. Anthropic is likely saving Opus 3.5 to release to coincide with GPT-5.
So when Anthropic says Claude can now use computers, it means this is the most respectful of ways.
“A vast amount of modern work happens via computers. Enabling AIs to interact directly with computer software in the same way people do will unlock a huge range of applications that simply aren’t possible for the current generation of AI assistants.” – Anthropic Team
Why this is important for Society?
It’s not clear where this is heading exactly. Some suspect the use cases in agentic coding with automated debugging, customer support, and education could foster new kinds of innovation, entrepreneurship and young people with a new set of AI skills that could transform society.
Access to your Screen is Sensitive Data!
Some of these companies clearly want your work data though as illustrated by Microsoft’s sketchy product called Microsoft Recall that’s a limited clone of a popular AI startup.
Microsoft’s Copilot Vision feature and OpenAI’s desktop app for ChatGPT have shown what their AI tools can do based on seeing your computer’s screen. But isn’t that sensitive and private data? We can expect Anthropic to be more ethically aligned than Microsoft, OpenAI and Google here. At least for now.
Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta.
Anthropic does caution that computer use is still experimental and can be “cumbersome and error-prone.”
Simon has also started a Substack:
Follow him on X here, I find him a solid source. Deidre Bosa of CNBC, talks about Computer Use here as a broad overview for those not into technical lingo.
How does it work?
Computer use appears to work by taking static screenshots that are constantly sent back to the API in real-time.
Then Claude can move your cursor, click, and type text.
This Beta version really is just a preview. Since it can take action for ~15 minutes but is limited with context windows and importantly and you can’t train computer use on internal data in this experimental trial.
This capability is currently in public beta and enables developers to direct Claude to perform tasks such as moving a cursor, clicking buttons, and typing text based on visual input from a computer screen. Presumably and eventually for the automation of useful tasks.
Demos of this nature should really be taken with a grain of salt as we have found out by many early players, e.g. Devin by Cognition Labs.
Is the next Frontier Computer Use?
Are people going to get more elaborate in their AI tool usage, prompting and combinations or are frontier models going to just get better than us?
Anthropic has commercial incentives to say that the next frontier is computer use: that AI models that don’t have to interact via bespoke tools, but that instead are empowered to use essentially any piece of software as instructed.
But isn’t the entire point that you want them to carry out the tasks more independently? Seems like a bit of a contradiction.
So in theory the computer use feature allows developers to automate repetitive tasks, build and test software, and conduct research by enabling Claude to interact with various applications directly. But what this preview makes clear is Agentic AI is likely going to take at least a decade to get good (not months).
is one of the best voices in AI to understand if Models can reason and what the academic literature says about it. To chain together more complex software and tasks frontier models would need significant breakthroughs and not just longer context windows (Magic AI backed by Eric Schmidt).
There’s room for another AI Startup winner in Computer Use
It’s fairly probable OpenAI won’t necessarily be the winner in Frontier model Computer Use, a fairly odd choice of name. Claude certainly is very human aligned and able to follow instructions! So maybe Anthropic is the right custodian for such a problem. Is this just an Enterprise AI play?
Computer use is now Available on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. You might just be helping to train the future of AI.
Anthropic needs funding badly, and in its recent Manifesto are some signs of what it hopes will become possible. For the record it’s unbearably sci-fi and utopian. The company referred to Computer Use as a “next-gen algorithm for AI self-teaching” — one it believed that could, if all goes according to plan, automate large portions of the economy someday. Do you need 15k words to convince investors that Frontier models are “Machines of loving grace”? Apparently the CEO of Anthropic does.
Anthropic will likely value itself at about $44 Billion for its next funding round. It will easily be more than double its valuation from earlier this year. It’s not clear how much Amazon or Google might be involved in this next round which I think we can expect to be announced early in 2025.
Anthropic will want to try to keep up with the funding prowess of the likes of OpenAI and Elon Musk’s xAI.
Progress of Frontier Models will soon be measured on Doing not Benchmarks
Anthropic clearly anticipates rapid improvements in the computer use functionality as feedback from developers is integrated into future updates. Computer use leans heavily on multi-modal LLMs and you can expect Google to be working hard on this as well. Operating computers involves the ability to see and interpret images—in this case, images of a computer screen. There are major data harvesting and prompt-injection risks with any technology working with images from your screen.
Generative AI’s answer to No-Code
What all of this illustrates whether you call it Agentic AI, computer use or software automation in larger context windows, the intersection of AI and coding is the holy grail. The best company or best AI startups in doing well, will be huge winners. The best at packaging it into products anyone and developers can use will rapidly get more popular.
This is no longer just about frontier models or their capabilities but about building great products. Great products with a great UX, workflow and flexible control with trust and security built-in.
Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text, but it isn’t as grand as it sounds. Many AI startups already do and will offer such capabilities too and most don’t work very well.
Singapore based Sam Witteveen, has a recent short video intro to Anthropio’s announcements that I found easy to listen to. I’ll be covering great YouTube voices in AI soon in a separate article. But will AI win over human based prompt-engineering that remains to be seen, props to the hilarious image by the team over at Wired here. Who is the puppet and who is the master in all of this anyways? Somebody tell Bill Gates it’s not about personal assistants.
The Coding Automation race of AI heats up 🔥
Generative AI so far has been really about extending human agency. Anthropic notes that other companies such as Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company (think Arc) have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete. This is all very cumbersome, error prone and almost trivial today.
Computer Use is not exciting or new, it’s just formalizing a framework for how they are working towards more and better agents. Honestly contrary to the PR, this new feature, called “Computer Use,” is highly unlikely to have far-reaching implications for industries that rely on repetitive tasks involving multiple applications and tabs. But the idea of AI predicting the next step in a computer tasks is important. Not necessarily Claude taking over your screen.
That AIs can now physically control computers, moving cursors, scrolling through pages and even clicking buttons just like humans do is not eerie or awesome, it’s fairly basic. It’s what they can do with it that is the important thing. It’s a weird way of talking about software automation in a world where RPA sort of hit a wall and was too expensive to maintain.
Anthropic’s computer use is likely to be a B2B and Enterprise AI product. Consider this, Replit is using Claude 3.5 Sonnet’s capabilities with computer use and UI navigation to develop a key feature that evaluates apps as they’re being built for their Replit Agent product, which they introduced in mid September, 2024. Their founder explains here.
Computer Use is about dozens of companies trying to use Agents and make No-code a bit more mainstream. It also gives some of the more flashy AI startups without significant products hope of a breakthrough or an environment where progress might be accelerated and more probable. But that window for AI agents will likely will peak in the 2025 to 2027 period ushered in by BigTech and its PR-machine. Microsoft also claims new autonomous AI agents, like we spoke about in yesterday’s post.
There’s a human somewhere who wants the data on your screen:
Anthropic was founded six years after OpenAI but in many ways is already their peer competitor. OpenAI is entangled with Microsoft and Amazon has emerged as Anthropic’s most serious backer. These research labs really are the maiden of BigTech in one way or another. But they are innovating real products based on their expensive models.
America Crowns yet another Duopoly
OpenAI and Anthorpic’s revenue growth dwarfs other AI startups in the world. They are the Generative AI duopoly.
These are different shades of VC backed Techno-optimism. Yet essentially how different are they really?
Anthropic claims Claude 3.5 Sonnet manifests industry leading software engineering skills. How good in this will the models of Nvidia, Databricks or the Chinese AI startups get like Alibaba Cloud’s Qwen? Look I get it, the intersection of AI and code is and will be even more important in the coming years.
As these frontier model builder companies go public IPO by the late 2020s we’ll have a better idea how much equity Microsoft, Amazon and Google will have in them and how useful they actually are for the grandiose claims they have made. Most of the well funded AI in coding startups won’t exist by then, and consolidation will have crushed smaller players.
I’m not sure AI using my computer is the right metaphor. But it’s fairly entertaining to speculate upon.
“Computer use capabilities have the potential to change how tasks that require navigation across multiple applications are performed,” said Mike Krieger, Chief Product Officer at Anthropic
Mike who is actually one of the Instagram co-founders, as Anthropic lags badly behind OpenAI in B2C product adoption and momentum.
Krieger, who was also chief technology officer of Meta-owned Instagram, grew the platform to 1 billion users and increased its engineering team to more than 450 people during his time there. Can he do it again with Claude? Claude has artifacts, projects and computer use – but pales in comparison to OpenAI’s number of AI products in development.
Anthropic does not seem to have a SearchGPT or Sora equivalent. Both could have significant revenue implications for OpenAI.
We have to agree with the temptation to compare o1 models with Computer Use and Sonnet 3.5. Who comes out more favorably?
B2C vs. B2B and Enterprise AI Growth
Anthropic are clearly more conscientious about their products and more B2B and Enterprise AI orientated so far. They don’t have ChatGPT fame to piggyback on. Most consumers have likely not yet heard of Claude, while some have heard of Perplexity.
Anthropic needs a Search related product, badly. While Perplexity suffers from a deluge of lawsuits, Claude is trustworthy. Google didn’t invest $2 Billion in Anthropic for nothing.
Anthropic also appears more customer centric in terms of the partners its worth with, this human alignment appears to impact Enterprise partners. OpenAI meanwhile is more hype driven trying to alter consumer opinions and usage.
Anthropic also had to balance its major investors Amazon and Google.
Developers can try the Computer Use via Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform.
If OpenAI is outrageously risk-taking, Anthropic is very constrained in the AI battles it can (afford to) pick.
Anthropic has a likeable approach to Generative AI, but it will need to scale revenue substantially even to remain independent in the decade ahead. It’s a doubtful task when the cost of talent and building Models is as high as it is today and will soon become.
Computer Use is a good buzzword for what Frontier models being involved as agents might look like. Generative AI is all about mimicking human behavior and that’s literally what Computer Use appears to be doing:
When a developer tasks Claude with using a piece of computer software and gives it the necessary access, Claude looks at screenshots of what’s visible to the user, then counts how many pixels vertically or horizontally it needs to move a cursor in order to click in the correct place. – Anthropic Blog
Do I even want a frontier model to be clicking for me? How does this democratize AI?
A lot of developers are already using Claude and combinations like Github Copilot or Cursor, so what now? Cognition uses the new Claude 3.5 Sonnet for autonomous AI evaluations, and experienced substantial improvements in coding, planning, and problem-solving compared to the previous version.
Is Computer Use just some of what B2B Coding Startups are going to be using in their frameworks?
The Browser Company, in using the model for automating web-based workflows, noted Claude 3.5 Sonnet outperformed every model they’ve tested before.Is navigating computer screens enough? Sounds a really AI-to-mouse for me. 🐭 Click, click. Will Computer Use even be useful for develops for computer applications and automations? See the video.
In their example it’s literally a fictional demo. Anthropic says that Computer Use requires reasoning about how and when to carry out specific operations in response to what’s on the screen.
Newser startups that do similar things according to TechCrunch, include the likes of Relay, Induced AI, and Automat.
Just like AGI, everyone seems to define what AI agents are differently too. In 2024 Agentic AI also sounds like really wishful thinking. AI agents are built for productivity and to complete multistep, complex tasks on a user’s behalf. Well maybe they will one day be able to?
Important note: Claude 3.5 Sonnet’s current ability to use computers is imperfect. Some actions that people perform effortlessly—scrolling, dragging, zooming—currently present challenges. So we encourage exploration with low-risk tasks. Anthropic said on X they expect this to rapidly improve in the coming months.
Top Tweets on Computer Use on October 23rd, 2024
Computer Use for orchestrating tasks
It’s this convenient
Is Claude the king of code?
We can at least speculate on Claude Opus 3.5.
Claude 3.5 Sonnet is the new Cursor
Thanks for reading! I try and cover AI news and their important implications on business, technology and society at large.
Read More in AI Supremacy