Skip to content

Beyond APIs: Empowering agents with Computer Use in Copilot Studio

Robot-Web-browser

While APIs have long been the go-to for integrating systems, many business processes still rely on user interfaces that aren’t easily accessible through traditional means. That’s where Computer-Using Agents (CUA) in Copilot Studio come in. If you want to understand how CUA works, and most important, what’s the difference between classic Robotic Process Automation (RPA) and Computer-Using Agents (CUA), this blog post by my colleague Timo Pertilä is a must-read.

This blog post dives into a real-world business case that showcases how agents, form processing and UI automation can work together to automate tasks that were previously manual, repetitive or dependent on legacy systems.

What are Computer-Using Agents (CUA) ?

Computer-Using Agents (CUA) are a new capability in Microsoft Copilot Studio that allow AI agents to interact with websites and desktop applications just like a human would, by visually navigating the interface, clicking buttons, typing into fields, and selecting menus.

Instead of relying on APIs or connectors, CUAs use a combination of vision-based AI and reasoning to simulate mouse and keyboard actions. This means they can automate tasks even in systems that don’t expose APIs, like legacy apps or third-party websites.

And how do they work?

  1. You describe the task in natural language (e.g.: “Go to http://www.customersite.com and download latest news)
  2. The CUA interprets your instructions and performs the task on a configured Windows machine or hosted browser.
  3. It adapts to UI changes (like button positions or screen layouts), making it more resilient than traditional UI automation.

According to Microsoft documentation, CUA are ideal for some certain use cases like automated data entry into web forms or desktop apps, or document processing from portals without APIs. However, they are not intended for sensitive or high-risk scenarios, such as executing financial transactions or healthcare operations.

Use case: Processing rental agreement documents

Manual process

Imagine you work on a company that manages some legal documents, and among those, rental agreements. Every time you receive a rental agreement document by email, you need to do the following tasks manually:

  1. Download the attached document in the email, and store it in a repository (e.g. in SharePoint).
  2. Open the document, and extract some important information, like tenant, landlord, start date, end date, monthly rent and deposit.
  3. Review all information is correct.
  4. Launch a new web browser instance, access a web application, and create a new record.
  5. Send a confirmation email to rental agreement submitter.

Below you can see a screenshot of the web app used to manage rental agreements (I developed this fake app with the invaluable help of GitHub Copilot):

Could we convert this manual process into an automatic one using agents, AI Builder, and computer use? Let’s find out!

New approach: Agents, AI Builder and CUA

We could try to redesign the process with this new approach:

And this could be handled by an agent, that performs the following actions:

  1. Process a rental agreement document when a new email is received.
  2. Save the attachment into a SharePoint document library.
  3. Extract information from the document using AI Builder and GPT Prompts.
  4. Using computer use, create a new record into a legacy system with the information extracted in the previous step.
  5. If everything is correct, send a notification to the document submitter that rental agreement has been processed.

Let’s build our agent using Copilot Studio!

Building the agent with Copilot Studio

You can read the general configuration of the agent below:

Let’s go into more detail for the different actions configured in the agent.

1. Create a new rental agreement (Computer use)

Before implementing the whole process, we can start by adding a new computer use Tool to the agent to create a new rental agreement record in the web application we showed before.

Important: Currently, computer use is in public preview for environments located only in the US.

We need to fill in the following values:

  • Instructions: Describe, step-by-step, how to add a new rental agreement record using the web application.
  • Machine: We will use a hosted browser, so no need to worry about machine configuration, resources allocations, permissions, etc, which is great.
  • Inputs: Input parameters that the tool will need. In this case, we need to pass Landlord, tenant, start date, end date, monthly rent and deposit.
  • Stored credentials: For simplicity, we don’t need to log in to the system in this case, so we don’t need to store any credentials. If we needed this, then we would need to create a key vault in Azure to store them.

Important: As Microsoft states here, Hosted browser is not recommended for production use, as usage may be throttled based on demand. You can read about its limitations here (e.g. a user can have only a single active hosted browser session at any given time).

If we want to test how it works, we can start a conversation with the agent so it triggers the computer use tool:

And then we will see how computer use works, as the agent will show in the testing page what it is doing and sharing some screen captures with information:

When the computer use task ends, we receive a notification message (step #6).

We can also check in the web application if a new record is created:

That’s amazing! Our computer use tool is working as expected if it receives the correct parameters. And most important: We didn’t design any RPA flow here! It’s enough with just describing what the bot needs to do.

In next sections, we will implement the agent flow to extract information from a rental agreement document, and finally the step to trigger the process.

2. Extract information from rental agreement (agent flow)

In Copilot Studio, we added a new agent flow as a Tool in our agent, as we want to parse the PDF document (the rental agreement), extract the fields we need, and send them to the agent.

We use a GPT Prompt, where we ask to extract specific fields from the document, and also specify the format for date (start date, end date) and currency fields (monthly rent, deposit).

And very important, those field values are then sent back to the agent:

3. Process email (Trigger)

Now we can add a trigger, specifically when a new email arrives with a certain subject (Rental agreement document) and an attachment, we save it in SharePoint:

You can find the whole solution in this GitHub repo.

Testing the agent

Finally it’s time to test our agent. Of course, the most interesting part here is checking if computer use works. In order to test the agent, we send an email with the following rental agreement in PDF format:

We can check all the steps the agent executed, and see how computer use received some input parameters and the rationale behind it:

Finally, if we open the web application, we can see a new record has been created with the information extracted from the PDF document:

What didn’t work (and why)?

UI controls

One of the most significant challenges we encountered during the development of our computer use tool was interacting with date picker controls. Unlike standard buttons or text fields, date pickers often rely on dynamic UI elements, such as pop-up calendars, nested selectors, or JavaScript-driven behaviours that can vary widely across applications.

As Microsoft states in this article about computer use known limitations, there could be difficulties with certain UI controls such as dropdowns, date pickers, or custom widgets.

That’s the main reason why we don’t see a date picker control for start and end date controls in the web application (as this is a demo site, we were able to update the UI).

Instructions

Another key learning from our experience was the importance of precise and well-structured instructions. Like many AI-powered tools today, CUAs rely heavily on the clarity of the prompts they receive. We found that using the right terminology made a significant difference: For example, saying fill in the form worked far better than submit the form. The agent interprets tasks based on the language used, so vague or ambiguous phrasing can lead to unexpected behaviour or failed executions.

While computer use is a powerful automation tool, it comes with security risks. Ambiguous instructions or unexpected screen content may cause the AI to take unintended actions, potentially affecting your device, data, or access to sensitive systems.

Conclusion: Promising potential with practical boundaries

The integration of agents, AI Builder, and Computer-Using Agents (CUA) in Copilot Studio opens up exciting possibilities for automating manual tasks, especially in web applications where traditional APIs fall short. The feature is genuinely impressive and shows strong potential, but it’s important to recognise that it’s still in its early stages of maturity and will need to evolve to handle more complex scenarios.

CUA currently performs best with basic tasks in web applications, where Microsoft observed a success rate of around 80%. However, when applied to desktop applications, the success rate drops significantly to about 35%, making it less reliable for those environments. It’s also worth noting that CUAs are not suitable for sensitive or high-risk domains, such as executing financial transactions or handling healthcare data, where precision and compliance are critical.

Finally, for development and testing, hosted machines can be useful, but for production-grade reliability, it’s strongly recommended to use a Windows machines that you configure and control. This ensures better stability, visibility, and security throughout the automation lifecycle.

Hopefully, as the technology evolves over the coming months and years, we’ll see CUAs become more powerful, reliable, and capable of handling a broader range of business scenarios.

AI-driven automationComputer useCopilot StudioCUA

Leave a Reply

Your email address will not be published. Required fields are marked *

Forward Forever logo
Cookie settings

This website uses cookies so that we can provide you with the best possible user experience. Please select the cookies you want to allow.