Optimizing custom copilot (agent) performance with Azure Load Testing: A comprehensive guide

Posted on: 2024-10-30

As we move into the next phase of digital transformation, the role of custom copilots is set to become increasingly pivotal. By leveraging the advanced capabilities of custom copilots, companies can automate routine tasks, provide real-time insights, and facilitate seamless interactions across various platforms.

Ensuring the performance and reliability of custom-built copilots is crucial for delivering best user experiences. This blog post delves into the process of testing the performance of a custom copilot created with Copilot Studio, leveraging the robust capabilities of Azure Load Testing. We will explore the key steps involved, from setting up the testing environment to analysing the results, and provide insights on how to optimize your copilot for peak performance.

Introduction: Power Platform Assistant copilot

We created a really simple custom copilot that provides support to makers about the Power Platform: Training resources, best practices for Power Automate and Copilot Studio, DAX reference for Power BI and Power Pages resources (a mix of everything!). In order to answer those questions, we used some knowledge sources but also built some custom topics:

Those custom topics are really simple, like the certifications one:

Therefore, depending on the questions, the copilot will use the knowledge sources or the custom topics to answer them. We can connect our Copilot Studio copilot to Azure Application Insights and check the logs for all the activity that the copilot is undertaking. In the following screenshot you can see a Kusto query to get the activity for the last 30 minutes, showing messages sent and received by the copilot:

Create an Apache JMeter test plan

In the Copilot Studio Samples GitHub repository you can find very useful resources, and among them, there is the Copilot Studio load testing with JMeter tutorial. In this repository there’s a step-by-step guide on how to create and execute an Apache JMeter test plan using some utterances located in different CSV files for each conversational flow.

In our case, we created 3 different CSV files that simulates conversation flows from users, asking questions about Power Automate, Copilot Studio, Power Pages, certification resources and governance recommendations, like the following:

This is the first time we are using Apache JMeter, but honestly, it is really simple to configure a test plan as we only need to define the following:

Files that contain utterances with a conversational flow (you can see a sample file here).
Number of threads to simulate the number of concurrent virtual users (in Azure Load Testing it can be up to 250).
Ramp-up period to gradually start all threads, simulating users arriving over time. For instance, if this value is 3000 seconds, and the number of threads is 250, it means that it will simulate a new user starts a conversation every 12 seconds.
Loop count to define how many times the conversational flows should be executed (1 on our sample).

We recommend you to download and open the Multi Group Websocket script file that you will find in the repository and modify it according to your needs.

Finally, you can run the test using JMeter in GUI mode, or export the script, so we can use it afterwards in Azure Load Testing.

You will need to update the JMX script file, and in the fetch token action replace directline.botframework.com by any of the available regions. In our case, it is europe.directline.botframework.com. You can find an updated version of the script here.

Create a performance test in Azure Load Testing

Azure Load Testing is a fully managed load-testing service that enables developers and testers to generate high-scale load and identify performance bottlenecks in their applications. The service supports running existing Apache JMeter scripts and provides rich dashboards for easy trouble shooting.

In order to create a load test using an existing Apache JMeter script file, we can follow this step-by-step guide. But before loading the script and the CSV files with the conversational flows, we need to make some adjustments:

In the script file (JMX), remove any folder references in the CSV files location.
In the CSV files, remove any header row.

You can download the files used in this blog post here.

Once we have done previous steps, we will be ready to create our test. In this sample, we uploaded the script file and 3 CSV files with the utterances:

In the Load section we will define how many engine instances we require to run our test. With a single instance, we would have 250 threads (we configured this value in the JMeter script file), but if we select 4 instances, then the test will use 1000 threads. We can also generate loads from multiple regions, simulating accesses from different geographical areas:

In the Monitoring section we can configure components to monitor server-side metrics during the test. In this case, we selected the Azure Application Insights resource we are using to log all copilot events (we configured it when we created the custom copilot).

By default, when adding an application insights resource, the metrics measured are: server response time (average), failed requests (count), and server requests (count).

Finally, we could also define some Test criteria, which allows us to define conditions for successful or failed tests. In this case, the test will be successful if the Copilot Response time is less than 8 seconds in 90% of the cases.

It is also important to mention that the test can stop automatically in case there are high number of errors (90%) in a time window (60 seconds), so we can avoid in spending some money on tests that have been incorrectly configured.

Now we are ready to start the test and check the results!

Analyse the test results

After running the test (in case, for 12 minutes), we can view test results in a dashboard:

As we can see, we had 16 virtual users using the copilot at the same time (sending requests), and the copilot response time was 6,49 seconds or less in 90% of the cases. We can also check that there are a lot of Websocket I/O errors, probably because the endpoint it’s overloaded, although we don’t know the exact details of that.

When testing the copilot, be careful with the OpenAI usage limits as it’s very easy to reach them (8000 requests per minute). When the copilot has reached this limit, the following message will be shown: “The OpenAI usage limit is reached. Try again later or contact your admin to increase the limit. Error Code: OpenAIRateLimitReached Conversation“.

During the test we reached out the OpenAI usage limits, so it’s important to take this into consideration: Reduce number of threads, decrease ramp-up time or reach out to Microsoft to adjust the limits.

What could we do to reduce the copilot response time? It’s not an easy one, but one quick answer would be to use custom topics and provide fixed answers to the most frequently asked questions instead of relying in generative answers (which needs to evaluate all knowledge source and generate an answer, which obviously requires some time no matter this information is on the web or in a document). Do you have any other ideas? Do not hesitate to add them in the comments!

Conclusions

The importance of performance testing for custom copilots cannot be overstated. Depending on the criticality and scope of the copilot, incorporating performance testing as an integral part of the project is essential for optimizing its performance. By leveraging tools like Azure Load Testing, businesses can identify potential bottlenecks, ensure scalability, and enhance the overall user experience.

In the end, usability, performance, reliability, security, and user feedback experiences are a must-have in any custom copilot to ensure it meets the highest standards of functionality and user satisfaction.

Optimizing custom copilot (agent) performance with Azure Load Testing: A comprehensive guide

Introduction: Power Platform Assistant copilot

Create an Apache JMeter test plan

Create a performance test in Azure Load Testing

Analyse the test results

Conclusions

Related insights

Leave a Reply Cancel reply