Skip to content

What is the most effective method for searching through your documents in Copilot Studio?

In the current technology context, the ability to efficiently query internal documents is crucial for building a custom copilot that meets any company specific needs. With options like DataverseSharePoint, and Azure OpenAI, makers have a rich toolkit at their disposal when building their copilots with Copilot Studio. Our recent experiment looked into the strengths and nuances of each option, providing valuable insights that we are excited to share in this article. Stay tuned to discover which platform can best enhance your copilot’s capabilities and speed up your document management process.

Context

Imagine that you want to build a custom copilot with Copilot Studio that needs to answer questions based on the documentation you have. Thanks to Generative Answers feature we can specify which data sources the copilot could use, that are shown in the following list, with their advantages and drawbacks:

  • Use uploaded documents:
    • All uploaded documents are internally stored in Dataverse, with a maximum file size of 3 MB for each one.
    • There is no limitation on the number of files, as long as you have enough space in your Dataverse instance.
    • Copilot users will have access to all the files.
    • Uploaded document become part of the copilot solution, therefore exporting and importing a copilot solution includes the documents.
    • Supported data types are doc, docx, xls, xlsx, pdf, txt, md, log, htm, html, odt, ods, odp, epub, rtf, json, yml, yaml, tex and Apple iWork (pages, key, numbers) files.
    • There are no extra costs.
  • Use content on SharePoint:
    • All documents are stored within a SharePoint site (in one or many document libraries).
    • It supports up to 4 different sites (URLs).
    • When using SharePoint as a data source, calls are made on behalf of the user chatting with the copilot, and therefore, permissions are applied on each document (if a user has no permission on a document, it will not be used in generative answers).
    • It can only use SharePoint files that are under 3 MB.
    • Copilot will search the provided URL (like contoso.sharepoint.com/sites/myteam) but it can have a maximum of 2 level deep (e.g. contoso.sharepoint.com/sites/myteam/mydocs would not be a valid URL).
    • Supported data types are aspx (modern pages), docx, pptx, pdf.
    • There are no extra costs.
  • Use a connection to Azure OpenAI:
    • Azure OpenAI supports different data sources: Azure Cosmos DB, Upload files (preview), Azure Blob Storage (preview).
    • Maximum file size is approximately 16 MB.
    • Supported data types are txt, md, html, docx, pptx and pdf.
    • Need an Azure OpenAI, Azure AI search and Azure Blob storage resources.
    • There are extra costs, which is based on consumption for each above-mentioned resource.

Taking into consideration the different options that we have, which one will provide the best results in a costly effective way? Let’s check it out!

The experiment

In order to compare the different options, we created different copilot projects in Copilot Studio that will use Generative Answers feature and get data from the NASA Earth book. This book has 178 pages, and it contains amazing images taken from NASA satellites with some descriptions. In this GitHub repository you can find the book in a single file (about 35 MB) and divided into 178 different files (1 file for each page, which never exceeds 1 MB in size).

We created 4 different Copilot Studio projects using the following options:

  • Dataverse: We uploaded the 178 files within the project (remember, in Dataverse we can only store files up to 3 MB size).
  • SharePoint with Copilot for Microsoft 365 licenses: We created a SharePoint site, and uploaded in a single document library the same 178 files that we mentioned before (we have the same file size limitation than in Dataverse). All users in the tenant have a Copilot for Microsoft 365 license.
  • SharePoint: We created a SharePoint site in a tenant where nobody has any Microsoft 365 license, and uploaded the 178 files in a single document library. Why? Because we want to see if there is any difference in the results whether a user has a Copilot for Microsoft 365 license.
  • Azure OpenAI: We created an AzureOpenAI instance that generates answers based on content stored in Azure Blob Storage. This is the only case where we uploaded a single pdf file (35 MB file size).

It is important to note that when using SharePoint as data source for generative answers, we need to enable authentication. You can learn how to do it watching this excellent video by Dewain Robinson.

When configuring the Azure AI Search (to index data stored in the pdf file), we found the following error: Error detecting index schema from data source: “Document is 35352230 bytes, which exceed the maximum size 16777216 bytes for document extraction for your current service tier”. As we mentioned before, that’s normal, as the maximum file size is 16 MB and our pdf file is 35 MB. Surprisingly, we changed the Azure AI Search tier from Basic to Standard, and it seemed to solve the problem, although the cost is radically different (70€/month Basic vs 234€/month Standard).

Besides that, we also configured all the Copilot Studio projects in the same way:

  • Copilot content moderation: Medium (Copilot generates more answers, but the responses may be less relevant).
  • Dynamic chaining with generative actions (preview): Off.
  • Boost conversational coverage with generative answers: On.
  • Instructions (preview): Empty (None).

Using Medium content moderation and turning Dynamic chaining off we get pretty decent results, and that is the main reason behind those settings.

The results

We tested our copilots asking two very basic questions about some content that it is already in the book. That content is mainly in the beginning and at the end of it. The following screenshot shows the results in asking the first question:

As you can see, the most detailed answers are provided by copilots using Dataverse and Azure OpenAI. Apart from that, we also see that the copilots using SharePoint return the exact same results, although in one case all users have a Copilot for Microsoft 365 license assigned. It is important to note that both SharePoint sites are in two different tenants located in different regions.

What if we try with another question? Let’s put to the test:

In this case, the content to answer the question is located at the very end of the document, but whichever the option is used, it is indexed and returned. And again, like in the first question, the best results are returned by copilots using Dataverse and Azure OpenAI, and when using SharePoint, the result is exactly the same.

Conclusions

If the efficiency is measured in terms of accuracy, cost and ease of integration, copilots using Dataverse could be considered the most efficient option, especially when the results are satisfactory and there are no additional costs as compared to the Azure OpenAI service.

We can certainly use SharePoint as a data source if the copilot needs to consider document permissions for each user, but that is the only case where we would recommend that. In any case, it does not matter if our users have a Copilot for Microsoft 365 license or not. Therefore, if we want to leverage the Dataverse option, you can use this very smart solution to copy documents from SharePoint to Dataverse, and use those in a Copilot Studio project.

Finally, using a connection to Azure OpenAI in our Copilot Studio project could be also a good choice, but currently the costs are a serious drawback. We should also test that option with larger files, as we need to remember that we have a 16 MB file size restriction, and 3 MB in Dataverse and SharePoint. If you want to know how to use this approach to get data from a database instead of documents, you can read this post.

At this point, probably someone is going to ask a question like: “Dataverse option is really good, but I need to convert and/or split all documents to be less than 3 MB. How can I do that?“. Surely we can use the Power Platform to do it, but let’s find out how to do it in another blog post.

Azure OpenAICopilot StudioDataverseSharePoint

One response to "What is the most effective method for searching through your documents in Copilot Studio?"

  1. Having been curious for a long time, I was eager to compare each of the options. However, due to stringent regulations within my organization, I was unable to test all these scenarios. I’m delighted to have come across your post. It saved me a significant amount of time and I thoroughly enjoyed the read.

    Best Regards,
    Vignesh

Leave a Reply

Your email address will not be published. Required fields are marked *