As AI works its way into more and more business processes, it has become increasingly important for auditors to understand where, why, when and how organizations use it and what impact it is having not only on the entity itself but its various stakeholders as well.
Speaking at a virtual conference on AI and finance hosted by Financial Executives International, Ryan Hittner, an audit and assurance principal with Big Four firm Deloitte, noted that since the technology is still relatively new it has not yet had time to significantly impact the audit process. However, given AI's rapid rate of development and adoption throughout the economy, he expects this will change soon, and it won't be long before auditors are routinely examining AI systems as a natural part of the engagement. As auditors are preparing for this future, he recommended that companies do as well.
"We expect lots of AI tools to inject themselves into multiple areas. We think most companies should be getting ready for this. If you're using AI and doing it in a way where no one is aware it is being used, or without controls on top of it, I think there is some risk for audits, both internal and external," he said.

There are several risks that are especially relevant to the audit process. The primary risk, he said, is accuracy. While models are improving in this area, they still have the tendency to make things up, which might be fine for creative writing but terrible for financial data reporting. Second, AI tends to lack transparency, which is especially problematic for auditors, as their decision-making process is often opaque, so unlike a human, an AI may not necessarily be able to explain why it classified an invoice a particular way, or how it decided on a specific chart of accounts for that invoice. Finally, there is the fact that AI can be unpredictable. Auditors, he said, are used to processes with consistent steps and consistent results that can be reviewed and tested; AI, however, can produce wildly inconsistent outputs even from the same prompt, making it difficult to test.
This does not mean auditors are helpless, but that they need to adjust their approach. Hittner said an auditor will likely need to consider the impact of AI on the entity and its internal controls over financial reporting; assess the impact of AI on risk assessment procedures; consider an entity's use of AI when identifying relevant controls and AI technologies or applications; and assess the impact of AI on audit response.
In order to best assist auditors evaluating AI, management should be able to answer relevant questions when it comes to their AI systems. Hittner said auditors might want to know how the entity assesses the appropriateness of AI for the intended purpose, what governance controls are in place around the use of AI, how the entity measures and monitors AI performance metrics, whether or how often they back-test the AI system, what is the level of human oversight over the model and what approach does the entity take for overriding outputs when necessary.
"Management should really be able to answer these kinds of questions," he said, adding that one of the biggest questions an auditor might ask is "how did the organization get comfortable with the result of what is coming out of this box. Is it a low-risk area with lots of review levels? … How do you measure the risk and how do you measure whether something is acceptable for use or not, and what is your threshold? If it's 100% accurate, that's pretty good, but no back-testing, no understanding of performance would give auditors pause."
He said it's important for organizations to be transparent about their AI use, not just with auditors but stakeholders as well. He said cases are already starting to appear where people were unaware that generative AI was producing the information they were reviewing.
Morgan Dove, a Deloitte senior manager within the AI and algorithmic assurance practice, stressed the importance of human review and oversight of AI systems, as well as documenting how that oversight works for auditors. When should there be human review? Anywhere in the AI lifecycle, according to Dove.
"Even the most powerful AIs can make mistakes, which is why human review is essential for accuracy and reliability," she said. "Depending on use case and model, human review may be incorporated in any stage of the AI lifecycle, starting with data processing and feature selection to development and training, validation and testing, to ongoing use."
But how does one perform this oversight? Dove said data control is a big part of it, as the quality and accuracy of a model hinges on its data stores. Organizations need to verify the quality, completeness, relevance and accuracy of any data they put into an AI, not just the training data, but also what is fed into the AI in its day-to-day functions.
She also said that organizations need to archive the inputs and outputs of their AI models. Without this documentation, it becomes very difficult for auditors to review the system because it allows them to trace the inputs to the outputs to test consistency and reliability. When archiving data, organizations should include details like the name and title of the data set, and its source. They should also document the prompts fed into the system, with timestamps, so they can possibly be linked with related outputs.
Dove added that effective change management is also essential, as even little changes in model behaviors can create large variations in performance and outputs. It is therefore important to document any changes to the model, along with the rationale for the change, the expected impact and the results of testing, all of which support a robust audit trail. She said this should be done regardless of whether the organization is using its own proprietary models or a third party vendor model.
"There are maybe two nuances," she said. "One is vendor solutions are proprietary so that contributes to the black box lack of transparency, and consequently does not provide users with the appropriate visibility … into the testing and how the given model makes decisions. So organizations may need to arrange for additional oversight in outputs made by the AI system in question. The second point is around the integration and adoption of a chosen solution. They need to figure out how they process data from existing systems. They also need to devote necessary resources to train personnel in using the solution and making sure there's controls at the input and output levels as well as pertinent data integration points."
When monitoring an AI, what exactly should people be looking for? Dove said people have already developed many different metrics for AI performance. Some include what's called a SemScore, which measures how similar the meaning of the generated text is to the reference text; BLEU (bilingual evaluation understudy), which measures how many words or phrases in the generated text match the reference text; and ROC-AUC (Receiver Operating Characteristic Area Under the Curve), which measures the overall ability of an AI model to distinguish between positive and negative classes.
Mark Hughes, an audit and assurance consultant with Deloitte, added that humans can also monitor the Character Error Rate, which measures the exact accuracy of an output down to the character (important for processes like calculating the exact dollar amount of an invoice); Word Error Rate, which is similar but does the evaluation at the word level; and the "Levenshtein distance," defined as the number of single-character edits needed to fix an extracted text to see how far away the output is from the ground truth text.
Hittner said that even if an organization is only just experimenting with AI now, it is critical to understand where AI is used, what tools the finance and accounting function have at their disposal to use, and how it will impact the financial statement process.
"Are they just drafting emails, or are they drafting actual parts of the financial statements or management estimates or [are they] replacing a control? All these are questions we have to think about," he said.