AI in accounting: Who are the humans in the loop?

By Chris Gaetano July 16, 2024, 11:16 a.m. EDT 14 Min Read

While "move fast and break things" might be OK for tech executives, accounting leaders have done the opposite when it comes to generative AI, with firms and the vendors that serve them eager to demonstrate the high level of oversight they exert to ensure quality and manage risk.

Key to many of these efforts has been the "human in the loop," a phrase that's been repeated over and over whenever accountants talk about generative AI, whether at a conference, webinar or product demo, or even in a news article. Less common, however, has been an explanation of what, exactly, people mean when they say that.

To many accounting leaders, the phrase represents less a specific set of policies and procedures and more a broad ethos of supervision and accountability embedded at all levels of the firm, such as that articulated by Aisha Tahirkheli, managing director of trusted AI at KPMG Dallas.

"Human in the loop for us means the AI assistants aren't left to operate entirely on their own," she said. "Humans are actively involved across the end-to-end AI lifecycle, and continually reviewing and refining and guiding the decisions to ensure accuracy, trust and fairness."

The Big Four firm takes what she called a human-centric approach that begins with the premise that humans play an indispensable role regardless of the AI tools used. Every firm professional is required to take training on KPMG's "Trusted AI" policy, which has 10 ethical pillars such as fairness, privacy, sustainability, explainability and accountability.

These professionals are supported by a Trusted AI Council that provides an extra layer of oversight as well as creates guidelines and policies for the firm. The council also tests the firm's AI models, as well as publishes report cards on each of them for not only performance but adherence to the ethical pillars of the Trusted AI framework. Members remain informed via a dashboard that provides visibility into generative AI inputs and outputs at the firm, which it reviews on a daily basis for improper use. It can also work proactively by reaching out to specific professionals about their AI use. She noted there is a red team group within the firm that conducts security and privacy tests, which also include alignment with the framework.

However, while these bodies play a large role in supervising the use of AI at KPMG, the foundation of the firm's efforts is the everyday professionals who are expected to use the technology responsibly and are held accountable if they do not. Every single professional, said Tahirkheli, is ultimately responsible for the outputs they produce from their AI assistants. If those outputs are applied to any sort of business function, whether internal or external, humans are explicitly required to review it.

"At the end of the day, everyone in our firm serves as a human in the loop when it comes to AI," she said.

RSM's approach

Top 10 Firm RSM takes a similar approach. Sergio de la Fe, the firm's enterprise digital leader, said accountability is at the core of its AI policies. Any AI output is traceable to a specific professional, who takes ownership of whatever the AI produces, for good or for ill.

"Our written policy is that all deliverables are the product of individuals who create them; there is no deliverable that is just created on its own." he said. "What AI does for us is accelerate their ability to create but the review, the verification, of the content for every deliverable must be reviewed and approved by a human. There is human accountability and human review to everything we send out to clients."

So for example, an AI model may come up with recommendations for compliance controls. These recommendations are reviewed first by the direct user, and then by a senior on the project, and then by a manager, director or even partner.

However, this process is a floor, not a ceiling. De la Fe noted there are individual products and specific use cases that will require extra layers of oversight, and possible intervention, from subject matter experts. For instance, he said RSM has certain models that can comb the Tax Code and the firm's own memos to create tax position papers. These papers are not only reviewed by the main user and their supervisors, but also by tax experts, who he said might declare, "This doesn't make sense, because they know the tax law."

Armanino's AI Lab

Top 25 Firm Armanino uses a similar approach, according to Carmel Wynkoop, the partner-in-charge of AI, analytics and automation. She noted that the concept is no different between generative AI and more conventional automations the firm has always used.

"You have an RPA that does bank reconciliations," she said. "It pulls statements down, logs into the ERP systems, reconciles those transactions, creates a journal entry and could post it. But the human in the loop there then looks at what the RPA did and verifies that it did it correctly, so you still got human in the loop oversight over the AI. And you can say the same for any AI process,"

OJ Laos, director of Armanino's AI Lab, concurred, noting that "human in the loop" for his firm ultimately means there is human accountability, and no one is simply taking whatever an AI puts out as gospel. In this respect, he said it's not actually that special to say there is a human in the loop, noting there is not a single firm out there that does not include some level of human accountability in its AI processes. While having a human in the loop is important, he said, it's not really a differentiator.

"Everyone claims, 'Oh, we do human in the loop.' Well, you have to. It is not in any way a special shining star you get, because [while] it depends on the tool, ultimately you have accountability and someone reviewing those items," he said, comparing the term to saying something is "all natural."

Humans play their part not just in overseeing their AI outputs but, through their everyday interactions with the models, improving them over the long term. Laos pointed to the firm's 13-week cashflow analytics tool: It will take tens of thousands of transactions and categorize them as best it can, assigning a confidence score to each one. Humans not only review the outputs but look at how confident the software is that they're accurate. Those that are especially low confidence get extra scrutiny, but if the software got it right, the user leaves the entry as is, which then serves to train the model and get higher confidence scores going forward.

Laos emphasized this oversight and accountability is embedded throughout the whole firm, as everyone is responsible for their own AI outputs.

"It's not just a discrete human," he said. "Any human touching this system is a human in the loop, extending to each and every user, the developer and the trainer. So it's not like 'those five guys in those cubicles over there are the humans.' At the end of the day it's part of everyone's role. It's using the tools to continuously evaluate how we could make it better."

While the conversation was about AI, everyone agreed the rigorous oversight and review of AI is little different than the rigorous oversight and review of their human professionals. People are responsible for the work they produce — and this is the case whether they used AI or their own brain to do it.

"So there is always the same review process we would have for manually generated deliverables or recommendations included in any times we use AI in our processes," said RSM's de la Fe.

Accountability baked into the software itself

This is exactly the way that audit and advisory solutions provider Fieldguide encourages professionals to engage with its product. Like others, CEO Jin Chang said "human in the loop" to him effectively means that somewhere there is a human accountable, but he didn't find it to be radically different than telling firms to supervise their professionals too.

"We say, 'Just like how you would review the work of your human practitioners, you should review the work of the AI outputs too.' That is how it feels familiar embedded into the workflow," said Chang.

Josh Tong, Fieldguide's head of product, noted the technology won't even work without human input, and therefore accountability. The software can surface insights and make recommendations, but ultimately it is up to the user to approve or deny them. Otherwise, the program would just hang, waiting for direction.

"We surface the information in the right panel [for] the producer to accept or reject the information, and maybe surface reasons to annotate it, giving it some rationale and score and putting the practitioner in the driver's seat to say, 'This is very accurate, this was accepted by me, the practitioner, and this other one I reject,' so we're trying to reinforce behavior versus blindly automating and sending [outputs] straight back to the client," said Tong.

Thomson Reuters takes a similar human-centric approach when it comes to AI, which is why it too built human oversight and accountability into the core of its solutions, according to Carter Cousineau, vice president of responsible AI and data.

"The AI model life cycle, from the moment we're creating or developing the creation of an AI model and the problem statement all the way through to deployment or decommissioning a specific model … human in the loop is in the development of any AI solution," she said.

Part of this is overseen by the centralized team she is a part of, which is tasked with executing the company's "Responsible AI" and data practices across the entire organization. This involves developing and implementing policies and standards requirements for not just product teams, but anyone who is using AI either internally or externally, such as marketing or HR. For instance, she said Thomson Reuters spent time looking at specific ethical challenges that come with AI, such as bias or privacy, and then worked with engineering and product teams to ensure appropriate mitigation plans were in place.

Outside centralized oversight, though, she pointed to the decentralized efforts undertaken by experts across the company, particularly where it concerns evaluating and testing these models. These experts include attorneys, CPAs and chartered accountants, MBAs and even PhDs in a range of fields such as taxation.

"And they have significant practical experience earned prior to coming to TR and then with many of our law firms and accounting firms or corporate and tax departments and government in academia," she said. "We combine those together in terms of the building, crafting, of ensuring the design, development and deployment of solutions have human in the loop involved and our content works with generative AI to reflect that deep expertise and diversity of experience to build the solutions in that front."

Out of the loop?

While human accountability is important, though, Enrico Palmerino, founder and CEO of accounting automation solutions provider Botkeeper, cautioned against idealizing humans or thinking that, just because there is a "human in the loop," the AI is completely trustworthy. Because while such an arrangement might guard against AI error, by definition it is still vulnerable to human error, something he said many companies have already observed.

"The problems companies tended to see is inherently people are lazy and will always take the path of least resistance. So if the AI is asking me something, it might take me five, 10, 15 minutes to correct it. So, more often than not, as a consumer, I'm just going to ignore it. I'm just going to let it go," he said.

David Wood, a Brigham Young University accounting professor, expressed similar views. While "at some point a human needs to take responsibility and own whatever is the final output," that output may not necessarily be trustworthy.

"Most people are worried about AI coming up with some crazy thing … The challenge is often, as humans, when we get a routine task, we just click and click and click and click and don't think. So it's not like humans in the loop will solve everything, especially if humans get into this mechanical [mindset of] 'Yup, this looks good,'" he said. "You see [that] model performance can regress."

Beyond even just mistakes on a given accounting problem, human error can also serve to degrade the model overall, as the way they interact with a model can serve to train its human behaviors. While some, such as Armanino's Laos, noted these everyday interactions can improve AI, Palmerino thinks it can go the other way as well.

"[A human] is also going to potentially take action or do behaviors that reinforce or teach the AI bad things. Like, by not editing or correcting, or by putting in shorthand typing, I am teaching the AI to do shorthand things in improper ways. Or if I'm not responding to a client that is attempting to book an appointment with me, I'm showing the AI it takes multiple days," he said, comparing AIs to small children who learn from adults for both good and ill.

Cousineau from Thomson Reuters sees this risk as well. Her own company works to mitigate it through constant monitoring and assessment even after deployment, using insights not just from AI specialists but also subject matter experts.

"Once [problems] are flagged through our data impact assessment, we work with our teams to be able to put the appropriate detection and mitigation alerts in place … . In deployment you're making sure the performance and evaluation has not drifted so there's no performance drift or degradation of results," she said.

Still, risks such as this are one reason that Palmerino said Botkeeper has slowly phased out humans in the loop. In his company's case, for many years there were humans involved in the majority of decisions made by their software, but as the solution improved and became more and more accurate, there was less and less need for human oversight.

"When we started, I looked at [users degrading model performance] and said, 'We'll hire a team internally that will review every single recommendation an AI makes, not let the AI make any categorization on its own — any sort of coding, any sort of workflow, any sort of typing or editing or tweaking. Everything the AI wants to do, we treat it as a recommendation and then we'll have a human in the loop for every one of those recommendations," he said. After this, humans review any transaction where there is less than 99% confidence. "We kind of started to step down" gradually until, as of April of this year, "we don't have a 'human in the loop' anymore."

This is not to say that Botkeeper does not have human accountability, nor does it mean the software has the ability to run rampant on its own. For Palmerino, "human in the loop" meant a specific thing: literally having a human involved in the AI's processes, in contrast with others who use the term more as a way to signal humans are generally overseeing AI. Removing this aspect does not negate how Botkeeper is built, with a "laundry list" of various guardrails governing what the AI can and cannot do, so "it can't just go wild." It simply means there are no longer humans evaluating every single thing the software does, trusting that, at this point, the models have been trained well enough.

Wood, the BYU professor, voiced a similar understanding of the term "human in the loop," saying he viewed it as "somewhere embedded in the process there is human review for making significant judgments." Even if a company does not do this specific thing, he said there is still a widespread ethos of accountability for AI outputs. So long as the buck stops somewhere human, this does not necessarily conflict with what he perceived to be the long-term goal of AI, which is to have little to no human involvement in the processes itself.

He noted there is precedent for this, going back to the computer itself. When accounting firms first began using computers, people didn't entirely trust the outputs. For a while, humans recomputed everything to make sure it worked. "AI, I think, is moving in that same direction," said Wood. "As we move more and more, the human will be removed from more processes until they're not needed."

Armanino's Wynkoop made a similar prediction, saying it would likely be a very gradual transition as the technology improves. However, because of the stakes involved, this shift will likely be even more gradual in the business world.

"The future is going to be less human in the loop … . We'll continue to see [AI] evolve, but in the business world we won't see as much movement because a lot of the stuff deals with finances and success of the business," she said. "I think it's a long way off to start thinking about AI not having a human in the loop from a business perspective. [But] we'll see it more on the more personal side for sure."

Chris Gaetano

Technology Editor, Accounting Today