Can auditing eliminate bias from algorithms?

For more than a decade, journalists and researchers have been writing about the dangers of relying on algorithms to make weighty decisions: who gets locked up, who gets a job, who gets a loan — even who has priority for COVID-19 vaccines.

Rather than remove bias, one algorithm after another has codified and perpetuated it, as companies have simultaneously continued to more or less shield their algorithms from public scrutiny.

The big question ever since: How do we solve this problem? Lawmakers and researchers have advocated for algorithmic audits, which would dissect and stress-test algorithms to see how they work and whether they’re performing their stated goals or producing biased outcomes. And there is a growing field of private auditing firms that purport to do just that. Increasingly, companies are turning to these firms to review their algorithms, particularly when they’ve faced criticism for biased outcomes, but it’s not clear whether such audits are actually making algorithms less biased — or if they’re simply good PR.

Algorithmic auditing got a lot of press recently when HireVue, a popular hiring software company used by companies like Walmart and Goldman Sachs, faced criticism that the algorithms it used to assess candidates through video interviews were biased.

HireVue called in an auditing firm to help and in January touted the results of the audit in a press release.

The audit found the software’s predictions ‘work as advertised with regard to fairness and bias issues,’ HireVue said in a press release, quoting the auditing firm it hired, O’Neil Risk Consulting & Algorithmic Auditing (ORCAA).

But despite making changes to its process, including eliminating video from its interviews, HireVue was widely accused of using the audit — which looked narrowly at a hiring test for early career candidates, not HireVue’s candidate evaluation process as a whole — as a PR stunt.

Articles in Fast Company, VentureBeat, and MIT Technology Review called out the company for mischaracterizing the audit.

HireVue said it was transparent with the audit by making the report publicly available and added that the press release specified that the audit was only for a specific scenario.

“While HireVue was open to any type of audit, including one that involved looking at our process in general, ORCAA asked to focus on a single use case to enable concrete discussions about the system,” Lindsey Zuloaga, HireVue’s chief data scientist, said in an email. “We worked with ORCAA to choose a representative use case with substantial overlap with the assessments most HireVue candidates go through.”

[Read: How do you build a pet-friendly gadget? We asked experts and animal owners]

But algorithmic auditors were also displeased about HireVue’s public statements on the audit.

“In repurposing [ORCAA’s] very thoughtful analysis into marketing collateral, they’re undermining the legitimacy of the whole field,” Liz O’Sullivan, co-founder of Arthur, an AI explainability and bias monitoring startup, said.

And that is the problem with algorithmic auditing as a tool for eliminating bias: Companies might use them to make real improvements, but they might not. And there are no industry standards or regulations that hold the auditors or the companies that use them to account.

What is algorithmic auditing — how does it work?

Good question — it’s a pretty undefined field. Generally, audits proceed a few different ways: by looking at an algorithm’s code and the data from its results, or by viewing an algorithm’s potential effects through interviews and workshops with employees.

Audits with access to an algorithm’s code allow reviewers to assess whether the algorithm’s training data is biased and create hypothetical scenarios to test effects on different populations.

There are only about 10 to 20 reputable firms offering algorithmic reviews, Rumman Chowdhury, Twitter’s director of machine learning ethics and founder of the algorithmic auditing company Parity, said. Companies may also have their own internal auditing teams that look at algorithms before they’re released to the public.

In 2016, an Obama administration report on algorithmic systems and civil rights encouraged the development of an algorithmic auditing industry. Hiring an auditor still isn’t common practice, though, since companies have no obligation to do so, and according to multiple auditors, companies don’t want the scrutiny or potential legal issues that that scrutiny may raise, especially for products they market.

“Lawyers tell me, ‘If we hire you and find out there’s a problem that we can’t fix, then we have lost plausible deniability and we don’t want to be the next cigarette company,’ ” ORCAA’s founder, Cathy O’Neil, said. “That’s the most common reason I don’t get a job.”

For those that do hire auditors, there are no standards for what an “audit” should entail. Even a proposed New York City law that requires annual audits of hiring algorithms doesn’t spell out how the audits should be conducted. A seal of approval from one auditor could mean much more scrutiny than that from another.

And because audit reports are also almost always bound by nondisclosure agreements, the companies can’t compare each other’s work.

“The big problem is, we’re going to find as this field gets more lucrative, we really need standards for what an audit is,” said Chowdhury. “There are plenty of people out there who are willing to call something an audit, make a nice looking website and call it a day, and rake in cash with no standards.”

And tech companies aren’t always forthcoming, even with the auditors they hire, some auditors say.

“We get this situation where trade secrets are a good enough reason to allow these algorithms to operate obscurely and in the dark, and we can’t have that,” Arthur’s O’Sullivan said.

Auditors have been in scenarios where they don’t have access to the software’s code and so risk violating computer access laws, Inioluwa Deborah Raji, an auditor and a research collaborator at the Algorithmic Justice League, said. Chowdhury said she has declined audits when companies demanded she allows them to review them before public release.

For HireVue’s audit, ORCAA interviewed stakeholders including HireVue employees, customers, job candidates, and algorithmic fairness experts, and identified concerns that the company needed to address, Zuloaga said.

ORCAA’s evaluation didn’t look at the technical details of HireVue’s algorithms — like what data the algorithm was trained on, or its code—though Zuloaga said the company did not limit auditors’ access in any way.

“ORCAA asked for details on these analyses but their approach was focused on addressing how stakeholders are affected by the algorithm,” Zuloaga said.

O’Neil said she could not comment on the HireVue audit.

Many audits are done before products are released, but that’s not to say they won’t run into problems, because algorithms don’t exist in a vacuum. Take, for example, when Microsoft built a chatbot that quickly turned racist once it was exposed to Twitter users.

“Once you’ve put it into the real world, a million things can go wrong, even with the best intentions,” O’Sullivan said. “The framework we would love to get adopted is there’s no such thing as good enough. There are always ways to make things fairer.”

So some prerelease audits will also provide continuous monitoring, though it’s not common. The practice is gaining momentum among banks and health care companies, O’Sullivan said.

O’Sullivan’s monitoring company installs a dashboard that looks for anomalies in algorithms as they are being used in real-time. For instance, it would alert companies months after launch if their algorithms were rejecting more women applicants for loans.

And finally, there’s also a growing body of adversarial audits, largely conducted by researchers and some journalists, which scrutinize algorithms without a company’s consent. Take, for example, Raji and Joy Buolamwini, founder of the Algorithmic Justice League, whose work on Amazon’s Rekognition tool highlighted how the software had racial and gender bias, without the company’s involvement.

Do companies fix their algorithms after an Audit?

There are no guarantee companies will address the issues raised in an audit.

“You can have a quality audit and still not get accountability from the company,” said Raji. “It requires a lot of energy to bridge the gap between getting the audit results and then translating that into accountability.”

Public pressure can at times push companies to address the algorithmic bias in the technology — or audits that weren’t performed at the behest of the tech firm and covered by a nondisclosure agreement.

Raji said the Gender Shades study, which found gender and racial bias in commercial facial recognition tools, named companies like IBM and Microsoft to spark a public conversation around it.

But it can be hard to create buzz around algorithmic accountability, she said.

While bias in facial recognition is relatable — people can see photos and the error rates and understand the consequences of racial and gender bias in the technology — it may be harder to relate to something like bias in interest-rate algorithms.

“It’s a bit sad that we rely so much on public outcry,” Raji said. “If the public doesn’t understand it, there is no fine, there are no legal repercussions. And it makes it very frustrating.”

So what can be done to improve algorithmic auditing?

In 2019, a group of Democratic lawmakers introduced the federal Algorithmic Accountability Act, which would have required companies to audit their algorithms and address any bias issues the audits revealed before they’re put into use.

AI For the People’s founder Mutale Nkonde was part of a team of technologists that helped draft the bill and said it would have created government mandates for companies to both audits and follow through on those audits.

“Much like drug testing, there would have to be some type of agency like the Food and Drug Administration that looked at algorithms,” she said. “If we saw the disparate impact, then that algorithm wouldn’t be released to the market.”

The bill never made it to a vote.

Sen. Ron Wyden, a Democrat from Oregon, said he plans to reintroduce the bill with Sen. Cory Booker (D-NJ) and Rep. Yvette Clarke (D-NY), with updates to the 2019 version. It’s unclear if the bill would set standards for audits, but it would require that companies act on their results.

“I agree that researchers, industry, and the government need to work toward establishing recognized benchmarks for auditing AI, to ensure audits are as impactful as possible,” Wyden said in a statement. “However, the stakes are too high to wait for full academic consensus before Congress begins to take action to protect against bias tainting automated systems. It’s my view we need to work on both tracks.”

This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.