Compliance Monitoring as an Impactful Mechanism of AI Safety Policy

CAISID

Important pre-read notes & definitions:

Some of this post is anecdotal as someone who works in this industry. Further research is needed to gather concrete quantitative evidence.

Some examples have been left intentionally vague due to NDAs or risks of litigation. Happy to answer some queries with more details in inbox.

In a rush? See tl;dr bullet list “Key Points” and “Key Takeaways” at the end of the post.

“AI Safety” is used in this post with its public, not EA-specific, definition.

“High-Risk AI” is defined here as an AI system developed or used in a way which is capable of causing harm to large numbers of people or property if used incorrectly. Examples include surveillance, justice, military, space, or healthcare. Though x-risk obviously applies here, it is not the focus of this post.

Introduction

In industries where mistakes can cost lives or cause large-scale damage to people or the environment (think nuclear power or air travel), all parties involved must not only obey very strict rules but collect large amounts of evidence about how exactly they obeyed the rules. This occurs at every stage of a process. These rules can come from laws, regulations, standards, and various other sources of varying strictness. The end result of this is much safer, more rigorously vetted systems and clusters of systems. This is called Compliance Monitoring (or just Compliance). If Law is the ‘what’, then Compliance is the ‘how’.

It also enables companies to demonstrate rule-following (compliance) if asked - for example during an audit or a court case. Having detailed records is very useful.

Despite this, the policy landscape in AI is such that compliance monitoring is still haphazard and is rarely undertaken in a serious manner by organisations. Some organisations have high-risk models, but as they are not in highly regulated industries they often lack the requirement and staffing to create proper processes. Some sell high-risk AIs into other industries without much regulatory oversight.

Compliance monitoring is also not an area subject to much research, either within EA or without. This post hopes to explore this in a simple, jargon-free manner, and suggest further useful research.

Compliance Monitoring in Highly Regulated Industries

Different industries require different levels of compliance monitoring. Even within those industries, the level expected can differ. Take the food industry for a simple example.

A local cafe will be expected to collect and keep fewer pieces of evidence than a large frozen food factory. There is no set level by size, but it scales to level of risk and level of oversight. The cafe will keep records of accidents, deliveries, expiry dates and records of hygiene inspections. The factory however will have more equipment, more employees, more regular shipments, more output, and their food ultimately impacts far more customers (and more remotely) than the cafe - so their compliance recordkeeping will most likely be more burdensome. Their customers will also likely be primarily other businesses and are therefore much more likely to have their compliance evidence requested by those they sell to than a cafe is. The factory is however also more likely to have compliance support from specialised staff.

In highly regulated industries, for example the nuclear sector, compliance monitoring is nothing short of a colossal undertaking - often with entire teams of specialists working full time on making sure everything is safe and legal. Inspections can come at any time, and the consequences for failing one are tremendous.

There are dozens of licences, inspections, reports, and more required by law. Such sites as nuclear power plants also require vast amounts of supplies and resources, many of which are also controlled by their own regulatory regimes, and therefore there is much to keep track of. Even when buying equipment in, that external supplier needs to provide their own compliance evidence as part of that supply chain, which is then checked by the plant’s compliance team. With major projects such as build space shuttles, debates between compliance teams can span years before a project even starts. Fortunately, the nuclear industry also attracts many highly specialised people who are experts in such compliance. The industry can also afford these people.

This is an important note, because the same is not the case when we consider AI.

Compliance Monitoring in AI

I have often found compliance monitoring in AI to have a very significant range of quality. I have spent a lot of time with organisations undertaking high-risk utilisation of AI, and part of that is meeting and talking with those in charge of safety. This may differ from the EA definition of AI Safety, but it's generally the person in charge of making sure the AI system is legal and safe to own and operate.

There is a vast range of how seriously these organisations take these duties. Even within public institutions such as healthcare or policing. Some are really on the ball, some are the legal compliance equivalent of a child's lemonade stand. Most are somewhere in between.

Amongst those who struggle the most are commercial entities, particularly when they are small or niche institutions. Often their safety or regulatory personnel lack the powers or experience to undertake the role effectively. Frequently they have no-one in charge of this at all.

A major issue is that so many (but not all) of the people in these roles have lots of experience in AI, but little (if any) in a safety regulation position. It's not something you can learn as you go without someone teaching you if the infrastructure isn't already there. AI start-ups rapidly scaling up are particularly prone to this.

Some AI Labs are internal research work only, meaning they don't release any products, which is also the hardest to audit because they aren't required to disclose much in the way of safety until something goes wrong unlike those who sell software to outside organisations who will be required to do some compliance tasks. The less an organisation interacts with the outside world, the less of a compliance infrastructure is likely there.

There are exceptions to this. Sometimes big companies can go a long time without this becoming an issue - until it does. One high-risk AI system used by over 150 cities in the US was found, somewhat accidentally during a court case, to have had its accuracy rating developed by its marketing department, not its engineers. As a result of this its highly touted accuracy rating had no scientific basis at all. This highly questionable process had not been detected by any of those 150 cities or hundreds of police forces who used it. No-one had checked that this system was as accurate as it stated.

Presumably, not one of those end customers had ever undertaken quality or compliance monitoring on how this AI was developed, trained, or deployed before they bought it and integrated it into their own systems.

This is why you see the safest AI systems in those selling such systems to regulated bodies. The AI systems used in, to use our earlier example, nuclear power are extensively vetted by both the makers and buyers from start to finish. I will caveat this by saying they are by no means perfect.

Why this is important

Compliance infrastructure is important for three reasons:

A lack of compliance infrastructure makes it very hard for organisations to understand their risk areas (and therefore increases risk)
A lack of compliance infrastructure makes it very hard for independent experts or regulators to check if an AI system is being developed safely, and prevents injured parties from being to easily take legal action
A lack of compliance infrastructure makes it very easy for organisations to ignore safety whilst developing or using AI

Whilst some of us consider AI Safety to be focused on AGI/ASI, and others consider AI Safety to include all AI systems in a spectrum, both parties would find such compliance useful in terms of reducing risk.

Challenges & Mitigations

Resources

When implementing compliance procedures with organisations using high-risk AI, the number one difficulty is often resourcing. This includes human resourcing, but that is considered by itself in the next section. Organisations have limited amounts of time and money to spend on things, and there can be significant up-front investiture of both of these things for implementation of compliance monitoring.

Initial implementation can often require 300-500 hours of work, which is a significant time and money sink - especially if the process is required to be implemented quickly. This is worse for organisations who run on a set, externally-allocated budget, such as government departments or ringfenced charities.

This funding is a tremendous barrier to getting started. Fortunately, some initiatives have taken place to assist with this. Some police forces in the UK were given grants to run testing and audit on their systems aided by external agencies such as the National Physical Laboratory and NIST, and were able to produce some useful reports.

Though the report was far from perfect in my opinion, compliance monitoring was rapidly accelerated. This external expertise enabled the organisation to know where they were falling short and how to fix it.

Such a fund to assist set-up of compliance monitoring on-par with highly regulated industries would enable more AI manufacturers, retailers, and researchers to implement such measures - though it is uncertain who would fund this. A potential option is via collected insurance premiums, which is what the Motor Insurers Bureau does to protect those injured by uninsured drivers.

Expertise

Adequate expertise is incredibly hard to find, and even harder to keep. This is because many working in complex areas of compliance require either lots of experience (making them expensive) or a specific mix of STEM and legal background (making them desirable, and therefore expensive). Employees take lots of time to train from scratch due to the complexity of the role.

Though very large organisations can typically offer high salaries and good perks, smaller organisations suffer from this. In particular this impacts healthcare, local councils, policing, and the military (but not military procurement).

Some organisations using high-risk AI who I have worked with have hired an AI specialist on a 6-month temporary contract to set things up and get things running, and found this to be cost-effective. However, it often meant that the non-specialist staff left behind once the consultant left found it difficult to notice when the system was acting strangely, and even if they did notice lacked the expertise to understand why - and certainly were not able to correct the problem.

Even in AI labs full of technical experts, a lack of experience in policy and regulation implementation is a major failure point. This applies even when hiring those from a law background. There is a common joke in the legal field that goes along the lines of:

Law School: This is how it all works in theory
Student: What about in practice?
Law School: In what?

This is because there is often a significant difference between legal theory and the complex socio-legal realities of implementation. It’s a lot like the difference between AI in lab settings and AI in the real-world.

As such, there is a considerable talent bottleneck here.

Safety-Washing

I recently read this post which briefly mentioned safety-washing, and I think it’s a good thing to touch on here.

Organisations can (and do) hire ‘safety’ or ‘compliance’ professionals which are in action largely ceremonial - usually in an effort to appear focused on safety. Unless you are inside those organisations, it is very difficult to tell whether safety professionals have the authority and scope to implement and audit the compliance monitoring, or whether they are merely decorative. There is no quick fix to this, but good mitigation methods are membership of regulatory bodies, high levels of experience, or the presence of stringent external legal safeguards. The less well regulated an industry is, the higher the risk of scapegoating.

The Benefits

The benefit of a comprehensive legislation-focused regulatory compliance monitoring infrastructure is essentially the reduction of risk. This includes not just risk of harm by the AI system itself, but reduces the risk to the company due to demonstrating effort towards compliance. It also enables better marketability, as products with good safety certifications will soon be preferred (and likely eventually legally required) now that AI risk is more in the public and policy conscience.

Consider Bridges v South Wales Police, where the court found in favour of Bridges on some elements not because the AI system was biased, but because a Data Protection Impact Assessment (DPIA) had not been carried out. Put simply, SWP hadn’t made sure it wasn’t biased. A DPIA is a foundation-level document in almost any compliance procedure.

Summary

Compliance Monitoring is amongst the most powerful policy / regulation components in terms of impact, yet is largely neglected by AI Safety research. There is a clear and demonstrated need for this, but it is often neglected due to high initial cost and a bottleneck of expertise.

We (as in the wider AI Safety field) can help mitigate this by:

Publishing best practice use cases
Encouraging compliance training in AI Policy / Governance early career researchers (ideally placements, but this would be difficult to organise in sufficient numbers)
Writing longform, detailed guidance on AI-centric compliance monitoring (I have considered doing this but, ironically, it’s too costly for me right now)
Having the above trialled with a complex, high-risk AI system (this would make a fantastic research project - even if just via stakeholder interviews)
Encouraging AI Governance orgs to research and publish on frontline compliance more often
Promoting more interdisciplinary socio-legal research in AI
Launching an AI Safety journal focused at in-industry safety professionals

Useful future research projects (especially for law or AI governance students):

Undertaking quantitative research on the state of compliance in AI
Undertaking qualitative research with AI stakeholders to assess compliance pain points
Proposing potential compliance changes from upcoming AI law and policy
Publication of a collection of best practice use cases

tl;dr

Key Points

Compliance monitoring is amongst the most powerful policy / regulation components in terms of risk impact, yet is largely neglected by AI Safety research
This is largely because it is a highly complex, specialist regulatory task
Much (effective) compliance monitoring also takes place primarily in complex industries who are not prone to sharing internal documents, which makes it hard for even enthusiastic AI orgs to imitate
More public, less formalised institutions have sub-par compliance methods and monitoring
Changes to compliance monitoring would be resource-intensive to undertake, but is a very possible task with desirable benefits for organisations
The main bottleneck to such a project is the lack of highly specialised skill sets, a lack of external regulatory pressure, and the sheer manpower hours that a first effort would undertake
Once these elements are resolved, given a generous timescale most AI producers (and potentially more importantly purchasers) would be able to modify such monitoring to their own circumstances, given good practice to work from

Key Takeaways

More research is urgently needed in compliance methodologies
This research should be undertaken by industry specialists in conjunction with academia due to the high levels of role experience required
Research in this area should be focused on frontline impact and making that impact more reportable, so that lessons can be learned
More research is also needed on international, cross-industrial, and intergovernmental collaborative projects and how these can inform future compliance (see here for some background)

RemmeltFeb 8 20243

This is an interesting anecdote.
It reminds me of how US medical companies having to go through FDA's premarket approval process for software designed for prespecified uses holds them from launching medical software willy-nilly on the market. If they did release before FDA approval, they are quite likely to see regulatory action (and/or be held liable in court).

CAISIDFeb 8 20242

That's a good regulatory mechanism, and isn't unlike many that exist UK-side for uses intended for security or nuclear application. Surprisingly, there isn't a similar requirement for policing although the above mentioned case has drastically improved the willingness of forces to have such systems adequately (and sometimes publicly) vetted. It certainly increased the seriousness to which AI safety is considered in a few industries.

I'd really like to see a similar system as to the one you just mentioned for AI systems over a certain threshold, or for sale to certain industries. A licensing process would be useful, though obviously faces challenges as AI can and does change over time. This is one of the big weaknesses of a NIST certification, and one I am careful to raise with those seeking regulatory input.

RemmeltFeb 11 20244

Another problem with the NIST approach is an overemphasis on solving for identified risks, rather than precautionary principle (just don’t use scaled tech that could destabilise society at scale), or on preventing and ensuring legal liability for designs that cause situationalised harms.

Effective Altruism Forum
EA Forum