Regulating for Bias in AI
You have been working on developing standards for algorithmic bias considerations with the Institute of Electrical and Electronic Engineers (IEEE). What are the main issues you have to consider when developing these standards for algorithmic bias?
Ansgar: There aren’t any existing standards in the whole area of automated decision making, or AI as it is frequently referred to. So this is a space where both industry and governments don’t quite know yet what rules need to be set up or what the best practices are.
One of the big areas of concern is that AI driven services are non-transparent. You don’t know what is happening inside, what are the actual criteria that decisions are being made on. This can have impact on human rights, or any established rights. If we don’t know how the decision is being made, how do we know that the decision is following existing rules, regulations and rights?
So a big area of concern has been around bias and discrimination. A number of investigations have shown that systems already in use are actually violating existing anti-discrimination laws. For instance, systems using Machine Learning just take in big data sets, finding correlations and making decisions based on those correlations. But because the correlations aren’t made visible, and the decision rules aren’t explicitly communicated, algorithms can end up making decisions based on protected criteria that shouldn’t be allowed for decision making.
We do have laws which say you can’t make decisions based on gender differences, and you’re not allowed to discriminate based on racial differences. But algorithms simply find correlations between different aspects that result in decisions based on gender or race. And if the system isn’t transparent, we can’t see that this is happening. This is a big concern.
How do you make the system transparent? What needs to be done?
Ansgar: One of the things we’re focusing on with the IEEE standard on algorithmic bias considerations is actually the process by which the system is being built. So it’s really about ensuring that the people who are involved in creating the system or in deploying the systems make sure that they are aware of the issues. And that they build the systems with a focus on understanding what is actually going on.
To begin with, they need to evaluate the data sets that are being used, that they are not violating anti-discrimination rules. When you’re developing these systems, you need to ensure that you are involving the whole range of stakeholders in the process of development, that you’re not just building a system for you or for the small minority in control of these systems but taking into account the communities that are effected by these technologies.
There need to be processes in place to identify what the actual decision making criteria are. So even if you use ML which is extracting rules without you explicitly imposing them, you can run tests.
You can manipulate your input data in controlled ways to see if and when you have identical inputs that only differ based on the race of the person does this still generate a different output? Is the system in effect being discriminatory based on race?
What we’re focusing on in the standard is to say, you need to provide clear justifications for each step of the process; how did you check your data sets; how did you collect your data sets; how did you go about making the decision criteria;what are your justifications for why this is an appropriate criteria to use for this situation?And if you are using/ repurposing an existing system and deploying it in your space, how did you make sure that it works for your context? Because your input data and stakeholders are different.
How did you check your data sets; how did you collect your data sets; how did you go about making the decision criteria;what are your justifications for why this is an appropriate criteria to use for this situation?
It’s really about making sure that clear justifications are made, documented, and presented in a transparent manner for all stakeholders, especially government regulators, NGOs working on these issues and the public in general. Another issue of concern around regulating these new technologies, is the way the tech industry responds. They are saying, the end user needs to take responsibility in assessing whether these systems, this technology is okay for them or not.
But this is simply unrealistic. You cannot ask every end user to evaluate whether each system, each platform is going to use their data in the right way and make decisions that may or may not be good for them. I cannot really as an individual, understand that if I give a particular piece of data, say, if I allow this app to follow the way I use public transport, how that information could be aggregated with other pieces of my data, and data from other people in order to extract something that could potentially have negative consequences for me. It is not realistic to put the onus on the individual.
Therefore, this kind of information needs to be accessible to professionals – human right groups, investigative journalists and so on. In fact, journalism has been one of the main sources of identifying issues of bias and unfair treatment. This has allowed activists to look into these issues in detail. It’s important that government regulators get access too.
In the UK for instance, with the Cambridge Analytica case, the ICO, the data protection agency didn’t have the powers to access the databases they needed. They couldn’t demand access to the Cambridge Analytica or Facebook databases to be able to check whether the law had been broken. It is critical that government regulators are able to check whether citizens are being protected.
How are governments and regulators dealing with these challenges?
Ansgar: In both the EU and the US, there are concerns that unregulated development and the use of AI technology may results in systems that will violate the rights of citizens. In the EU this led to the establishing of a High Level Expert Group for AI, which produced a set of Ethics Guidelines for Trustworthy Artificial Intelligence.
In the US this has resulted in the proposal for an Algorithmic Accountability Act that was introduced in the Senate with sponsors from both parties. While both initiatives were motivated by similar concerns, their processes differs.
The EU report is not meant to be legally binding, but rather is designed to be an early step in a longer regulatory process towards developing a set of guidelines. These might, if deemed necessary, be followed later by an EU directive or regulation to provide legal enforcement. As such, it outlines core ethical principles with a set of specific recommended actions. The guidelines were co-developed with strong input from industry, and have, in fact, been criticized for having had too much industry involvement and too little involvement from civil society. At this stage of the process, the High Level Expert Group is looking for companies and organizations to pilot-test the recommendations in the development of their AI systems.
By contrast, the US initiative seeks to move directly towards the setting of legal requirements around algorithm and data impact assessments. While the strength behind a legislative act would have more direct impact on the industry, than a set of Ethics Guidelines by an EU High Level Expert Group, the general expectation is that the US Algorithmic Accountability proposal is unlikely to be moved through the legislative system to actually become an Act. Hence, it is unlikely to be enforceable.
Can There Really be an ‘AI for Good’?
There is a lot of hype around ‘AI for Good’ and ‘AI for All’ (link to the initiatives), what are your thoughts on that? Is inclusive AI really possible? If it is, what needs to be done?
Ansgar: The underlying intent behind the AI for good movement obviously is good. But we have concerns I would say, whether its applications are actually effective and also if the way it is being implemented is okay.
For instance, the UN’s approach towards AI for good, links to the Sustainable Development Goals. When you look at how this is actually being done, a lot of it tends to basically be data collection, which is then accessed by big international corporations who promise to provide important services. But then of course, as part of providing the service they’re also growing their markets, especially in developing countries. And they’re also using this data in more ways than one way, to optimize their profits. We have concerns around the safeguards built into these systems.
It’s not necessarily a case that there is bad intent always. There have been discussions around using mapping, similar to google maps and video recording of streets etc., to be able to provide data sets that could be of use after natural disasters. To see whether what were the issues, assess the extent of the damage and identify what need to be fixed. This is good. But the problem is, once this data is collected, it will not stay in that silo and only be used for this particular purpose. The same data-set can be used in different ways, many of which invade our privacy. So the question is really, how do we make sure that the data-set collected for one purpose doesn’t get repurposed for something else which ends up only benefiting the big tech corporations?
What is the way to ensure that data-sets can be available when necessary but actually remain in the hands of the local community, whose data it is?
Ansgar: There are discussions about setting up data commons but how these are going to be controlled, I don’t have enough insight. So far I haven’t seen a clear framework. Within Europe there are some movements, like ‘my data’, which is Scandanavian. It looks at ways to hold data where in the individual gets control of who accesses the data and for what its used. It’s really about not transferring the data into a central data set but instead, having it in an accessible point where you can control access around when and for what the data is being used.
As to what extent it is possible to scale this to the global level? I don’t really know. What we need to explore is an alternative to aggregated central databases. Just because this is how things have been done in the past, it doesn’t mean that this is the only way things need to be done. There are different ways of controlling and using data, without centralized aggregation, especially aggregation that is held by big data monopolies.