Mint Primer: Will decoding AI ‘black boxes’ make them safe?

Artificial intelligence (AI) and generative AI (GenAI) are advancing at an incredible pace, but we are unable to understand how they make decisions, so they are called ‘black boxes’. Researchers say they are now able to peek under the AI hood. Will this make AI models safer?

Why are AI algorithms called ‘black boxes’?

AI algorithms, especially those involving complex machine learning (ML) models like deep neural networks, are modelled on the human brain. They receive inputs and send outputs layer by layer, until the final output. But their internal decision-making processes are complex, opaque, and difficult to interpret and so are called ‘black boxes’. For instance, if a self-driving car hits a pedestrian instead of applying the brakes, it’s challenging to trace the system’s thought process to find out why it made this decision. This has significant implications for trust, transparency, accountability, bias, and fixing errors in such models.

How is this issue being addressed currently?

The process involves improving model transparency, auditing model decisions, and introducing regulatory measures to enforce explainability, along with ongoing research and community collaboration to advance the field of explainable AI (XAI). It is focused on developing methods to make AI more interpretable with the help of researchers, ethicists, legal experts, and specialists. Google, Microsoft, IBM, OpenAI, and credit scoring service Fair Isaac Corp are developing XAI techniques, while governments in the EU, the US, etc. are actively promoting and regulating its ethical and transparent use.

 

What’s the latest development?

Last October, Anthropic, an AI startup, said it was successful in breaking neural networks into parts that humans can understand, by applying a technique called ‘dictionary learning’ to a very small “toy” language model, and decomposing groups of neurons into interpretable features. This May, it scaled the technique to influence the model’s outputs and behaviour.

Will this make AI safer and less scary?

Today, patients and doctors would not know how an AI algorithm has determined the results of an x-ray. Anthropic’s breakthrough will make such processes more transparent. But Anthropic’s identified features are only a small subset of the model’s learned concepts. Finding a full set of features using current techniques would require more computing power and more money than used to train the model in the first place. Besides, understanding the model’s representations doesn’t tell us how it uses them.

 

 

What more can big tech companies do?

As OpenAI, Microsoft, Meta, Amazon, Apple, Anthropic, Nvidia, etc. develop smarter language models, they must also strengthen teams that align AI models with human values. However, some companies have reduced the strength of their ‘Ethical AI’ teams over the past 2-3 years. For example, “superalignment” team members of OpenAI, including co-founder and chief scientist Ilya Sutskever, quit over differences with CEO Sam Altman. Microsoft, though, increased its Responsible AI team from 350 to 400 last year.

Leave a Comment