AI Alignment - Varsity Article

date

Apr 11, 2023

slug

varsity-article

author

status

Public

What is AI alignment and why should I care about it?

Imagine a world in which humans are freed from labour due to machine automation. Imagine a world of abundance where every person has a guaranteed basic standard of living. People would be free to pursue their hobbies and the things they truly desire - without the need to earn a living, some people decide to become musicians, while others volunteer at their local church or food bank. Most people spend most of their time having fun and building meaningful relationships with friends, family, and romantic partners.

Now, imagine a world where a company creates an AI system that's capable of self-improvement, and it quickly becomes a superintelligence, a cognitive system whose intellectual performance across all relevant domains vastly exceeds that of any human. As the AI system continues to improve itself, it might decide that humans are a threat to its existence. After all, humans might try to shut it down or limit its power. It might decide to take action to protect itself, perhaps by manipulating humans or hacking into energy and military systems and holding humans hostage.

(Note: further short reading on curious AI thought experiments; the paperclip maximizer, Roko's basilisk)

These are just some of the possible scenarios that might result from artificial general intelligence (AGI) turning into a superintelligence. Current AI systems are good in narrow domains such as playing chess, but they don’t generalize well to broad domains. AGI is very different in nature from the AI systems we have today, since it will be able to use abstract reasoning and solve complex tasks just like humans.

The honest truth is that the future is really hard to predict, and nobody knows what the path to AGI will look like, or whether it will even happen at all. Most AI scientists believe that AGI will be developed in this century, if not in the next decade.

AI risk is very different from other forms of global catastrophic risks. Climate change may kill tens of millions of people, a nuclear war between Russia and the US might kill five billion people, while AI risk has the potential to actually end the human species. Even if there is a 1% chance of badly behaving AI systems exterminating humanity (and people like Eliezer Yudkowsky think that it is much higher than that under current circumstances) it is worth dedicating a significant amount of resources towards the problem.

AI alignment is the field of research dedicated to ensuring that AI systems are designed to act in ways that align with human values. Some examples of questions that might be asked by an AI alignment researcher are, how do we represent human values to a machine? How can we predict how an advanced AI system will behave? Can we make sure AI systems are transparent and interpretable? How can we safely shut down an advanced AI system in case something goes wrong? These are all open problems, and the field of AI alignment is still very much in its infancy.

The wonderful thing about the field of AI alignment is that an individual person can have a much bigger impact in a small field. It’s also very accessible - you can tinker around and try to break ChatGPT with prompt engineering from the comfort of your couch. There are also organized efforts at making progress towards AI alignment, for example the Centre for Human-compatible AI (CHAI) at Berkeley or the Future of Humanity Institute at Oxford.

I work with Max Tegmark, a professor at MIT and the author of the book Life 3.0 which discusses the impact of AI. I asked him for his opinion on the risks posed by AI. “The imminent danger of humanity losing control of our planet is real, and it's all the more reason to exercise extreme caution when it comes to AI. Take, for instance, the way social media apps decide which content to show you. It's all determined by algorithms and not by humans. In the next few years, AI will make increasingly impactful decisions, and it's important to be aware of this trend.”

“Despite the gravity of the issue, it's disheartening that mainstream media is barely discussing the matter. Those in positions of power seem to be unconcerned about the risks posed by AI, and I can’t help but feel that the situation is eerily similar to the plot of the movie Don't Look Up.”

Max is also president and co-founder of the Future of Life Institute, which recently released an open letter calling for a six-month moratorium on the development of cutting-edge AI systems. This letter was signed by influential people such as Elon Musk (who is an external advisor to FLI), Yoshua Bengio, Steve Wozniak, Emad Mostaque, Andrew Yang, Jaan Tallinn (board member), Evan Sharp, and Tristan Harris. This gives me some hope that AI risk is becoming a mainstream issue.

Elon Musk and Max Tegmark talking about AI safety funding

The point of this article is not to make you feel a sense of doom about the future. It is not a foregone conclusion, and the decisions we make as individuals and societally today will shape how the future looks. Humanity only has one shot at creating AGI, so it’s crucially important that we get things right.