ITSPmagazine

Superalignment - Turtles all the way Down | Cyber Cognition Podcast with Hutch

Episode Summary

In this episode, we will discuss the problem of aligning artificial super-intelligence -- and the recently proposed solution by OpenAI.

Episode Notes

Host: Hutch

On ITSPmagazine 👉 https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/hutch

______________________

Episode Sponsors

Are you interested in sponsoring an ITSPmagazine Channel?

👉 https://www.itspmagazine.com/sponsor-the-itspmagazine-podcast-network

______________________

Episode Introduction

In this episode, we will discuss the problem of aligning artificial super-intelligence -- and the recently proposed solution by OpenAI.

We will begin by discussing the fundamental concepts of artificial super-intelligence and the alignment problem. We will then look at OpenAI's recently proposed solution, the problems associated with this solution, and the benefits of this conversation.

References

https://openai.com/blog/introducing-superalignment

https://www.techrxiv.org/articles/preprint/Administration_of_the_text-based_portions_of_a_general_IQ_test_to_five_different_large_language_models/22645561/1

https://www.vice.com/en/article/epvgem/the-new-gpt-4-ai-gets-top-marks-in-law-medical-exams-openai-claims

______________________

For more podcast stories from Cyber Cognition Podcast with Hutch, visit: https://www.itspmagazine.com/cyber-cognition-podcast

Watch the video podcast version on-demand on YouTube: https://www.youtube.com/playlist?list=PLnYu0psdcllS12r9wDntQNB-ykHQ1UC9U

Episode Transcription

Cyber Cognition

Episode 4: Superalignment - Turtles all the Way Down

Hello everybody and welcome to the fourth episode of the Cyber Cognition podcast. As always, I am your host, Justin Hutchens (AKA Hutch).

And today, we are going to be talking about OpenAI’s recent announcements related to their future efforts on super-alignment. Earlier this month, OpenAI released a blog entitled “Introducing Superalignment” which, at a very high level, acknowledges that superintellience (that is, a digital intelligence which far exceeds the capabilities of human intelligence) will likely be achieved within this decade. And this blog post outlined OpenAI’s strategy for mitigating some of the potential risks of superintelligence, which they describe in the blog as ranging from human disempowerment all the way up to the possibility of human extinction. Keep in mind, this blog is not from an uncredible source. This blog is coming from THE industry leaders. It was written by the very company that created ChatGPT and GPT-4. For reference, I’ve included the link for this blog post in the show notes.

If people want to believe that I am wearing a tin foil hat and constantly spouting conspiracy theories, then that’s fine. But you probably want to start paying attention when THE industry leader that has revolutionized the field of AI publicly announces their contingency planning for the impeding risks of artificial superintelligence. In reviewing this publication, I arrived at two key conclusions – one negative and one positive.

The first, and negative conclusion, is that the solution proposed by OpenAI is (in my opinion) inherently problematic and I don’t think it can adequately solve the problem (more to come on that). But the second, and more positive conclusion, is that despite the problems with the proposed solution, the mere publication of this article will have a profound positive influence on our society’s ability to proactively tackle these risks.

We will dive a little deeper into both of these conclusions, but before we do, let’s first define a few key terms. Specifically, let’s discuss what is artificial super intelligence, and what is the alignment problem.

So first, let’s discuss what artificial super intelligence is? Artificial superintelligence refers to a hypothetical form of artificial intelligence, which possesses capabilities significantly surpassing the abilities of even the smartest and most gifted human minds. In the article published by OpenAI, they address the topic of Artificial Superintelligence as follows:

[quote]

Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction. While superintelligence seems far off now, we believe it could arrive this decade.

[end quote]

To me, the conclusion that artificial superintelligence will arrive this decade does not seem unreasonable. Since the release of GPT-4, countless researchers have published blogs demonstrating how it is capable of easily passing many of the most challenging academic benchmarks out there, to include the LSAT, MCAT, SAT, GRE and many others.

A researcher from Vanderbilt university recently adapted two industry leading IQ tests and applied them to several different large language models to understand how their current levels of intelligence compares to human intelligence. In these tests, GPT-4 already ranks in the top 99th percentile of human performance. If we assume any level of improvement, LLMs will likely surpass human intelligence in the very next iteration. I’ve included links to the GPT-4 academic benchmarks in the show notes. But suffice it to say, we have already arrived at artificial intelligence that is already at genius-level on the human intelligence scale, and is already on par with the most capable minds in human history.

And it is inevitable that we will continue to see further improvement in future iterations. What many people don’t understand, is that there haven’t been any great technological breakthroughs that have allowed the advancements from GPT-2 to GPT-3, or from GPT-3 to GPT-4. The training process has largely remained the same from one iteration to the next. The only difference is that with each iteration, we have just expanded the depth of the neural network and increased the number of parameters (that is, the number of neural nodes and interconnections between them). So whether the next iteration is more powerful than GPT-4 is strictly contingent upon whether or not they continue to add more computational resources and whether they continue to make it bigger. And the answer to that question, is that they absolutely and invariably will. The amount of funding now going into AI, the amount of interest related to AI, and even the very basic principles of competitive capitalism all but ensure that we, as a society, will continue to build bigger and more powerful systems. Even if OpenAI or Google does not, somebody else will. There is zero question that for all future generations – after our own, human intelligence will no longer be the dominant and superior intelligence on this planet. Let that sink in for a minute. For all of documented history, humanity has been the supreme intelligence on this planet. And now, within the next few years, that will no longer be the case. It is hard to imagine a more pivotal point in the history of humanity that this moment right now. Everything is about to change, and we need to get ready.

So now that we have defined what artificial super intelligence is, we should also discuss what the alignment problem is. The alignment problem in artificial intelligence is the challenge of ensuring that AI systems' objectives and behavior align with human values and intentions, to prevent undesired or potentially harmful outcomes. OpenAI’s approach to alignment with its current models is a process called Reinforcement Learning with Human Feedback or “RLHF”. OpenAI modifies the foundational language model with a reinforcement agent which seeks to optimize human satisfaction with its output. To achieve this, the language model will be provided a single input, and will then generate multiple different outputs. The human supervisor will then rank the outputs based on their alignment with human expectations. The problem is, this method of alignment will not work with superintelligence, because humans will no longer be intelligent enough to adequately understand the complex logical connections that the system is making, and will therefore be unable to sufficiently guide its responses. In the “Introducing Superalignment” blog, OpenAI writes:

[quote]

How do we ensure AI systems much smarter than humans follow human intent? Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us, and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

[end quote]

To address the problem of superalignment, OpenAI proposed the following.

[quote]

Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.

[end quote]

For me, there are a couple immediate problems that come to mind, related to this solution. First, if you build a “roughly human-level automated alignment researcher”, then it stands to reason that that alignment AI would, like humans, also lack the cognitive complexity to be able to sufficiently understand and align the output of the superintelligence. If the superintelligence is orders of magnitude more intelligent than humans, then it would also be orders of magnitude more intelligent than a [quote] “roughly human-level automated alignment researcher”.

OpenAI does then state that they could use vast amounts of compute to scale their efforts. This could be interpreted in one of two ways. They could scale their efforts horizontally, as in, having a whole fleet of approximately human-level AI aligners. Or, they could scale their efforts vertically, as in, increasing the intelligence and sophistication of the AI aligner by increasing its own scale – in the same way that we have continuously scaled the GPT models. The first possible interpretation (horizontal scaling) still does not seem to solve the problem. It doesn’t matter how many human-level intelligence AI aligners you make. If not a single one of them is intelligent enough to comprehend the output of the superintelligence, then they will not be able to adequately align it. But the second possible interpretation is even more problematic. If we continue to scale the size and complexity of the aligner to be able to adequately interpret the output of the superintelligence, then we will be thereby creating a second super intelligence. And that begs the obvious question, of who or what will align that second superintelligence with our human values. This solution to the problem is infeasible because it is infinitely regressive. For each superintelligence you create to align the former superintelligence, you will then need to have another superintelligence to align that one – and so on, and so forth. In this way, you are never effectively solving the problem, but only shifting it to another system. In cosmological science, there is a historical anecdote of a flat earth theory, which explains that a flat earth might sit upon the back of a giant turtle. If asked what supports the giant turtle, the theory proposes it is another giant turtle, leading to the claim "it's turtles all the way down". This anecdote is a logical reductio ad absurdum – or a reduction to absurdity. It highlights the absurdity of explanations or solutions that rely on infinite regression. And if we attempt to solve the superintelligence alignment problem with another superintelligence, then our solution just becomes another problem. It really is “turtles all the way down”.

But regardless of all the problems related to the specific solution proposed by OpenAI, the mere publication of this blog still represents a profound step in the right direction. It’s exciting to see that industry leaders like OpenAI are beginning to take these risks seriously. This publication validates that these risks do have legitimacy and that they are very real. Historically, the people who have warned about the existential risks of AI have been lumped together with those who have spouted claims of alien abductions, big foot, and the loch ness monster. Most importantly, this publication functions as a catalyst for further serious public discourse about these topics by removing some of the stigma related to them.

These risks are very real and we cannot continue to ignore them. We ARE going to be the last generation of humans on this planet who live in a world where our species is the dominant intelligence. The world is going to transform radically in the coming years. So rather than taking a passive role and watching this transformation from the sidelines, we need to begin taking a more active role and proactively steering ourselves towards a future that is more positive for both ourselves and our future generations. This blog from OpenAI may not be a final solution, but it should serve as a catalyst for a very important conversation. This is arguably, the most important conversation that we will ever have. It is now up to us, whether we choose to engage in this conversation and take control of our own future, or continue to bury our heads in the sand. If there has ever been a time for us to set aside our differences and come together to plan for a more positive outcome on behalf of all of mankind – that time is now.

And that’s all for today. As always, this is Hutch – broadcasting from the last bastion of the human resistance. Thank you all for listening and we will catch you on the next one. Over and out!

References

https://openai.com/blog/introducing-superalignment

https://www.techrxiv.org/articles/preprint/Administration_of_the_text-based_portions_of_a_general_IQ_test_to_five_different_large_language_models/22645561/1

https://www.vice.com/en/article/epvgem/the-new-gpt-4-ai-gets-top-marks-in-law-medical-exams-openai-claims