Responses to comments on our democratic control of AGI paper
On March 14th I published a co-written proposal on this blog on ‘Securing liberal democratic control of AGI through UK leadership’. In brief, it called for a multilateral effort to ensure advanced AI technologies are developed safely and in a democratically accountable way.
There were people in leadership roles across frontier labs, the AI investment community, and the policy space that contributed, who shared deep concerns about the trajectory we may currently be on, and who all saw a substantial mismatch between private concerns of people in frontier labs, and the public narrative.
We got a lot of extremely helpful and constructive feedback. Jack Clark, a co-founder of AnthropicAI, wrote about it here.
Things have changed substantially just in the past 2 weeks, and there were some points where we were not clear enough on what we intended, so I’m addressing that. This follow up is written by me and is my opinion only.
Image generated by DALLE 2.
Clarifying two points
The meaning of ‘liberal democratic control’
We said ‘Whilst we [the UK] cannot stop AGI development unilaterally, we must ensure we and allied liberal democracies are in a position to control it.’
By ‘liberal democratic control of AGI’, we did not mean this to say that the proposed effort should ensure liberal democratic countries must possess the most advanced AGI possible over non-democratic countries by creating a race. Rather, we meant ‘control’ in the sense of safety and alignment, and that such alignment and safety is set through governance mechanisms that have democratic accountability and oversight. We also meant it in terms of ensuring that democratic institutions have access to the most capable systems private actors have access to, so that individual private actors do not develop greater power than the collective public. It is not difficult to imagine scenarios where a highly advanced AI with ability to write to the internet, create code, and replicate could disturb that balance irreversibly.
The use of the phrase ‘liberal democratic’ may have been a mistake, as ‘liberal democracies’ is often used in the context of competition with China. While China poses challenges, on the matter of AI safety we believe that it will be very important to engage with them (and other non-allies) to solve collective challenges.
One challenge is that the words safety/alignment have to some people acquired the connotation of ‘giving AI particular political opinions’, and has therefore become somewhat politicised as a term. As others have highlighted, we need new words to bring out the different meanings. There may be consensus on some meanings, dispute on others.
Stopping the development of AGI
As quoted above, we said the ‘UK cannot stop AGI development unilaterally’, hinting at the idea should at least be considered in some form. However, the overton window has now shifted significantly due to the Musk/Bengio/Wozniak et al letter, which calls for a pause on training of models more powerful than GPT4.
I signed the letter. It’s imperfect, insufficient, and conflates some issues. But I think it’s an important step to shift the overton window and highlight that deliberate slowing of some, not all, AI progress may be required. I’m relieved and happy the letter exists. My current view is that there is a case that we should never develop ‘agentic’ AI systems above a certain threshold, or at least not until we have very high confidence in alignment. I think that decision should be taken in a democratic manner, though it's not quite clear what form that takes given it may be a very technical and international question.
I’m also sympathetic to many, though not all, aspects of the Yudkowsky article in Time. I encourage people to read it. There are a couple of quotes from it that stand out to me:
"Many researchers working on these systems think that we’re plunging toward a catastrophe, with more of them daring to say it in private than in public; but they think that they can’t unilaterally stop the forward plunge"
“"Some of my friends have recently reported to me that when people outside the AI industry hear about extinction risk from AGI for the first time, their reaction is “maybe we should not build AGI, then. Hearing this gave me a tiny flash of hope”
This aligns precisely with my own experiences of private conversations in recent months, some of which led directly to the proposal we put out.
The exact right path forward is very unclear to me, but the situation that existed barely a month ago, where major concerns about our trajectory with AGI were largely being raised only in private, was unsustainable. I think it’s better to have this debate and discussion earlier rather than later. It takes time for political systems to deal with issues, and the closer you get to trouble, the more panicked the decision making can get.
A potential lesson from COVID….. It's worth considering what might have happened if, in January 2020, senior figures had written an open letter saying ‘no, it’s not wrong or panicky to take COVID extremely seriously, it's not racist to close the borders, here are some things we should do right now‘, even if some of those things were imperfect and uncertain? I know several senior people had this view. I view the ‘AI pause’ letter in the light cast by that hypothetical. I wasn’t in government until April 2020, but I deeply regret not flagging my concerns to people I knew who were in government, even if I didn’t have perfect solutions to offer. Not being seen to panic must not prevent us from being honest.
I’ve heard some very intelligent, and also very complex, nuanced, and precise arguments for why the ‘AI pause’ letter is too soon, and that explain what should instead be done somewhat later. In my experience, people who have not worked inside political structures intuitively overestimate the bandwidth, rationality, nuance, and complexity of arguments that can be processed by our political and governance systems. This is despite the evidence we see on our TV screens, and despite the same people, when asked, saying ‘yes of course politics is irrational’. I made this mistake too until I had worked inside Number Ten during COVID19, and still have to consciously correct for it. Relatedly, there is a tendency to assume that somewhere behind the closed doors of government buildings, there is a group who are on top of these issues and will make sure all is ok when the time is right. In my experience this is often not the case at least until concerns are raised externally. So you have to start the conversations as early as possible, acting too soon not too late, and know that it will take a lot of effort to steer to a good outcome.
There has been a lot of discussion behind the scenes in various very thoughtful, sincere, and publicly minded communities about AGI risks over the past decade. But I think it's also hard to argue that it has to date had a major effect on policy, or that had that thinking not happened, we would be in a worse position than the one we are in today. Hence so many previously established ‘guard rails’ being blown past with little comment. This suggests to me a need to rethink approaches.
None of this is to say there are not truly amazing potential benefits to AI development, or that the concerns about AGI alignment difficult may prove to be misplaced which is very possible. Rather, it is that the current incentives to development do not align with mitigating the serious risks in a responsible way, and that the potential downsides are so high that caution is warranted.
Objections to the proposal
We received many very thoughtful responses to the proposal, which improved our thinking and made us reflect, and for which we are thankful. Here are three common ones.
Objection 1: This will exacerbate a race
The idea of a ‘race to the bottom’ in developing advanced AI systems is a serious one, and was the motivator of the piece.
An objection to our proposal for a multilateral effort is that such an effort could lead to a race between countries, with nation state level resources exceeding even that being used by tech giant funding today, especially if it develops between political systems locked in competition.
It's not impossible that concerned people taking steps toward such a multilateral effort would end up unintentionally in this trajectory. Certainly there are figures across many countries who would see the allure of international power through AGI and try to seize it, even if well meaning in trying to defend political systems they believe in.
Whilst this is true, my view is there is no better option, or more principled option, than trusting democratic processes and trying to win the public argument. Democracy, to paraphrase Churchill, is by far the worst system of government. Apart from all the others.
The reality is that a race is already underway, occurring in an unaccountable way. The authors of our proposal believe that a multilateral approach focussed on safety is most likely to defuse this race, before other powerful actors take more risky action. As noted above, there is a strong need to coordinate with China, though I won’t go into what that might look like right now. Again, an AGI race to the bottom with China is likely to be a disaster. Both ‘sides’ in the geopolitical competition have a strong incentive to avoid AGI induced chaos. The Chinese system highly values alignment of human views to communist party wishes: even in that sense an unaligned, out of control AGI is very out of step with their priorities. In the near term, arguably the greater concern is a proliferation of private western companies.
For our proposal, whether such a multilateral effort explicitly bans progress toward powerful agentic systems is a debate that needs to be had. Both the Musk/Bengio/Wozniak et al piece and the Yudkowsky piece highlight that progress in narrow AI, like AlphaFold for medicine discovery, is different to powerful agentic systems, and that distinction will be important.
Objection 2: It’s better that a small number of private labs develop it safely than having widespread adoption at nation state level
An argument we have heard several times is that, given the track record of political systems dealing with major technological challenges, and the risks of them viewing advanced systems as a way to increase their power, it would be better to trust the small number of private tech leaders to do this sensibly. Universally the people who suggest this seem to me very sincerely deeply worried about the situation we are in, and motivated by trying to avert a disaster.
But there are also serious reasons to doubt this approach.
It’s true that currently the most advanced labs have a substantial lead over other actors, at least in highly capable LLMs. However, soon advanced and potentially dangerous models will be far more broadly available. The new NVIDIA H100 GPUs will bring the cost of training GPT4 level technology within the range of many early stage start-ups, as they reduce training cost by around a factor of 5-10 times. This, combined with hardware overhang, suggests that in the next 2-3 years technology more advanced than GPT4 will be widespread, with a much richer diversity of algorithmic approaches available and testable. The continual lesson of recent years is that progress is unpredictable and often dramatic - we just don’t know what might arise. AGI progress might accelerate further, or halt for a while. We just don’t know. LLM’s alone are probably not a path to AGI, but we also have no idea how much or how little needs to be added to them to make them very dangerous.
As well as this proliferation of technology, there are also worrying signs that technologies are being pushed out in a race. The decision to connect GPT4 to third party APIs barely a few days after release surprised me, to put it mildly. Again, we do not know how much needs to be added to them, potentially in the form of a third party API, to make them or a subsequent GPT, more dangerous. Right now, it appears that GPT4 is relatively safe technology with many benefits, but it really isn’t clear when that situation changes, which could be sudden leading to corrective action coming too late.
So ‘let's do nothing and it will stay safely in a few labs’ isn’t the trajectory we are on.
Even if it were, entrusting such a powerful technology to a small number of unelected people, however virtuous they may be, does not seem sensible long term. It introduces a small number of points of failure, subject to the flaws of human nature. It seems to reduce to a small number of people doing things against their own narrow interests with the sole incentive of ‘doing the right thing’, which we have seen from the FTX scandal can be catastrophic. There are recent public suggestions by Sam Altman of government representatives being placed into OpenAI which should be welcomed and adopted.
We can address both of these concerns: identifying supervision and limits on private sector development and a push toward a strongly safety oriented multilateral effort are not incompatible and could even be done simultaneously, and steps should be taken to explore the advantages and disadvantages of both starting now.
Objection 3: Framing around AGI X-risk
Several people agreed with the proposal, but objected to beginning with AGI X-risk. I still don’t fully understand the objection, but I’ll just highlight that even people like Hinton are now beginning to openly say it is plausible that AGI could destroy humanity.
https://twitter.com/JMannhart/status/1641764742137016320
Most people senior in frontier labs have also publicly acknowledged that AGI could be extremely dangerous. In private many are much more concerned.