(Originally posted on LessWrong)
The following is written based on what was shared in the CoEm announcement previously posted here and also discussed in depth by Connor in a recent YouTube interview with the Future of Life Institute.
What is a CoEm system anyway?
The current paradigm of using LLMs (and increasingly sophisticated monolithic systems that move beyond language in the sense of written human language, including code and images) rely on black boxes and Magic.
We fundamentally do not know how these things work, what the bounds of capabilities are, what the effects of “nudges” such as RLHF are beyond the specific cases being trained on, or how capabilities will change with larger models or more data. I’m talking here about qualitative capabilities such as chain of thought reasoning appearing at certain complexity, rather than predicting loss by predetermined training objectives. As such, we currently do not (and possibly never will) have any way of proving that these systems are safe.
On the flip side, we have at least an existence proof of a form of intelligence sufficiently advanced to do useful things (such as conducting research) but with for the most part limited motivation to act maliciously. When such tendencies do exist, we have many existing barriers in place that limit individuals with this form of intelligence from doing significant harm at scale.
This existence proof of course is us, humans, and importantly the existing societal structures that are set up to deal with individuals of human-level intelligence and capabilities. It is no easier to prove that an individual human is “safe” or “aligned” than it is to do the same about a large black box AI model, but we do have a much better empirical understanding of the way in which humans can pose threats and the bounds within which their abilities lie.
The case then for CoEms is to build systems that do not only possess “human level” intelligence but more specifically “human-like” intelligence – machines that think not just “as well” as us but in the same way that we do.
While this is clearly doable in principle (again, we exist), it’s not necessarily obvious that the creation of such a system is feasible or possible to complete within the timeframes determined by the race to create human-level (or above) intelligence accomplished through black box methods. As with any form of research, there are no guaranteed results, but I look forward to seeing what comes of this and what we may learn about both human and machine intelligence as a result.
How would one go about building this?
Conjecture has not said much about how one would go about building a system like this beyond the discussion of separating out smaller (dumber/safer) modules with clearly defined forms of communication between them that can be inspected and understood. In general, I can think of a number of ways in which one could limit capability and the amount of Magic in this way, including limiting resources (memory, computational, communication), using understandable rules-based protocols for parts of a system instead of black box LLMs for everything, training specialised models employed for particular tasks instead of ever larger “generalist” models, etc.
A practical example could be to have an LLM core strategy/coordinator model intentionally not trained on code, that instead requests that software components are built according to an English language human-readable specification that could be built by either a human or another specialised model.
Many have noted that it is often much easier to validate the correctness of a proof than to create one in the first place, and this form of separation in general would give more opportunities to inspect and verify the inner workings of an overall process.
Would the existence of such a system be safe?
As mentioned above, I see this as an interesting topic of research and hope that it will lead to tangential insights into intelligence more broadly. I do wonder however why a system that is artificially constrained to be at human-level intelligence would be any safer than a black box model. In a way, this would appear to be not too dissimilar to projects like AutoGPT, HuggingGPT, etc. While the goals may be very different, these are all systems that today are cute, but tomorrow would offer a push-button deployment solution for multiplying the capabilities of a future, more capable system.
Based on the recent interview with Connor, he openly acknowledges these challenges. The intended use cases for a system like this include “doing science” and “running a company” (1:08:25), but it is not clear how an open-ended task like “running a company” could be done safely with this system (or any more safely than using a black box system). Later Connor caveats the safety properties of this hypothetical system with “[…] if it is used by a responsible user, who follows the exact protocol of how you should use it […] and does use it to do extremely crazy things, then it doesn’t kill you. That’s the safety protocol I’m looking for” (1:09:00)
Connor also directly addresses the question of what happens if you were to tweak the artificial constraints (presumably on parameters that relate to resources, communication bandwidth, etc as discussed above). “If you made a CoEm superhumanly intelligent, which I expect to be straightforwardly possible by just. like, changing variables, then you’re screwed, then your story won’t work, and then you die”. (1:19:10)
Based on the above, the vision of a CoEm system now looks more like an all-powerful genie, wearing handcuffs, tucked away in a bottle and kept in a vault guarded by “the good guys”, rather than your friendly neighbourhood AI.
How would such a system be used?
Later in the interview, Connor goes into a bit more detail on the practical implementation/usage of this system: “Let’s also assume furthermore that we keep these systems secure and safe and they don’t get immediately stolen by every unscrupulous actor in the world” […] “one of the first things that I would do with a CoEm system if I had one is I would produce massive amounts of economic value and trade with everybody”
“I would trade with everybody, I would be like – look – whatever you want in life, I will get it to you, in return, don’t build AGI.” (1:34:40)
It would appear that this system if it were to exist, would be tightly controlled and locked down by Conjecture or rather as discussed later on: “We’ll fuse with one or more national governments […] work closely together with authority figures, politicians, military intelligence services, etc to keep things secure”.
The question here is why a securely locked down system, even if implemented in such a way that it is safe, would be different from other black box AIs. They will presumably look very much the same from the outside.
Summary
I do find the ideas around a CoEm system interesting and worth exploring. However, I am less comfortable with the plan stated by Conjecture for how such a system would be developed, implemented, and used. It would appear that all of the same market dynamics that encourage a race to AGI with the existing methods would apply here and that Conjecture now positions itself as “the good guys with the good AI” in much the same way that OpenAI and others have already. It would appear that this system would have to be kept locked away to be deemed safe, so any safety properties about the system, even if they do exist, would have to be taken at face value – again much as with black box models today.
Even assuming no leaks, the mere existence of a model with a new architecture would likely be sufficient to motivate many other actors to replicate the results, though without necessarily preserving the same safety properties or caution.
In short, is this a proposal for the creation of a safer form of human-level intelligence AI, or just another horse entering the race?