Is there a consensus among AI safety researchers that there is no way to safely study an AI agent's behavior within a simulated environment?
It seems to me as if the creation of an adequate AGI Sandbox would be a top priority (if not #1) for AI safety researchers as it would effectively close the feedback loop and allow researchers to take multiple shots at AGI alignment without threat of total annihilation.
What other methods are there that would in principle allow iteration?
If it is true that "a failed AGI attempt could result in unrecoverable loss of human potential within the bounds everything that it can affect", then our options are to A) not fail or B) limit the bounds of everything that it can affect. In this sense any strategy that hopes to allow for iteration is abstractly equivalent to a box/simulation/sandbox whatever you may call it.