Hi, all! The Machine Intelligence Research Institute (MIRI) is answering questions here tomorrow, October 12 at 10am PDT. You can post questions below in the interim.
MIRI is a Berkeley-based research nonprofit that does basic research on key technical questions related to smarter-than-human artificial intelligence systems. Our research is largely aimed at developing a deeper and more formal understanding of such systems and their safety requirements, so that the research community is better-positioned to design systems that can be aligned with our interests. See here for more background.
Through the end of October, we're running our 2016 fundraiser — our most ambitious funding drive to date. Part of the goal of this AMA is to address questions about our future plans and funding gap, but we're also hoping to get very general questions about AI risk, very specialized questions about our technical work, and everything in between. Some of the biggest news at MIRI since Nate's AMA here last year:
- We developed a new framework for thinking about deductively limited reasoning, logical induction.
- Half of our research team started work on a new machine learning research agenda, distinct from our agent foundations agenda.
- We received a review and a $500k grant from the Open Philanthropy Project.
Likely participants in the AMA include:
- Nate Soares, Executive Director and primary author of the AF research agenda
- Malo Bourgon, Chief Operating Officer
- Rob Bensinger, Research Communications Manager
- Jessica Taylor, Research Fellow and primary author of the ML research agenda
- Tsvi Benson-Tilsen, Research Associate
Nate, Jessica, and Tsvi are also three of the co-authors of the "Logical Induction" paper.
EDIT (10:04am PDT): We're here! Answers on the way!
EDIT (10:55pm PDT): Thanks for all the great questions! That's all for now, though we'll post a few more answers tomorrow to things we didn't get to. If you'd like to support our AI safety work, our fundraiser will be continuing through the end of October.
What makes for an ideal MIRI researcher? How would that differ from being an ideal person who works for DeepMind, or who does research as an academic? Do MIRI employees have special knowledge of the world that most AI researchers (e.g. Hinton, Schmidhuber) don't have? What about the other way around? Is it possible for a MIRI researcher to produce relevant work even if they don't fully understand all approaches to AI?
How does MIRI aim to cover all possible AI systems (those based on symbolic AI, connectionist AI, deep learning, and other AI systems/paradigms?)
The ideal MIRI researcher is someone who’s able to think about thorny philosophical problems and break off parts of them to formalize mathematically. In the case of logical uncertainty, researchers started by thinking about the initially vague problem of reasoning well about uncertain mathematical statements, turned some of these thoughts into formal desiderata and algorithms (producing intermediate possibility and impossibility results), and eventually found a way to satisfy many of these desiderata at once. We’d like to do a lot more of this kind of work in the future.
Probably the main difference between MIRI research and typical AI research is that we focus on problems of the form “if we had capability X, how would we achieve outcome Y?” rather than “how can we build a practical system achieving outcome Y?”. We focus less on computational tractability and more on the philosophical question of how we would build a system to achieve Y in principle, given e.g. unlimited computing resources or access to extremely powerful machine learning systems. I don’t think we have much special knowledge that others don’t have (or vice versa), given that most relevant AI research is public; ... (read more)
Two years ago, I asked why MIRI thought they had a "medium" probability of success and got a lot of good discussion. But now MIRI strategy has changed dramatically. Any updates now on how MIRI defines success, what MIRI thinks their probability of success is, and why MIRI thinks that?
I don’t think of our strategy as having changed much in the last year. For example, in the last AMA I said that the plan was to work on some big open problems (I named 5 here: asymptotically good reasoning under logical uncertainty, identifying the best available decision with respect to a predictive world-model and utility function, performing induction from inside an environment, identifying the referents of goals in realistic world-models, and reasoning about the behavior of smarter reasoners), and that I’d be thrilled if we could make serious progress on any of these problems within 5 years. Scott Garrabrant then promptly developed logical induction, which represents serious progress on two (maybe three) of the big open problems. I consider this to be a good sign of progress, and that set of research priorities remains largely unchanged.
Jessica Taylor is now leading a new research program, and we're splitting our research time between this agenda and our 2014 agenda. I see this as a natural consequence of us bringing on new researchers with their own perspectives on various alignment problems, rather than as a shift in organizational strategy. Eliezer, Benya, and I drafted the... (read more)
What would be your next few hires, if resources allow?
What kind of things, if true, would convince you that MIRI was not worth donating to? What would make you give up on MIRI?
Would you rather prove the friendliness of 100 duck-sized horse AIs or one horse-sized duck AI?
One horse-sized duck AI. For one thing, the duck is the ultimate (route) optimization process: you can ride it on land, sea, or air. For another, capabilities scale very nonlinearly in size; the neigh of even 1000 duck-sized horse AIs does not compare to the quack of a single horse-sized duck AI. Most importantly, if you can safely do something with 100 opposite-sized AIs, you can safely do the same thing with one opposite-sized AI.
In all seriousness though, we don't generally think in terms of "proving the friendliness" of an AI system. When doing research, we might prove that certain proposals have flaws (for example, see (1)) as a way of eliminating bad ideas in the pursuit of good ideas. And given a realistic system, one could likely prove certain high-level statistical features (such as “this component of the system has an error rate that vanishes under thus-and-such assumptions”), though it’s not yet clear how useful those proofs would be. Overall, though, the main challenges in friendly AI seem to be ones of design rather than verification. In other words, the problem is to figure out what properties an aligned system should possess, rather than to figure out how ... (read more)
How should a layman with only college-level mathematical knowledge evaluate the work that MIRI does?
1) What are the main points of disagreement MIRI has with Open Phil's technical advisors about the importance of Agent Foundations research for reducing risks from AI?
2) Is Sam Harris co-authoring a book with Eliezer on AI Safety? If yes, please provide further details.
3) How many hours do full time MIRI staff work in a usual working week?
4) What’s the biggest mistake MIRI made in the past year?
You say that MIRI is attempting to do research that is, on the margin, less likely to be prioritised by the existing AI community. Why, then, are you moving towards work in Machine Learning?
In 2013, MIRI announced it was shifting to do less outreach and more research. How has that shift worked out, and what's the current balance between these two priorities?
A lot of the discourse around AI safety uses terms like "human-friendly" or "human interests". Does MIRI's conception of friendly AI take the interests of non-human sentient beings into consideration as well? Especially troubling to me is Yudkowsky's view on animal consciousness, but I'm not sure how representative his views are of MIRI in general.
(I realize that MIRI's research focuses mainly on alignment theory, not target selection, but I am still concerned about this issue.)
Quoting Nate's supplement from OpenPhil's review of "Proof-producing reflection for HOL" (PPRHOL) :
How far along the way are you towards narrowing these gaps, now that "Logical Induction" is a thing people can talk about? Are there variants of it that narrow these gaps, or are there planned follow-ups to PPRHOL that might improve our models? What kinds of experiments seem valuable for this subgoal?
I endorse Tsvi's comment above. I'll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we're taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)
Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world---and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.
Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns hi... (read more)
Would you like to state any crisp predictions for how your Logical Uncertainty paper will be received, and/or the impact it will have?
I’ll start by stating that, while I have some intuitions about how the paper will be received, I don’t have much experience making crisp forecasts, and so I might be miscalibrated. That said:
Everyone knows who to look out for in the creation of AI, who should we be paying attention to for the solving of the control problem? I know of Elizier, Stewart Russel and the team mentioned above but is there anyone else you would recommend is worth following?
If you find and prove right strategy for FAI creation, how you will implement it? Will you send it to all possible AI creators, or will try to build own AI, or ask government to pass it as law?
Thanks for doing this AMA! Which of the points in your strategy have you seen a need to update on, based on the unexpected progress of having published the "Logical Induction" paper (which I'm currently perusing)?
A question from Topher Halquist, on facebook:
We considered it, but decided against it because supervision doesn’t seem like a key bottleneck on our research progress. Our priority is just to find people who have the right kinds of math/CS intuitions to formalize the mostly-informal problems we’re working on, and I haven’t found that this correlates with seniority. That said, I'm happy to hire senior mathematicians if we find ones who want to work... (read more)
It seems like people in academia tend to avoid mentioning MIRI. Has this changed in magnitude during the past few years, and do you expect it to change any more? Do you think there is a significant number of public intellectuals who believe in MIRI's cause in private while avoiding mention of it in public?
You often mention that MIRI is trying to not be a university department, so you can spend researcher time more strategically and not have the incentive structures of a university. Could you describe the main differences in what your researchers spend their time doing?
Also, I think I've heard the above used as an explanation of why MIRI's work often doesn't fit into standard journal articles at a regular rate. If you do think this, in what way does the research not fit? Are there no journals for it, or are you perhaps more readily throwing less-useful-but-interesting ideas away (or something else)?
I believe that the best and biggest system of morality so far is the legal system. It is an enormous database where the fairest of men have built over the wisdom of their predecessors for a balance between fairness and avoiding chaos; where the bad or obsolete judgements are weed out. It is a system of prioritisation of law which could be encoded one day. I believe that it would be a great tool for addressing corrigibility and value learning. I'm a lawyer and I'm afraid that MIRI may not understand all the potential of the legal system.
Could you tell me w... (read more)
In short: there’s a big difference between building a system that follows the letter of the law (but not the spirit), and a system that follows the intent behind a large body of law. I agree that the legal system is a large corpus of data containing information about human values and how humans currently want their civilization organized. In order to use that corpus, we need to be able to design systems that reliably act as intended, and I’m not sure how the legal corpus helps with that technical problem (aside from providing lots of training data, which I agree is useful).
In colloquial terms, MIRI is more focused on questions like “if we had a big corpus of information about human values, how could we design a system to learn from that corpus how to act as intended”, and less focused on the lack of corpus.
The reason that we have to work on corrigibility ourselves is that we need advanced learning systems to be corrigible before they’ve finished learning how to behave correctly from a large training corpus. In other words, there are lots of different training corpuses and goal systems where, if the system is fully trained and working correctly, we get corrigibility for free; the difficult part is getting the system to behave corrigibly before it’s smart enough to be doing corrigibility for the “right reasons”.
Do you intend to submit Logical Induction to a relevant magazine for peer review and publication? Do you still hold with ~Eliezer2008 that people who currently object that MIRI doesn't participate in the orthodox scientific progress would still object for other reasons, even if you tried to address the lack of peer review?
Also why no /r/IAmA or /r/science AMA? The audience on this site seems limited from the start. Are you trying to target people who are already EAs in specific?
The authors of the "Concrete Problems in AI safety" paper distinguish between misuse risks and accident risks. Do you think in these terms, and how does your roadmap address misuse risk?
If you will get credible evidences that AGI will be created by Google in next 5 years, what will you do?
What does the internal drafting and review process look like at MIRI? Do people separate from the authors of a paper check all the proofs, math, citations, etc.?
What do you think of OpenAI?
In particular, it seems like OpenAI has both managed to attract both substantial technical talent and a number of safety-conscious researchers.
1) It seems that, to at least some degree, you are competing for resources -- particularly talent but also "control of the AI safety narrative". Do you feel competitive with them, or collaborative, or a bit of both? Do you expect both organizations to be relevant for 5+ years or do you expect one to die off? What, if anything, would convince you that it would make sense to mer... (read more)
I sometimes see influential senior staff at MIRI make statements on social media that pertain to controversial moral questions. These statements are not accompanied by disclaimers that they are speaking on behalf of themselves and not their employer. Is it safe to assume that these statements represent the de facto position of the organization?
This seems relevant to your organizational mission since MIRI's goal is essentially to make AI moral, but a donor's notion of what's moral might not correspond with MIRI's position. Forcefully worded statements on... (read more)
I haven't seen much about coherent extrapolated volition published or discussed recently.
Can you give us the official word on the status of the theory?
Do you share Open Phil's view that there is a > 10% chance of transformative AI (defined as in Open Phil's post) in the next 20 years? What signposts would alert you that transformative AI is near?
Relatedly, suppose that transformative AI will happen within about 20 years (not necessarily a self improving AGI). Can you explain how MIRI's research will be relevant in such a near-term scenario (e.g. if it happens by scaling up deep learning methods)?
Question 2: Suppose tomorrow MIRI creates a friendly AGI that can learn a value system, make it consistent with minimal alteration, and extrapolate it in an agreeable way. Whose values would it be taught?
I've heard the idea of averaging all humans' values together and working from there. Given that ISIS is human and that many other humans believe that the existence of extreme physical and emotional suffering is good, I find that idea pretty repellent. Are there alternatives that have been considered?
One thing always puzzle me about provable AI. If we able to prove that AI will do X and only X after unlimitedly many generations of self-improvemnet, it still not clear how to choose right X.
For example we could be sure that paperclip maximizer will still makes clip after billion generations.
So my question is what we are proving about provable AI?
What would you do if you don't find solution to friendliness problem while it would be clear that strong AI is within one year? What is the second best option after trying to develop AI friendliness theory?