Text size

 

Inadequate Equilibria, Chapter 8(?): Hero Licensing

 

I expect most readers to know me either as MIRI's co-founder and the originator of a number of the early research problems in AI alignment, or as the author of Harry Potter and the Methods of Rationality, a popular work of Harry Potter fanfiction. I’ve described how I apply concepts in Inadequate Equilibria to various decisions in my personal life, and some readers may be wondering how I see these tying in to my AI work and my fiction-writing. And I do think these serve as useful case studies in inadequacy, exploitability, and modesty.

As a supplement to Inadequate Equilibria, then, the following is a dialogue that never took place—largely written in 2014, and revised and posted online in 2017.

 


 

i. Outperforming and the outside view

(The year is 2010. ELIEZER-2010 is sitting in a nonexistent park in Redwood City, California, working on his laptop. A PERSON walks up to him.)

 

PERSON:  Pardon me, but are you Eliezer Yudkowsky?

ELIEZER-2010:  I have that dubious honor.

PERSON:  My name is Pat; Pat Modesto. We haven’t met, but I know you from your writing online. What are you doing with your life these days?

ELIEZER-2010:  I’m trying to write a nonfiction book on rationality. The blog posts I wrote on Overcoming Bias—I mean Less Wrong—aren’t very compact or edited, and while they had some impact, it seems like a book on rationality could reach a wider audience and have a greater impact.

PAT:  Sounds like an interesting project! Do you mind if I peek in on your screen and—

ELIEZER:  (shielding the screen)  —Yes, I mind.

PAT:  Sorry. Um... I did catch a glimpse and that didn’t look like a nonfiction book on rationality to me.

ELIEZER:  Yes, well, work on that book was going very slowly, so I decided to try to write something else in my off hours, just to see if my general writing speed was slowing down to molasses or if it was this particular book that was the problem.

PAT:  It looked, in fact, like Harry Potter fanfiction. Like, I’m pretty sure I saw the words “Harry” and “Hermione” in configurations not originally written by J. K. Rowling.

ELIEZER:  Yes, and I currently seem to be writing it very quickly. And it doesn’t seem to use up mental energy the way my regular writing does, either.

 

(A MYSTERIOUS MASKED STRANGER, watching this exchange, sighs wistfully.)

 

ELIEZER:  Now I’ve just got to figure out why my main book-writing project is going so much slower and taking vastly more energy... There are so many books I could write, if I could just write everything as fast as I’m writing this...

PAT:  Excuse me if this is a silly question. I don’t mean to say that Harry Potter fanfiction is bad—in fact I’ve read quite a bit of it myself—but as I understand it, according to your basic philosophy the world is currently on fire and needs to be put out. Now given that this is true, why are you writing Harry Potter fanfiction, rather than doing something else?

ELIEZER:  I am doing something else. I’m writing a nonfiction rationality book. This is just in my off hours.

PAT:  Okay, but I’m asking why you are doing this particular thing in your off hours.

ELIEZER:  Because my life is limited by mental energy far more than by time. I can currently produce this work very cheaply, so I’m producing more of it.

PAT:  What I’m trying to ask is why, even given that you can write Harry Potter fanfiction very cheaply, you are writing Harry Potter fanfiction. Unless it really is true that the only reason is that you need to observe yourself writing quickly in order to understand the way of quick writing, in which case I’d ask what probability you assign to learning that successfully. I’m skeptical that this is really the best way of using your off hours.

ELIEZER:  I’m skeptical that you have correctly understood the concept of “off hours.” There’s a reason they exist, and the reason isn’t just that humans are lazy. I admit that Anna Salamon and Luke Muehlhauser don’t require off hours, but I don’t think they are, technically speaking, “humans.”

 

(The Mysterious Masked Stranger speaks for the first time.)

 

STRANGER:  Excuse me.

ELIEZER:  Who are you?

STRANGER:  No one of consequence.

PAT:  And why are you wearing a mask?

STRANGER:  Well, I’m definitely not a version of Eliezer from 2014 who’s secretly visiting the past, if that’s what you’re thinking.

PAT:  It’s fair to say that’s not what I’m thinking.

STRANGER:  Pat and Eliezer-2010, I think the two of you are having some trouble communicating. The two of you actually disagree much more than you think.

PAT & ELIEZER:  Go on.

STRANGER:  If you ask Eliezer of February 2010 why he’s writing Harry Potter and the Methods of Rationality, he will, indeed, respond in terms of how he expects writing Methods to positively impact his attempt to write The Art of Rationality, his attempt at a nonfiction how-to book. This is because we have—I mean, Eliezer has—a heuristic of planning on the mainline, which means that his primary justification for anything will be phrased in terms of how it positively contributes to a “normal” future timeline, not low-probability side-scenarios.

ELIEZER:  Sure.

PAT:  Wait, isn’t your whole life—

ELIEZER:  No.

STRANGER:  Eliezer-2010 also has a heuristic that might be described as “never try to do anything unless you have a chance of advancing the Pareto frontier of the category.” In other words, if he’s expecting that some other work will be strictly better than his along all dimensions, it won’t occur to Eliezer-2010 that this is something he should spend time on. Eliezer-2010 thinks he has the potential to do things that advance Pareto frontiers, so why would he consider a project that wasn’t trying? So, off-hours or not, Eliezer wouldn’t be working on this story if he thought it would be strictly dominated along every dimension by any other work of fanfiction, or indeed, any other book.

PAT:  Um—

ELIEZER:  I wouldn’t put it in exactly those terms.

STRANGER:  Yes, because when you say things like that out loud, people start saying the word “arrogance” a lot, and you don’t fully understand the reasons. So you’ll cleverly dance around the words and try to avoid that branch of possible conversation.

PAT:  Is that true?

ELIEZER:  It sounds to me like the Masked Stranger is trying to use the Barnum effect—like, most people would acknowledge that as a secret description of themselves if you asked them.

PAT:  ...... I really, really don’t think so.

ELIEZER:  I’d be surprised if it were less than 10% of the population, seriously.

STRANGER:  Eliezer, you’ll have a somewhat better understanding of human status emotions in 4 years. Though you’ll still only go there when you have a point to make that can’t be made any other way, which in turn will be unfortunately often as modest epistemology norms propagate through your community. But anyway, Pat, the fact that Eliezer-2010 has spent any significant amount of time on Harry Potter and the Methods of Rationality indeed lets you infer that Eliezer-2010 thinks Methods has a chance of being outstanding along some key dimension that interests him—of advancing the frontiers of what has ever been done—although he might hesitate to tell you that before he’s actually done it.

ELIEZER:  Okay, yes, that’s true. I’m unhappy with the treatment of supposedly “intelligent” and/or “rational” characters in fiction and I want to see it done right just once, even if I have to write the story myself. I have an explicit thesis about what’s being done wrong and how to do it better, and if this were not the case then the prospect of writing Methods would not interest me as much.

STRANGER:  (aside)  There’s so much civilizational inadequacy in our worldview that we hardly even notice when we invoke it. Not that this is an alarming sign, since, as it happens, we do live in an inadequate civilization.

ELIEZER:  (continuing to Pat)  However, the reason I hold back from saying in advance what Methods might accomplish isn’t just modesty. I’m genuinely unsure that I can make Methods be what I think it can be. I don’t want to promise more than I can deliver. And since one should first plan along the mainline, if investigating the conditions under which I can write quickly weren’t a sufficiently important reason, I wouldn’t be doing this.

STRANGER:  (aside)  I have some doubts about that alleged justification in retrospect, though it wasn’t stupid.

PAT:  Can you say more about how you think your Harry Potter story will have outstandingly “intelligent” characters?

ELIEZER:  I’d rather not? As a matter of literature, I should show, not tell, my thesis. Obviously it’s not that I think that my characters are going to learn fifty-seven languages because they’re super-smart. I think most attempts to create “intelligent characters” focus on surface qualities, like how many languages someone has learned, or they focus on stereotypical surface features the author has seen in other “genius” characters, like a feeling of alienation. If it’s a movie, the character talks with a British accent. It doesn’t seem like most such authors are aware of Vinge’s reasoning for why it should be hard to write a character that is smarter than the author. Like, if you know exactly where an excellent chessplayer would move on a chessboard, you must be at least that good at playing chess yourself, because you could always just make that move. For exactly the same reason, it’s hard to write a character that’s more rational than the author.

I don’t think the concept of “intelligence” or “rationality” that’s being used in typical literature has anything to do with discerning good choices or making good predictions. I don’t think there is a standard literary concept for characters who excel at cognitive optimization, distinct from characters who just win because they have a magic sword in their brains. And I don’t think most authors of “genius” characters respect their supposed geniuses enough to really put themselves in their shoes—to really feel what their inner lives would be like, and think beyond the first cliche that comes to mind. The author still sets themselves above the “genius,” gives the genius some kind of obvious stupidity that lets the author maintain emotional distance...

STRANGER:  (aside)  Most writers have a hard time conceptualizing a character who's genuinely smarter than the author; most futurists have a hard time conceptualizing genuinely smarter-than-human AI; and indeed, people often neglect the hypothesis that particularly smart human beings will have already taken into account all the factors that they consider obvious. But with respect to sufficiently competent individuals making decisions that they can make on their own cognizance—as opposed to any larger bureaucracy or committee, or the collective behavior of a field—it is often appropriate to ask if they might be smarter than you think, or have better justifications than are obvious to you.

PAT:  Okay, but supposing you can write a book with intelligent characters, how does that help save the world, exactly?

ELIEZER:  Why are you focusing on the word “intelligence” instead of “rationality”? But to answer your question, nonfiction writing conveys facts; fiction writing conveys experiences. I’m worried that my previous two years of nonfiction blogging haven’t produced nearly enough transfer of real cognitive skills. The hope is that writing about the inner experience of someone trying to be rational will convey things that I can’t easily convey with nonfiction blog posts.

STRANGER:  (laughs)

ELIEZER:  What is it, Masked Stranger?

STRANGER:  Just... you’re so very modest.

ELIEZER:  You’re saying this to me?

STRANGER:  It’s sort of obvious from where I live now. So very careful not to say what you really hope Harry Potter and the Methods of Rationality will do, because you know people like Pat won’t believe it and can’t be persuaded to believe it.

PAT:  This guy is weird.

ELIEZER:  (shrugging)  A lot of people are.

PAT:  Let’s ignore him. So you’re presently investing a lot of hours—

ELIEZER:  But surprisingly little mental energy.

STRANGER:  Where I come from, we would say that you’re investing surprisingly few spoons.

PAT:  —but still a lot of hours, into crafting a Harry Potter story with, you hope, exceptionally rational characters. Which will cause some of your readers to absorb the experience of being rational. Which you think eventually ends up important to saving the world.

ELIEZER:  Mm, more or less.

PAT:  What do you think the outside view would say about—

ELIEZER:  Actually, I think I’m about out of time for today.  (Starts to close his laptop.)

STRANGER:  Wait. Please stick around. Can you take my word that it’s important?

ELIEZER:  ...all right. I suppose I don’t have very much experience with listening to Masked Strangers, so I’ll try that and see what happens.

PAT:  What did I say wrong?

STRANGER:  You said that the conversation would never go anywhere helpful.

ELIEZER:  I wouldn’t go that far. It’s true that in my experience, though, people who use the phrase “outside view” usually don’t offer advice that I think is true, and the conversations take up a lot of mental energy—spoons, you called them? But since I’m taking the Masked Stranger’s word on things and trying to continue, fine. What do you think the outside view has to say about the Methods of Rationality project?

PAT:  Well, I was just going to ask you to consider what the average story with a rational character in it accomplishes in the way of skill transfer to readers.

ELIEZER:  I’m not trying to write an average story. The whole point is that I think the average story with a “rational” character is screwed up.

PAT:  So you think that your characters will be truly rational. But maybe those authors also think their characters are rational—

ELIEZER:  (in a whisper to the Masked Stranger)  Can I exit this conversation?

STRANGER:  No. Seriously, it’s important.

ELIEZER:  Fine. Pat, your presumption is wrong. These hypothetical authors making a huge effort to craft rational characters don’t actually exist. They don’t realize that it should take an effort to craft rational characters; they’re just regurgitating cliches about Straw Vulcans with very little self-perceived mental effort.

STRANGER:  Or as I would phrase it: This is not one of the places where our civilization puts in enough effort that we should expect adequacy.

PAT:  Look, I don’t dispute that you can probably write characters more rational than those of the average author; I just think it’s important to remember, on each occasion, that being wrong feels just like being right.

STRANGER:  Eliezer, please tell him what you actually think of that remark.

ELIEZER:  You do not remember on each occasion that “being wrong feels just like being right.” You remember it on highly selective occasions where you are motivated to be skeptical of someone else. This feels just like remembering it on every relevant occasion, since, after all, every time you felt like you ought to think of it, you did. You just used a fully general counterargument, and the problem with arguments like that is that they provide no Bayesian discrimination between occasions where we are wrong and occasions where we are right. Like “but I have faith,” “being wrong feels just like being right” is as easy to say on occasions when someone is right as on occasions when they are wrong.

STRANGER:  There is a stage of cognitive practice where people should meditate on how the map is not the territory, especially if it’s never before occurred to them that what feels like the universe of their immersion is actually their brain’s reconstructed map of the true universe. It’s just that Eliezer went through that phase while reading S. I. Hayakawa’s Language in Thought and Action at age eleven or so. Once that lesson is fully absorbed internally, invoking the map-territory distinction as a push against ideas you don’t like is (fully general) motivated skepticism.

PAT:  Leaving that aside, there’s this research showing that there’s a very useful technique called “reference class forecasting”—

ELIEZER:  I am aware of this.

PAT:  And I’m wondering what reference class forecasting would say about your attempt to do good in the world via writing Harry Potter fanfiction.

ELIEZER:  (to the Masked Stranger)  Please can I run away?

STRANGER:  No.

ELIEZER:  (sighing)  Okay, to take the question seriously as more than generic skepticism: If I think of the books which I regard as having well-done rational characters, their track record isn’t bad. A. E. van Vogt’s The World of Null-A was an inspiration to me as a kid. Null-A didn’t just teach me the phrase “the map is not the territory”; it was where I got the idea that people employing rationality techniques ought to be awesome and if they weren’t awesome that meant they were doing something wrong. There are a heck of a lot of scientists and engineers out there who were inspired by reading one of Robert A. Heinlein’s hymns in praise of science and engineering—yes, I know Heinlein had problems, but the fact remains.

STRANGER:  I wonder what smart kids who grew up reading Harry Potter and the Methods of Rationality as twelve-year-olds will be like as adults...

PAT:  But surely van Vogt’s Null-A books are an exceptional case of books with rationalist characters. My first question is, what reason do you have to believe you can do that? And my second question is, even given that you write a rational character as inspiring as a character in a Heinlein novel, how much impact do you think one character like that has on an average reader, and how many people do you think will read your Harry Potter fanfiction in the best case?

ELIEZER:  To be honest, it feels to me like you’re asking the wrong questions. Like, it would never occur to me to ask any of the questions you’re asking now, in the course of setting out to write Methods.

STRANGER:  (aside)  That’s true, by the way. None of these questions ever crossed my mind in the original timeline. I’m only asking them now because I’m writing the character of Pat Modesto. A voice like Pat Modesto is not a productive voice to have inside your head, in my opinion, so I don’t spontaneously wonder what he would say.

ELIEZER:  To produce the best novel I can, it makes sense for me to ask what other authors were doing wrong with their rational characters, and what A. E. van Vogt was doing right. I don’t see how it makes sense for me to be nervous about whether I can do better than A. E. van Vogt, who had no better source to work with than Alfred Korzybski, decades before Daniel Kahneman was born. I mean, to be honest about what I’m really thinking: So far as I’m concerned, I’m already walking outside whatever so-called reference class you’re inevitably going to put me in—

PAT:  What?! What the heck does it mean to “walk outside” a reference class?

ELIEZER:  —which doesn’t guarantee that I’ll succeed, because being outside of a reference class isn’t the same as being better than it. It means that I don’t draw conclusions from the reference class to myself. It means that I try, and see what happens.

PAT:  You think you’re just automatically better than every other author who’s ever tried to write rational characters?

ELIEZER:  No! Look, thinking things like that is just not how the inside of my head is organized. There’s just the book I have in my head and the question of whether I can translate that image into reality. My mental world is about the book, not about me.

PAT:  But if the book you have in your head implies that you can do things at a very high percentile level, relative to the average fiction author, then it seems reasonable for me to ask why you already think you occupy that percentile.

STRANGER:  Let me try and push things a bit further. Eliezer-2010, suppose I told you that as of the start of 2014, Methods succeeded to the following level. First, it has roughly half a million words, but you’re not finished writing it—

ELIEZER:  Damn. That’s disappointing. I must have slowed down a lot, and definitely haven’t mastered the secret of whatever speed-writing I’m doing right now. I wonder what went wrong? Actually, why am I hypothetically continuing to write this book instead of giving up?

STRANGER:  Because it’s the most reviewed work of Harry Potter fanfiction out of more than 500,000 stories on fanfiction.net, has organized fandoms in many universities and colleges, has received at least 15,000,000 page views on what is no longer the main referenced site, has been turned by fans into an audiobook via an organized project into which you yourself put zero effort, has been translated by fans into many languages, is famous among the Caltech/MIT crowd, has its own daily-trafficked subreddit with 6,000 subscribers, is often cited as the most famous or the most popular work of Harry Potter fanfiction, is considered by a noticeable fraction of its readers to be literally the best book they have ever read, and on at least one occasion inspired an International Mathematical Olympiad gold medalist to join the alliance and come to multiple math workshops at MIRI.

ELIEZER:  I like this scenario. It is weird, and I like weird. I would derive endless pleasure from inflicting this state of affairs on reality and forcing people to come to terms with it.

STRANGER:  Anyway, what probability would you assign to things going at least that well?

ELIEZER:  Hm... let me think. Obviously this exact scenario is improbable, because conjunctive. But if we partition outcomes according to whether they rank at least this high or better in my utility function, and ask how much probability mass I put into outcomes like that, then I think it’s around 10%. That is, a success like this would come in at around the 90th percentile of my hopes.

PAT:  (incoherent noises)

ELIEZER:  Oh. Oops. I forgot you were there.

PAT:  90th percentile?! You mean you seriously think there’s a 1 in 10 chance that might happen?

ELIEZER:  Ah, um...

STRANGER:  Yes, he does. He wouldn’t have considered it in exactly those words if I hadn’t put it that way—not just because it’s ridiculously specific, but because Eliezer Yudkowsky doesn’t think in terms like that in advance of encountering the actual fact. He would consider it a “specific fantasy” that was threatening to drain away his emotional energy. But if it did happen, he would afterward say that he had achieved an outcome such that around 10% of his probability mass “would have been” in outcomes like that one or better, though he would worry about being hindsight-biased.

PAT:  I think a reasonable probability for an outcome like that would be more like 0.1%, and even that is being extremely generous!

ELIEZER:  “Outside viewers” sure seem to tell me that a lot whenever I try to do anything interesting. I’m actually kind of surprised to hear you say that, though. I mean, my basic hypothesis for how the “outside view” thing operates is that it’s an expression of incredulity that can be leveled against any target by cherry-picking a reference class that predicts failure. One then builds an inescapable epistemic trap around that reference class by talking about the Dunning-Kruger effect and the dangers of inside-viewing. But trying to write Harry Potter fanfiction, even unusually good Harry Potter fanfiction, should sound to most people like it’s not high-status. I would expect people to react mainly to the part about the IMO gold medalist, even though the base rate for being an IMO gold medalist is higher than the base rate for authoring the most-reviewed Harry Potter fanfiction.

PAT:  Have you ever even tried to write Harry Potter fanfiction before? Do you know any of the standard awards that help publicize the best Harry Potter fan works or any of the standard sites that recommend them? Do you have any idea what the vast majority of the audience for Harry Potter fanfiction wants? I mean, just the fact that you’re publishing on FanFiction.Net is going to turn off a lot of people; the better stories tend to be hosted at ArchiveOfOurOwn.Org or on other, more specialized sites.

ELIEZER:  Oh. I see. You do know about the pre-existing online Harry Potter fanfiction community, and you’re involved in it. You actually have a pre-existing status hierarchy built up in your mind around Harry Potter fanfiction. So when the Masked Stranger talks about Methods becoming the most popular Harry Potter fanfiction ever, you really do hear that as an overreaching status-claim, and you do that thing that makes an arbitrary proposition sound very improbable using the “outside view.”

PAT:  I don’t think the outside view, or reference class forecasting, can make arbitrary events sound very improbable. I think it makes events that won’t actually happen sound very improbable. As for my prior acquaintance with the community—how is that supposed to devalue my opinions? I have domain expertise. I have some actual idea of how many thousands of authors, including some very good authors, are trying to write Harry Potter fanfiction, only one of whom can author the most-reviewed story. And I’ll ask again, did you bother to acquire any idea of how this community actually works? Can you name a single annual award that’s given out in the Harry Potter fanfiction community?

ELIEZER:  Um... not off the top of my head.

PAT:  Have you asked any of the existing top Harry Potter fanfiction authors to review your proposed plot, or your proposed story ideas? Like Nonjon, author of A Black Comedy? Or Sarah1281 or JBern or any of the other authors who have created multiple works widely acknowledged as excellent?

ELIEZER:  I must honestly confess, although I’ve read those authors and liked their stories, that thought never even crossed my mind as a possible action.

PAT:  So you haven’t consulted anyone who knows more about Harry Potter fandom than you do.

ELIEZER:  Nope.

PAT:  You have not written any prior Harry Potter fanfiction—not even a short story.

ELIEZER:  Correct.

PAT:  You have made no previous effort to engage with the existing community of people who read or write Harry Potter fanfiction, or learn about existing gatekeepers on which the success of your story will depend.

ELIEZER:  I’ve read some of the top previous Harry Potter fan works, since I enjoyed reading them. That, of course, is why the story idea popped into my head in the first place.

PAT:  What would you think of somebody who’d read a few popular physics books and wanted to be the world’s greatest physicist?

STRANGER:  (aside)  It appears to me that since the “outside view” as usually invoked is really about status hierarchy, signs of disrespecting the existing hierarchy will tend to provoke stronger reactions, and disrespectful-seeming claims that you can outperform some benchmark will be treated as much larger factors predicting failure than respectful-seeming claims that you can outperform an equivalent benchmark. It seems that physics crackpots feel relevantly analogous here because crackpots aren’t just epistemically misguided—that would be tragicomic, but it wouldn’t evoke the same feelings of contempt or disgust. What distinguishes physics crackpots is that they’re epistemically misguided in ways that disrespect high-status people on an important hierarchy—physicists. This feels like a relevant reference class for understanding other apparent examples of disrespectfully claiming to be high-status, because the evoked feeling is similar even if the phenomena differ in other ways.

ELIEZER:  If you want to be a great physicist, you have to find the true law of physics, which is already out there in the world and not known to you. This isn’t something you can realistically achieve without working alongside other physicists, because you need an extraordinarily specific key to fit into this extraordinarily specific lock. In contrast, there are many possible books that would succeed over all past Harry Potter fanfiction, and you don’t have to build a particle accelerator to figure out which one to write.

STRANGER:  I notice that when you try to estimate the difficulty of becoming the greatest physicist ever, Eliezer, you try to figure out the difficulty of the corresponding cognitive problem. It doesn’t seem to occur to you to focus on the fame.

PAT:  Eliezer, you seem to be deliberately missing the point of what’s wrong with reading a few physics books and then trying to become the world’s greatest physicist. Don’t you see that this error has the same structure as your Harry Potter pipe dream, even if the mistake’s magnitude is greater? That a critic would say the same sort of things to them as I am saying to you? Yes, becoming the world’s greatest physicist is much more difficult. But you’re trying to do this lesser impossible task in your off-hours because you think it will be easy.

ELIEZER:  In the success scenario the Masked Stranger described, I would invest more effort into later chapters because it would have proven to be worth it.

STRANGER:  Hey, Pat? Did you know that Eliezer hasn’t actually read the original Harry Potter books four through six, just watched the movies? And even after the book starts to take off, he still won’t get around to reading them.

PAT:  (incoherent noises)

ELIEZER:  Um... look, I read books one through three when they came out, and later I tried reading book four. The problem was, I’d already read so much Harry Potter fanfiction by then that I was used to thinking of the Potterverse as a place for grown-up stories, and this produced a state change in my brain, so when I tried to read Harry Potter and the Goblet of Fire it didn’t feel right. But I’ve read enough fanfiction based in the Potterverse that I know the universe very well. I can tell you the name of Fleur Delacour’s little sister. In fact, I’ve read an entire novel about Gabrielle Delacour. I just haven’t read all the original books.

STRANGER:  And when that’s not good enough, Eliezer consults the Harry Potter Wikia to learn relevant facts from canon. So you see he has all the knowledge he thinks he needs.

PAT:  (more incoherent noises)

ELIEZER:  ...why did you tell Pat that, Masked Stranger?

STRANGER:  Because Pat will think it’s a tremendously relevant fact for predicting your failure. This illustrates a critical life lesson about the difference between making obeisances toward a field by reading works to demonstrate social respect, and trying to gather key knowledge from a field so you can advance it. The latter is necessary for success; the former is primarily important insofar as public relations with gatekeepers is important. I think that people who aren’t status-blind have a harder time telling the difference.

PAT:  It’s true that I feel a certain sense of indignation—of, indeed, J. K. Rowling and the best existing Harry Potter fanfiction writers being actively disrespected—when you tell me that Eliezer hasn’t read all of the canon books and that he thinks he’ll make up for it by consulting a wiki.

ELIEZER:  Well, if I can try to repair some of the public relations damage: If I thought I could write children’s books as popular as J. K. Rowling’s originals, I would be doing that instead. J. K. Rowling is now a billionaire, plus she taught my little sister to enjoy reading. People who trivialize that as “writing children’s books” obviously have never tried to write anything themselves, let alone children’s books. Writing good children’s literature is hard—which is why Methods is going to be aimed at older readers. Contrary to the model you seem to be forming of me, I have a detailed model of my own limitations as well as my current capabilities, and I know that I am not currently a good enough author to write children’s books.

PAT:  I can imagine a state of affairs where I would estimate someone to have an excellent chance of writing the best Harry Potter fanfiction ever made, even after reading only the first three canon books—say, if Neil Gaiman tried it. (Though Neil Gaiman, I’m damned sure, just would read the original canon books.) Do you think you’re as good as Neil Gaiman?

ELIEZER:  I don’t expect to ever have enough time to invest in writing to become as good as Neil Gaiman.

PAT:  I’ve read your Three Worlds Collide, which I think is your best story, and I’m aware that it was mentioned favorably by a Hugo-award-winning author, Peter Watts. But I don’t think Three Worlds Collide is on the literary level of, say, the fanfiction Always and Always Part 1: Backwards With Purpose. So what feats of writing have you already performed that make you think your project has a 10% chance of becoming the most-reviewed Harry Potter fanfiction in existence?

ELIEZER:  What you’re currently doing is what I call “demanding to see my hero license.” Roughly, I’ve declared my intention to try to do something that’s in excess of what you think matches my current social standing, and you want me to show that I already have enough status to do it.

PAT:  Ad hominem; you haven’t answered my question. I don’t see how, on the knowledge you presently have and on the evidence already available, you can possibly justify giving yourself a 10% probability here. But let me make sure, first, that we’re using the same concepts. Is that “10%” supposed to be an actual well-calibrated probability?

ELIEZER:  Yes, it is. If I interrogate my mind about betting odds, I think I’d take your money at 20:1—like, if you offered me $20 against $1 that the fanfiction wouldn’t succeed—and I’d start feeling nervous about betting the other way at $4 against $1, where you’ll pay out $4 if the fanfiction succeeds in exchange for $1 if it doesn’t. Splitting the difference at somewhere near the geometric mean, we could call that 9:1 odds.

PAT:  And do you think you’re well-calibrated? Like, things you assign 9:1 odds should happen 9 out of 10 times?

ELIEZER:  Yes, I think I could make 10 statements of this difficulty that I assign 90% probability, and be wrong on average about once. I haven’t tested my calibration as extensively as some people in the rationalist community, but the last time I took a CFAR calibration-testing sheet with 10 items on them and tried to put 90% credibility intervals on them, I got exactly one true value outside my interval. Achieving okay calibration, with a bit of study and a bit of practice, is not anywhere near as surprising as outside-view types make it out to be.

STRANGER:  (aside)  Eliezer-2010 doesn’t use PredictionBook as often as Gwern Branwen, doesn’t play calibration party games as often as Anna Salamon and Carl Shulman, and didn’t join Philip Tetlock’s study on superprediction. But I did make bets whenever I had the opportunity, and still do; and I try to set numeric odds whenever I feel uncertain and know I’ll find out the true value shortly.

I recently saw a cryptic set of statements on my refrigerator’s whiteboard about a “boiler” and various strange numbers and diagrams, which greatly confused me for five seconds before I hypothesized that they were notes about Brienne’s ongoing progress through the game Myst. Since I felt uncertain, but could find out the truth soon, I spent thirty seconds trying to tweak my exact probability estimate of these being notes for Brienne’s game. I started with a 90% “first pass” probability that they were Myst notes, which felt obviously overconfident, so I adjusted that down to 80% or 4:1. Then I thought about how there might be unforeseen other compact explanations for the cryptic words on the whiteboard and adjusted down to 3:1. I then asked Brienne, and learned that it was in fact about her Myst game. I then did a thirty-second “update meditation” on whether perhaps it wasn’t all that probable that there would be some other compact explanation for the cryptic writings; so maybe once the writings seemed explained away, I should have been less worried about unforeseen compact alternatives.

But I didn’t meditate on it too long, because it was just one sample out of my life, and the point of experiences like that is that you have a lot of them, and update a little each time, and eventually the experience accumulates. Meditating on it as much as I’m currently doing by writing about it here would not be good practice in general. (Those of you who have a basic acquaintance with neural networks and the delta rule should recognize what I’m trying to get my brain to do here.) I feel guilty about not betting more systematically, but given my limited supply of spoons, this kind of informal and opportunistic but regular practice is about all that I’m likely to actually do, as opposed to feel guilty about not doing.

As I do my editing pass on this document, I more recently assigned 5:1 odds against two characters on House of Cards having sex, who did in fact have sex; and that provides a bigger poke of adjustment against overconfidence. (According to the delta rule, this was a bigger error.)

PAT:  But there are studies showing that even after being warned about overconfidence, reading a study about overconfidence, and being allowed to practice a bit, overconfidence is reduced but not eliminated—right?

ELIEZER:  On average across all subjects, overconfidence is reduced but not eliminated. That doesn’t mean that in every individual subject, overconfidence is reduced but not eliminated.

PAT:  What makes you think you can do better than average?

STRANGER:  ...

ELIEZER:  What makes me think I could do better than average is that I practiced much more than those subjects, and I don’t think the level of effort put in by the average subject, even a subject who’s warned about overconfidence and given one practice session, is the limit of human possibility. And what makes me think I actually succeeded is that I checked. It’s not like there’s this “reference class” full of overconfident people who hallucinate practicing their calibration and hallucinate discovering that their credibility intervals have started being well-calibrated.

STRANGER:  I offer some relevant information that I learned from Sarah Constantin’s “Do Rational People Exist?”: Stanovich and West (1997) found that 88% of study participants were systematically overconfident, which means that they couldn’t demonstrate overconfidence for the remaining 12%. And this isn't too surprising; Stanovich and West (1998) note a number of other tests where around 10% of undergraduates fail to exhibit this or that bias.

ELIEZER:  Right. So the question is whether I can, with some practice, make myself as non-overconfident as the top 10% of college undergrads. This… does not strike me as a particularly harrowing challenge. It does require effort. I have to consciously work to expand my credibility intervals past my first thought, and I expect that college students who outperform have to do the same. The potential to do better buys little of itself; you have to actually put in the effort. But when I think I’ve expanded my intervals enough, I stop.

 

ii. Success factors and belief sharing

PAT:  So you actually think that you’re well-calibrated in assigning 9:1 odds for Methods failing versus succeeding, to the extreme levels assigned by the Masked Stranger. Are you going to argue that I ought to widen my confidence intervals for how much success Harry Potter and the Methods of Rationality might enjoy, in order to avoid being overconfident myself?

ELIEZER:  No. That feels equivalent to arguing that you shouldn’t assign a 0.1% probability to Methods succeeding because 1,000:1 odds are too extreme. I was careful not to put it that way, because that isn’t a valid argument form. That’s the kind of thinking which leads to papers like Ord, Hillerbrand, and Sandberg’s “Probing the Improbable,” which I think are wrong. In general, if there are 500,000 fan works, only one of which can have the most reviews, then you can’t pick out one of them at random and say that 500,000:1 is too extreme.

PAT:  I’m glad you agree with this obvious point. And I'm not stupid; I recognize that your stories are better than average. 90% of Harry Potter fanfiction is crap by Sturgeon’s Law, and 90% of the remaining 10% is going to be uninspired. That leaves maybe 5,000 fan works that you do need to seriously compete with. And I’ll even say that if you’re trying reasonably hard, you can end up in the top 10% of that pool. That leaves a 1-in-500 chance of your being the best Harry Potter author on fanfiction.net. We then need to factor in the other Harry Potter fanfiction sites, which have fewer works but much higher average quality. Let’s say it works out to a 1-in-1,000 chance of yours being the best story ever, which I think is actually very generous of me, given that in a lot of ways you seem ridiculously unprepared for the task—um, are you all right, Masked Stranger?

STRANGER:  Excuse me, please. I’m just distracted by the thought of a world where I could go on fanfiction.net and find 1,000 other stories as good as Harry Potter and the Methods of Rationality. I’m thinking of that world and trying not to cry. It’s not that I can’t imagine a world in which your modest-sounding Fermi estimate works correctly—it’s just that the world you’re describing looks so very different from this one.

ELIEZER:  Pat, I can see where you’re coming from, and I’m honestly not sure what I can say to you about it, in advance of being able to show you the book.

PAT:  What about what I tried to say to you? Does it influence you at all? The method I used was rough, but I thought it was a very reasonable approach to getting a Fermi estimate, and if you disagree with the conclusion, I would like to know what further factors make your own Fermi estimate work out to 10%.

STRANGER:  You underestimate the gap between how you two think. It wouldn’t occur to Eliezer to even consider any one of the factors you named, while he was making his probability estimate of 10%.

ELIEZER:  I have to admit that that’s true.

PAT:  Then what do you think are the most important factors in whether you’ll succeed?

ELIEZER:  Hm. Good question. I’d say... whether I can maintain my writing enthusiasm, whether I can write fast enough, whether I can produce a story that’s really as good as I seem to be envisioning, whether I’ll learn as I go and do better than I currently envision. Plus a large amount of uncertainty in how people will actually react to the work I have in my head if I can actually write it.

PAT:  Okay, so that’s five key factors. Let’s estimate probabilities for each one. Suppose we grant that there’s an 80% chance of your maintaining enthusiasm, a 50% chance that you’ll write fast enough—though you’ve had trouble with that before; it took you fully a year to produce Three Worlds Collide, if I recall correctly. A 25% probability that you can successfully write down this incredible story that seems to be in your mind—I think this part almost always fails for authors, and is almost certainly the part that will fail for you, but we’ll give it a one-quarter probability anyway, to be generous and steelman the whole argument. Then a 50% probability that you’ll learn fast enough to not be torpedoed by the deficits you already know you have. Now even without saying anything about audience reactions (really, you’re going to try to market cognitive science and formal epistemology to Harry Potter fans?), and even though I’m being very generous here, multiplying these probabilities together already gets us to the 5% level, which is less than the 10% you estimated—

STRANGER:  Wrong.

PAT:  … Wrong? What do you mean?

STRANGER:  Let’s consider the factors that might be involved in your above reasoning not being wrong. Let us first estimate the probability that any given English-language sentence will turn out to be true. Then, we have to consider the probability that a given argument supporting some conclusion will turn out to be free of fatal biases, the probability that someone who calls an argument “wrong” will be mistaken—

PAT:  Eliezer, if you disagree with my conclusions, then what’s wrong with my probabilities?

ELIEZER:  Well, for a start: Whether I can maintain my writing speed is not conditionally independent of whether I maintain my enthusiasm. The audience reaction is not conditionally independent of whether I maintain my writing speed. Whether I’m learning things is not conditionally independent of whether I maintain my enthusiasm. Your attempt to multiply all those numbers together was gibberish as probability theory.

PAT:  Okay, let’s ask about the probability that you maintain writing speed, given that you maintain enthusiasm—

ELIEZER:  Do you think that your numbers would have actually been that different, if that had been the question you’d initially asked? I’m pretty sure that if you’d thought to phrase the question as “the probability given that...” and hadn’t first done it the other way, you would have elicited exactly the same probabilities from yourself, driven by the same balance of mental forces—picking something low that sounds reasonable, or something like that. And the problem of conditional dependence is far from the only reason I think “estimate these probabilities, which I shall multiply together” is just a rhetorical trick.

PAT:  A rhetorical trick?

ELIEZER:  By picking the right set of factors to “elicit,” someone can easily make people’s “answers” come out as low as desired. As an example, see van Boven and Epley’s “The Unpacking Effect in Evaluative Judgments.” The problem here is that people... how can I compactly phrase this... people tend to assign median-tending probabilities to any category you ask them about, so you can very strongly manipulate their probability distributions by picking the categories for which you “elicit” probabilities. Like, if you ask car mechanics about the possible causes of a car not starting—experienced car mechanics, who see the real frequencies on a daily basis!—and you ask them to assign a probability to “electrical system failures” versus asking separately for “dead battery,” “alternator problems,” and “spark plugs,” the unpacked categories get collectively assigned much greater total probability than the packed category.

PAT:  But perhaps, when I’m unpacking things that can potentially go wrong, I’m just compensating for the planning fallacy and how people usually aren’t pessimistic enough—

ELIEZER:  Above all, the problem with your reasoning is that the stated outcome does not need to be a perfect conjunction of those factors. Not everything on your list has to go right simultaneously for the whole process to work. You have omitted other disjunctive pathways to the same end. In your universe, nobody ever tries harder or repairs something after it goes wrong! I have never yet seen an informal conjunctive breakdown of an allegedly low probability in which the final conclusion actually required every one of the premises. That’s why I’m always careful to avoid the “I shall helpfully break down this proposition into a big conjunction and ask you to assign each term a probability” trick.

Its only real use, at least in my experience, is that it’s a way to get people to feel like they’ve “assigned” probabilities while you manipulate the setup to make the conclusion have whatever probability you like—it doesn’t have any role to play in honest conversation. Out of all the times I’ve seen it used, to support conclusions I endorse as well as ones I reject, I’ve never once seen it actually work as a way to better discover truth. I think it’s bad epistemology that sticks around because it sounds sort of reasonable if you don’t look too closely.

PAT:  I was working with the factors you picked out as critical. Which specific parts of my estimate do you disagree with?

STRANGER:  (aside)  The multiple-stage fallacy is an amazing trick, by the way. You can ask people to think of key factors themselves and still manipulate them really easily into giving answers that imply a low final answer, because so long as people go on listing things and assigning them probabilities, the product is bound to keep getting lower. Once we realize that by continually multiplying out probabilities the product keeps getting lower, we have to apply some compensating factor internally so as to go on discriminating truth from falsehood.

You have effectively decided on the answer to most real-world questions as “no, a priori” by the time you get up to four factors, let alone ten. It may be wise to list out many possible failure scenarios and decide in advance how to handle them—that’s Murphyjitsu—but if you start assigning “the probability that X will go wrong and not be handled, conditional on everything previous on the list having not gone wrong or having been successfully handled,” then you’d better be willing to assign conditional probabilities near 1 for the kinds of projects that succeed sometimes—projects like Methods. Otherwise you’re ruling out their success a priori, and the “elicitation” process is a sham.

Frankly, I don’t think the underlying methodology is worth repairing. I don’t think it’s worth bothering to try to make a compensating adjustment toward higher probabilities. We just shouldn’t try to do “conjunctive breakdowns” of a success probability where we make up lots and lots of failure factors that all get informal probability assignments. I don’t think you can get good estimates that way even if you try to compensate for the predictable bias.

ELIEZER:  I did list my own key factors, and I do feel doubt about whether they’ll work out. If I were really confident in them, I’d be assigning a higher probability than 10%. But besides having conditional dependencies, my factors also have disjunctive as well as conjunctive character; they don’t all need to go right and stay right simultaneously. I could get far enough into Methods to acquire an audience, suddenly lose my writing speed, and Methods could still end up ultimately having a large impact.

PAT:  So how do you manipulate those factors to arrive at an estimate of 10% probability of extreme success?

ELIEZER:  I don’t. That’s not how I got my estimate. I found two brackets, 20:1 and 4:1, that I couldn’t nudge further without feeling nervous about being overconfident in one direction or the other. In other words, the same way I generated my set of ten credibility intervals for CFAR’s calibration test. Then I picked something in the logarithmic middle.

PAT:  So you didn’t even try to list out all the factors and then multiply them together?

ELIEZER:  No.

PAT:  Then where the heck does your 10% figure ultimately come from? Saying that you got two other cryptic numbers, 20:1 and 4:1, and picked something in the geometric middle, doesn’t really answer the fundamental question.

STRANGER:  I believe the technical term for the methodology is “pulling numbers out of your ass.” It’s important to practice calibrating your ass numbers on cases where you’ll learn the correct answer shortly afterward. It’s also important that you learn the limits of ass numbers, and don’t make unrealistic demands on them by assigning multiple ass numbers to complicated conditional events.

ELIEZER:  I’d say I reached the estimate… by thinking about the object-level problem? By using my domain knowledge? By having already thought a lot about the problem so as to load many relevant aspects into my mind, then consulting my mind’s native-format probability judgment—with some prior practice at betting having already taught me a little about how to translate those native representations of uncertainty into 9:1 betting odds. I’m not sure what additional information you want here. If there’s a way to produce genuinely, demonstrably superior judgments using some kind of break-it-down procedure, I haven’t read about it in the literature and I haven’t practiced using it yet. If you show me that you can produce 9-out-of-10 correct 90% credible intervals, and your intervals are narrower than my intervals, and you got them using a break-it-down procedure, I’m happy to hear about it.

PAT:  So basically your 10% probability comes from inaccessible intuition.

ELIEZER:  In this case? Yeah, pretty much. There’s just too little I can say to you about why Methods might work, in advance of being able to show you what I have in mind.

PAT:  If the reasoning inside your head is valid, why can’t it be explained to me?

ELIEZER:  Because I have private information, frankly. I know the book I’m trying to create.

PAT:  Eliezer, I think one of the key insights you’re ignoring here is that it should be a clue to you that you think you have incommunicable reasons for believing your Methods of Rationality project can succeed. Isn’t being unable to convince other people of their prospects of success just the sort of experience that crackpots have when they set out to invent bad physics theories? Isn’t this incommunicable intuition just the sort of justification that they would try to give?

ELIEZER:  But the method you’re using—the method you’re calling “reference class forecasting”—is too demanding to actually detect whether someone will end up writing the world’s most reviewed Harry Potter fanfiction, whether that’s me or someone else. The fact that a modest critic can’t be persuaded isn’t Bayesian discrimination between things that will succeed and things that will fail; it isn’t evidence.

PAT:  On the contrary, I would think it very reasonable if Nonjon told me that he intended to write the most-reviewed Harry Potter fanfiction. Nonjon’s A Black Comedy is widely acknowledged as one of the best stories in the genre, Nonjon is well-placed in influential reviewing and recommending communities—Nonjon might not be certain to write the most reviewed story ever, but he has legitimate cause to think that he is one of the top contenders for writing it.

STRANGER:  It's interesting how your estimates of success probabilities can be well summarized by a single quantity that correlates very well with how respectable a person is within a subcommunity.

PAT:  Additionally, even if my demands were unsatisfiable, that wouldn’t necessarily imply a hole in my reasoning. Nobody who buys a lottery ticket can possibly satisfy me that they have good reason to believe they’ll win, even the person who does win. But that doesn’t mean I’m wrong in assigning a low success probability to people who buy lottery tickets.

Nonjon may legitimately have a 1-in-10 lottery ticket. Neil Gaiman might have 2-in-3. Yours, as I’ve said, is probably more like 1-in-1,000, and it’s only that high owing to your having already demonstrated some good writing abilities. I’m not even penalizing you for the fact that your plan of offering explicitly rational characters to the Harry Potter fandom sounds very unlike existing top stories. I might be unduly influenced by the fact that I like your previous writing. But your claim to have incommunicable advance knowledge that your lottery ticket will do better than this by a factor of 100 seems very suspicious to me. Valid evidence should be communicable between people.

STRANGER:  “I believe myself to be writing a book on economic theory which will largely revolutionize—not I suppose, at once but in the course of the next ten years—the way the world thinks about its economic problems. I can’t expect you, or anyone else, to believe this at the present stage. But for myself I don’t merely hope what I say,—in my own mind, I’m quite sure.” Lottery winner John Maynard Keynes to George Bernard Shaw, while writing The General Theory of Employment, Interest and Money.

ELIEZER:  Come to think of it, if I do succeed with Methods, Pat, you yourself could end up in an incommunicable epistemic state relative to someone who only heard about me later through my story. Someone like that might suspect that I'm not a purely random lottery ticket winner, but they won't have as much evidence to that effect as you. It's a pretty interesting and fundamental epistemological issue.

PAT:  I disagree. If you have valid introspective evidence, then talk to me about your state of mind. On my view, you shouldn’t end up in a situation where you update differently on what your evidence “feels like to you” than what your evidence “sounds like to other people”; both you and other people should just do the second update.

STRANGER:  No, in this scenario, in the presence of other suspected biases, two human beings really can end up in incommunicable epistemic states. You would know that “Eliezer wins” had genuinely been singled out in advance as a distinguished outcome, but the second person would have to assess this supposedly distinguished outcome with the benefit of hindsight, and they may legitimately never trust their hindsight enough to end up in the same mental state as you.

You're right, Pat, that completely unbiased agents who lack truly foundational disagreements on priors should never end up in this situation. But humans can end up in it very easily, it seems to me. Advance predictions have special authority in science for a reason: hindsight bias makes it hard to ever reach the same confidence in a prediction that you only hear about after the fact.

PAT:  Are you really suggesting that the prevalence of cognitive bias means you should be more confident that your own reasoning is correct? My epistemology seems to be much more straightforward than yours on these matters. Applying the “valid evidence should be communicable” rule to this case: A hypothetical person who saw Eliezer Yudkowsky write the Less Wrong Sequences, heard him mention that he assigned a non-tiny probability to succeeding in his Methods ambitions, and then saw him succeed at Methods should just realize what an external observer would say to them about that. And what they’d say is: you just happened to be the lucky or unlucky relatives of a lottery ticket buyer who claimed in advance to have psychic powers, and then happened to win.

ELIEZER:  This sounds a lot like a difficulty I once sketched out for the “method of imaginary updates.” Human beings aren’t logically omniscient, so we can’t be sure we’ve reasoned correctly about prior odds. In advance of seeing Methods succeed, I can see why you’d say that, on your worldview, if it did happen then it would just be a 1000:1 lottery ticket winning. But if that actually happened, then instead of saying, “Oh my gosh, a 1000:1 event just occurred,” you ought to consider instead that the method you used to assign prior probabilities was flawed. This is not true about a lottery ticket, because we’re extremely sure about how to assign prior probabilities in that case—and by the same token, in real life neither of us will actually see our friends winning the lottery.

PAT:  I agree that if it actually happens, I would reconsider your previous arguments rather than insisting that I was correct about prior odds. I’m happy to concede this point because I am very, very confident that it won’t actually happen. The argument against your success in Harry Potter fanfiction seems to me about as strong as any argument the outside-view perspective might make.

STRANGER:  Oh, we aren’t disputing that.

PAT:  You aren’t?

STRANGER:  That’s the whole point, from my perspective. If modest epistemology sounds persuasive to you, then it’s trivial to invent a crushing argument against any project that involves doing something important that hasn’t been done in the past. Any project that’s trying to exceed any variety of civilizational inadequacy is going to be ruled out.

PAT:  Look. You cannot just waltz into a field and become its leading figure on your first try. Modest epistemology is just right about that. You are not supposed to be able to succeed when the odds against you are like those I have described. Maybe out of a million contenders, someone will succeed by luck when the modest would have predicted their failure, but if we’re batting 999,999 out of 1,000,000 I say we’re doing pretty well. Unless, of course, Eliezer would claim that the project of writing this new Harry Potter fanfiction is so important that a 0.0001% chance of success is still worth it—

ELIEZER:  I never say that. Ever. If I ever say that you can just shoot me.

PAT:  Then why are you not responding to the very clear, very standard, very obvious reasons I have laid out to think that you cannot do this? I mean, seriously, what is going through your head right now?

ELIEZER:  A helpless feeling of being unable to communicate.

STRANGER:  Grim amusement.

PAT:  Then I’m sorry, Mr. Eliezer Yudkowsky, but it seems to me that you are being irrational. You aren’t even trying to hide it very hard.

ELIEZER:  (sighing)  I can imagine why it would look that way to you. I know how to communicate some of the thought patterns and styles that I think have served me well, that I think generate good predictions and policies. The other patterns leave me with this helpless feeling of knowing but being unable to speak. This conversation has entered a dependency on the part that I know but don’t know how to say.

PAT:  Why should I believe that?

ELIEZER:  If you think the part I did figure out how to say was impressive enough. That was hidden purpose #7 of the Less Wrong Sequences—to provide an earnest-token of all the techniques I couldn’t show. All I can tell you is that everything you’re so busy worrying about is not the correct thing for me to be thinking about. That your entire approach to the problem is wrong. It is not just that your arguments are wrong. It is that they are about the wrong subject matter.

PAT:  Then what’s the right subject matter?

ELIEZER:  That’s what I’m having trouble saying. I can say that you ought to discard all thoughts from your mind about competing with others. The others who’ve come before you are like probes, flashes of sound, pingbacks that give you an incomplete sonar of your problem’s difficulty. Sometimes you can swim past the parts of the problem that tangled up other people and enter a new part of the ocean. Which doesn’t actually mean you’ll succeed; all it means is that you’ll have very little information about which parts are difficult. There often isn’t actually any need to think at all about the intrinsic qualities of your competition—like how smart or motivated or well-paid they are—because their work is laid out in front of you and you can just look at the quality of the work.

PAT:  Like somebody who predicts hyperinflation, saying all the while that they’re free to disregard conventional economists because of how those idiot economists think you can triple the money supply without getting inflation?

ELIEZER:  I don’t really know what goes through someone else’s mind when that happens to them. But I don’t think that telling them to be more modest is a fix. Telling somebody to shut up and respect academics is not a generally valid line of argumentation because it doesn’t distinguish mainstream economics (which has relatively high scholarly standards) from mainstream nutrition science (which has relatively low scholarly standards). I’m not sure there is any robust way out except by understanding economics for yourself, and to the extent that’s true, I ought to advise our hypothetical ill-informed contrarian to read a lot of economics blogs and try to follow the arguments, or better yet read an economics textbook. I don’t think that people sitting around and anxiously questioning themselves and wondering whether they’re too audacious is a route out of that particular hole—let alone the hole on the other side of the fence.

PAT:  So your meta-level epistemology is to remain as ultimately inaccessible to me as your object-level estimates.

ELIEZER:  I can understand why you’re skeptical.

PAT:  I somehow doubt that you could pass an Ideological Turing Test on my point of view.

STRANGER:  (smiling)  Oh, I think I’d do pretty well at your ITT.

ELIEZER:  Pat, I understand where your estimates are coming from, and I’m sure that your advice is truly meant to be helpful to me. But I also see that advice as an expression of a kind of anxiety which is not at all like the things I need to actually think about in order to produce good fiction. It’s a wasted motion, a thought which predictably will not have helped in retrospect if I succeed. How good I am relative to other people is just not something I should spend lots of time obsessing about in order to make Methods be what I want it to be. So my thoughts just don’t go there.

PAT:  This notion, “that thought will predictably not have helped in retrospect if I succeed,” seems very strange to me. It helps precisely because we can avoiding wasting our effort on projects which are unlikely to succeed.

STRANGER:  Sounds very reasonable. All I can say in response is: try doing it my way for a day, and see what happens. No thoughts that predictably won’t have been helpful in retrospect, in the case that you succeed at whatever you’re currently trying to do. You might learn something from the experience.

ELIEZER:  The thing is, Pat... even answering your objections and defending myself from your variety of criticism trains what look to me like unhealthy habits of thought. You’re relentlessly focused on me and my psychology, and if I engage with your arguments and try to defend myself, I have to focus on myself instead of my book. Which gives me that much less attention to spend on sketching out what Professor Quirrell will do in his first Defense lesson. Worse, I have to defend my decisions, which can make them harder to change later.

STRANGER:  Consider how much more difficult it will be for Eliezer to swerve and drop his other project, The Art of Rationality, if it fails after he has a number of (real or internal) conversation like this—conversations where he has to defend all the reasons why it's okay for him to think that he might write a nonfiction bestseller about rationality. This is why it’s important to be able to casually invoke civilizational inadequacy. It’s important that people be allowed to try ambitious things without feeling like they need to make a great production out of defending their hero license.

ELIEZER:  Right. And... the mental motions involved in worrying what a critic might think and trying to come up with defenses or concessions are different from the mental motions involved in being curious about some question, trying to learn the answer, and coming up with tests; and it’s different from how I think when I’m working on a problem in the world. The thing I should be thinking about is just the work itself.

PAT:  If you were just trying to write okay Harry Potter fanfiction for fun, I might agree with you. But you say you can produce the best fanfiction. That’s a whole different ball game—

ELIEZER:  No! The perspective I’m trying to show you, the way it works in the inside of my head, is that trying to write good fanfiction, and the best fanfiction, are not different ball games. There’s an object level, and you try to optimize it. You have an estimate of how well you can optimize it. That’s all there ever is.

 

iii. Social heuristics and problem importance, tractability, and neglectedness

PAT:  A funny thought has just occurred to me. That thing where you’re trying to work out the theory of Friendly AI—

ELIEZER:  Let me guess. You don’t think I can do that either.

PAT:  Well, I don’t think you can save the world, of course!  (laughs)  This isn’t a science fiction book. But I do see how you can reasonably hope to make an important contribution to the theory of Friendly AI that ends up being useful to whatever group ends up developing general AI. What’s interesting to note here is that the scenario the Masked Stranger described, the class of successes you assigned 10% aggregate probability, is actually harder to achieve than that.

STRANGER:  (smiling)  It really, really, really isn’t.

I'll mention as an aside that talk of “Friendly” AI has been going out of style where I’m from. We’ve started talking instead in terms of “aligning smarter-than-human AI with operators’ goals,” mostly because “AI alignment” smacks less of anthropomorphism than “friendliness.”

ELIEZER:  Alignment? Okay, I can work with that. But Pat, you’ve said something I didn’t expect you to say and gone outside my current vision of your Ideological Turing Test. Please continue.

PAT:  Okay. Contrary to what you think, my words are not fully general counterarguments that I launch against just anything I intuitively dislike. They are based on specific, visible, third-party-assessable factors that make assertions believable or unbelievable. If we leave aside inaccessible intuitions and just look at third-party-visible factors, then it is very clear that there’s a huge community of writers who are explicitly trying to create Harry Potter fanfiction. This community is far larger and has far more activity—by every objective, third-party metric—than the community working on issues related to alignment or friendliness or whatever. Being the best writer in a much larger community is much more improbable than your making a significant contribution to AI alignment when almost nobody else is working on that problem.

ELIEZER:  The relative size of existing communities that you’ve just described is not a fact that I regard as important for assessing the relative difficulty of “making a key contribution to AI alignment” versus “getting Methods to the level described by the Masked Stranger.” The number of competing fanfiction authors would be informative to me if I hadn’t already checked out the Harry Potter fan works with the best reputations. If I can see how strong the competition is with my own eyes, then that screens off information about the size of the community from my perspective.

PAT:  But surely the size of the community should give you some pause regarding whether you should trust your felt intuition that you could write something better than the product of so many other authors.

STRANGER:  See, that meta-reasoning right there? That’s the part I think is going to completely compromise how people think about the world if they try to reason that way.

ELIEZER:  Would you ask a juggler, in the middle of juggling, to suddenly start worrying about whether she’s in a reference class of people who merely think that they’re good at catching balls? It’s all just... wasted motion.

STRANGER:  Social anxiety and overactive scrupulosity.

ELIEZER:  Not what brains look like when they’re thinking productively.

PAT:  You’ve been claiming that the outside view is a fully general counterargument against any claim that someone with relatively low status will do anything important. I’m explaining to you why the method of trusting externally visible metrics and things that third parties can be convinced of says that you might make important contributions to AI alignment where nobody else is trying, but that you won’t write the most reviewed Harry Potter fanfiction where thousands of other authors are competing with you.

 

(A WANDERING BYSTANDER suddenly steps up to the group, interjecting.)

 

BYSTANDER:  Okay, no. I just can't hold my tongue anymore.

PAT:  Huh? Who are you?

BYSTANDER:  I am the true voice of modesty and the outside view!

I’ve been overhearing your conversation, and I’ve got to say—there’s no way it’s easier to make an important contribution to AI alignment than it is to write popular fanfiction.

ELIEZER:  … That’s true enough, but who…?

BYSTANDER:  The name’s Maude Stevens.

PAT:  Well, it's nice to make your acquaintance, Maude. I am always eager to hear about my mistakes, even from people with suspiciously relevant background information who randomly walk up to me in parks. What is my error on this occasion?

MAUDE:  All three of you have been taking for granted that if people don’t talk about “alignment” or “friendliness,” then their work isn’t relevant. But those are just words. When we take into account machine ethicists working on real-world trolley dilemmas, economists working on technological unemployment, computer scientists working on Asimovian agents, and so on, the field of competitors all trying to make progress on these issues becomes much, much larger.

PAT:  What? Is that true, Eliezer?

ELIEZER:  Not to my knowledge—unless Maude is here from the NSA to tell me about some very interesting behind-closed-doors research. The examples Maude listed aren't addressing the technical issues I've been calling “friendliness.” Progress on those problems doesn’t help you with specifying preferences that you can reasonably expect to produce good outcomes even when the system is smarter than you and searching a much wider space of strategies than you can consider or check yourself. Or designing systems that are stable under self-modification, so that good properties of a seed AI are preserved as the agent gets smarter.

MAUDE:  And your claim is that no one else in the world is smart enough to notice any of this?

ELIEZER:  No, that's not what I'm saying. Concerns like “how do we specify correct goals for par-human AI?” and “what happens when AI gets smart enough to automate AI research itself?” have been around for a long time, sort of just hanging out and not visibly shifting research priorities. So it's not that the community of people who have ever thought about superintelligence is small; and it's not that there are no ongoing lines of work on robustness, transparency, or security in narrow AI systems that will incidentally make it easier to align smarter-than-human AI. But the community of people who go into work every day and make decisions about what technical problems to tackle based on any extended thinking related to superintelligent AI is very small.

MAUDE:  What I’m saying is that you’re jumping ahead and trying to solve the far end of the problem before the field is ready to focus efforts there. The current work may not all bear directly on superintelligence, but we should expect all the significant progress on AI alignment to be produced by the intellectual heirs of the people presently working on topics like drone warfare and unemployment.

PAT:  (cautiously)  I mean, if what Eliezer says is true—and I do think that Eliezer is honest, if often, by my standards, slightly crazy—then the state of the field in 2010 is just like it looks naively. There aren’t many people working on topics related to smarter-than-human AI, and Eliezer’s group and the Oxford Future of Humanity Institute are the only ones with a reasonable claim to be working on AI alignment. If Eliezer says that the problems of crafting a smarter-than-human AI to not kill everyone are not of a type with current machine ethics work, then I can buy that as plausible, though I’d want to hear others’ views on the issue before reaching a firm conclusion.

MAUDE:  But Eliezer’s field of competition is far wider than just the people writing ethics papers. Anyone working in machine learning, or indeed in any branch of computer science, might end up contributing to AI alignment.

ELIEZER:  Um, that would certainly be great news to hear. The win state here is just “the problem gets solved”—

PAT:  Wait a second. I think you’re leaving the realm of what’s third-party objectively verifiable, Maude. That’s like saying that Eliezer has to compete with Stephen King because Stephen King could in principle decide to start writing Harry Potter fanfiction. If all these other people in AI are not working on the particular problems Eliezer is working on, whereas the broad community of Harry Potter fanfiction writers is competing directly with Eliezer on fiction-writing, then any reasonable third party should agree that the outside view counterargument applies very strongly to the second case, and much more weakly (if at all) to the first.

MAUDE:  So now fanfiction is supposed to be harder than saving the world? Seriously? Just no.

ELIEZER:  Pat, while I disagree with Maude’s arguments, she does have the advantage of rationalizing a true conclusion rather than a false conclusion. AI alignment is harder.

PAT:  I’m not expecting you to solve the whole thing. But making a significant contribution to a sufficiently specialized corner of academia that very few other people are explicitly working on should be easier than becoming the single most successful figure in a field that lots of other people are working in.

MAUDE:  This is ridiculous. Fanfiction writers are simply not the same kind of competition as machine learning experts and professors at leading universities, any of whom could end up making far more impressive contributions to the cutting edge in AGI research.

ELIEZER:  Um, advancing AGI research might be impressive, but unless it's AGI alignment it's—

PAT:  Have you ever tried to write fiction yourself? Try it. You’ll find it’s a heck of a lot harder than you seem to imagine. Being good at math does not qualify you to waltz in and—

 

(The Masked Stranger raises his hand and snaps his fingers. All time stops. Then the Masked Stranger looks over at Eliezer-2010 expectantly.)

 

ELIEZER:  Um... Masked Stranger... do you have any idea what’s going on here?

STRANGER:  Yes.

ELIEZER:  Thank you for that concise and informative reply. Would you please explain what’s going on here?

STRANGER:  Pat is thoroughly acquainted with the status hierarchy of the established community of Harry Potter fanfiction authors, which has its own rituals, prizes, politics, and so on. But Pat, for the sake of literary hypothesis, lacks an instinctive sense that it’s audacious to try to contribute work to AI alignment. If we interrogated Pat, we’d probably find that Pat believes that alignment is cool but not astronomically important, or that there are many other existential risks of equal stature. If Pat believed that long-term civilizational outcomes depended mostly on solving the alignment problem, as you do, then he would probably assign the problem more instinctive prestige—holding constant everything Pat knows about the object-level problem and how many people are working on it, but raising the problem’s felt status.

Maude, meanwhile, is the reverse: not acquainted with the political minutiae and status dynamics of Harry Potter fans, but very sensitive to the importance of the alignment problem. So to Maude, it’s intuitively obvious that making technical progress on AI alignment requires a much more impressive hero license than writing the world’s leading Harry Potter fanfiction. Pat doesn’t see it that way.

ELIEZER:  But ideas in AI alignment have to be formalized; and the formalism needs to satisfy many different requirements simultaneously, without much room for error. It’s a very abstract, very highly constrained task because it has to put an informal problem into the right formal structure. When writing fiction, yes, I have to juggle things like plot and character and tension and humor, but that’s all still a much less constrained cognitive problem—

STRANGER:  That kind of consideration isn’t likely to enter Pat or Maude’s minds.

ELIEZER:  Does it matter that I intend to put far more effort into my research than into fiction-writing? If Methods doesn’t work the first time, I’ll just give up.

STRANGER:  Sorry. Whether or not you’re allowed to do high-status things can’t depend on how much effort you say you intend to put in. Because “anyone could say that.” And then you couldn’t slap down pretenders—which is terrible.

ELIEZER:  …… Is there some kind of organizing principle that makes all of this make sense?

STRANGER:  I think the key concepts you need are civilizational inadequacy and status hierarchy maintenance.

ELIEZER:  Enlighten me.

STRANGER:  You know how Pat ended up calculating that there ought to be 1,000 works of Harry Potter fanfiction as good as Methods? And you know how I got all weepy visualizing that world? Imagine Maude as making a similar mistake. There’s a world in which some scruffy outsider like you wouldn’t be able to estimate a significant chance of making a major contribution to AI alignment, let alone help found the field, because people had been trying to do serious technical work on it since the 1960s, and were putting substantial thought, ingenuity, and care into making sure they were working on the right problems and using solid methodologies. Functional decision theory was developed in 1971, two years after Robert Nozick’s publication of “Newcomb’s Problem and Two Principles of Choice.” Everyone expects humane values to have high Kolmogorov complexity. Everyone understands why, if you program an expected utility maximizer with utility function 𝗨 and what you really meant is 𝘝, the 𝗨-maximizer has a convergent instrumental incentive to deceive you into believing that it is a 𝘝-maximizer. Nobody assumes you can “just pull the plug” on something much smarter than you are. And the world's other large-scale activities and institutions all scale up similarly in competence.

We could call this the Adequate World, and contrast it to the way things actually are. The Adequate World has a property that we could call inexploitability; or inexploitability-by-Eliezer. We can compare it to how you can’t predict a 5% change in Microsoft’s stock price over the next six months—take that property of S&P 500 stocks, and scale it up to a whole planet whose experts you can’t surpass, where you can’t find any knowable mistake. They still make mistakes in the Adequate World, because they’re not perfect. But they’re smarter and nicer at the group level than Eliezer Yudkowsky, so you can’t know which things are epistemic or moral mistakes, just like you can’t know whether Microsoft’s equity price is mistaken on the up-side or low-side on average.

ELIEZER:  Okay... I can see how Maude’s conclusion would make sense in the Adequate World. But how does Maude reconcile the arguments that reach that conclusion with the vastly different world we actually live in? It’s not like Maude can say, “Look, it’s obviously already being handled!” because it obviously isn’t.

STRANGER:  Suppose that you have an instinct to regulate status claims, to make sure nobody gets more status than they deserve.

ELIEZER:  Okay...

STRANGER:  This gives rise to the behavior you’ve been calling “hero licensing.” Your current model is that people have read too many novels in which the protagonist is born under the sign of a supernova and carries a legendary sword, and they don’t realize real life is not like that. Or they associate the deeds of Einstein with the prestige that Einstein has now, not realizing that prior to 1905, Einstein had no visible aura of destiny.

ELIEZER:  Right.

STRANGER:  Wrong. Your model of heroic status is that it ought to be a reward for heroic service to the tribe. You think that while of course we should discourage people from claiming this heroic status without having yet served the tribe, no one should find it intuitively objectionable to merely try to serve the tribe, as long as they’re careful to disclaim that they haven’t yet served it and don’t claim that they already deserve the relevant status boost.

ELIEZER:  ... this is wrong?

STRANGER:  It’s fine for “status-blind” people like you, but it isn’t how the standard-issue status emotions work. Simply put, there’s a level of status you need in order to reach up for a given higher level of status; and this is a relatively basic feeling for most people, not something that’s trained into them.

ELIEZER:  But before 1905, Einstein was a patent examiner. He didn’t even get a PhD until 1905. I mean, Einstein wasn’t a typical patent examiner and he no doubt knew that himself, but someone on the outside looking at just his CV—

STRANGER:  We aren’t talking about an epistemic prediction here. This is just a fact about how human status instincts work. Having a certain probability of writing the most popular Harry Potter fanfiction in the future comes with a certain amount of status in Pat’s eyes. Having a certain probability of making important progress on the AI alignment problem in the future comes with a certain amount of status in Maude’s eyes. Since your current status in the relevant hierarchy seems much lower than that, you aren’t allowed to endorse the relevant probability assignments or act as though you think they’re correct. You are not allowed to just try it and see what happens, since that already implies that you think the probability is non-tiny. The very act of affiliating yourself with the possibility is status-overreaching, requiring a slapdown. Otherwise any old person will be allowed to claim too much status—which is terrible.

ELIEZER:  Okay. But how do we get from there to delusions of civilizational adequacy?

STRANGER:  Backward chaining of rationalizations, perhaps mixed with some amount of just-world and status-quo bias. An economist would say “What?” if you presented an argument saying you ought to be able to double your money every year by buying and selling Microsoft stock in some simple pattern. The economist would then, quite reasonably, initiate a mental search to try to come up with some way that your algorithm doesn’t do what you thought it did, a hidden risk it contained, a way to preserve the idea of an inexploitable market in equities.

Pat tries to preserve the idea of an inexploitable-by-Eliezer market in fanfiction (since on a gut level it feels to him like you’re too low-status to be able to exploit the market), and comes up with the idea that there are a thousand other people who are writing equally good Harry Potter fanfiction. The result is that Pat hypothesizes a world that is adequate in the relevant respect. Writers’ efforts are cheaply converted into stories so popular that it’s just about humanly impossible to foreseeably write a more popular story; and the world’s adequacy in other regards ensures that any outsiders who do have a shot at outperforming the market, like Neil Gaiman, will already be rich in money, esteem, etc.

And the phenomenon generalizes. If someone believes that you don’t have enough status to make better predictions than the European Central Bank, they’ll have to believe that the European Central Bank is reasonably good at its job. Traditional economics doesn’t say that the European Central Bank has to be good at its job—an economist would tell you to look at incentives, and that the decisionmakers don’t get paid huge bonuses if Europe’s economy does better. For the status order to be preserved, however, it can’t be possible for Eliezer to outsmart the European Central Bank. For the world’s status order to be unchallengeable, it has to be right and wise; for it to be right and wise, it has to be inexploitable. A gut-level appreciation of civilizational inadequacy is a powerful tool for dispelling mirages like hero licensing and modest epistemology, because when modest epistemology backward-chains its rationalizations for why you can’t achieve big things, it ends up asserting adequacy.

ELIEZER:  Civilization could be inexploitable in these areas without being adequate, though; and it sounds like you're saying that Pat and Maude mainly care about inexploitability.

STRANGER:  You could have a world where poor incentives result in alignment research visibly being neglected, but where there’s no realistic way for well-informed and motivated individuals to strategically avoid those incentives without being outcompeted in some other indispensable resource. You could also have a world that’s inexploitable to you but exploitable to many other people. However, asserting adequacy reaffirms the relevant status hierarchy in a much stronger and more airtight way. The notion of an Adequate World more closely matches the intuitive sense that the world's most respectable and authoritative people are just untouchable—too well-organized, well-informed, and well-intentioned for just anybody to spot Moloch’s handiwork, whether or not they can do anything about it. And affirming adequacy in a way that sounds vaguely plausible generally requires less detailed knowledge of microeconomics, of the individuals trying to exploit the market, and of the specific problems they’re trying to solve than is the case for appeals to inexploitable inadequacy.

Civilizational inadequacy is the basic reason why the world as a whole isn’t inexploitable in the fashion of short-term equity price changes. The modest view, roughly, is that the world is inexploitable as far as you can predict, because you can never knowably know better than the experts.

ELIEZER:  I... sort of get it? I still don’t understand Maude’s actual thought process here.

STRANGER:  Let’s watch, then.

  

(The Masked Stranger raises his hands and snaps his fingers again, restarting time.)

  

PAT:  —take over literature because mere fiction writers are stupid.

MAUDE:  My good fellow, please take a moment to consider what you’re proposing. If the AI alignment problem were really as important as Eliezer claims, would he really be one of the only people working on it?

PAT:  Well, it sure looks like he is.

MAUDE:  Then the problem can’t be as important as he claims. The alternative is that a lone crank has identified an important issue that he and very few others are working on; and that means everyone else in his field is an idiot. Who does Eliezer think he is, to defy the academic consensus to the effect that AI alignment isn’t an interesting idea worth working on?

PAT:  I mean, there are all sorts of barriers I could imagine a typical academic running into if they wanted to work on AI alignment. Maybe it’s just hard to get academic grants for this kind of work.

MAUDE:  If it’s hard to get grants, then that’s because the grant-makers correctly recognize that this isn’t a priority problem.

PAT:  So now the state of academic funding is said to be so wise that people can’t find neglected research opportunities?

STRANGER:  What person with grant-making power gets paid less in the worlds where alignment is important and yet neglected? If no one loses their bonuses or incurs any other perceptible cost, then you’re done. There’s no mystery here.

MAUDE:  All of the evidence is perfectly consistent with the hypothesis that there are no academic grants on offer because the grantmakers have made a thoughtful and informed decision that this is a pseudo-problem.

ELIEZER:  I appreciate Pat’s defense, but I think I can better speak to this. Issues like intelligence explosion and the idea that there’s an important problem to be solved in AI goal systems, as I mentioned earlier, aren’t original to me. They're reasonably widely known, and people at all levels of seniority are often happy to talk about it face-to-face, though there’s disagreement about the magnitude of the risk and about what kinds of efforts are likeliest to be useful for addressing it. You can find it discussed in the most commonly used undergrad textbook in AI, Artificial Intelligence: A Modern Approach. You can’t claim that there’s a consensus among researchers that this is not an important problem.

MAUDE:  Then the grantmakers probably carefully looked into the problem and determined that the best way to promote humanity’s long-term welfare is to advance the field of AI in other ways, and only work on alignment once we reach some particular capabilities threshold. At that point, in all likelihood, funders plan to coordinate to launch a major field-wide research effort on alignment.

ELIEZER:  How, exactly, could they reach a conclusion like that without studying the problem in any visible way? If the entire grantmaking community was able to arrive at a consensus to that effect, then where are the papers and analyses they used to reach their conclusion? What are the arguments? You sound like you’re talking about a silent conspiracy of competent grantmakers at a hundred different organizations, who have in some way collectively developed or gained access to a literature of strategic and technical research that Nick Bostrom and I have never heard about, establishing that the present-day research problems that look relevant and tractable aren’t so promising, and that capabilities will develop in a specific known direction at a particular rate that lends itself to late coordinated intervention.

Are you saying that despite all the researchers in the field casually discussing self-improving AI and Asimov Laws over coffee, there’s some hidden clever reason why studying this problem isn’t a good idea, which the grantmakers all arrived at in unison without leaving a paper trail about their decision-making process? I just... There are so many well-known and perfectly normal dysfunctions of grantmaking machinery and the academic incentive structure that allow alignment to be a critical problem without there necessarily being a huge academic rush to work on it. Instead you’re postulating a massive global conspiracy of hidden competence grounded in secret analyses and arguments. Why would you possibly go there?

MAUDE:  Because otherwise—

  

(The Stranger snaps his fingers again.)

  

STRANGER:  Okay, Eliezer-2010, go ahead and answer. Why is Maude going there?

ELIEZER:  Because... to prevent relatively unimpressive or unauthoritative-looking people from affiliating with important problems, from Maude’s perspective there can’t be knowably low-hanging research fruit. If there were knowably important problems that the grantmaking machinery and academic reward system had left untouched, then somebody like me could knowably be working on them. If there were a problem with the grantmakers, or a problem with academic incentives, at least of the kind that someone like me could identify, then it might be possible for someone unimportant like me to know that an important problem was not being worked on. The alleged state of academia and indeed the whole world has to backward chain to avoid there being low-hanging research fruit.

First Maude tried to argue that the problem is already well-covered by researchers in the field, as it would be in the Adequate World you described. When that position became difficult to defend, she switched to arguing that authoritative analysts have looked into the problem and collectively determined it’s a pseudo-problem. When that became difficult to defend, she switched to arguing that authoritative analysts have looked into the problem and collectively devised a better strategy involving delaying alignment research temporarily.

STRANGER:  Very different hypotheses that share this property: they allow there to be something like an efficient market in high-value research, where individuals and groups that have high status in the standard academic system can't end up visibly dropping the ball.

Perhaps Maude's next proposal will be that top researchers have determined that the problem is easy. Perhaps there's a hidden consensus that AGI is centuries away. In my experience, people like Maude can be boundlessly inventive. There's always something.

ELIEZER:  But why go to such lengths? No real economist would tell us to expect an efficient market here.

STRANGER:  Sure, says Maude, the system isn’t perfect. But, she continues, neither are we perfect. All the grantmakers and tenure-granters are in an equivalent position to us, and doing their own part to actively try to compensate for any biases in the system they think they can see.

ELIEZER:  But that’s visibly contradicted both by observation and by the economic theory of incentives.

STRANGER:  Yes. But at the same time, it has to be assumed true. Because while experts can be wrong, we can also be wrong, right? Maybe we’re the ones with bad systemic incentives and only short-term rewards.

ELIEZER:  But being inside a system with badly designed incentives is not the same as being unable to discern the truth of... oh, never mind.

This has all been very educational, Masked Stranger. Thanks.

STRANGER:  Thanks for what, Eliezer? Showing you a problem isn’t much of a service if there’s nothing you can do to fix it. You’re no better off than you were in the original timeline.

ELIEZER:  It still feels better to have some idea of what’s going on.

STRANGER:  That, too, is a trap, as we’re both aware. If you need an elaborate theory to justify seeing the obvious, it will only become more elaborate and distracting as time goes on and you try harder and harder to reassure yourself. It’s much better to just take things at face value, without needing a huge argument to do so. If you must ignore someone’s advice, it’s better not to make up big elaborate reasons why you’re licensed to ignore it; that makes it easier to change your mind and take the advice later, if you happen to feel like it.

ELIEZER:  True. Then why are you even saying these things to me?

STRANGER:  I’m not. You never were the one to whom I was speaking, this whole time. That is the last lesson, that I didn’t ever say these things to myself.

 

(The Stranger turns upon his own heel three times, and was never there.)

 

This document is ©2017 by Eliezer Yudkowsky and free under the Creative Commons Attribution-No Derivative Works 3.0 License for copying and distribution, so long as the work is attributed and the text is unaltered.

Eliezer Yudkowsky's work is supported by the Machine Intelligence Research Institute.

Praise, condemnation, and feedback are always welcome. The web address of this page is http://yudkowsky.net/rational/herolicensing/.

Back to Top