EA - Cost-effectiveness of professional field-building programs for AI safety research by Center for AI Safety
The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categories:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cost-effectiveness of professional field-building programs for AI safety research, published by Center for AI Safety on July 10, 2023 on The Effective Altruism Forum.SummaryThis post explores the cost-effectiveness of AI safety field-building programs aimed at ML professionals, specifically the Trojan Detection Challenge (a prize), NeurIPS ML Safety Social, and NeurIPS ML Safety Workshop.We estimate the benefit of these programs in âQuality-Adjusted Research Yearsâ, using cost-effectiveness models built for the Center for AI Safety (introduction post here, full code here).We intend for these models to support - not determine - strategic decisions. We do not believe, for instance, that programs which a model rates as lower cost-effectiveness are necessarily not worthwhile as part of a portfolio of programs.The modelsâ tentative results, summarized below, suggest that field-building programs for professionals compare favorably to âbaselineâ programs - directly funding a talented research scientist or PhD student working on trojans research for 1 year or 5 years respectively. Further, the cost-effectiveness of these programs can be significantly improved with straightforward modifications - such as focusing a hypothetical prize on a more ârelevantâ research avenue, or running a hypothetical workshop with a much smaller budget.ProgramCost (USD)Benefit (counterfactual expected QARYs)Cost-effectiveness (QARYs per $1M)Trojan Detection Challenge65,00026390NeurIPS ML Safety Social520015029,000NeurIPS ML Safety Workshop110,0003603300Hypothetical: Power Aversion Prize50,0004909900Hypothetical: Cheaper Workshop35,0002507000Baseline: Scientist Trojans500,000 84170Baseline: PhD Trojans250,0008.735For readers who are after high-level takeaways, including which factors are driving these results, skip ahead to the cost-effectiveness in context section. For those keen on understanding the model and results in more detail, read on as we:Give important disclaimers. (Read more.)Direct you to background information about this project. (Read more.)Walk through the model. (Read more.)Contrast these programs with one another, and with funding researchers directly. (Read more.)Consider the scalability and robustness properties of the model. (Read more.)DisclaimerThis analysis is a starting point for discussion, not a final verdict. The most critical reasons for this are that:These models are reductionist. Even if we have avoided other pitfalls associated with cost-effectiveness analyses, the models might ignore factors that turn out to be crucial in practice, including (but not limited to) interactions between programs, threshold effects, and diffuse effects.The modelsâ assumptions are first-pass guesses, not truths set in stone. Most assumptions are imputed second-hand following a short moment of thought, before being adjusted ad-hoc for internal consistency and differences of beliefs between Center for AI Safety (CAIS) staff and external practitioners. In some cases, parameters have been redefined since initial practitioner input.Instead, the analyses in this post represent an initial effort in explicitly laying out assumptions, in order to take a more systematic approach towards AI safety field-building.BackgroundFor an introduction to our approach to modeling - including motivations for using models, the benefits and limitations of our key metric, guidance for adopting or adapting the models for your own work, comparisons between programs for students and professionals, and more - refer to the introduction post.2. The modelsâ default parameters are based on practitioner surveys and the expertise of CAIS staff. Detailed information on the values and definitions of these parameters, and comments on parameters with delicate definitions or contestable views, can be found i...