AG Heinzelmann, 08.11.2024 Reflective Equilibration Solves The Paradox of Diachronic AI Safety

  • Date in the past
  • Friday, 8. November 2024, 14:00 - 16:15
  • Click here for recording
    • Gregor Betz (Philosophy, KIT, Karlsruhe)

Recording

The dynamic nature of our factual knowledge of, and our normative outlook on the world give rise to two conflicting safety requirements of advanced AI systems: (i) adaptability and (ii) resilience. The case for adaptability is straightforward: Safe AI systems must be able to learn novel facts and correct outdated or erroneous beliefs: acting on false information is likely to lead to bad outcomes. In addition, it’s an important safety feature that AI systems be able to continuously learn from human feedback, e.g. by adjusting the way they interpret and apply general principles of helpfulness or harm avoidance. The case for resilience seems, however, equally strong: It would be highly problematic for AI systems to autonomously modify, or even entirely drop basic normative tenets they have been initially designed to follow. Moreover and more specifically, safe AI systems must be able to resist manipulation attempts such as adversarial model editing or malicious prompt hacking, which typically try to “fool” the model into accepting presumably novel facts. I will argue that this paradoxical situation, where safety requirements pull in opposite directions, can be eased by stipulating that dynamic AI learning be guided by reason. To this end, I present and discuss two computational studies on rational belief revision of LLMs through reflective equilibration.

About Gregor Betz:

Dr. Gregor Betz is a Professor of Philosophy of Science at the Karlsruhe Institute of Technology. His research focuses on scientific prediction limits, the role of values in science, and the ethics of climate engineering. He has developed computational models of argumentative debate and applied them to improve critical thinking and AI. Dr. Betz earned his M.A., Ph.D., and Habilitation from Freie Universität Berlin. In 2023, he founded Logikon AI, a startup aimed at enhancing generative AI using critical thinking methods. His work is featured in journals like Erkenntnis and Synthese.

All Dates of the Event 'Interdisciplinary Philosophy'