AG Heinzelmann, 08.11.2024 Reflective Equilibration Solves The Paradox of Diachronic AI Safety

Date in the past
Friday, 8. November 2024, 14:00 - 16:15
Click here for recording
- Gregor Betz (Philosophy, KIT, Karlsruhe)

The dynamic nature of our factual knowledge of, and our normative outlook on the world give rise to two conflicting safety requirements of advanced AI systems: (i) adaptability and (ii) resilience. The case for adaptability is straightforward: Safe AI systems must be able to learn novel facts and correct outdated or erroneous beliefs: acting on false information is likely to lead to bad outcomes. In addition, it’s an important safety feature that AI systems be able to continuously learn from human feedback, e.g. by adjusting the way they interpret and apply general principles of helpfulness or harm avoidance. The case for resilience seems, however, equally strong: It would be highly problematic for AI systems to autonomously modify, or even entirely drop basic normative tenets they have been initially designed to follow. Moreover and more specifically, safe AI systems must be able to resist manipulation attempts such as adversarial model editing or malicious prompt hacking, which typically try to “fool” the model into accepting presumably novel facts. I will argue that this paradoxical situation, where safety requirements pull in opposite directions, can be eased by stipulating that dynamic AI learning be guided by reason. To this end, I present and discuss two computational studies on rational belief revision of LLMs through reflective equilibration.

About Gregor Betz:

Dr. Gregor Betz is a Professor of Philosophy of Science at the Karlsruhe Institute of Technology. His research focuses on scientific prediction limits, the role of values in science, and the ethics of climate engineering. He has developed computational models of argumentative debate and applied them to improve critical thinking and AI. Dr. Betz earned his M.A., Ph.D., and Habilitation from Freie Universität Berlin. In 2023, he founded Logikon AI, a startup aimed at enhancing generative AI using critical thinking methods. His work is featured in journals like Erkenntnis and Synthese.

Address
Room 117, Institute of Philosophy, Schulgasse 6, 69117 Heidelberg
Organizer
Nora Heinzelmann
Event Type
Lecture
Contact
Gregor Betz

Room 117, Institute of Philosophy, Schulgasse 6, 69117 Heidelberg

Friday 7.2.2025

14:00 - 16:15

Self-regulation Training in Primary Schools and Academic Achievements: Evidence from a Randomized Controlled Trial

Daniel Schunk (Economics, Mainz)

Room 117, Institute of Philosophy, Schulgasse 6, 69117 Heidelberg

Deutsch

AG Heinzelmann, 08.11.2024 Reflective Equilibration Solves The Paradox of Diachronic AI Safety

Address

Organizer

Event Type

Contact

All Dates of the Event 'Interdisciplinary Philosophy'

Interventions to reduce the spread of misinformation: Two online experiments

Preimplantation polygenic testing: a comeback of determinism and eugenics?

Reflective Equilibration Solves The Paradox of Diachronic AI Safety

Neutral by choice

Gender differences in career-related decision making–biases we need to cure, or rational behavior?

A Method for Philosophy: AI Philosophy

Decomposing Motivation

What does it mean to be healthy? It’s complicated…

Promoting the Ethical and Trustworthy Governance of AI in Healthcare

An Anscombean account of doxastic agency

Self-regulation Training in Primary Schools and Academic Achievements: Evidence from a Randomized Controlled Trial

Navigation

Quick Links

Research Groups