STAI 2023

The Safe and Trustworthy AI Workshop (ICLP 2023)

The 2023 workshop on Safe and Trustworthy AI (STAI 23) was held on 9 July at the International Conference on Logic Programming (ICLP) in London. Accepted papers were presented either in talk (numbering 13) or as posters (numbering 4). Three winners were nominated. There were three invited speakers divided between one talk (on the adversarial susceptibility of large neural networks) and one fireside chat (on the relationships between the AI ethics community and the catastrophic AI risk community). We made some financial support available to support those who would otherwise be prevented from attending through lack of funding. 


Accepted Papers

The accepted papers are listed below. They are ordered alphabetically according to their title. Only some of the papers appear in full on this website (i.e. only some are linked). In total, 23 submission were accepted as either a talk or a poster presentation.

Harvey Mannering. Analysing Gender Bias in Text-to-Image Models using Object Detection

Alexander W. Goodall and Francesco Belardinelli. Approximate Model-Based Shielding for Safe Reinforcement Learning

Harry Coppock. Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Hasra Dodampegama and Mohan Sridharan. Back to the Future: Toward a Hybrid Architecture for Ad Hoc Teamwork

Paolo Bova, Alessandro Di Stefano and The Anh Han. Both eyes open: Vigilant Incentives help Auditors improve AI Safety

Fabrizio Russo and Francesca Toni. Causal Discovery and Knowledge Injection for Contestable Neural Networks

Matt MacDermott, Tom Everitt and Francesco Belardinelli. Decision Theory Using Mechanised Causal Graphs

Teun van der Weij, Simon Lermen and Leon Lang. Evaluating Language Model’s Shutdown Avoidance in Textual Scenarios

Cristina Carata. From Citizens to Decision-Makers: Changing the Public Governance Paradigm with the Help of Artificial Intelligence

Ismaïl Sahbane, Francis Rhys Ward and C Henrik Åslund. Experiments with Detecting and Mitigating AI Deception

Adam Kaufman. Grokking Grokking: Investigating Model Performance on Modular Arithmetic Tasks

Francis Rhys Ward. Honesty Is the Best Policy: Defining and Mitigating AI Deception

Anthony DiGiovanni and Jesse Clifton. Improved coordination with fail-safes and belief-conditioned programs

Dylan Cope. Learning to Plan with Tree Search via Deep RL

Usman Anwar, Chris Lu, David Krueger and Jakob Foerster. Noisy ZSC: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

Chad DeChant. On the risks and benefits of episodic memory in AI agents

Aidan Kierans, Hananel Hazan and Shiri Dori-Hacohen. Quantifying Misalignment Between Agents

Elfia Bezou-Vrakatseli, Benedikt Brückner and Luke Thorburn. Reasons Why Influence May Be Unethical

Richard Willis and Michael Luck. Resolving social dilemmas through reward transfer commitments

Luke Bailey, Gustaf Ahdritz and Anat Kleiman. Soft Prompts Are Unlike Token Embeddings

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim and Adrià Garriga-Alonso. Towards Automated Circuit Discovery for Mechanistic Interpretability

Avinash Kori. Unsupervised Conditional Slot Attention for Object Centric Learning

Mattia Villani. Unwrapping all ReLU Networks

There were three types of submissions: (1) Regular original papers (8 pages) present more mature work that includes some (perhaps preliminary) results, that have not been previously published nor accepted for publication, nor are currently under review by another conference or journal. (2) Short original papers (4 pages) were intended for less well-developed work, where results may still be forthcoming, that has not been previously published nor accepted for publication, nor is currently under review by another conference or journal. (3) Published papers or papers under review (15 pages) reporting on interesting and relevant work that has been published (or accepted for publication) in the last 18 months or is currently under review at another venue.

Programme Committee

The program chairs were C Henrik Åslund, Francesco Belardinalli, Elizabeth Black, and Francis Rhys Ward, see Organisation. The people below performed the indispensible work on reviewing the submissions:

Adedjouma Morayo, Adin Safokem, Aidan Kierans, Alastair Donaldson, Alessandro Abate, Alex Goodall, Alex Jackson, Alex Spies, Almuthanna Alageel, Amani Abou Rida, Anastasios Lepipas, Andrea Omicini, Andrea Orlandini, Aniello Murano, Areej Alzaidi, Ashwathy T Revi, Caitlin Bentley, Caspar Oesterheld, Catalin Dima, Charlie Rogers-Smith, Chengsong Tan, Dekai Zhang, Dimitrios Letsios, Dipesh Singla, Eleanor Watson, Elena Botoeva, Elfia Bezou Vrakatseli, Elizabeth Black, Emiliano Lorini, Emilio Serrano, Florence Eghwrudje, Francesco Chiariello, Ganesh Pai, Hana Kopecka, Hariram Veeramani, Ibrahim Habli, Ishmeet Kaur, Jack McKinlay, Javier Carnerero-Cano, John Favaro, Juliette Mattioli, Kenneth Co, Kevin Wei, Krystal Maughan, Laurent Perrussel, Leon Lang, Leon van der Torre, Luca Viganò, Luis Croquevielle, Lun Ai, Malinda Vania, Mandar Pitale, Maria Stoica, Marta Bienkiewicz, Marten Kaas, Mary Paterson, Matt MacDermott, Matteo Magnini, Mehrdad Saadatmand, Munyque Mittelmann, Nathan Gavenski, Nicky Pochinkov, Philipp Rader, Philippos Papaphilippou, Pierre Parrend, Rachel Horne, Richard Willis, Rob Alexander, Sandareka Wickramanayake, Sanjay Modgil, Sebastian Benthall, Shanza Ali Zafar, Shikha Bordia, Shubhi Asthana, Sian Carey, Simos Gerasimou, Surabhi Sinha, Temitope Ayano, Teun van der Weij, Tilman Räuker, Xiaotong Ji, Xinyi Ye, Yawen Duan, Yi Chang

The workshop aims to give early career researchers (ECRs) working in relevant fields the opportunity to gain experience of participating in a PC, and training and support was provided for this. Both experienced reviewers and ECRs joined our PC (all papers received at least one review from an experienced PC member).