Dana Arad

I’m a CS PhD candidate at the Technion, part of the Technion CS NLP lab, advised by Yonatan Belinkov. My research aims to improve our understanding of the internal mechanisms of language and vision-language models, focusing on information flow and factuality. I am a fellow of the Ariane de Rothschild Women Doctoral Program. Previously, I interned at Amazon and eBay.

I am passionate about advancing women in STEM and research. I volunteer with QueenB, where I founded the Academy Month, a program aimed at encouraging undergraduate students and early-career graduates to pursue research. I’m also part of She-S, the CS faculty women’s organization. Additionally, I lead NLP-IL’s vision-language club.

If you find any of the above interesting, feel free to reach out!

News

Sep 03, 2025	I’ll be giving talks on our recent papers on Sparse Autoencoders for Content Control at the Bau Lab at Northeastern (Sept 5), Singh Lab at Brown (Sept 8), and MIT CSAIL (Sept 9). Please join if you’re around!
Aug 20, 2025	Our paper SAEs Are Good for Steering - If You Select the Right Features has been accepted to EMNLP 2025 (Main Conference)!
Aug 04, 2025	I’ll be spending the rest of the summer visiting David Bau’s lab at Northeastern University’s Khoury College of Computer Sciences. If you’re in the area and want to chat about interpretability, feel free to reach out!

Selected Publications

CRISP: Persistent Concept Unlearning via Sparse Autoencoders

Tomer Ashuach, Dana Arad, Aaron Mueller, and 2 more authors

arXiv, 2025

arXiv

EMNLP

SAEs Are Good for Steering - If You Select the Right Features

Dana Arad, Aaron Mueller, and Yonatan Belinkov

arXiv, 2025

arXiv Bib

@article{DBLP:journals/corr/abs-2505-20063,
  author = {Arad, Dana and Mueller, Aaron and Belinkov, Yonatan},
  title = {SAEs Are Good for Steering - If You Select the Right Features},
  journal = {arXiv},
  volume = {abs/2505.20063},
  year = {2025},
  url = {https://doi.org/10.48550/arXiv.2505.20063},
  eprinttype = {arXiv},
  eprint = {2505.20063},
  timestamp = {Fri, 27 Jun 2025 21:43:42 +0200},
  biburl = {https://dblp.org/rec/journals/corr/abs-2505-20063.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Yaniv Nikankin, Dana Arad, Yossi Gandelsman, and 1 more author

arXiv, 2025

arXiv Bib Website

@article{DBLP:journals/corr/abs-2506-09047,
  author = {Nikankin, Yaniv and Arad, Dana and Gandelsman, Yossi and Belinkov, Yonatan},
  title = {Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms
                    in VLMs},
  journal = {arXiv},
  volume = {abs/2506.09047},
  year = {2025},
  url = {https://doi.org/10.48550/arXiv.2506.09047},
  eprinttype = {arXiv},
  eprint = {2506.09047},
  timestamp = {Tue, 08 Jul 2025 20:40:20 +0200},
  biburl = {https://dblp.org/rec/journals/corr/abs-2506-09047.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}

ICML

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, and 20 more authors

ICML, 2025

arXiv Bib

@article{DBLP:journals/corr/abs-2504-13151,
  author = {Mueller, Aaron and Geiger, Atticus and Wiegreffe, Sarah and Arad, Dana and Arcuschin, Iv{\'{a}}n and Belfki, Adam and Chan, Yik Siu and Fiotto{-}Kaufman, Jaden and Haklay, Tal and Hanna, Michael and Huang, Jing and Gupta, Rohan and Nikankin, Yaniv and Orgad, Hadas and Prakash, Nikhil and Reusch, Anja and Sankaranarayanan, Aruna and Shao, Shun and Stolfo, Alessandro and Tutek, Martin and Zur, Amir and Bau, David and Belinkov, Yonatan},
  title = {{MIB:} {A} Mechanistic Interpretability Benchmark},
  journal = {ICML},
  volume = {abs/2504.13151},
  year = {2025},
  url = {https://doi.org/10.48550/arXiv.2504.13151},
  eprinttype = {arXiv},
  eprint = {2504.13151},
  timestamp = {Thu, 22 May 2025 21:00:35 +0200},
  biburl = {https://dblp.org/rec/journals/corr/abs-2504-13151.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}

ACL

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Michael Toker, Hadas Orgad, Mor Ventura, and 2 more authors

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, 2024

arXiv Bib Website

@inproceedings{DBLP:conf/acl/TokerOVAB24,
  author = {Toker, Michael and Orgad, Hadas and Ventura, Mor and Arad, Dana and Belinkov, Yonatan},
  editor = {Ku, Lun{-}Wei and Martins, Andre and Srikumar, Vivek},
  title = {Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines},
  booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational
                    Linguistics (Volume 1: Long Papers), {ACL} 2024, Bangkok, Thailand,
                    August 11-16, 2024},
  pages = {9713--9728},
  publisher = {Association for Computational Linguistics},
  year = {2024},
  url = {https://doi.org/10.18653/v1/2024.acl-long.524},
  timestamp = {Tue, 24 Sep 2024 10:55:49 +0200},
  biburl = {https://dblp.org/rec/conf/acl/TokerOVAB24.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}

NAACL

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Dana Arad, Hadas Orgad, and Yonatan Belinkov

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, June 16-21, 2024, 2024

arXiv Bib Website

@inproceedings{DBLP:conf/naacl/AradOB24,
  author = {Arad, Dana and Orgad, Hadas and Belinkov, Yonatan},
  editor = {Duh, Kevin and G{\'{o}}mez{-}Adorno, Helena and Bethard, Steven},
  title = {ReFACT: Updating Text-to-Image Models by Editing the Text Encoder},
  booktitle = {Proceedings of the 2024 Conference of the North American Chapter of
                    the Association for Computational Linguistics: Human Language Technologies
                    (Volume 1: Long Papers), {NAACL} 2024, Mexico City, Mexico, June 16-21,
                    2024},
  pages = {2537--2558},
  publisher = {Association for Computational Linguistics},
  year = {2024},
  url = {https://doi.org/10.18653/v1/2024.naacl-long.140},
  timestamp = {Thu, 29 Aug 2024 17:13:57 +0200},
  biburl = {https://dblp.org/rec/conf/naacl/AradOB24.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org},
}