publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. saes_steering.png
    SAEs Are Good for Steering - If You Select the Right Features
    Dana Arad, Aaron Mueller, and Yonatan Belinkov
    arXiv, 2025
  2. vlm_different.png
    Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
    Yaniv Nikankin, Dana Arad, Yossi Gandelsman, and 1 more author
    arXiv, 2025
  3. ICML
    mib.png
    MIB: A Mechanistic Interpretability Benchmark
    Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, and 20 more authors
    ICML, 2025

2024

  1. ACL
    diffusion_lens.jpg
    Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
    Michael Toker, Hadas Orgad, Mor Ventura, and 2 more authors
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, 2024
  2. NAACL
    refact.jpg
    ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
    Dana Arad, Hadas Orgad, and Yonatan Belinkov
    In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, June 16-21, 2024, 2024
  3. EDBT
    Predicting Fact Contributions from Query Logs with Machine Learning
    Dana Arad, Daniel Deutch, and Nave Frost
    In Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, March 25 - March 28, 2024

2022

  1. CIKM
    LearnShapley: Learning to Predict Rankings of Facts Contribution Based on Query Logs
    Dana Arad, Daniel Deutch, and Nave Frost
    In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, 2022