Two Research Threads

We investigate
intelligence itself.

To enable trustworthy AI deployment, we must first understand the agents we build. Our research runs deep on two parallel fronts — one for AI safety, one for personalization.

6+
Published papers
2
Research threads
2025
Ongoing work
To enable trustworthy deployment of AI agents, we need to first understand them.
// Research Thread I

Understanding Intelligence for AI Safety

To enable trustworthy deployment, we need to first understand the AI agents.

AI agents are increasingly capable — but capability without understanding is risk. Our first research thread investigates the internal workings of AI systems: how representations form, how features can be decoded and controlled, how failure modes can be identified automatically, and how human concepts map onto machine cognition. This line of work underpins safe and trustworthy AI deployment.

Open Research Questions
  • What is the difference between VLM (Vision-Language Model) representations and the brain's representations?
  • What are the differences between a post-trained model and a pretrained model in its feature spaces?
  • How can we automatically identify and interpret failure modes in large models without human annotation?
  • How do sparse dictionary features in LLMs relate to human-interpretable concepts?
Methods & Disciplines
Mechanistic Interpretability Sparse Autoencoders Representation Learning AI Safety Prompt Engineering Optimal Control Gradient Analysis Neuroscience
Active Investigations
  • Mapping VLM representation spaces to fMRI-measured brain activity patterns
  • Feature space geometry shifts induced by RLHF and instruction tuning
  • Automated failure mode discovery in physics-reasoning models
  • Sparse feature universality across model families and scales
Publications
// Research Thread II

LLM Agent Personalization

Building agents that truly know the person they serve.

Large language models are general. People are not. Our second thread develops the theory and engineering foundations for making LLM-based agents genuinely personal — building persistent user models, context-aware adaptation mechanisms, and trust-calibrated behavior that respects individual differences across a lifetime of use.

Research Questions
  • What is the minimal sufficient representation of a user that makes an agent feel personal?
  • How can user context be injected into LLMs without sacrificing generalization ability?
  • What implicit behavioral signals can drive preference learning without explicit feedback?
  • How do personalization preferences change over time — and how should agents adapt?
Open Problems — Join Us
  • Lifelong user modeling at low computational and memory cost
  • Privacy-preserving personalization without centralized user data
  • Cross-agent persona transfer without information leakage
  • Evaluating personalization quality in the absence of ground truth
  • Balancing user preference alignment with ethical guardrails
Methods & Disciplines
Machine Learning NLP / LLMs RLHF User Modeling Differential Privacy RAG Longitudinal Studies Causal Inference
Upcoming Publications
In Preparation
Target: ACL 2026
Persistent Persona: A Framework for Long-Horizon User Context in LLM-Based Agents
Proteus Research Team · 2025
Planned
Target: EMNLP 2025
Beyond Prompting: Behavioral Feedback as a Signal for Implicit Preference Learning
Proteus Research Team · 2025
Open Collaboration

We don't have all the answers — and we know it.

These problems require minds from cognitive science, ML, HCI, neuroscience, and beyond. We are actively seeking research collaborators, visiting scholars, and industry partners who want to work on the hard questions of human-AI symbiosis and AI safety.