Katalina Hernandez

AI Safety & Privacy Specialist | Researching Autonomy by Design @ Views are my own. Research on AI autonomy is independent of my employer.
2 karmaJoined Working (0-5 years)Spain
substack.com/@katalinahernandez

Bio

🔹 Background: Started as a data privacy specialist, focusing on Privacy-Enhancing Technologies (PETs) in technological innovation.

🔹 AI Governance Shift:

  • 20% because my employer asked me to.
  • 80% because I refuse to let compliance reduce AI to just another product to audit.

🔹 What I Do: Bridging AI governance, privacy engineering, and alignment-adjacent control mechanisms to ensure AI enhances (not replaces) human decision-making.

🔹 Current Research – Autonomy by Design (AbD):

  • Integrating interpretability, inference contestability, and user-controlled safeguards into real-world GenAI deployment.
  • Moving beyond transparency for regulatory’s sake: building AI systems where users have real, actionable control over AI-driven inferences.

🔹 Long-Term Vision:

  • Making autonomy preservation a first-class citizen in AI governance.
  • Bridging AI safety, UX research, and technical alignment work to ensure AI remains contestable, understandable, and user-controllable at scale.

How others can help me

I’m actively exploring how mechanistic interpretability can be leveraged to give users real control over the inferences GenAI models make about them.

🔹 If you’re working on mechanistic interpretability, interpretability-adjacent research, or UX for AI transparency, I’d love to discuss how these efforts can extend beyond AI oversight into direct user control over AI inferences.

🔹 I’m particularly interested in how feature-level interpretability (like Anthropic’s feature steering) could apply to real-time inference contestability, so that users don’t just see what AI assumes, but can intervene and correct how AI reasons about them in deployed systems.

🔹 If you’re in AI governance, alignment, privacy, or HCI, let’s connect. I believe autonomy-centered AI needs multidisciplinary collaboration to be taken seriously.

If you have insights, critiques, or research that overlaps with this, I’d love to hear from you!

How I can help others

I work at the intersection of AI governance, privacy, and alignment-adjacent research, focusing on Autonomy by Design.

🔹 AI Governance & Privacy: If you’re navigating the regulatory landscape, I can help bridge legal, technical, and alignment perspectives to make governance frameworks actionable in real-world AI deployment.

🔹 Mechanistic Interpretability & UX Research: If you’re working on interpretability, AI safety, or user experience in AI control, I can help connect research on transparency and control to real-world autonomy-preserving interfaces.

🔹 Multidisciplinary Collaboration: AI alignment, privacy, and HCI need to work together. If you’re looking for insights on how to make autonomy-preserving AI credible, actionable, and scalable, I’d love to contribute.

I’m here to exchange ideas, challenge assumptions, and help build AI systems where users have real choice over how AI impacts them.

Posts
1

Sorted by New

Comments
1

How should AI alignment and autonomy preservation intersect in practice?

We know that AI alignment research has made significant progress in embedding internal constraints that prevent models from manipulating, deceiving, or coercing users (to the extent that they don’t). However, internal alignment mechanisms alone don’t necessarily give users meaningful control over AI’s influence on their decision-making. Which is a mechanistic problem on its own, but…

This raises a question: Should future AI systems be designed to not only align with human values but also expose their influence in ways that allow users to actively contest and reshape AI-driven inferences?

For example:

  • If an AI model generates an inference about a user (e.g., “this person prefers risk-averse financial decisions”), should users be able to see, override, or refine that inference?
  • If an AI assistant subtly nudges users toward certain decisions, should it disclose those nudges in a way that preserves user autonomy?
  • Could mechanisms like adaptive user interfaces (allowing users to adjust how AI explains itself) or AI-generated critiques of its own outputs serve as tools for reinforcing autonomy rather than eroding it?

I’m exploring a concept I call Autonomy by Design, a control-layer approach that builds on alignment research but adds external, user-facing mechanisms to make AI’s reasoning and influence more contestable.

Would love to hear from interpretability experts, and UX designers: Where do you see the biggest challenges in implementing user-facing autonomy safeguards? Are there existing methodologies that could be adapted for this purpose?

Thank you in advance. 

Feel free to shatter this if you must XD.