San Francisco, 19th June 2025 We’re building a world where AI agents handle our emails, book our flights, make purchasing decisions, and generally manage the mundane for us automatically. But even in this mundanity, there are personal decisions. Like, choosing the tone for an email response, booking a seat in the front or back of the plane, and purchasing the brand of toothpaste I prefer. Current AI implementations don’t handle these personal decisions well. This is by design. Models have their own personalities, which conflict with mine. Interact with GPT, Gemini, or Claude, and you’ll notice each has a distinct voice. GPT is short and direct. Gemini leans heavily into bullet points and lists. Claude is verbose and loves to pull out the canvas whenever possible. These aren’t bugs in the traditional sense. They’re features the companies spent millions training into their models. How do we know this? Ask any LLM to write emails the way you write emails, and they’ll still sound like themselves. Memory doesn’t make a difference. LLMs have a distinct smell, and it doesn’t go away. “This MIT study” backs this idea. There’s a significant distance between essays written with LLMs and essays written with “brain only”. LLMs, despite good prompting, cannot capture the essence of the user’s personality. English teachers who evaluated the AI essays for the study could also point to the lack of “human” personality in the LLM-generated work:
Some essays across all topics stood out because of a close to perfect use of language and structure while simultaneously failing to give personal insights or clear statements. These, often lengthy, essays included standard ideas, reoccurring typical formulations and statements, which made the use of AI in the writing process rather obvious. We, as English teachers, perceived these essays as ‘soulless’, in a way, as many sentences were empty with regard to content and essays lacked personal nuances. While the essays sounded academic and often developed a topic more in-depth than others, we valued individuality and creativity over objective “perfection”. This is reflected in lower content and uniqueness scores, while language, structure and accuracy are rated higher. However, some of these obviously AI generated essays did offer unique approaches, e.g. examples or quotes, which then led to higher uniqueness scores, even if structure and language lacked uniqueness.
When an AI agent acts on my behalf, it needs to make the same choices I would make. Not similar choices—the same ones. It needs to write messages the way I do, prioritize the way I do, write code in the style I prefer. You can define rules to guide LLMs around this, but that’s not a good solution. I doubt most people can filter their entire personality into a set of clean rules for the LLM to follow. That’s not good UX. This goes deeper than memory or context. It’s not about the model knowing my preferences—it’s about the model’s fundamental decision-making process mirroring mine. When faced with ten ways to phrase a sentence, does it choose the same one I would? When deciding whether to be direct or diplomatic, does it make my choice? The current approach treats AI personality as a feature to be proud of. Companies differentiate their models based on these distinct voices. B ut personality in AI is actually a bug masquerading as a feature. It creates a gap between what I want and what I get, between my decision-making and the AI’s. If the goal is an AI native world, then we need models that are impressionable in the deepest sense—not just following instructions, but adopting the user’s entire approach to communication and decision-making. An AI that knows what I like is a side effect of an AI that thinks like me. A digital twin that can replicate my behavior in binary should be the north star. Until we solve this, we’ll be stuck in the current paradigm of AI as a tool that requires constant supervision.