Author: Alessandro Zulberti

The Self-Referential Loop
Before we get to metaphors, it helps to ask a practical question: how do we avoid self-referential loops in UX work, whether we are talking with users or prompting AI? The danger is the same in both cases: answers that circle back on themselves, giving the illusion of progress while nothing new is learned.

A few inputs can help break the loop:
- Vary your questions. In usability tests, do not always ask “Was that easy?” Try “What would you do next?” or “What slowed you down?” In AI prompts, ask “Why might this design succeed, and why might it fail?” to invite both sides, not only confirmation.
- Encourage contrast. With participants, compare two flows instead of rating one. With LLMs, ask for “three different explanations and one possible outlier.” Contrast pulls the answer outward.
- Follow up carefully. If a user says “I like it,” ask “What part?” or “Was anything missing?” If the model repeats a phrase, prompt: “Where are you circling back to yourself?” or “What new angle have we not covered?”
- Rotate perspectives. In research, ask how a first-time user and a returning user might differ. In AI, shift frames: “How would a stakeholder see this?” versus “How would a competitor frame it?”
- Anchor in evidence. For humans, triangulate with numbers and stories. For AI, push outward with “Give me a concrete example from practice or literature,” not just a generic statement.
- With these inputs, loops can be broken before they harden.
Expansion versus Collapse

The golden ratio is often used as a symbol of beauty and growth. Its spiral expands forever, always outward, always balanced. But what happens when the movement goes the other way? Instead of expansion, what if the spiral folds back on itself, repeating the same thing? This is the self-referential loop.

The golden ratio spiral shows infinity as something generous. Each turn grows larger, and each step reveals something new but still connected. The self-referential loop shows infinity as something closed. Each turn brings us back to what was already said. Instead of widening our view, it makes it smaller. The lesson is simple: not all infinities are the same. Some open up, others close in.

Umberto Eco helps explain this. In The Open Work (1962), he described books and artworks that stay unfinished on purpose, so that readers and viewers can add their own meaning. The golden ratio spiral is like that: open, growing, never complete. The self-referential loop is the opposite: closed, repeating, not allowing anything from outside to enter.

The Semiotic Trap

Mathematicians such as Cantor showed that infinity can take different forms. Semiotics, the study of signs, shows another difference: signs can point outward to the world, or they can point inward to themselves.

Eco described this difference using the dictionary and the encyclopaedia. A dictionary can fall into a loop. For example:
• “Truth” → “Fact”
• “Fact” → “Truth”

The circle closes, with no way out. That is a self-referential loop. An encyclopaedia works differently. Instead of circling, it connects ideas outward: “truth” might link to law, science, philosophy, or religion. This keeps meaning alive.

Large language models risk falling into the dictionary model at its worst, circling around the same definitions or references. In The Limits of Interpretation (1990), Eco warned against this kind of empty overinterpretation, where signs only chase each other instead of reaching reality.

Contexts of the Self-Referential Loop

The loop is not only a problem for AI. We can see it in many parts of life:
- Mathematics: A student says, “I know 10 – 5 = 5, because 5 + 5 = 10.” Then, when asked why 5 + 5 = 10, they answer, “Because 10 – 5 = 5.” The reasoning circles back on itself. Nothing is really explained.
- Media: A rumour starts on Twitter, gets quoted in a blog, then reported in the news. The story seems stronger, but all sources point back to the first tweet.
- UX Research: A company asks customers only about speed at checkout. Customers answer about speed. The company concludes speed is the only thing that matters.
- Everyday Life: Someone says, “Trust me, because I always say I can be trusted.” The claim supports itself, nothing more.
Each example shows the same trap: the loop looks like movement, but it never brings in anything new.

Implications for Research

For researchers, this difference matters. The golden ratio spiral is a good metaphor for discovery, where each turn adds more. The self-referential loop warns us of closure, where repetition hides as insight.

Eco’s Kant and the Platypus (1997) offers a useful reminder. When the platypus was first discovered, it did not fit existing categories. Scientists had to adjust. If they had only circled within their old categories, they would have missed the truth. In research, the anomaly, the unexpected, is what breaks the loop.

Recent AI studies echo this point. Shumailov et al. (2024) showed that language models trained on their own outputs experience model collapse – a degenerative loop where the system loses touch with reality. Kommers et al. (2025) proposed computational hermeneutics as a framework for evaluating AI, arguing that meaning must emerge in context and dialogue. Both works highlight that loops without outside anchors erode meaning.

Without triangulation—using more than one method or viewpoint—the loop can trick us into thinking we have depth. What matters is not only the tools we use, but the ability to step outside the loop when it closes in.

Reflection

From my side, I see the self-referential loop as both a warning and a mirror. It warns us how easy it is to confuse movement with progress, or repetition with growth. And it mirrors our own habits: we too can circle inside familiar categories instead of reaching outward. Eco’s semiotics gives us language for this choice: the golden ratio as an open work, infinity as growth, and the loop as the dictionary model, infinity as stasis. For research, the task is clear. We must notice when the spiral is opening, and when it is only turning back on itself.

References
Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R. (2024). AI models collapse when trained on recursively generated data. Nature. Link
Kommers, C. et al. (2025). Evaluating Generative AI as a Cultural Technology. SSRN Preprint. Link
Eco, U. (1962). The Open Work. Harvard University Press.
Eco, U. (1976). A Theory of Semiotics. Indiana University Press.
Eco, U. (1990). The Limits of Interpretation. Indiana University Press.
Eco, U. (1997). Kant and the Platypus. Harcourt.
Hofstadter, D. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books.
Pattee, H. H. (2006). The Physics and Metaphysics of Biosemiotics: BioSystems. Elsevier.
Corballis, M. C. (2011). The Recursive Mind: The Origins of Human Language, Thought, and Civilization. Princeton University Press.
September 13, 2025
AI, Authorship & Discomfort
AI-generated content has entered public life quickly, raising questions about creativity, authenticity, and ethics. What is striking is that AI-generated writing often meets with more suspicion than AI-generated images or music. To see why, we need to look at history, culture, and recent empirical studies. Western traditions of authorship and originality carry heavy weight, and these traditions shape how we judge written, visual, and musical media differently when machines create them.

Authorship and Originality in Western Culture

In the West, writing has long been tied to the figure of the author. This was not always the case. In earlier periods – ancient, medieval – many works (folktales, poetry, scriptures) were transmitted without a clear individual author. Only through the rise of printing, copyright law (16th-18th centuries), and Enlightenment ideas did the idea of singular authorship become central. Modern readers expect writing to express an individual mind, with originality and personal insight.

Writing vs. Images: Different Traditions

Visual media have undergone mechanical reproduction (e.g. photography in the 19th century), tools, remixing, and appropriation for a long time. These traditions made us more tolerant to technological mediation in images. By contrast, in writing, plagiarism is heavily condemned; originality of phrasing and voice are central. That difference helps explain why AI writing triggers more discomfort.

Empirical Evidence: Imagery vs. Perception
- A recent study by Velásquez-Salamanca (2025) found that human-made images are perceived as both more realistic and more credible than AI-generated images.
- Another study (“Deciphering authenticity in the age of AI” by Farooq et al., 2025) showed that when AI-generated images are more realistic in appearance, people are more likely to accept them as authentic—but still with less confidence. Emotional salience did not always contribute significantly to the judgement of authenticity.
These findings help show that people’s unease with AI in images exists, but it is more forgiving when the image is high quality and believable.

The Rise of AI Writing

When large language models appeared (e.g. ChatGPT), many reacted with alarm. An AI can now produce essays, poems, or articles that sound human. This raises fears: what does it mean for writing if the voice behind it might be machine, not human?

People often describe a strange hollowness when they discover text they liked is AI-written. The promise of another mind behind words collapses. In branding or emotionally charged messages, consumer studies find that AI-written emotional content is trusted less and seen as less authentic. For example, a study by Kirk & Givi (2024) found that consumers respond less favourably to heartfelt messages once they believe an AI wrote them.

Music as Comparative Case

The recent case of The Velvet Sundown, a band that accrued over one million Spotify streams before being revealed to be entirely AI generated (music, backstory, visuals) offers a concrete example. Industry insiders called for warning labels and transparency, arguing that listeners should know whether music is made with human involvement.

This case highlights how music, though mediated by technology, still carries strong expectations of authorial voice, emotional authenticity, and human identity.

Cultural Differences Beyond the West

We must also consider how other traditions treat authorship and originality differently:
- In East Asia, imitation and mastering earlier forms are valued; creative variation within tradition is admired.
- In South Asia, improvisation and lineage in music and poetry make authorship shared and ongoing.
- African oral traditions often see storytelling as communal; the identity of the teller might matter less than the function of the story.
- Indigenous cultures of Americas and Oceania frequently tie voice, song, and story to collective memory, land, or ritual rather than individual ownership.
These traditions suggest that discomfort with AI writing may be especially acute because of Western cultural assumptions. In other cultures where authorship is more fluid, AI’s role might be interpreted differently.

Conclusion: Authorship, Authenticity, and the Future of Creativity

Western tradition has long treated writing as the domain of individual creative thought: the idea that one voice produces text, carries originality, and can be praised or held responsible. Visual art and music have histories of technological mediation, collaboration, and tradition, making them somewhat more ready to absorb AI’s role—though not without questions and ethical challenges.

The Velvet Sundown case shows that in music, as in writing, authenticity and disclosure matter. People expect more than technical quality—they expect voice, identity, integrity. Writing provokes the strongest unease because it is most tightly bound with assumptions of presence of a thinking, feeling author. Images are tolerated with machine assistance; music is contested; writing is the art form where the absence of human voice most deeply unsettles.

References
- Velásquez-Salamanca, D. (2025). Interpretation of AI-Generated vs. Human-Made Images. PMC.
- Farooq, A., et al. (2025). Deciphering authenticity in the age of AI: how AI-generated images are judged when realistic. Springer.
- Kirk, & Givi, J. (2024). Are messages from robots trustworthy? Study on consumers’ reactions to emotionally charged AI messages.
- The Guardian. (2025, July 14). An AI-generated band got 1m plays on Spotify. Now music insiders say listeners should be warned.
- People Magazine. (2025). Rock Band with More Than 1 Million Monthly Spotify Listeners Reveals Itself as AI Project.
September 12, 2025
AI, Language Gaps, and Equity
Performance Disparities: High-Resource vs Low-Resource Languages

Modern AI language systems have achieved impressive results in major languages, but performance drops steeply for low-resource languages. Benchmarks like FLORES-101/200 and XTREME have made these gaps measurable. For example, Meta AI’s FLORES evaluation set covers 100+ languages (many previously ignored) to assess translation quality on the “long tail” of low-resource languages. Using such benchmarks, Meta’s No Language Left Behind (NLLB-200) model (supporting 200 languages) delivered a 44% average BLEU improvement over prior state-of-the-art systems. In practice, NLLB-200 outperformed previous models by about +7 BLEU points on a subset of 87 languages, signalling substantial gains especially for under-served languages. Google has also expanded Translate to 24 new languages using zero-resource methods (training only on monolingual text), achieving BLEU scores in the 10–40 range. However, Google concedes that these new translations “still lag far behind” the quality of higher-resource languages in its system. This highlights that despite progress, major quality gaps remain between well-resourced and low-resourced tongues.

Multiple evaluations confirm the performance gap across languages. A 2025 study found that large language models show an over 15% average drop in accuracy on common-sense reasoning tasks when prompted in low-resource languages like Hindi or Swahili compared to English. Similarly, the AfroBench benchmark (2025) – testing 64 African languages – revealed “significant performance disparities” between English and most African languages examined. In one case, researchers fine-tuning a smaller LLM (Mistral 7B) for translation saw that Meta’s NLLB still achieved the highest BLEU, chrF++ and similarity scores for Zulu and Xhosa, outperforming both the fine-tuned model and Google Translate. These evaluations underscore a consistent pattern: AI systems perform dramatically better on high-resource languages, whereas translations and language understanding for low-resource languages are often error-prone and inferior. Even big tech’s multilingual models have shown “lacklustre performance on low-resourced languages” when compared to community-trained, local models. In short, inequities in training data translate directly into quality disparities, leaving many languages behind despite the “universal” ambitions of multilingual AI.

Cultural and Political Implications of Language Exclusion

The uneven performance and support for languages in AI systems lead to profound cultural and political consequences. Researchers refer to a growing “digital language divide” – a gap between languages with ample digital content/AI support and those virtually absent online. Only a small fraction of the world’s ~7,000 languages (under 5%) have a significant presence on the internet. This divide isn’t just about technology – it translates into epistemic injustice and knowledge marginalisation. When a language lacks digital support, its speakers’ knowledge and narratives become digitally invisible or underrepresented. AI language models today cover only a tiny subset of languages and “favour North American language and cultural perspectives,” introducing an Anglo-centric bias that undermines other worldviews. In effect, English and a few major languages dominate AI systems’ training data and outputs, so content generated by these models is “filtered through a Western lens,” often neglecting diverse cultural contexts. This bias toward Anglo-American norms means that minority languages and the perspectives encoded in them are sidelined – a subtle form of cultural homogenization and epistemic injustice in the digital sphere.

The exclusion of languages from digital infrastructure has real-world implications for equity and human rights. Communities whose languages lack AI support are caught in a “repeating cycle of exclusion,” unable to fully participate in or benefit from digital services and AI advancements. Speakers of these languages face information barriers and often endure poorer service from technology – a form of systemic bias or discrimination in access to knowledge. For many Global South countries, there are even geopolitical stakes: reliance on AI tools that only work in English (or Chinese, etc.) can lead to distorted global narratives and a loss of local “epistemic sovereignty” over information. In other words, if your language isn’t represented, your history and concerns risk being filtered out or misinterpreted by dominant digital platforms.

Several stark examples illustrate these risks. In Indonesia, local officials using a health IT system discovered grave translation errors when the software attempted to handle minority languages (Javanese, Sundanese); the mistakes led to dangerous misunderstandings in medication dosage instructions. This shows how lack of localization can literally endanger lives. Another case involved the Romani language: Google Translate’s inclusion of Romani, without proper safeguards, reportedly enabled police in Hungary and Romania to surveil and target Romani communities by misusing automated translations. Here, adding a marginalised language to AI systems without community consent or context actually facilitated harm against a minority group. Such incidents demonstrate that technological inclusion done wrong can backfire, reinforcing power imbalances. Indeed, the “technological bias” sends a message that some languages (and by extension, their people) matter less in the digital world, especially when the same privileged languages always perform best and receive the most attention.

Toward Linguistic Equity: Ethics and Policy Responses

Recognising these issues, scholars and policymakers argue that linguistic equity must be a priority in AI development. It’s not enough to “add more languages” into large models; power dynamics and biases in data and design must be addressed to truly include marginalised languages. AI ethicists note that current NLP methods often assume a one-size-fits-all, English-trained approach, which can misalign with local realities and even impose neocolonial dynamics. To avoid perpetuating injustice, community-centred and value-sensitive approaches are urged. For instance, participatory AI projects that involve native speakers in data collection, translation, and evaluation have proven feasible and effective. Grassroots initiatives (e.g. Masakhane or Lelapa in Africa) often produce higher-quality, culturally informed language tools by leveraging local knowledge.

At a policy level, various recommendations are emerging to close the “AI language gap.” In a 2025 multilingual AI policy primer, Cohere researchers call for: greater R&D investment in under-represented languages, support for creating open datasets, and international collaboration to share knowledge and best practices. Experts also emphasise that improving multilingual capabilities improves AI safety for all users, since weaknesses in one language can be exploited to spread harm across platforms. Ultimately, achieving linguistic equity in AI means treating language not just as data, but as a critical dimension of cultural diversity and justice. As one analysis put it, we must guard against “westernised cultural homogenisation” in AI outputs and ensure equal opportunities for all language communities in their self-representation and knowledge creation. In sum, bridging the multilingual performance gap is not only a technical endeavour – it is an ethical imperative to empower minority languages, protect cultural heritage, and promote a more inclusive digital future for speakers of all languages.

Sources: Recent academic studies and industry reports were used to ensure up-to-date information and examples (2022–2025). Key references include peer-reviewed benchmarks (e.g. FLORES-101 ), Meta AI’s NLLB project results, Google’s zero-resource translation initiative, as well as analyses from AI ethics and linguistics experts on the impacts of linguistic exclusion. These illustrate both the technical performance gaps and the broader socio-cultural stakes of linguistic diversity in AI. Each citation in the text corresponds to the source of the statement preceding it, allowing for verification and further reading.

References
- Arxiv (2023). Diversity and Language Technology: How Techno-Linguistic Bias Can Cause Epistemic Injustice.
- Bold Insight (2023). Context, accents, and privacy: Overcoming hurdles of AI translation tools in global UX research.
- Ethnologue (2022). How many languages are there?
- Highberg Insights (2025). Erik de Ruiter, Is Generative AI the Digital Tower of Babel? Pride Comes Before a Fall.
- Johns Hopkins University (2025). Multilingual Artificial Intelligence Often Reinforces Bias.
- Le Monde (2024). Google bets on African languages including Dyula, Wolof, Baoulé, and Tamazight.
- Mozilla (2025). Towards Truly Multilingual AI: Breaking English Dominance.
- RSIS International (2021). Accuracy of Google Translate in Translation of English–Kiswahili and Kiswahili–English Newspaper Headlines.
- Reddit (2024). Google’s AI translations are a disaster for my language (Manx).
- Wikipedia (2025). Google Translate.
- Wired (2016). Meet Noto, Google’s Free Font for 800 Languages.
- Google Blog (2016). Preserving Endangered Languages with Noto Fonts.
September 12, 2025
Ethnographic Methods in UX

Digital confirmation is not the point of commitment — it’s the starting point of negotiation.

At departure counters, passengers often arrive already having checked price and options online. They have scanned the available durations, noted wrapping and insurance, and assessed costs. The digital interface suggests a linear sequence: review, select, confirm. But at the counter the sequence bends. Clarification comes first. Duration is recalculated. Delivery timing is reconsidered. The value of protective services is weighed again. And then, just before the bag is placed on the machine, hesitation.

That hesitation clusters before the physical act.

Ethnographic research in user experience (UX), as defined by the Nielsen Norman Group, is about observing behaviour in its natural context rather than relying solely on what users say they do or what flows imply. In airport service environments, context matters: time pressure is visible; security procedures are fixed; staffed service counters replace lockers; and multiple services — storage, wrapping, shipping, insurance — converge at one physical point. The staffed desk is not a digital endpoint. It is a risk-resolution point.

The website communicates cost.
The counter resolves uncertainty.
The machine enacts commitment.

Booking sites for left luggage and carry-on services present multiple options — storage, wrapping, insurance — often at similar navigational priority. That structure can imply a digital completion channel. What behaviour shows is different. Passengers use digital touchpoints primarily to validate price and orient themselves. They approach the counter to clarify details, adjust duration, and negotiate edge cases. Only when the bag is placed on the scale, fed through X-ray, or loaded into the wrapping apparatus does the decision crystallise into irreversible action.

Research on airport self-service technologies shows that passengers’ use of automated systems such as check-in kiosks is influenced by how much they feel they still need human interaction and reassurance in the process. Some segments of travellers prefer staff assistance even when automated channels are available, which helps explain why digital adoption does not always map to digital completion. Studies that examine the factors influencing the use of self-service technology in airports find that the need for human interaction remains a significant influence on whether and how passengers engage with automated options.

In service design, this makes a difference. A blueprint might assume that digital confirmation equals commitment. Observation shows the opposite: commitment happens at the physical threshold. Before that, price, duration, and risk are recomputed in dialogue with staff. After the bag crosses the counter or machine, negotiation stops and the service begins.

Designing space or procedures around the assumption that commitment happens online risks misalignment. Information may be staged too late. Staff may be positioned as transaction processors rather than clarification agents. Queues may be treated as simple friction, rather than visible negotiation under constraints. Digital abandonment may be misinterpreted as failure, when in fact it may be expected pre-counter validation under tension.

Ethnographic observation reframes the system by locating the true decision point in a service ecology and aligning design choices to it. In departure environments, that decisive moment is not a click. It is the instant the passenger — under time pressure, risk awareness, and social negotiation — lets go of the bag.

August 22, 2025
Heuristics as Reflective Practice

What’s left out when we rely on heuristics?

We were reviewing an onboarding flow for a government-facing service, a design that, on paper, respected several usability heuristics: clear feedback, minimalist design, consistency. But one line stood out. A user reached a confirmation screen, and the system displayed a success message in bright green with a tick. “You’ve completed this stage,” it said.
Only, they hadn’t.

The heuristic flagged it, a violation of match between system and real-world expectations. But what it didn’t show us was why the message had been written that way, or why no one had changed it. It had passed three rounds of internal review.

In follow-up interviews, a participant summed it up:

“It says I’m done, but I know I’m not. So I don’t trust it.”

The issue wasn’t just mislabelling. It was a small moment of institutional self-protection, a pattern of overpromising to reduce call centre load. The green tick wasn’t just a mistake. It was a compromise.

Heuristics are sharp, but shallow. They bring clarity at the cost of context. When you evaluate against them, you often find problems quickly, but that speed can lull you into stopping too soon.

In the case I mentioned, the heuristic diagnosis might have ended the inquiry: “Success message needs better labelling.” But lived behaviour showed us more. That surface error had roots in deeper dynamics, organisational habits, legacy fears, even the KPIs used to evaluate service calls.

This wasn’t triangulation, we didn’t combine methods. We traced meaning through layers of context, beginning from a violated heuristic and unfolding outward.

I’ve learned to treat heuristics as invitations to reflect, not conclusions. But that took time. Early in my practice, I used them like scorecards. What shifted was realising that a flagged issue isn’t the end of an investigation, it’s the beginning of one.

What changed was seeing the heuristic not as an authority, but as a prompt: something that signalled where to slow down.

Sometimes, it’s that slowness that makes the method meaningful.

I used to treat them as fixed criteria. Now I see them more as evolving patterns of judgement, context-sensitive, culturally dependent, and shaped by the teams that apply them.

Take Nielsen’s “Consistency and Standards.” It makes sense, but what counts as consistent varies with platform, history, and expectation. A swipe gesture may be intuitive in one app, opaque in another.

Heuristics don’t resolve those differences. But they surface where reflection is needed.

They don’t answer the question. They show you where to ask one.

One thing that helped was adapting the list itself. In one project, we created a localised version of the ten heuristics to evaluate internal tools. We combined Nielsen’s principles with our org’s own interface patterns. We even named the tensions, like where error prevention conflicted with user control. That friction didn’t weaken the evaluation. It made it real.

Heuristics became more useful when we stopped pretending they were universal.

Always. Especially in time-sensitive projects or procurement-led settings. There’s a comfort in having something objective to point to — “This violates heuristic #5” — even if, as I explored in The Method is the Medium, that objectivity is often constructed.

But that’s where we risk mistaking the method for the meaning. Heuristics aren’t answers, they’re artefacts of judgement.

A good heuristic shows you where something’s broken. A reflective team asks why it broke that way.

To be clear, I still use them. They’re fast, communicable, and surprisingly durable. But I no longer treat them as the end of the conversation.
They’re a first filter, not a full account.

As my personal reflection, I keep returning to the moment the user said, “I don’t trust it.”

The interface followed most of the rules, and yet trust broke. That tension stays with me. It reminds me that evaluation isn’t just about identifying problems, but understanding what they mean.

Heuristics can spotlight friction. But they rarely explain its cause.

So I use them, but never alone. What matters is what we do after they show us something. How we resist the temptation to tidy the issue away.
Because sometimes, the problem isn’t the design. It’s the compromise behind it.

August 22, 2025
Silent Data: What We Don’t Capture

What if our tools are filtering out the most human parts?

The tension emerged in a familiar form: a scroll heatmap with a flat line, no visible clicks, and a near-total bounce rate. To the team, the conclusion seemed obvious. “No one’s engaging with this page,” someone said. “We should move the CTA up.” But that verdict felt strangely hollow. I remembered the session, more precisely, the user who had lingered, breathed, scrolled down slowly, then paused for a long time. No interaction, no click. But not nothing either.

What was happening in that pause? The tools offered no answer. Just absence, rendered as failure.

A case where the screen recording failed to explain behaviour

This particular session was a composite, drawn from multiple rounds of moderated testing for a content-heavy landing page. The metrics were bleak. No clicks below the fold. High exit rate. But one participant’s recording showed a curiously slow interaction. They read every line. They scrolled with care, almost hesitantly, as if searching for something unnameable. Then they left. No questions asked. No indication of confusion.

In a follow-up call, they described the experience as “a bit too much at once… but I didn’t want to rush it.” There was a kind of respect in their slowness. They weren’t bouncing, they were processing.

Yet to the analytics layer, it looked like disinterest.

This gap, between presence and interaction, between what is sensed and what is tracked, became the lens through which we revisited other sessions.

What was missing, pauses, gesture, emotional tone

This is where language fails us: we say “user behaviour,” but record only motion. We track taps and scrolls, not silences or furrowed brows. A participant’s pause, sometimes a full ten seconds, often reveals more than any quote. Their hand hovering over a button. A slight shift in posture. A sigh.

None of it captured. None of it categorised.

Quantitative tools can mask this with granularity. Qualitative tools can distort it with narrative. But it’s in the unscripted space between them, between what’s said and what’s stored, that some of the most meaningful signals live.

We began to notice these moments more deliberately:
• When someone hesitated to criticise.
• When eyes flicked sideways toward an unclicked element.
• When a scroll stopped, not due to confusion, but consideration.

Noticing required us to slow down too.

This observation sits in close kinship with the approach described in Ethnographic Methods in UX, where presence and pacing guide what becomes legible.

Mixed methods, speculative prompts, and analogue note-taking

To make room for this slower noticing, we tried something modest: writing by hand during sessions. Not transcriptions or timestamped events, just impressions. Mood. Pacing. Fragmented phrases like “leaning back, smile faint.” It changed how we listened. The act of writing slowed our response time and made us more porous.

Later, in synthesis, these notes became a soft frame, not the final word, but a cue to revisit recordings with new eyes. They led us to ask different questions:
• Not “What went wrong here?” but “What might have passed unspoken?”
• Not “How many dropped off?” but “When did engagement shift?”

We also ran a short trial with speculative methods, lightly inspired by cultural probes. Participants were invited to mark, title, or sketch moments on the interface where they felt something, hesitation, tension, clarity, delight. Some drew lines around white space. One labelled a content block “a pause I needed.” Another crossed out a CTA with the word “too soon.”

The results were imprecise, and not designed to be codified. What mattered was the permission: to surface inner tempo, to let emotional tone enter the frame. These activities revealed structure not by precision, but by association. They made visible what the tools had made mute.

This reframing of method is explored further in The Method is the Medium, where tools are treated as ethical filters, not neutral instruments.

Personal reflections:

As my personal reflection, what changed most was how I read a session. I used to begin with tasks and outputs, what got done, what didn’t. But now I attend first to tempo. Did something slow down? Speed up? Did the participant’s voice falter? Did the air change?

These shifts often point to something just outside articulation, not yet named, but present. I don’t claim to capture it fully. But I no longer assume absence is absence. Some things don’t show up in the tools because they aren’t meant to. They are felt, not logged.

That doesn’t mean we abandon structure. It means we expand our sensitivity. A good tool helps us measure. A good researcher learns to feel what the tool leaves out.

Note: This echoes some of the tensions raised in What Research Forgets, where silence isn’t failure but unacknowledged form.

August 22, 2025
Writing UX Research for Humans

Why are UX research reports so often unreadable?

UX research reports are meant to clarify. Yet many of the ones we write, or read, feel unreadable. A participant had cried during a session. But by the time the quote appeared in the final slide deck, it had been reframed as: “Emotional user response highlights latent frustration with service inefficiencies.”

Technically, the meaning was intact. But what was lost wasn’t just the emotion — it was the cadence, the context, the small moment that made the tear matter.

Many UX researchers write as if they’re still proving the value of their discipline. The result is often a defensive posture: hedged language, passive constructions, over-polished charts. We end up with something that looks professional, but says very little.

1. A phrasing that emptied the finding

One project still stays with me. We were testing a prototype for a scheduling tool built for shift workers. Participants were clear — almost blunt — in how they spoke. “It’s confusing,” one said. “I’d rather just call in.” Another asked, “What happens if I swap two shifts and forget to confirm?”

In the report, those became: “Users request more intuitive flows,” and “There is a need for clarity around confirmation logic.” At the time, I thought I was being careful — clear, neutral, professional.

The stakeholders nodded. Then we moved on.

Weeks later, preparing a different deliverable, I returned to the transcripts. I reread what one participant had said — this time with the recording open. “If I mess up, I won’t know until my manager calls me. I just… don’t trust it.” That version made it into a new draft, almost unchanged. It wasn’t just more vivid — it gave the reader a reason to pause.

Not because it was better written. Because it made the user present.

2. Writing as interpretation, not transcription

The move from raw research to report is often treated as a delivery task. But in practice, it’s interpretive. We decide what tone to strike, what order to reveal things, where to hold back. We choose whether to tell a story or flatten it into a theme.

Writing, in this sense, isn’t post-processing — it’s synthesis. A nested clause can hide agency. A passive phrase can sound objective while displacing responsibility. The choice to list findings versus narrate them isn’t neutral either.

I’ve seen two reports from the same study produce completely different reactions. One was shelved, the other sparked roadmap changes. The difference wasn’t the data — it was the stance. One report left room for uncertainty, the other rushed to make sense.

This isn’t about style. It’s about what kind of knowledge we’re producing, and for whom.

3. The ethics of phrasing, and its hidden actors

Interpretation carries responsibility. Passive voice can seem gentle but often hides cause: “Users were confused by the interface” avoids the question of why it was that way.

The same is true of phrases like “Users need more education.” It subtly locates the problem in the user, not the system. These aren’t just linguistic choices — they’re ethical ones.

In one report review, I noticed how a single stakeholder’s comment — “Let’s avoid blaming the design” — quietly shaped the whole tone. We changed “Users struggled with unclear icons” to “Some users had differing expectations of icon meanings.” The revision softened the feedback. But it also shifted responsibility away from the product team.

Bias enters this way. Through emphasis. Through sequencing. Through whose voice we make audible, and whose we paraphrase.

To write is to decide what matters. And in research, those decisions shape what gets funded, fixed, or ignored.

Personal reflections:

As my personal reflection, I’ve come to see writing not as a way to end a research project, but as a way to stay inside it longer. Writing — when I let it interrupt me — helped me listen again. It showed me what I’d misunderstood or rephrased too quickly.

This piece began as a question about how to make UX research reports more readable. But I’ve realised I was also asking something else: What kind of attention does a quote deserve? And what kind of writing might allow us, just briefly, to hear it properly?

August 22, 2025
UX Research and Sensemaking
What do we do when the data doesn’t agree?

In one study, we heard two conflicting things about the same feature. Some users described it as “super intuitive, just works”, while others couldn’t even find it. Both accounts came from usability sessions using the same prototype. The scroll heatmaps weren’t conclusive either, most users reached the section, but some bypassed it altogether, either scrolling too fast or getting distracted by another element.

At first, we treated it like a clarity problem. Maybe one group misunderstood the task. Maybe the findings were noise. But then came the temptation: we could just resolve the tension. Choose a story. Pick the more strategic finding. Tell the team what they needed to hear.

That pressure, subtle, unspoken, was real. But clarity, in this case, would have meant flattening something valuable: the contradiction itself. Because it wasn’t a glitch in the data. It was the pattern.

Step 1, The contradictory findings

What made this harder was that both groups of users were “right”. The design hadn’t failed outright, but it hadn’t adapted either. For some, its invisibility was a mark of success, seamlessly embedded in their workflow. For others, the same invisibility meant confusion. When we filtered by device, we saw that most struggling users were on smaller screens. When we looked again at the session videos, a pattern emerged, the ones who understood the feature tended to arrive via a different entry point, a help article, or a campaign page.

The contradiction was real, not apparent. And that distinction mattered. We weren’t uncovering noise. We were surfacing a structural inconsistency, a product behaviour that held up differently under varied conditions.

One user put it plainly: “Oh, I thought that was just a label — not something I could click.” That comment shifted everything. The element hadn’t changed. But the reading of it had.

If we’d treated the dataset as binary, “some users found it, some didn’t”, we’d have missed what the contradiction was pointing to: a split in how the feature was contextualised. The disagreement wasn’t the problem. It was the clue.

Step 2, Sensemaking as iterative framing

This was where sensemaking became a method in itself. Not a step after analysis, but an approach threaded through it.

We used Karl Weick’s view of sensemaking as “retrospective, social, and identity-driven” to ask: what frame are we applying, and how are we adjusting it in response to the evidence? Donald Schön’s idea of “reflection-in-action” also helped: design and interpretation are not separate, the way we read the field already reshapes it.

Rather than declare an insight, we mapped multiple storylines:
- One for users arriving through guided flows.
- One for those landing from search.
- One for support-driven re-entries.
These were not segments or personas, but frames, interpretive pathways through which the same UI was being read. Each one highlighted different tensions. The “seamless” experience, for instance, only held up when onboarding had done the work upstream. In other contexts, the same feature became invisible in the wrong sense, under-described, under-signalled.

We returned to the prototype and re-ran two sessions. This time, we narrated aloud what we thought the experience would communicate, and asked users to do the same. Their interpretations broke our assumptions open. A coloured label we considered purely functional turned out to be read as a badge, making the element seem less interactive than intended. That changed how we framed the scroll data. Suddenly, the speed of descent made more sense.

Sensemaking here wasn’t resolution. It was a method of holding disagreement long enough to let new patterns surface.

Step 3, The ethics of interpretation

But we had to decide. The roadmap team was asking: should this feature be flagged for redesign?

Here the ethical dimension surfaced. Interpretation wasn’t neutral. If we led with the seamless users, we risked exclusion. If we led with the struggling ones, we risked undermining a design that worked well in specific flows. Either choice would be read back as the finding.

We brought in a facilitation technique: instead of framing one interpretation as dominant, we presented the three narrative lines side by side, asking stakeholders to annotate the friction points they found most risky. It changed the conversation. Suddenly, they weren’t asking “which story is true?”, but “which gap matters most to address?”

That shift in posture, from validation to framing, became the real outcome. It didn’t flatten complexity. It located it.

Personal reflections:

As my personal reflection, I learned that patience isn’t just a soft skill here, it’s methodological. What felt like indecision was actually a refusal to oversimplify.

Sensemaking, in this context, wasn’t a path to consensus. It was a way of preserving ambiguity until its contours became meaningful. In retrospect, what I thought was a stuck moment, where nothing agreed, was actually the turning point. That’s where we began to see the outlines of a system in flux, not a feature in failure.
August 22, 2025
Triangulation in UX Research

Why does the call for rigour so often collapse into a checklist?

Triangulation promises rigour. But too often, it delivers repetition. Researchers reach for it not to surface contradiction, but to secure consensus, three methods, three data types, one tidy insight. It becomes less a practice of inquiry than a form of pre-emption: anticipate critique by outnumbering it.

This article explores what triangulation in UX research might become if we resist that instinct. Not as a seal of truth, but as a process of interpretive tension. What happens when we treat friction between methods as a sign of insight, not failure?

The scenario that follows is fictional, but drawn from real patterns I’ve seen repeatedly in practice. The aim is not to simulate realism, but to examine how meaning takes shape when signals disagree.

A familiar trap

Picture a budgeting tool, just launched. In early usage, one pattern catches the team’s attention: users consistently ignore the pencil icon beside each category name, the one meant for renaming. Clip after clip shows it: people scroll, pause, skip.

The speculation begins.

“She doesn’t even see the rename icon.”
“He’s moving too fast to personalise.”
“They’re not trying to make it their own.”

A story forms: users aren’t engaging with customisation, they just want to get through setup.

It’s the kind of insight that travels easily. In one version, it supports a minimalist roadmap. In another, it’s flagged as a design flaw. Either way, the story seems self-evident. The behaviour is there. It repeats. It looks conclusive.

But behaviour alone doesn’t explain pacing, priority, or mental load. A pattern is not a preference. Not yet.

So the team brings in other data sources, not for confirmation, but to widen the aperture.

Three signals, no closure

Scroll maps show a hesitation near the top of the page, users slow down at the category labels, then move smoothly past. Click data reveals high interaction with the “+” icon (adding categories) but almost none on the pencil icon. These signals repeat the pattern, but don’t yet explain it.

Then, in a small interview set, a few participants give a different view:

> “I knew I’d want to rename things later, but it didn’t feel urgent. I just wanted to start tracking.”
> “I got what the categories meant — even if the names weren’t perfect. I made the change in my head.”

These aren’t disengaged users. They’re strategic. They’re managing attention, not opting out.

Suddenly, the scroll pause looks like momentary mapping, not confusion. The absence of a click becomes a form of deferral, not disinterest. The intention to personalise is there, but postponed, not performed.

This is where triangulation becomes something more than method layering. The different signals don’t converge. They collide. And the contradiction becomes instructive.

Contradiction as method

When triangulation is treated as proof, disagreement between methods looks like error. One source must be “off.” But when it’s treated as dialogue, disagreement becomes a doorway.

Here, the screen recordings offered one lens: what people skipped.
The scroll data offered another: where attention slowed.
The interviews reframed both: users were deciding what to delay.

Together, they changed the frame of inquiry. Not “Do users personalise their budget?” but “How do users pace their understanding of a system that feels incomplete?”

In practice, this shifted the structure of our readouts. Instead of presenting “findings,” we began mapping tensions: quote vs heatmap, interview vs clickstream. Anomalies were preserved, not smoothed out. We stopped looking for alignment, and started looking for friction.

As in A Week of Searching, contradiction became a signal of interpretive depth.

Personal reflection:

As my personal reflection, I’ve come to understand triangulation less as an exercise in confirmation and more as a practice of patience.

The urge to wrap things up, to move from signal to statement, is strong. Especially under time pressure, especially in stakeholder meetings. But in many of the most formative projects I’ve worked on, clarity emerged not through alignment but through tension. A quote that didn’t match the graph. A metric that refused to cohere with the story.

These moments weren’t dead ends. They were invitations to look again, not at the user, but at how we were reading them.

So now, when I write a research plan, I ask different questions:
Where might methods pull away from each other?
Where might contradiction help me see the edge of what I think I know?

Triangulation doesn’t give certainty. It gives contour.

August 22, 2025
Stakeholder Listening as UX Method
When research doesn’t land, is it wrong , or unreadable?

This is not a real project. It’s a constructed example , a fictionalised scenario drawn from patterns I’ve seen repeat across teams, industries, and projects. The quotes are imagined, but plausible. The tension is real.

We had good evidence. Interviews were clear. Quant confirmed it. Playback sessions felt aligned. But the moment we shared our findings with the wider team, the temperature changed. One stakeholder frowned. Another asked, “Where’s the commercial angle?”

It wasn’t confusion , it was misalignment. The findings didn’t land. Not because they were flawed, but because they couldn’t be read.

At the time, we assumed this was a presentation problem. In hindsight, it was a listening problem , not with our users, but with the organisation itself. We had treated the business as a backdrop. Static. Not something in motion, with its own anxieties and codes. We had done research, but failed to translate it.

This article explores how stakeholder listening became a method , not for appeasement, but for sensemaking. And why triangulation doesn’t just mean using multiple tools , it means listening across systems that don’t always speak the same language.

Step 1: The wrong insight , or the wrong context?

In this fictional case, the brief was clear: users needed faster access to a diagnostic tool. The product team had shaped the roadmap around that. But once in field, our interviews told a different story. These users , clinicians working with vulnerable patients , didn’t want speed. They wanted validation. To cross-check, consult peers, follow their instincts.

We brought that insight back confidently. Quotes, behaviours, even a mapped-out moment where one doctor paused to rewatch a training module mid-task. But our synthesis landed with a thud.

It took a tense follow-up session to uncover what we’d missed: a new pricing model was in play. “Speed” wasn’t about UX , it was about cost-per-user. Fewer steps meant fewer calls to support. The goal wasn’t task flow, it was operational savings.

We hadn’t failed to understand the user. We’d failed to understand the organisation.

Step 2: Listening upstream , a second round of research

After that moment, the team did something unplanned. They paused external fieldwork and re-entered the business. Not to get approval. To investigate.

They ran a second round of interviews , this time with internal teams. Product, legal, customer support, finance. Not stakeholder check-ins, but structured conversations. What were people worried about? Where had previous efforts gone wrong? What did they really think research could change?

This second wave revealed patterns:
- Product managers wanted user-centred change , but not if it meant shifting delivery dates.
- Legal teams feared ambiguity in insight statements , they needed defensibility.
- Support leads were burnt out, and saw new UX flows as potential ticket generators.
What emerged weren’t feature requests. They were thresholds. Unspoken constraints. The team realised they’d been treating internal interviews as logistics. In truth, they were a second dataset.

This was triangulation, but not across methods. Across perspectives. Internal and external evidence, layered , not to validate, but to surface contradiction.

Step 3: Translation without appeasement

One phrase from a delivery lead stayed with the team: “We just need confidence.” At first, it sounded like obstruction. But in follow-up, they understood it differently. Confidence, here, meant traceability. The ability to stand by a decision if challenged internally or legally.

This realisation reshaped their approach. In their final insight document, they added a new section: crosswalks. Each key finding was accompanied by links to roadmap priorities, risk concerns, and compliance constraints. Not to dilute the user voice , but to protect it in unfamiliar territory.

This didn’t mean conceding to the business. It meant helping insights survive the journey.

Stakeholder listening, in this sense, wasn’t about pleasing people. It was about translation , making insight legible in a space where multiple value systems operate. And resisting the urge to resolve contradiction too early.

Personal reflections:

Writing this fictional scenario helped me test my own assumptions. I’ve sometimes approached stakeholder interviews as background , necessary for alignment, but outside the frame of the research itself.

What I now see more clearly is that internal listening is interpretive work. It’s not only about extracting priorities, but about sensing where language falters , where a stakeholder can’t quite name what’s at stake, and the researcher’s task becomes one of translation.

The cost, in this case, is letting go of the idea that research speaks for itself. It doesn’t. It needs help crossing thresholds , not just from insight to decision, but from meaning to meaning.
August 22, 2025