Entity Disambiguation: How Google Figures Out Which “Apple” You Mean

Search for “John Williams” and Google has a problem. There’s the composer who scored Star Wars. There’s a professional wrestler. There’s a venture capitalist. Same name, three different people, plus an unknown number of less famous John Williamses.

Search for “Python” and the problem repeats. The programming language, the snake, the Monty Python comedy troupe. Search for “Mercury” and you could mean the planet, the element, the car brand, or the musician. Search for “Apple” and you could mean the company or the fruit.

This is the entity disambiguation problem, and solving it is core to how modern search works. Google has to figure out which specific entity you mean, and which specific entity each page is about, before it can match the two. This isn’t new, and it isn’t speculative. Google has held patents on it for years, and a major change to the Knowledge Graph in 2025 shows they’re investing in it harder than ever.

Named Entities Are Most of Search

A named entity is a thing with a proper name. A person, a place, a company, a product, an organization. Not “coffee shop” but “Starbucks.” Not “president” but “George Washington.”

These aren’t an edge case. According to Microsoft research cited in Google’s own patent work, 20 to 30% of queries submitted to search engines are themselves named entities, and around 71% of queries contain a named entity. Most of what people search for involves a specific person, place, or thing, not a generic concept.

That makes disambiguation essential. If most queries involve named entities, and many of those names are shared by multiple entities, then a search engine that can’t tell the entities apart is matching strings, not meaning. Google moved past string matching a long time ago.

What the Patent Actually Describes

The foundational patent is US9135238B2, “Disambiguation of named entities,” filed by Google in June 2006 and granted in September 2015. The inventors, Razvan Bunescu and Marius Pasca, also published an academic paper on the same approach in 2006, so we have a clear picture of the thinking.

The patent describes disambiguating named entities using a knowledge base of articles about those entities. At the time, the knowledge base was Wikipedia, referred to in the patent as an “exemplary knowledge base.” Today the equivalent is Google’s Knowledge Graph, supplemented by Wikidata and Wikipedia.

The system builds a disambiguation scoring model from several features of that knowledge base:

  • Article titles that identify specific entities
  • Redirect pages that map aliases to a canonical entity (Mark Twain redirects to Samuel Clemens)
  • Disambiguation pages that list the different senses of an ambiguous name
  • Hyperlinks between articles, which establish context and relationships
  • Categories assigned to each entity

When a query contains an entity name, the system uses the scoring model to identify which article, and therefore which specific entity, the name most likely refers to, based on the other context present. It can then group or organize results by the correct sense of the name.

The key insight is that an entity’s identity is established by its context. How it’s linked, what it’s linked to, what categories it belongs to, and what other entities appear alongside it.

Two Directions of the Same Problem

A later Google patent on entity metrics draws a useful distinction between two related problems.

Differentiation is the many-to-one case. Multiple names refer to a single entity. “George Washington,” “Geo. Washington,” “the first U.S. president,” and “General Washington” all point to the same person. Google needs to recognize that these different strings resolve to one entity.

Disambiguation is the one-to-many case. A single name refers to multiple possible entities. “Georgia” could be the U.S. state or the country. “New York” could be the city or the state. “John Williams” could be any of several people. Google needs to recognize which specific entity is meant in a given context.

Both problems are solved the same way: through context and through connection to a knowledge base where each entity has a unique identity. In the Knowledge Graph, that unique identity is an entity ID. In Wikidata, it’s a Q-number (the entity for Portugal is Q45, machine learning is Q2539). These identifiers let Google reference a specific entity unambiguously, even when the human-readable name is shared by many things.

How Google Tells Entities Apart

A 2017 Google patent on knowledge-based entity detection describes tagging entities in web pages with unique identifiers that unambiguously identify them, and attaching a confidence score to each disambiguation. The signals that drive that confidence come down to context, and the strongest contextual signal is the other entities present.

A page that mentions Apple alongside iPhone, Tim Cook, and Cupertino is clearly about the company. A page that mentions apple alongside orchard, harvest, and pie is clearly about the fruit. Neither page has to state which one it means. The co-occurring entities resolve the ambiguity.

This is the practical core of disambiguation. Google identifies the candidate entities a name could refer to, then uses the surrounding context to score which candidate is most likely, then assigns the page to that entity. The more clearly your context points to one specific entity, the higher the confidence and the less room for Google to guess wrong.

Why This Matters More Now

Entity disambiguation has long been part of how Google works, but the company has recently made it a visible priority.

According to Knowledge Graph tracking from Kalicube, published by its founder Jason Barnard in Search Engine Land, Google ran what’s been called a “clarity cleanup” in June 2025. Over two updates in a single week, the Knowledge Graph contracted by 6.26%, removing more than 3 billion entities. It was the largest contraction in a decade. Ambiguous “Thing” entities dropped by around 15%, and temporary event entities (many added during the pandemic) were hit hardest, with close to 77% removed.

The interpretation, which Google hasn’t officially confirmed, is that this was a deliberate move to trade volume for clarity. A leaner, higher-confidence set of entities to underpin AI features like AI Overviews and AI Mode. The logic tracks. If AI-generated answers are going to cite and rely on entities, those entities need to be unambiguous and well-defined. Ambiguous entities are a liability when a system is generating direct answers rather than just ranking links.

The practical takeaway: if Google is prioritizing high-confidence, clearly disambiguated entities, then content tied to vague or poorly defined entities is at a disadvantage. Clear entity identity is no longer just a ranking nicety. It’s increasingly a requirement for visibility in both traditional and AI search. This connects to the AirOps study covered a few weeks ago, where entity recognition correlated with citation likelihood in AI search.

What You Can Actually Do

Disambiguation is something you can influence. The goal is to make Google’s job easy by removing ambiguity about which entity your content is about. A few concrete things help.

Surround your entity with related entities. This is the strongest signal and the easiest to apply. If your page is about your software company, make sure the page co-occurs with the entities that establish that context: your product names, your category (project management, email marketing, whatever it is), your integration partners, your competitors, your founders. Don’t leave Google to guess from a bare brand name.

Use the sameAs property in your schema. Link your entity to its authoritative profiles: Wikipedia, Wikidata, LinkedIn, Crunchbase, official social accounts. This gives Google a verification anchor that removes ambiguity directly. Instead of inferring which entity you are, Google gets an explicit statement that the entity on your site is the same as a specific, known entity elsewhere. This is the entity-linking layer that templated plugin schema doesn’t handle, and it’s worth doing by hand for your most important pages.

Match your schema type to your entity. A page with Organization schema declaring a company is read differently than a page with Recipe schema mentioning an ingredient. The schema type sets expectations that help Google interpret the content correctly.

Get a Wikidata entry if your entity qualifies. A Wikidata item with a Q-number gives your entity a machine-readable identity in a database Google and the major AI systems all use. Once it exists, your sameAs schema can point directly to it. Note that Wikidata has notability requirements and entries that don’t meet them get deleted, so this applies to entities with genuine independent coverage, not every small business.

Be consistent with naming and build external corroboration. Use the same entity name consistently across your site and your off-site presence. Google builds confidence in an entity’s identity when multiple trusted sources corroborate the same facts. Unlinked brand mentions, industry directory listings, and coverage in credible publications all reinforce entity recognition, even without links.

The Takeaway

The same-name problem isn’t new and it isn’t going away. Google has been working on entity disambiguation since at least 2006, and the 2025 Knowledge Graph cleanup shows the company prioritizing clear, high-confidence entities for the AI era rather than backing off.

Your job is to make the disambiguation easy. Establish clear context around your entities. Connect them explicitly to the knowledge bases Google already trusts. Be consistent. The less Google has to guess about which entity your content is about, the more reliably it can match your content to the right searches, in traditional results and in AI answers alike.

When Google encounters your “Apple,” there should be no question which one you mean.

Sign up for weekly notes straight from my vault.
Subscription Form (#5)

Tools I Use:

๐Ÿ”Ž ย Semrush – Competitor and Keyword Analysis

โœ…ย  Monday.com – For task management and organizing all of my client work

๐Ÿ“„ย  Frase – Content optimization and article briefs

๐Ÿ“ˆย  Keyword.com – Easy, accurate rank tracking

๐Ÿ—“๏ธย  Akiflow – Manage your calendar and daily tasks

๐Ÿ“Šย  Conductor Website Monitoring – Site crawler, monitoring, and audit tool

๐Ÿ‘‰ย  SEOPress โ€“ Itโ€™s like Yoast, if Yoast wasnโ€™t such a mess.

Sign Up So You Don't Miss the Next One:

vector representation of computers with data graphs
Subscription Form (#5)

Past tips you may have missed...