Information Gain: The Patent You Know, the Patent You Don’t, and What Actually Matters

Mike FriedmanMarch 24, 2026

“Information gain” has become one of the more useful concepts in SEO content strategy. The idea is intuitive: pages that say something the other ranking pages don’t have a structural advantage. Google can measure novelty, so content that adds something new to the conversation should perform better than content that restates what’s already out there.

Most SEOs who talk about information gain have the right mental model. Where it gets interesting is that the mechanics behind it are more specific than most people realize, and there’s a second patent that adds an entirely different dimension that almost nobody in the SEO space is discussing.

The Patent Most SEOs Are Actually Describing

When SEOs talk about information gain as a content strategy, what they’re describing, whether they know it or not, is the logic behind US8140449B1, “Detecting Novel Document Content.” Filed in 2006, granted in 2012. Inventors are M. Bharath Kumar and Krishna Bharat.

This patent operates at the corpus level. It scores documents based on how much novel content they contain relative to all other documents on the same topic. This is the “does this page say something the other 500 pages on this topic don’t?” mechanism.

It was originally designed for Google News, where multiple outlets cover the same story and the system needs to surface articles that break new ground rather than repeat what’s already been reported. But the mechanics apply to any set of documents on a shared topic.

Most SEOs have the concept right. But the patent’s mechanics are more granular than the general advice suggests, and that granularity is where the real strategic value is.

How the 2006 Patent Actually Works

The system breaks documents into two components: information nuggets and interactions.

Information nuggets are specific, trackable units of information. Named entities (people, places, products, concepts), numbers, and word sequences from the document title. These are concrete. “Server performance can vary” is not a nugget. “A 2GB RAM allocation on a Paper server handles approximately 15 concurrent players before TPS drops below 18” contains multiple nuggets that likely don’t exist in other pages on the same topic.

Interactions are pairs of nuggets appearing in close proximity. The system doesn’t just track whether an entity appears in a document. It tracks whether entities appear in relationship to each other. A page that mentions “Prince Charles” and “Camilla Parker-Bowles” separately scores differently than one that connects them in the same sentence. Relationships between entities matter, not just their presence.

Each nugget gets scored using TF-IDF. A nugget that appears in only one or two documents in the corpus scores higher than one that appears in most of them. Rarity drives the score.

Here’s the part that matters for content structure: the patent includes a depth-weighting mechanism. Matches closer to the top of the document are weighted more heavily than those buried deep in the text. Where you place novel information changes its contribution to the score. Front-loading unique content isn’t just good writing advice. It’s structurally rewarded under this patent’s logic.

One important caveat. The patent has no truth-verification layer. Novelty is measured structurally and statistically. Whether the novel content is accurate, relevant, or fabricated is not evaluated by this system. That’s presumably handled by other signals in Google’s ranking system.

The Patent Most SEOs Don’t Know About

The patent that actually gets cited most often in SEO articles about information gain is US11720613B2, “Contextual Estimation of Link Information Gain.” Filed in 2018, granted in 2023, with continuation patents granted through 2024.

It describes something fundamentally different from what most SEOs mean when they talk about information gain.

This patent is about session-level personalization. It tracks what an individual user has already viewed during a search session, then re-ranks the remaining results to surface documents with the most new information relative to what that specific user already clicked on. It’s about sequencing results for one user, not scoring whether a page has unique content compared to the rest of the web.

The mechanics: a user searches, clicks a result, reads it, comes back to the SERP. The system now knows what that user consumed. It recalculates scores for the remaining results and promotes the ones that would provide the most new information given what the user already saw. Every subsequent click triggers another recalculation.

Two scenarios from the patent worth noting. First, pogo-sticking. A user bouncing quickly back to results could cause Google to re-rank and surface content with higher information gain scores for that user. Second, follow-up searches. When a user makes a slightly modified query, Google could factor in what they already consumed to avoid serving redundant results.

The patent was also written partly in the context of automated assistants and chatbots that read content aloud. In that context, the system skips redundant information entirely. If Document 1 covered causes A and B of an error, the assistant only reads cause C from Document 2. That context may be relevant to how AI Overviews select and present information from web pages.

This patent isn’t directly about content strategy the way the 2006 patent is. But it suggests that Google is thinking about information redundancy at the individual session level, not just the corpus level. That’s a layer most SEOs haven’t factored in.

What the Mechanics Actually Mean for Content

The general advice of “add unique information” is correct. But the 2006 patent’s specifics sharpen it considerably.

Audit the SERP before writing. Read the top 5 to 10 results for your target keyword. Note the entities they all mention, the points they all cover, the examples they all use. That’s your baseline. Table stakes. You need enough of it for topical relevance, but it won’t differentiate you.

Find what’s missing. What do you know from experience, client work, or industry knowledge that none of those pages cover? What subtopics do they skip? What specific data, tools, or examples could you include that they don’t? That’s where your information gain lives.

Lead with the novel content. Don’t open with five paragraphs restating what every other page says and then bury your unique insight in paragraph twelve. The 2006 patent explicitly weights content earlier in the document more heavily. Put your differentiated information up front.

Use specific entities. Names, numbers, tools, methodologies, data points. These are trackable nuggets under the patent’s framework. Vague statements are not. “SEO tools can help” contains zero scorable nuggets. A specific tool name with a specific use case does.

Connect novel entities to the core topic. A random unique fact with no relationship to the topic’s central entities doesn’t score well. The patent measures interactions, meaning proximity between nuggets. Novel entities need meaningful connections to the entities already relevant to the query.

Draw from sources outside the SERP. If research consists only of reading the pages that already rank, the output will be another version of those pages. The SERP is where the redundant information lives. Novelty comes from experience, practitioner knowledge, original data, industry forums, conference talks, product documentation. Somewhere other than the first page of Google.

How I Have Used Information Gain to Solve Indexing Issues

In a note I shared last year, I discussed how I helped a site facing the dreaded “Crawled – currently not indexed” dilemma. You can read that note here.

In that project, we reworked the content structure, and part of what we were doing was focusing on information gain. You can see in that note that on those pages we added a section called “Real World Examples”. That was something that other sites were not doing.

In the FAQ section, we looked for information gain opportunities by scouring PAA questions and branching off of those and brainstorming other ideas. We also conducted interviews of sales people to pull from their real world knowledge and experiences.

The results were that out of 106 pages that had fallen out of the index, 100% of them were included back in the index after the changes we made.

As you can read in the note, this wasn’t the only change we made to those pages, but it was a significant one.

The One Question to Ask Before Publishing

If someone already read the other pages ranking for your target keyword, would your page teach them something new?

If the answer is no, you’re not adding information. You’re adding to the pile.

Tools I Use:

🔎 Semrush – Competitor and Keyword Analysis

✅ Monday.com – For task management and organizing all of my client work

📄 Frase – Content optimization and article briefs

📈 Keyword.com – Easy, accurate rank tracking

🗓️ Akiflow – Manage your calendar and daily tasks

📊 Conductor Website Monitoring – Site crawler, monitoring, and audit tool

👉 SEOPress – It’s like Yoast, if Yoast wasn’t such a mess.

Sign Up So You Don't Miss the Next One:

vector representation of computers with data graphs