Google’s Open Knowledge Format: What It Is, and Who It’s Actually For

Mike FriedmanJune 23, 2026

Google Cloud launched the Open Knowledge Format on June 12, 2026, and the takes overheated within days. One camp is calling it the most important AI infrastructure of the year. Another is already repackaging it as an SEO or AI-visibility tactic. Both are wrong, in opposite directions.

The reality is narrower.

OKF is a format, not a platform and not a ranking mechanism, for representing your organization’s internal knowledge as markdown files that AI agents can actually read and use.

If you run a real internal knowledge base, it’s worth understanding.

If someone is selling it to you as your next AI-SEO play, it isn’t one. They are selling you snakeoil.

What OKF Actually Is

OKF is an open specification, version 0.1, published by Google Cloud’s Data Cloud team under the Apache 2.0 license. Strip away the announcement language and the format is simple.

A bundle is a directory of markdown files with YAML frontmatter.

(YAML (which stands for “YAML Ain’t Markup Language”) is a highly readable, human-friendly data serialization language. It is primarily used to write configuration files, automate workflows, and exchange data between systems.)

Each file represents one concept, and the file’s path is its identity. The only required field is type. Optional fields like title, description, tags, and timestamp exist for the things you want to query or filter on. The body is plain markdown.

Concepts link to each other with normal markdown links, which turns the directory into a graph of relationships rather than a flat list.

That’s the whole idea. No SDK, no runtime, no proprietary account. A bundle is just files: readable in any editor, hostable in any git repo, shippable as a tarball.

It formalizes what people have been calling the LLM-wiki pattern, the one Andrej Karpathy described in an April 2026 gist: agents maintaining a markdown library because LLMs handle the cross-referencing bookkeeping that humans start and then abandon.

The same shape has shown up as Obsidian vaults wired to coding agents, as AGENTS.md and CLAUDE.md convention files, and as “metadata as code” repos inside data teams. OKF’s contribution is agreeing on the small set of conventions that let those bespoke wikis cooperate.

The problem it solves is internal context. The schema of a table, your business’s definition of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API. The knowledge that too often lives scattered across catalogs, wikis, code comments, and the heads of a few senior engineers.

Google updated its Knowledge Catalog to ingest OKF and serve it to agents, but the format itself doesn’t require Knowledge Catalog or any other Google product.

This is knowledge for agents to do work with. It is not web content, and it is not an SEO mechanism.

Cutting Through the Misinformation

Before anything else, here’s what OKF is not, because the wrong ideas are spreading faster than the right ones.

It is not a ranking or AI-citation mechanism. There is no web-search or AI-search system that consumes OKF bundles to decide what to cite. Turning your website into an OKF bundle to “get cited by AI” is a speculative repurposing, and even the people building tools to do exactly that admit you are not buying a ranking.

If you see OKF framed as the new way to show up in AI answers, that’s the hype talking, not to even mention it is too new for anyone to even suggest such a thing.

“Vendor-neutral” comes with an asterisk. The format genuinely is open and Apache-licensed, and Google deserves credit for publishing it that way. But the demonstrated path runs through Google’s own stack: the reference producer uses Gemini, the example data source is BigQuery, and the obvious ingestion point is Google Cloud’s Knowledge Catalog. The format is neutral. The gravity is still Google-shaped.

It standardizes the container, not the meaning. OKF fixes the folder layout, the file format, the frontmatter, and a couple of reserved filenames. It does not standardize what concepts mean.

The type field is required, but type values are not registered anywhere, so one producer writes “BigQuery Table” and another writes “table” for the same thing.

Links assert that two concepts are related but not how. This is structural interoperability, not semantic interoperability, and the spec says so on purpose.

It is version 0.1. Google is explicit that this is “a starting point, not a finished standard.” Treating it as settled infrastructure is premature.

How This Compares to LLMs.txt

If this is starting to sound like llms.txt, that instinct is worth addressing directly. They share real DNA: both formalize the idea of giving language models clean markdown instead of noisy HTML. That is where the similarities end.

The decisive difference is the consumer relationship, and it’s what separates a useful spec from a wishful one.

LLMs.txt is an outward bet on consumers you don’t control. You publish the file and hope external LLM providers honor the convention. For the visibility use case it was sold on, they declined to show up. No major model provider consumes llms.txt for citation in production.

In one analysis, 97% of domains that had the file received zero requests for it in a month. Google’s Gary Illyes said in 2025 that Google doesn’t support it and isn’t planning to, John Mueller compared it to the discredited keywords meta tag, and Google’s own May 2026 guide on generative AI features listed machine-readable files like llms.txt under “mythbusting.”

Its one genuine niche is narrow: developer documentation consumed by AI coding assistants like Cursor and Claude Code, where it saves tokens.

OKF pays off with consumers you control. You build the bundle for your own agents, your own support bot, your own team. The value doesn’t depend on Google or OpenAI adopting a convention. It depends on you wiring your agent to read it. You are both the producer and the consumer, or you choose the consumer.

The contrast makes the point on its own. Google waved off llms.txt for AI search in May, in a section literally labeled mythbusting, then shipped OKF for agent knowledge weeks later. Same company, drawing the same line: between a speculative visibility play and a concrete utility.

One fairness caveat. LLMs.txt is not worthless, and OKF is not guaranteed to win. OKF’s grander pitch, the “lingua franca” vision where bundles flow freely between organizations and tools, also depends on adoption it does not yet have. If nobody outside Google builds OKF consumers, that vision stalls too.

And we all can remember how quickly Google has abandoned initiatives in the past (AMP, Core Web Vitals, etc.)

Between the two, OKF is value you can capture yourself versus value that was always contingent on others (LLMS.txt), who passed.

It’s Concept-Per-File, Not Page-Per-File

Here’s the part most coverage I’m seeing so far gets wrong, and the part that matters most if you actually build one.

OKF is not about turning each of your web pages into a markdown file like LLMs.txt proposed. Instead, you decompose your knowledge into discrete concepts, and each concept gets its own file. A concept is a table, a metric, a runbook, an API, an idea, not a page.

Take a billing system.

The wrong mental model is a single billing.md that mirrors your billing page.

The right model is several concept files: the billing data model, the refund runbook, the dunning metric definition, the failed-payment retry logic, each in its own file, each linked to the others.

An agent then sees not just the concepts but how they connect.

That reframes the work entirely. Building an OKF bundle is a knowledge-modeling exercise, not a content export. You’re deciding what your discrete units of knowledge are and how they relate, which is harder and more valuable than dumping pages into markdown.

For anyone into knowledge work, it reminds me a lot of the concept of atomic notes. Atomic notes is a note-taking method where each note utilizes brief, distinct units of information that contain exactly one idea or concept

Who It’s Actually For

The honest answer is that the use cases are fairly narrow right now, and that narrowness is the point, not a gap.

The core fit is internal agent context. A data team exposing its table schemas, metric definitions, and join paths to its own agents, so an agent asked “how do we compute weekly active users from our event stream” can actually answer. That’s the use case Google built it for, and the reference tooling reflects it.

For most of this audience, two adjacent fits matter more.

The first is a SaaS company with a real knowledge base. If you maintain documentation, runbooks, and product knowledge that internal agents need, OKF is a clean way to structure it for them.

The second is a customer service bot. A support bot is exactly an agent that needs to consume a curated knowledge base, which is the consumer side of OKF. The interesting distinction here is from the typical RAG setup.

Most support bots today embed the knowledge base and retrieve chunks. OKF offers curated, version-controlled concept files the agent reads directly, with the relationships preserved as links. For well-bounded product knowledge, that can beat chunk-and-embed.

The caveat: if it’s purely your bundle feeding your bot, you control both ends, so OKF’s interoperability benefit is muted and what you actually gain is structure, versioning, and portability. You still need a consumer that reads it. The bot doesn’t speak OKF for free.

There’s also a consultant angle worth noting: a knowledge bundle as a portable, version-controlled project deliverable, handed off like code.

Beyond these, the use cases thin out fast. That’s what a v0.1 internal-knowledge format should look like, not a sign you’re missing something.

If You Genuinely Have a Knowledge Base, Here’s What a Bundle Involves

If one of those use cases is yours, here’s the practical shape.

You decompose your knowledge into concepts, one file each. Each file gets type in its frontmatter (and, in practice, more on that below), plus optional title, description, resource, tags, and timestamp. You write the body in plain markdown and cross-link related concepts with normal markdown links.

Two filenames are reserved: index.md for navigation as an agent walks the hierarchy, and log.md for change history.

Google shipped tooling to make this concrete: a reference enrichment agent that walks a BigQuery dataset and drafts a concept document per table, then runs a second pass that crawls authoritative documentation to enrich each concept with citations and join paths; a static HTML visualizer that renders any bundle as an interactive graph in a single self-contained file; and three sample bundles you can browse. These are proofs of concept by design, not the only way to produce or consume OKF.

And the consumer question decides everything. If you’re on Google Cloud, Knowledge Catalog can ingest a bundle today. If you’re not, you wire your own agent or bot to read it, because the format does not consume itself. A bundle sitting in a repo does nothing until something is pointed at it.

The Takeaway

For most SEOs and content people, OKF is something to understand so you can advise clients well and not get sold hype, not something to act on this week. It is not a visibility lever, and nothing about your rankings or AI citations changes because you did or didn’t build a bundle.

But the underlying pattern, clean and typed and cross-linked markdown concepts built for agents to consume, is a reasonable direction, and Google putting weight behind it is worth noting even at v0.1.

If you maintain a genuine internal knowledge base or run a support bot, it’s worth a real look now.

If someone is selling it to you as an AI-SEO tactic, it isn’t one.

The whole game is telling those two situations apart. That’s the line Google itself drew when it dismissed llms.txt for search and shipped OKF for agents in the same month. It’s the line worth holding onto while everyone else picks a camp.

Tools I Use:

🔎 Semrush – Competitor and Keyword Analysis

✅ Monday.com – For task management and organizing all of my client work

📄 Frase – Content optimization and article briefs

📈 Keyword.com – Easy, accurate rank tracking

🗓️ Akiflow – Manage your calendar and daily tasks

📊 Conductor Website Monitoring – Site crawler, monitoring, and audit tool

👉 SEOPress – It’s like Yoast, if Yoast wasn’t such a mess.

Sign Up So You Don't Miss the Next One:

vector representation of computers with data graphs