What are LLMs.txt files and why you should not use them on your website

I shared a little about this on LinkedIn recently, but I have been seeing more and more people suggesting using these files on forums and in Facebook groups without understanding what they really are.

I want to save you from making a big mistake in implementing these files on your website.

Traditional web pages are designed for human readers, incorporating complex HTML structures, navigation menus, advertisements, and interactive elements. These components can hinder LLMs during content extraction and comprehension.

To address this, Jeremy Howard, co-founder of Answer.AI, introduced the llms.txt standard in September 2024.

What are LLMS.txt files?

An llms.txt file is a standardized Markdown document proposed to enhance the way Large Language Models (LLMs) interact with website content. Positioned at the root directory of a website (/llms.txt), it offers a concise, structured summary of the site’s key information, enabling LLMs to process and understand the content more effectively.

Structure of an llms.txt File

An llms.txt file typically includes the following sections, formatted in Markdown:​

  1. Title (H1 Header): The name of the project or website.​
  2. Summary (Blockquote): A brief description of the site’s purpose and key information.​
  3. Detailed Information: Additional paragraphs or lists providing more in-depth insights about the project or site.​
  4. File Lists (Under H2 Headers): Sections containing lists of URLs to important documents or resources, each accompanied by a brief description.​

An example structure might look like:

# Project Name

> Brief description of the project.
Additional details about the project.

## Documentation
 - [API Reference](https://example.com/api.md): Detailed API documentation.
 - [User Guide](https://example.com/user-guide.md): Comprehensive user manual.

 ## Tutorials
- [Getting Started](https://example.com/tutorials/getting-started.md): Introductory tutorial for new users.

## Optional
 - [Changelog](https://example.com/changelog.md): List of recent updates and changes.


The “Optional” section is intended for supplementary information that can be omitted if a shorter context is required. ​

Distinction from Other Web Standards

While files like robots.txt and sitemap.xml serve specific purposes for search engine crawlers, llms.txt is uniquely tailored for LLMs:​

  • robots.txt: Instructs search engine bots on which pages to crawl or avoid, primarily focusing on indexing control.​
  • sitemap.xml: Lists all indexable pages on a site to assist search engines in discovering content.​
  • llms.txt: Provides a curated, structured summary of essential content, specifically designed for efficient LLM consumption.​

This specialization ensures that AI models can access and process the most relevant information without sifting through extraneous data. ​

How It Works:

  1. Create an llms.txt file
    • This file is a structured summary of your website, listing important content and linking to the actual Markdown files that contain your full documentation, guides, or other structured content.
  2. Host Your Content in Markdown
    • Store your actual content in separate Markdown (.md) files and make them publicly accessible on your website.
    • The llms.txt file references these Markdown documents, providing a structured way for LLMs to understand and fetch them.
  3. Upload to Root Directory
    • Place llms.txt in your site’s root directory (https://yourwebsite.com/llms.txt) so LLMs can easily discover and access it.

Do llms.txt files control AI spider behavior?

No, llms.txt files do not currently exist as a standard for controlling AI spider behavior in the same way that robots.txt files control web crawlers.

Why you should avoid using LLMs.txt files

These files are designed to make the content of your website easier for LLMs to digest.

Isn’t that a good idea?

At first glance, it may seem that way, but there is a big problem.

You have to recreate all your content in markdown files (.md) and host those. That isn’t too much trouble. There are plenty of tools out there that you can feed content to and get an output in markdown format. Even ChatGPT will do that for you.

The problem, assuming LLMs adopt this standard, is that now when your content is cited in LLMs like ChatGPT or Perplexity it is going to be a link to the markdown file.

The markdown files contain no reference to the original URL.

Imagine the user experience for someone landing on a page that is all just markdown text.

There is no benefit in this for website owners.

It only benefits the LLMs by making it easier for them to read and absorb your content into their model.

Tools I Use:

🔎  Semrush Competitor and Keyword Analysis

✔  Monday.com – For task management and organizing all of my client work

🗄  Frase – Content optimization and article briefs

📊  Keyword.com – Easy, accurate rank tracking

📆  Akiflow – Manage your calendar and daily tasks

👑  Conductor Website Monitoring – Site crawler, monitoring, and audit tool

📈 SEOPress – It’s like Yoast, if Yoast wasn’t such a mess.

Sign Up So You Don't Miss the Next One:

vector representation of computers with data graphs
Subscription Form

Past tips you may have missed...