llms.txt & llms-full.txt – how to protect your website from AI content scraping
VeröffentlichtKategorie: SEO (Search Engine Optimization)
Veröffentlicht am 23.09.2025
llms.txt & llms-full.txt – protecting and controlling your content for AI crawlers
Status: September 2025
AI is changing how web content is used. With llms.txt and llms-full.txt, two new standards aim to control how Large Language Models (LLMs) like ChatGPT, Perplexity or Google Gemini access your content. While llms.txt acts like a bouncer deciding who may access your content, llms-full.txt provides a structured overview of which content is available to LLMs.
What are llms.txt and llms-full.txt?
- llms.txt: similar to
robots.txt, but for AI crawlers. You define which areas may be used or should be blocked for AI systems. - llms-full.txt: a structured content overview for LLMs – comparable to a sitemap plus metadata like title, license and last updated date. The goal is fair, clean reuse of your content.
Both files must be placed in the root directory of your domain, e.g.: https://www.your-domain.com/llms.txt and https://www.your-domain.com/llms-full.txt.
llms.txt – your gatekeeper for AI crawlers
With llms.txt, you control which AI providers may crawl your content. Some crawlers already respect these rules. For you as a website owner, that means: more control, less wild-west scraping.
Minimal protection – allow only blog and glossary
# /llms.txt
User-agent: *
Allow: /blog/
Allow: /glossar/
Disallow: /intern/
Disallow: /uploads/
Granular control by provider
# Block Perplexity completely
User-agent: Perplexity
Disallow: /
# OpenAI may only read the blog section
User-agent: OpenAI
Allow: /blog/
Disallow: /
# Default rule for everyone else
User-agent: *
Allow: /blog/
Allow: /glossar/
Disallow: /
Block everything
User-agent: *
Disallow: /
Note: llms.txt is voluntary. If a crawler ignores it, you need server-side measures such as IP blocks or firewalls.
llms-full.txt – structured data for fair attribution
llms-full.txt provides LLMs with a complete, clearly structured overview of the content you allow. This increases the chance that AI systems cite and link correctly instead of only summarizing.
Example structure
# /llms-full.txt
Version: 1.0
Domain: https://www.your-domain.com
Generated: 2025-09-23T20:00:00Z
Entry:
URL: https://www.your-domain.com/blog/wordpress-vs-webflow/
Title: WordPress vs. Webflow – which CMS is better?
Author: Denise Jung
Summary: A practical comparison focused on performance, maintenance and ownership.
Category: Blog / CMS
Last-Modified: 2025-09-10
License: Copyright © 2025 your-domain.com, All rights reserved
Entry:
URL: https://www.your-domain.com/glossar/ai-overview/
Title: AI Overview (Google) – benefits and problems
Summary: Explanation of AI Overviews, impact on publishers, risks and workarounds.
Category: Glossary / AI
Last-Modified: 2025-06-02
License: CC BY-NC-ND 4.0
Field explanation
- URL: full address of the content.
- Title: clear title, ideally not shortened.
- Summary: optional short text for LLMs.
- Last-Modified: date of the last change.
- License: legal statement, e.g. “All rights reserved” or Creative Commons.
Combining llms.txt and llms-full.txt
The best approach is using both files: llms.txt controls access, and llms-full.txt provides optimized, structured metadata for allowed crawlers.
Example combination
# llms.txt
User-agent: Perplexity
Disallow: /
User-agent: OpenAI
Allow: /blog/
Disallow: /
User-agent: *
Allow: /blog/
Allow: /glossar/
Disallow: /
# llms-full.txt
Version: 1.0
Domain: https://www.your-domain.com
Entry:
URL: https://www.your-domain.com/blog/wordpress-vs-webflow/
Title: WordPress vs. Webflow – which CMS is better?
Author: Denise Jung
Last-Modified: 2025-09-10
License: Copyright © 2025 your-domain.com
Server hardening against unwanted bots
If a crawler ignores llms.txt, you can block it server-side. Examples for Apache and NGINX:
Apache (.htaccess)
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (?i)perplexity
RewriteRule ^ - [F]
</IfModule>
NGINX
if ($http_user_agent ~* "perplexity") {
return 403;
}
Important: user agents can be spoofed. This is only a first layer.
Best practices
- Place both files in the root (
/llms.txtand/llms-full.txt). - Update metadata in
llms-full.txtregularly (e.g. Last-Modified). - Use clear licenses for important content.
- Monitor server logs to detect suspicious activity early.
FAQ
Do these files affect Google rankings?
No. Currently (September 2025), llms.txt and llms-full.txt do not directly impact SEO rankings. They are only for controlling AI crawlers.
Do I need both files?
Yes, ideally: llms.txt controls access; llms-full.txt provides clean metadata for attribution.
Isn’t robots.txt enough?
No. robots.txt is for classic search crawlers, not LLMs. llms.txt is designed specifically for AI systems.
Conclusion
With llms.txt and llms-full.txt, you now have tools to regain control over how AI systems use your content. llms.txt decides who may access, and llms-full.txt delivers machine-readable metadata – increasing the chance of fair attribution and correct citations.
Need help implementing this?
I set up llms.txt and llms-full.txt professionally – including monitoring and protection against AI crawlers.
Request a consultation