Was ist die llms.txt und wofür wird sie verwendet?

Die llms.txt steuert, welche KI- und LLM-Crawler auf deine Inhalte zugreifen dürfen. Sie funktioniert ähnlich wie die robots.txt, jedoch speziell für KI-Systeme.

Was ist die llms-full.txt und wie unterscheidet sie sich von llms.txt?

Die llms-full.txt stellt eine strukturierte Inhaltsübersicht für LLMs bereit – inklusive Titel, Lizenzen und Änderungsdatum. Sie ergänzt die llms.txt, indem sie saubere Attribution ermöglicht.

Wo müssen llms.txt und llms-full.txt abgelegt werden?

Beide Dateien liegen im Root der Domain, zum Beispiel unter https://www.deine-domain.de/llms.txt und https://www.deine-domain.de/llms-full.txt.

Brauche ich beide Dateien oder reicht eine davon?

Empfehlung: Beide einsetzen. Die llms.txt regelt den Zugriff, während die llms-full.txt maschinenlesbare Metadaten für korrekte Zitation bereitstellt.

Haben llms.txt und llms-full.txt Auswirkungen auf SEO?

Nein. Stand September 2025 beeinflussen beide Dateien keine Rankings. Sie dienen ausschließlich zur Kontrolle und Transparenz für KI-Crawler.

Halten sich alle KI-Crawler an die Regeln in llms.txt?

Nicht zwingend. Seriöse Anbieter respektieren die Datei, jedoch ist der Standard freiwillig. Bei Missachtung sind serverseitige Sperren sinnvoll.

Beitragsbild zu llms.txt & llms-full.txt: protection against AI content scraping

llms.txt & llms-full.txt – how to protect your website from AI content scraping

Veröffentlicht

Kategorie: SEO (Search Engine Optimization)

Veröffentlicht am 23.09.2025

Autor: Denise Hollstein – Webentwicklung & SEO

llms.txt & llms-full.txt – protecting and controlling your content for AI crawlers

Status: September 2025

AI is changing how web content is used. With llms.txt and llms-full.txt, two new standards aim to control how Large Language Models (LLMs) like ChatGPT, Perplexity or Google Gemini access your content. While llms.txt acts like a bouncer deciding who may access your content, llms-full.txt provides a structured overview of which content is available to LLMs.

What are llms.txt and llms-full.txt?

llms.txt: similar to robots.txt, but for AI crawlers. You define which areas may be used or should be blocked for AI systems.
llms-full.txt: a structured content overview for LLMs – comparable to a sitemap plus metadata like title, license and last updated date. The goal is fair, clean reuse of your content.

Both files must be placed in the root directory of your domain, e.g.:
https://www.your-domain.com/llms.txt and https://www.your-domain.com/llms-full.txt.

llms.txt – your gatekeeper for AI crawlers

With llms.txt, you control which AI providers may crawl your content. Some crawlers already respect these rules. For you as a website owner, that means: more control, less wild-west scraping.

Minimal protection – allow only blog and glossary

# /llms.txt
User-agent: *
Allow: /blog/
Allow: /glossar/
Disallow: /intern/
Disallow: /uploads/

Granular control by provider

# Block Perplexity completely
User-agent: Perplexity
Disallow: /

# OpenAI may only read the blog section
User-agent: OpenAI
Allow: /blog/
Disallow: /

# Default rule for everyone else
User-agent: *
Allow: /blog/
Allow: /glossar/
Disallow: /

Block everything

User-agent: *
Disallow: /

Note: llms.txt is voluntary. If a crawler ignores it, you need server-side measures such as IP blocks or firewalls.

llms-full.txt – structured data for fair attribution

llms-full.txt provides LLMs with a complete, clearly structured overview of the content you allow. This increases the chance that AI systems cite and link correctly instead of only summarizing.

Example structure

# /llms-full.txt
Version: 1.0
Domain: https://www.your-domain.com
Generated: 2025-09-23T20:00:00Z

Entry:
URL: https://www.your-domain.com/blog/wordpress-vs-webflow/
Title: WordPress vs. Webflow – which CMS is better?
Author: Denise Jung
Summary: A practical comparison focused on performance, maintenance and ownership.
Category: Blog / CMS
Last-Modified: 2025-09-10
License: Copyright © 2025 your-domain.com, All rights reserved

Entry:
URL: https://www.your-domain.com/glossar/ai-overview/
Title: AI Overview (Google) – benefits and problems
Summary: Explanation of AI Overviews, impact on publishers, risks and workarounds.
Category: Glossary / AI
Last-Modified: 2025-06-02
License: CC BY-NC-ND 4.0

Field explanation

URL: full address of the content.
Title: clear title, ideally not shortened.
Summary: optional short text for LLMs.
Last-Modified: date of the last change.
License: legal statement, e.g. “All rights reserved” or Creative Commons.

Combining llms.txt and llms-full.txt

The best approach is using both files: llms.txt controls access, and llms-full.txt provides optimized, structured metadata for allowed crawlers.

Example combination

# llms.txt
User-agent: Perplexity
Disallow: /

User-agent: OpenAI
Allow: /blog/
Disallow: /

User-agent: *
Allow: /blog/
Allow: /glossar/
Disallow: /

# llms-full.txt
Version: 1.0
Domain: https://www.your-domain.com

Entry:
URL: https://www.your-domain.com/blog/wordpress-vs-webflow/
Title: WordPress vs. Webflow – which CMS is better?
Author: Denise Jung
Last-Modified: 2025-09-10
License: Copyright © 2025 your-domain.com

Server hardening against unwanted bots

If a crawler ignores llms.txt, you can block it server-side. Examples for Apache and NGINX:

Apache (.htaccess)

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (?i)perplexity
RewriteRule ^ - [F]
</IfModule>

NGINX

if ($http_user_agent ~* "perplexity") {
  return 403;
}

Important: user agents can be spoofed. This is only a first layer.

Best practices

Place both files in the root (/llms.txt and /llms-full.txt).
Update metadata in llms-full.txt regularly (e.g. Last-Modified).
Use clear licenses for important content.
Monitor server logs to detect suspicious activity early.

FAQ

Do these files affect Google rankings?

No. Currently (September 2025), llms.txt and llms-full.txt do not directly impact SEO rankings. They are only for controlling AI crawlers.

Do I need both files?

Yes, ideally: llms.txt controls access; llms-full.txt provides clean metadata for attribution.

Isn’t robots.txt enough?

No. robots.txt is for classic search crawlers, not LLMs. llms.txt is designed specifically for AI systems.

Conclusion

With llms.txt and llms-full.txt, you now have tools to regain control over how AI systems use your content. llms.txt decides who may access, and llms-full.txt delivers machine-readable metadata – increasing the chance of fair attribution and correct citations.

Need help implementing this?

I set up llms.txt and llms-full.txt professionally – including monitoring and protection against AI crawlers.

Request a consultation

Back to Overview