robots.txt — How to control the crawling of your website

Inhalt des Beitrags:

The robots.txt is a text file in the root directory of a website. It gives search engine crawlers clues as to which areas of the page should be indexed and which should be ignored. The aim is to control crawling activity and avoid unnecessary traffic or duplicate content.

Why is robots.txt important?

Search engines like Google regularly crawl websites to capture their content and save it in the index. The robots.txt helps to set priorities:

  • Protect performance: Large websites can specifically control crawlers to avoid server load.
  • Exclude unnecessary pages: For example, login pages, internal search results, or test environments.
  • Avoid duplicate content: Parameter URLs or print versions can be excluded.

Important: The robots.txt prevents notthat pages appear in the index — it just stops the crawler. If a URL is known, for example, from external links, it can still be listed — but without Page content.

Building a robots.txt

A robots.txt consists of so-called user agents and directives. A simple example:

User-agent: *
Disallow: /intern/ 
Allow: /intern/übersicht.html

explanation:

  • User-agent: * affects all crawlers.
  • Disallow prohibits crawling of the /intern/ directory.
  • Allow makes a specific subpage of it accessible again.

Best Practices

  • robots.txt always in the root directory Drop (example.com/robots.txt)
  • Provide a site mapso crawlers know which pages to index:
    Site map: https://example.com/sitemap.xml
  • Don't “hide” sensitive content via robots.txt — it is publicly available.
  • Don't use for SEO-critical pages: If you want to specifically remove a page from the index, you should use the noindex or X-Robots tag meta tag instead.

Create robots.txt — step by step

Even without developer knowledge, you can use a simple robots.txt Create it yourself. Here's how you go about it:

1. Create text file

Open a simple text editor (e.g. Notepad, VS Code) and create a new file. Save it under the name robots.txt — exactly as it is, without an additional extension.

2. Target crawlers

Determine which crawlers you want to target with the file. User-agent: * means: The rules apply to all search engines.

3. Exclude paths

Specify which areas of your site should not be crawled:

Disallow: /intern/

4. Release individual pages (optional)

If you still want to unblock certain pages within an excluded directory, use Allow:

Allow: /intern/übersicht.html

5. Upload a file

Load the finished robots.txt into Your domain's main directory (root level), e.g.

https://example.com/robots.txt

Only there is it recognized by search engines.

6. Link to a sitemap (recommended)

At the very bottom of the file, you can also enter your sitemap — this also helps search engines:

Site map: https://example.com/sitemap.xml

Note for Webflow users:
In the project settings, Webflow offers under SEO → Custom robots.txt a separate field for this file. You can paste your content directly there — no need to upload it via FTP.

conclusion

The robots.txt is a simple but important tool for SEO and crawling control. Used correctly, it improves the efficiency of search engines while protecting sensitive or irrelevant areas of a website.

What is a robots.txt? - Projekte

No items found.
Ready for a free website check?

Boost your website's effectiveness and get valuable insights to take your online business to the next level.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Kirch & Kriewald in the website check call
Webflow logo
Webflow Professional Partner
Hotjar logo
Hotjar Partner
Weglot logo
Weglot Partner
<script type="application/ld+json"> {"@context “:" https://schema.org “," @type “: “howTo”, “name”: "Create robots.txt for SEO”, “description”: “A step-by-step guide on how to create a simple robots.txt to specifically control the crawling of your website. “, “step”: [{"@type “: “HowToStep”, “name”: “Create 1st text file”, “text”: “Create a new text file called robots.txt on your computer.”}, {"@type “: “HowToStep”, “name”: “2nd address crawler”, “text”: “Define in the file which crawlers you want to address, for example: user-agent: *”}, {"@type “: “HowToStep”, “name”: “Exclude 3. paths”, “text”: “Specify which directories or pages should not be crawled, for example: Disallow: /intern/"}, {"@type “: “HowToStep”, “name”: “4. add exceptions”, “text”: “When individual pages If you want to exclude, use Allow, e.g.: Allow: /intern/übersicht.html "}, {" @type “: “HowToStep”, “5th upload file”, “text”: “Load robots.txt into the root directory of your domain (root level), e.g. https://example.com/robots.txt"}, {"@type “: “HowToStep”, “name”: “6. provide a sitemap (optional)”, “text”: “Add at the end add a link to the site map: sitemap: https://example.com/sitemap.xml"}], “tool”: [{"@type “: “HowToTool”, “name”: “text editor (e.g. Notepad, VS Code)”}, {"@type “: “HowToTool”, “name”: “FTP access or webflow hosting” }], “estimatedTime”: “PT10M” </script>}