robots.txt is a simple text file placed at the root of a website that gives crawlers instructions about which parts of the site they are allowed to access. It uses the Robots Exclusion Protocol, with rules like User-agent and Disallow, to guide how search engine bots and other automated crawlers behave when they visit your site. In practice, it is one of the first files many crawlers check before exploring URLs.
The important nuance is that robots.txt controls crawling, not indexing, and it is not a security mechanism. If you block a URL in robots.txt, a search engine may not be able to crawl the page, but the URL can still sometimes appear in search results if it is discovered through links or other signals, just without full content details. To keep things clear, use robots.txt to prevent unnecessary crawling of low value areas, and use noindex when you need a page kept out of results. Always test changes carefully, because an overly broad rule can accidentally block important sections of your site.