What is Llms.txt File and What Does It Do?
With the rapid development of artificial intelligence technologies, Large Language Models (LLM) are transforming how users access information. AI-powered systems such as ChatGPT, Google Gemini, Claude, and Perplexity aim to process complex web content and provide users with faster, more accurate, and contextualized answers. However, in this process, some obstacles may arise for artificial intelligence due to complex HTML structures, unnecessary codes, and robots.txt blocking on websites. This is where the llms.txt file, which has recently become a hot topic, comes into play.
In this guide, we will cover important points such as what llms.txt is, how it works, the differences between it and robots.txt, how to create and integrate it correctly, and why it should be monitored regularly. Let's take a look at llms.txt.
What is Llms.txt File?
Llms.txt is a special text file that allows websites to be understood more effectively by artificial intelligence systems and large language models. By being located in the root directory of your website, this file helps artificial intelligence systems such as ChatGPT, Google Gemini, Claude, and Perplexity to process the content of your site more accurately and efficiently.
Emergence of Llms.txt
The Llms.txt format is thought to have emerged as a result of the inadequacy of traditional web standards for artificial intelligence systems. This standard, proposed by Jeremy Howard in September 2024, aims to make more efficient use of web content by artificial intelligence systems. This is based on the fact that the complex structure and large size of HTML pages make it difficult for artificial intelligence systems to make sense of the content. The fact that the llms.txt file has been on the agenda recently may also be linked to Answer.AI's efforts to increase brand awareness. What will happen in the future and where it will evolve is a matter of curiosity. At the same time, there is no information yet that Google supports this file.
What Does Llms.txt Do?
The Llms.txt file helps large language models to better understand and process your website. This file summarizes the important content of your website to guide AI models and enable them to provide more accurate and effective responses to user queries.
As you may already know, web content is HTML and often contains complex structures, navigation menus, advertisements, JavaScript, and so on. This makes it difficult for large language models to access and make sense of the content. The Llms.txt file aims to eliminate this complexity and provide AI models with simple, clear, and processable data.
What are the Differences Between Llms.txt and Robots.txt?
Llms.txt and robots.txt are files that help optimize websites for different purposes. Both are located in the root directory of the website and have a machine-readable structure. However, their intended use and target audiences are different today. Perhaps in the future, it may be possible in a situation where it can be integrated into the robots.txt file. Let's examine the main differences between these two files by grouping them.
1. Purpose of these Files
- Llms.txt:
-Allows large language models to better understand the content of your website.
- It presents the most important content of your site to artificial intelligence systems in a simple and clear format.
- Goal: Artificial intelligence optimization (GEO - Generative Engine Optimization) and knowledge presentation.
- Robots.txt:
- Controls how search engine bots crawl your site.
- Allows or prevents certain pages or indexes from being crawled or indexed.
- Goal: To make pages easier to discover by providing a crawlable structure within the scope of search engine optimization.
2. Target Audience of the Files
- Llms.txt:
- ChatGPT targets large language model-based AI systems such as Google Gemini, Claude, and Bing AI.
- Robots.txt:
- It targets search engine bots such as Google, Bing, and Yandex.
3. Target audience of the files
- Llms.txt:
- It is prepared in Markdown format and can be easily read by both humans and machines.
Example:
- The Markdown format allows AIs to process content faster.
- Robots.txt:
- It is written in a simple text file format and offers bots-specific browsing rules.
Example:
- Tells bots which pages can be crawled or blocked.
4. How These Files Relate to SEO
- Llms.txt:
- By providing artificial intelligence optimization (GEO), it enables LLM-based systems to explore to increase visibility.
- Robots.txt:
- It is part of SEO, but it provides redirection by controlling how search engines crawl your site.
Things to Consider When Creating a Llms.txt File
The points to be considered when creating the Llms.txt file are very important for the file to be processed correctly and effectively by artificial intelligence systems. In particular, writing the file in Markdown format and ensuring that the content is clear, organized, and comprehensible allows large language models to easily understand this file. Let's take a look at the basic elements that should be considered when creating a llms.txt file.
- Markdown formatını kullanarak sade ve anlaşılır bir dosya oluşturun.
- Create a simple and clear file using the Markdown format.
- Include only important content and avoid unnecessary details.
- Avoid complex structures such as HTML or JavaScript.
- Include up-to-date, accurate, and descriptive information.
- Present optional and secondary content in a separate section.
- Take care not to provide conflicting information with robots.txt.
- Refresh your llms.txt file as your website is updated.
Contents of Llms.txt File
The content of the llms.txt file aims to present your website's most important information, pages, and documents in a clear and organized way to large language models. Preparing content correctly and effectively makes it easier for artificial intelligence to understand your site and helps it produce more accurate answers to user queries. The content of the llms.txt file should include the following sections.
- H1 Title: Project or site name must be included.
- Excerpt Block: Contains a summary of the project and key information.
- Detailed Information: Paragraphs or lists with more information about the project.
- Link Lists: URLs to relevant documents or resources are provided. Each link can optionally be accompanied by a short description.
In addition, we would like to state that you can add additional information by the directive.
How to Integrate Llms.txt File?
The process of integrating the Llms.txt file into your website involves preparing the file in the correct format, uploading it to the correct directory and testing its accessibility. You can follow the steps below to successfully integrate the Llms.txt file.
- Prepare the llms.txt file in Markdown format.
- Upload the file to the root directory of your website.
- Add a reference to the robots.txt file.
- Check the accessibility of the file in the browser.
- Define the necessary permissions for AI bots to detect the file.
- Regularly update llms.txt and test it with validation tools.
How to Create Llms.txt in WordPress?
If you are a WordPress site owner, you can use the llms.txt file manually by adding it to the public_html folder. If you cannot do this, you can also create llms.txt file with the help of the plugin in WordPress.
Why is it important to follow up on incoming requests after the Llms.txt process?
After integrating the llms.txt file into your website, monitoring requests from AI-powered systems is critical to assessing the impact of the file and understanding whether it is working correctly. This process goes beyond just checking that the file exists. It also allows you to understand how the information provided through the file is being used, which AI bots are accessing it, and how traffic to your site is being affected.
Resources
https://medium.com/towards-data-science/llms-txt-414d5121bcb3