fix: escape HTML to fix llms.txt files#266
Conversation
There was a problem hiding this comment.
Code Review
This pull request converts HTML links to Markdown in the documentation and introduces HTML escaping for generated llms.txt files. A critical issue was identified where merge conflict markers were accidentally committed to dotprompt.mdx, which must be resolved. Additionally, the implementation of escapeHtml in src/generate-llms-direct.ts is problematic as it escapes characters within code snippets, potentially hindering LLM readability; it is recommended to handle HTML tags during content processing instead.
| function escapeHtml(text: string): string { | ||
| return text.replace(/</g, '<').replace(/>/g, '>'); | ||
| } |
There was a problem hiding this comment.
The escapeHtml function performs a global replacement of < and >. This will escape these characters even when they are part of valid code snippets (e.g., if (x < y) becomes if (x < y)), which is generally undesirable for llms.txt files intended for LLM consumption. LLMs typically process Markdown and raw code better than HTML entities.
If the goal is to ensure the output doesn't contain raw HTML tags, it is better to convert those tags to Markdown in the source files (as seen in your changes to dotprompt.mdx) or use a more targeted approach to strip HTML tags during the content processing phase in src/utils/content-processor.ts.
There was a problem hiding this comment.
This is a stopgap for now.
i14h
left a comment
There was a problem hiding this comment.
I'm not sure if we should make these changes only for Antigravity. Other code agents seem to be working fine. Lets hold back for now
No description provided.