We follow the precedent of robots.txt, ads.txt, and llms.txt.
The reason is friction. Platforms like Shopify and Wix make .well-known folders difficult or impossible for merchants to configure. Root files work everywhere.
I prioritize simplicity and adoption for non-technical users over strict IETF compliance right now. My goal is to make this work for a shop owner on Shopify and Wix, not just for sysadmins.
That said, I am open to supporting .well-known as a secondary location in v1.1 if the community wants it.
Wait, am I dumb, or did the authors hallucinate? @INVENTORY says that 42 are in stock, but the text says "Only 3 left". Am I misunderstanding this or does stock mean something else?
Physical stock rarely equals sellable stock. Items sit in abandoned carts. Or are held as safety buffers. If you have 42 items and 39 are reserved, telling the user "42 available" is the lie. It causes overselling.
The protocol allows the developer to define the sellable reality.
Crucially, we anticipated abuse. See Section 9: Cross-Verification.
If an agent detects systematic manipulation (fake urgency that contradicts checkout data), the merchant suffers a Trust Score penalty. The protocol is designed to penalize dark patterns, not enable them.
That is the current standard. But it is hard for agents to read efficiently. To access JSON-LD, an agent must download the entire HTML page. This creates a haystack problem where you download 2MB of noise just to find 5KB of data.
Even then, you pay a syntax tax. JSON is verbose. Brackets and quotes waste valuable context window. Furthermore, the standard lacks behavior. JSON-LD lists facts but lacks instructions on how to sell (like @SEMANTIC_LOGIC). CommerceTXT is a fast lane. It does not replace JSON-LD. It optimizes it.
We map strictly to Schema.org for all transactional data (Price, Inventory, Policies). This ensures legal interoperability.
But Schema.org describes what a product is, not how to sell it.
So we extend it. We added directives like @SEMANTIC_LOGIC for agent behavior. We combine standard definitions for safety with new extensions for capability.
I'm not sure I understand the point of this as opposed to something like a json file, and also, assuming there is any type of structured format, why one would use an LLM for this task instead of a normal parser.
We should stop polluting website roots with these files (including llms.txt).
All these files should be registered with IANA and put under the .well-known namespace.
https://en.wikipedia.org/wiki/Well-known_URI
I understand the theoretical argument.
We follow the precedent of robots.txt, ads.txt, and llms.txt.
The reason is friction. Platforms like Shopify and Wix make .well-known folders difficult or impossible for merchants to configure. Root files work everywhere.
Adoption matters more than namespace hygiene.
How about following the precedent of all of these users of /.well-known/
https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-kn...
robots.txt was created three decades ago, when we didn’t know any better.
Moving llms.txt to /.well-known/ is literally issue #2 for llms.txt
https://github.com/AnswerDotAI/llms-txt/issues/2
Please stop polluting the web.
I prioritize simplicity and adoption for non-technical users over strict IETF compliance right now. My goal is to make this work for a shop owner on Shopify and Wix, not just for sysadmins.
That said, I am open to supporting .well-known as a secondary location in v1.1 if the community wants it.
How is using a standard path 'just for sysadmins' again? You are introducing something new today.
Wait, am I dumb, or did the authors hallucinate? @INVENTORY says that 42 are in stock, but the text says "Only 3 left". Am I misunderstanding this or does stock mean something else?
Good eye. This demonstrates the protocol’s core feature.
The raw data shows 42. We used @SEMANTIC_LOGIC to force a limit of 3. The AI obeys the developer's rules, not just the CSV.
We failed to mention this context. It causes confusion. We are changing it to 42.
Ah, so dark patterns then. Baked right into your standard.
Not dark patterns. Operational logic.
Physical stock rarely equals sellable stock. Items sit in abandoned carts. Or are held as safety buffers. If you have 42 items and 39 are reserved, telling the user "42 available" is the lie. It causes overselling.
The protocol allows the developer to define the sellable reality.
Crucially, we anticipated abuse. See Section 9: Cross-Verification.
If an agent detects systematic manipulation (fake urgency that contradicts checkout data), the merchant suffers a Trust Score penalty. The protocol is designed to penalize dark patterns, not enable them.
Can shops not just embed Schema/JSON-LD in the page if they want their information to be machine readable?
That is the current standard. But it is hard for agents to read efficiently. To access JSON-LD, an agent must download the entire HTML page. This creates a haystack problem where you download 2MB of noise just to find 5KB of data.
Even then, you pay a syntax tax. JSON is verbose. Brackets and quotes waste valuable context window. Furthermore, the standard lacks behavior. JSON-LD lists facts but lacks instructions on how to sell (like @SEMANTIC_LOGIC). CommerceTXT is a fast lane. It does not replace JSON-LD. It optimizes it.
How is "schema.org compatibility" related to Legal Compliance?
Schema.org is the dictionary for facts.
We map strictly to Schema.org for all transactional data (Price, Inventory, Policies). This ensures legal interoperability.
But Schema.org describes what a product is, not how to sell it.
So we extend it. We added directives like @SEMANTIC_LOGIC for agent behavior. We combine standard definitions for safety with new extensions for capability.
I'm not sure I understand the point of this as opposed to something like a json file, and also, assuming there is any type of structured format, why one would use an LLM for this task instead of a normal parser.
You assume JSON is a standalone file. It rarely is.
Even if it were, JSON is verbose. Every bracket and quote costs tokens.
In reality, the data is buried in 1MB+ of HTML. You download a haystack to find a needle.
We fetch a standalone text file. It cuts the syntax tax. It is pure signal.