1
0
Fork 0
mirror of https://we.phorge.it/source/phorge.git synced 2024-11-25 00:02:41 +01:00

Disallow webcrawlers to follow Paste line number anchor links

Summary:
Paste provides line anchor links in every single line of a paste.
If webcrawlers follow these links, they index the very same Paste again.
Thus disallow in robots.txt to reduce unneeded traffic and indexing time.

Closes T15662

Test Plan:
Go to `/robots.txt` in the web browser.
Cross fingers that more webcrawlers abide by RFC 9309.

Reviewers: O1 Blessed Committers, valerio.bozzolan

Reviewed By: O1 Blessed Committers, valerio.bozzolan

Subscribers: tobiaswiese, valerio.bozzolan, Matthew, Cigaryno

Maniphest Tasks: T15662

Differential Revision: https://we.phorge.it/D25461
This commit is contained in:
Andre Klapper 2023-11-10 12:56:43 +01:00
parent f42dd5819e
commit 76ed0c7ff7

View file

@ -19,6 +19,13 @@ final class PhabricatorRobotsPlatformController
$out[] = 'Disallow: /diffusion/'; $out[] = 'Disallow: /diffusion/';
$out[] = 'Disallow: /source/'; $out[] = 'Disallow: /source/';
// See T15662. Prevent indexing line anchor links in Pastes. Per RFC 9309
// section 2.2.3, percentage-encode "$" to avoid interpretation as end of
// match pattern. However, crawlers may not abide by it but follow the
// original standard at https://www.robotstxt.org/orig.html with no mention
// how to interpret characters like "$" and thus entirely ignore this rule.
$out[] = 'Disallow: /P*%24*';
// Add a small crawl delay (number of seconds between requests) for spiders // Add a small crawl delay (number of seconds between requests) for spiders
// which respect it. The intent here is to prevent spiders from affecting // which respect it. The intent here is to prevent spiders from affecting
// performance for users. The possible cost is slower indexing, but that // performance for users. The possible cost is slower indexing, but that