mirror of
https://we.phorge.it/source/phorge.git
synced 2024-11-21 22:32:41 +01:00
Disallow webcrawlers to follow Paste line number anchor links
Summary: Paste provides line anchor links in every single line of a paste. If webcrawlers follow these links, they index the very same Paste again. Thus disallow in robots.txt to reduce unneeded traffic and indexing time. Closes T15662 Test Plan: Go to `/robots.txt` in the web browser. Cross fingers that more webcrawlers abide by RFC 9309. Reviewers: O1 Blessed Committers, valerio.bozzolan Reviewed By: O1 Blessed Committers, valerio.bozzolan Subscribers: tobiaswiese, valerio.bozzolan, Matthew, Cigaryno Maniphest Tasks: T15662 Differential Revision: https://we.phorge.it/D25461
This commit is contained in:
parent
f42dd5819e
commit
76ed0c7ff7
1 changed files with 7 additions and 0 deletions
|
@ -19,6 +19,13 @@ final class PhabricatorRobotsPlatformController
|
|||
$out[] = 'Disallow: /diffusion/';
|
||||
$out[] = 'Disallow: /source/';
|
||||
|
||||
// See T15662. Prevent indexing line anchor links in Pastes. Per RFC 9309
|
||||
// section 2.2.3, percentage-encode "$" to avoid interpretation as end of
|
||||
// match pattern. However, crawlers may not abide by it but follow the
|
||||
// original standard at https://www.robotstxt.org/orig.html with no mention
|
||||
// how to interpret characters like "$" and thus entirely ignore this rule.
|
||||
$out[] = 'Disallow: /P*%24*';
|
||||
|
||||
// Add a small crawl delay (number of seconds between requests) for spiders
|
||||
// which respect it. The intent here is to prevent spiders from affecting
|
||||
// performance for users. The possible cost is slower indexing, but that
|
||||
|
|
Loading…
Reference in a new issue