We made some minor adjustments to our installation of MediaWiki to prevent robots such as Googlebot from indexing irrelevant pages like article edit pages and history pages.
Essentially, we prepended a “/w/” to all non-article pages and then used mod_rewrite to remove the /w/ so the pages still work normally. The robots.txt file then prohibits any nicely behaving robots from visiting pages that have /w/ before them.
Here is a snapshot of our MediaWiki configuration file:
$wgScriptPath = '/gts/';
$wgScript = $wgScriptPath . 'w/index.php';
$wgRedirectScript = "$wgScriptPath/redirect.php";
$wgArticlePath = $wgScriptPath . '$1';
Our .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /gts
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [S=40]
RewriteRule ^(.*?)/?$ index.php?title=$1 [L,QSA]
# Remove the /w/ from the edit links
RewriteRule ^/w/(.*)$ /$1 [L,QSA]
</IfModule>
And our robots.txt:
User-agent: *
Disallow: /gts/w/
Leave a Reply