Robots in the Wiki Posted on August 14th, 2006 by

We made some minor adjustments to our installation of MediaWiki to prevent robots such as Googlebot from indexing irrelevant pages like article edit pages and history pages.

Essentially, we prepended a “/w/” to all non-article pages and then used mod_rewrite to remove the /w/ so the pages still work normally. The robots.txt file then prohibits any nicely behaving robots from visiting pages that have /w/ before them.

Here is a snapshot of our MediaWiki configuration file:
$wgScriptPath = '/gts/';
$wgScript = $wgScriptPath . 'w/index.php';
$wgRedirectScript = "$wgScriptPath/redirect.php";
$wgArticlePath = $wgScriptPath . '$1';

Our .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /gts
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [S=40]
RewriteRule ^(.*?)/?$ index.php?title=$1 [L,QSA]
# Remove the /w/ from the edit links
RewriteRule ^/w/(.*)$ /$1 [L,QSA]
</IfModule>

And our robots.txt:
User-agent: *
Disallow: /gts/w/

Contact Us

Phone: 507-933-6111
Email: helpline@gustavus.edu
Web: https://gustavus.edu/gts
Blog: https://gts.blog.gustavus.edu
Remote Support: https://sos.gac.edu
System Status: https://gustavus.freshstatus.io

Sign up for our newsletter.

Receive a daily digest anytime we post something new.

We don’t spam! Unsubscribe at any time!

 

Comments are closed.