Monday, March 1, 2010

Automating Accessibility testing

I’ve been working on new a website recently and one of the major elements is to adhere to W3C WAI-AA compliant accessibility. I figured I could just hand the problem over the the designers but taking a Ronald Reagan signature of “Trust, but verify” I figured I’d need to check any output regardless. So my question was what would be the easiest way to check a reasonably large website in an automated way so I could be notified if anything was found. 

To my surprise most of the CMS’s don’t offer this facility out of the box and the online offerings need you to enter a URL on another site each day.  I wanted something simple and free that could be integrated into my existing continuous integration setup with Cruise Control. I’d already been successful using Selenium and NUnit so I figured I could reuse the same technology stack.  But what fun would that be?  So I figured I’d move to using WatiN.

The simple solution

I ended up with a solution using a combination of three main technologies.

NUnit – to hold the UnitTesting Code and having all the infrastructure.

WatiN – to interface with the Browser and get access to the HTML to test.

Tidy – A very interesting little utility that has all the imbedded accessibility tests that I really did not want to write myself.

First off you need to create a Unit Test with a continuous integration service environment, for instructions on that you can see my previous post.

One off page test

Below is a simple UnitTest that will test a HTML page for accessibility errors and warning. I’ve highlighted a number of the more interesting lines.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using Tidy;
using WatiN.Core;

namespace HTMLTestExamples
{

    [TestFixture]
    public class TidyTest
    {
        private static readonly log4net.ILog _log = log4net.LogManager.GetLogger(System.Reflection.MethodInfo.GetCurrentMethod().DeclaringType);
        Tidy.Document _tdoc = new Tidy.Document();
        int _status = 0;
        string _ConfigFileName = @"Files\foo.tidy";

        [TestFixtureSetUp]
        public void TestFixtureSetup()
        {
            _tdoc.OnMessage += new ITidyDocumentEvents_OnMessageEventHandler(doc_OnMessage);
            log4net.Config.XmlConfigurator.Configure();
            _status = _tdoc.LoadConfig(_ConfigFileName);
            Assert.IsTrue(_status == 0, "Ensure no errors found in configuration");
            _log.Info("Starting up for testing");
        }
        /// <summary>
        /// Tests the file.
        /// </summary>
        [Test]
        public void TestHTMLPage()
        {
            String htmlResults = String.Empty;
            using (var browser = new IE("http://<whatever URL you want"))
            {
               _status = _tdoc.ParseString(browser.Html);
                _status = _tdoc.RunDiagnostics();
                Assert.IsTrue(_status == 0, "Oh No!, Error were errors found");
            }
        }

        /// <summary>
        /// Process messages from the Tidy parse process.
        /// </summary>
        /// <param name="level"></param>
        /// <param name="line"></param>
        /// <param name="col"></param>
        /// <param name="message"></param>
        void doc_OnMessage(Tidy.TidyReportLevel level, int line, int col, string message)
        {
            _log.InfoFormat("{3}:  {0}  Line: {1}  Col: {2}", message, line, col, level);
        }
    }
}

The first point of interest is the foo.config file. Tidy can take its configuration setting programmatically or via a config file.  I’ve chosen the config route as it was easier to modify on the fly.

accessibility-check:  2
show-warnings:      no
show-errors:          6

This file I used only had three setting, accessibility-check is set to 2 which means that it will warn at the level of AA-WAI.   There are a whole range of different values you can use and these are documented on the TIDY website.

The next point of interest is how we hook up to the OnMessage event created by Tidy when it reads the HTML pages.  Here we pass the even over to our delegate method called “doc_OnMessage”.  At the moment we simply want to print out the results on screen, but we can expand on this later.

The next few lines of code do all the real work.
     using (var browser = new IE("<URL>"))
Will tell WatiN to open a browser instance of and load the URL into memory.  
     _tdoc.ParseString(browser.Html);
Will take the raw HTML that has been read from the page and check it for any errors.  This will run basic HTML checks rather than the specific accessibility tests we want to look at, but its important to load the data before we can look the AA.
      _tdoc.RunDiagnostics();
This is the line is where we actually get to run the tests we are looking to do.

Running this against my iGoogle page came up with the following:

***** HTMLTestExamples.TidyTest.TestHTMLPage
2010-03-01 22:13:26,101 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyInfo:  Document content looks like HTML Proprietary  Line: 0  Col: 0
2010-03-01 22:13:26,114 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyError:  [3.2.1.1]: <doctype> missing.  Line: 1  Col: 1
2010-03-01 22:13:26,115 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyError:  [13.2.1.1]: Metadata missing.  Line: 2  Col: 1
2010-03-01 22:13:26,117 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyError:  [1.1.10.1]: <script> missing <noscript> section.  Line: 7  Col: 2
2010-03-01 22:13:26,119 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyError:  [11.2.1.10]: replace deprecated html <u>.  Line: 9  Col: 745
2010-03-01 22:13:26,120 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyError:  [11.2.1.10]: replace deprecated html <u>.  Line: 13  Col: 395
2010-03-01 22:13:26,121 [TestRunnerThread] DEBUG HTMLTestExamples.TidyTest TidyError:  [1.1.10.1]: <script> missing <noscript> section.  Line: 17  Col: 1

Humm…. not to good for our friends in Google, butg they can’t be good at everything.

One improvement

The first thing I did when moving on from the first example was to move away from the hard coded URL.  By simply using the TestCase attribute we are able to add a whole bunch of URLs.

[Test]
[TestCase("http://www.google.com")]
[TestCase("http://www.abc.com")]
[TestCase("http://www.irishtimes.com")]
public void TestHTMLPage(string url)
{
    Tidy.Document tdoc = new Tidy.Document();
    tdoc.OnMessage += new ITidyDocumentEvents_OnMessageEventHandler(doc_OnMessage);
    log4net.Config.XmlConfigurator.Configure();
    _status = tdoc.LoadConfig(_ConfigFileName);

    String htmlResults = String.Empty;
    using (var browser = new IE(url))
    {
        _status = tdoc.ParseString(browser.Html);
        _status = tdoc.RunDiagnostics();
        Assert.IsTrue(_status == 0, "There were errors found");
    }
    tdoc.OnMessage += new ITidyDocumentEvents_OnMessageEventHandler(doc_OnMessage);
}

One thing I did find is that I had to move the Tidy Document as being defined at the top level to being set at the method.

Testing from root to leaf

The next thing I did was to setup a simple root to leaf test.  I simply used the built in functionality available in WatiN to build myself a generic list of URLs on the page.  Then store this in a class that I can read back when needed.

// simple public property to hold all the pages.
List<Pages> _allPages = new List<Pages>(); 

/// <summary>
/// HTML Pages class
/// </summary>
private class Pages
{
    public string URL { get; set; }
    public string Title { get; set; }
    public ArrayList errors { get; set; }
    public ArrayList warnings { get; set; }
    public bool Tested { get; set; }       
}

// Use this code within your test
List<string> pageLinks = ExtractLinks(browser.Links);
_allPages.Add(currentPage); 
foreach (string link in pageLinks)
{
     if (!string.IsNullOrEmpty(link) && !PageOutsideSite(link) && !PageAlreadyTested(link))
         TestHTMLPage(link);
}

// This function will take all the links and build a list
private List<string> ExtractLinks(LinkCollection linkCollection)
{
     List<string> links = new List<string>();
     foreach (Link link in linkCollection)
         links.Add(link.Url);
     return links;
}

// Check to see if the page is outside of the main root URL
private bool PageOutsideSite(string urlRoot, string urlLink)
{
     if (!urlLink.Contains(urlRoot)
         return true;
     if (urlLink.Contains("?"))  // ignore param urls
         return true;
     if (urlLink.Split('/').Length > 5)  // two levels deep
         return true;
     return false;
}

// Check if the page has already been tested.
private bool PageAlreadyTested(string urlLink)
{
     foreach (Pages page in _allPages)
     {
         if (page.URL == urlLink && page.Tested)
             return true;
     }
     return false;
}

A few words of warning

This method of testing is very slow and processor intensive, WatiN is really not the best method for dealing with a large number of pages.  I’d probably recommend some form of download of the files before hand on a nightly basis and then run the Tidy tests.