Testing HTML
We are living in the age of Ajax and beyond, and I talk about testing HTML! But it is true. HTML needs testing. The issue with HTML is its flexibility. HTML specification does not insist on well-formedness or correctness. It allowed many different ways of doing the same thing. This paved way for its instant adoption (when the web was invented), but later created trouble in terms of complexity involved in the parser required to parse it, consistent rendering between different agents and, of course, the complexity involved in testing it.
A few days back we faced a production problem caused by HTML. The programmer had put the <option> attribute value thus:
<option value=DYNAMICALLY_GENERATED>Some content</option>
This seemingly harmless code failed when the DYNAMICALLY_GENERATED content contained space in it. The programmer did not wrap the attribute value in single-quote or double-quote. When the form was submitted, the value was taken was the part of the string before the first space. We, the developers did not test with the DYNAMICALLY_GENERATED string with spaces in it.
Another instance we faced was also similar. The programmer had left some open tag in <form>. Firefox rendered this page correctly. The form was functional. When this was deployed for customer review, the customers were appalled to see no form elements. An empty screen. The customers were using IE6.
Issues like these are caused by lack of discipline in the developer’s part. And sometimes, they creep due to duress, or genuine mistake.
When we encounter problems like these, we immediately recognize that these are problems that can be solved easily. Our XML parsers immediately complain when we face such issues when writing XML by hand. But the Web Browser is not made for this purpose. We have specialized HTML Validators to perform such routine checks and find errors of this kind.
HTML Validators
Some of the popular HTML Validators:
- W3’s validator
- WDG’s validator
- Validator by Henri Sivonen
- HTMLTidy
- Firefox Addon (not available for Linux)
Note that some of the services like WDG validator and Henri Sivonen’s validator also come with source code. You may run these services in your local network too.
Challenges in testing HTML
Dynamic content
The first challenge is the process of HTML generation itself. The steps in generating the final HTML:
code > compile > deploy > runtime inputs > final HTML generation
As you see, in most Web development methodologies the final HTML which gets generated is in the last step. This renders validation process to be a costly affair in terms of time.
Automation (with examples of Java specific tools)
As many things in programming world, we can also automate the testing of valid HTML. This can be done during the phase of integration/funtional testing (because actual HTML is generated in the runtime environment based on various dynamic user inputs and parameters).
Approach 1: Using a validation service
The first approach would be to use your functional test tool to get the HTML source, and then programatically send it to one of the validation services (validation.nu exposes its functionality as services). For example, when using Selenium as JUnit test, you would get the HTML source thus:
String htmlSource = selenium.getBodyText(); // Send htmlSource to validation service to verify its validity
Approach 2: Using JTidy inside your test suites
JTidy is a port of the immensely popular HTMLTidy tool initially written by Dave Raggett, now maintained by volunteers.
Tidy tidy = new Tidy(); tidy.parse(inputStream, System.out);
Approach 3: Use a tool that supports HTML verification
If you see the architecture of MaxQ, JTidy is integrated into the tool’s default testing workflow.
Based on the approaches discussed, choose the approach best suited to you and your environment.