Hanalei, Hawaii 9/2/2010
438 Posts and Counting

A Hack's Guide To Unit Testing Generated HTML

Wednesday, January 30, 2008 -

UPDATE: Refactored and tweaked the validation method below by popular request!

I know this might make some people groan - but that's OK - I'm used to it. As Eilon Lipton always tells me:

You're a PM dude - you're not supposed to code

Which is true! And to get the PM as Haack tradition alive, here is my latest attempt to completely devastate my reputation as a coder. Have at me, I love it.

Let's Put an X in Front Of HTML Too
The world's gone X-Crazy (XBox, OS X, ASPX pages, ActiveX...) and it seems that nothing's cool anymore unless there's an X associated with it. Maybe I can regain some of the rep I'm about to lose by changing my name to "RobX".

XRob?

Testing For Compliance
One of the main things that I get hammered for (with respect to the MVC Toolkit) is the lack of XHTML Compliance. I tried to pay as much attention as I could to it but... well some things just slip through. I read somewhere that my slip ups (particularly with respect to the method on the form tag not being in quotes was

...yet again evidence that Microsoft could care less [sic] about the HTML spec

On the contrary it was me, being lame.

I know what you're thinking - "can't you test for that somehow?" and up until now it meant copy/pasting a whole mess of HTML up to w3.org to run through the validator. But as of today I decided to let my Mr. Hack take over and created some code to ping w3.org automatically.

The Code
For my compliance Unit Tests (you wouldn't want to do this for every test, for obvious reasons) I'm creating a Select box, comme ci:

    [TestMethod]
    public void Select_BindToIntegerArray() {
        int[] numbers = { 1, 2, 3 };
        string select = SelectBuilder.Select("test", numbers,"","",0,false,null,null);

        //validate it
        Assert.IsTrue(XHTMLValidator.ValidateFragment(select));

    }

UPDATE: Duncan Smart (?) refactored this function yet again - thanks!
And I'm calling on my new hacked up wunderclass- the XHTMLValidator. Here's the code - have at me and make it hurt:

        public static bool IsValidXhtml(string htmlFragment) {

            NameValueCollection values = new NameValueCollection();
            values["fragment"] = htmlFragment;
            values["prefill"] = "1";
            values["prefill_doctype"] = "xhtml10";

            WebClient webClient = new WebClient();
            string postResult = Encoding.UTF8.GetString(
                webClient.UploadValues("http://validator.w3.org/check", values)
                );

            //lame check - but it works
            bool isValid = postResult.Contains("Congratulations");
            return isValid;
        }
 

Yes, I know. But you know what - it's an automatic way to make sure that my tags are compliant :).

If You're Still Here...
I'm also going to use the HTML Agility Pack that's up on CodePlex to make sure all the other bits that are supposed to be present in the generated HTML are indeed there. This is a really cool project for checking your HTML out - from there site:

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Hope someone finds this helpful...

Related


Gravatar
Duncan Smart - Wednesday, January 30, 2008 - Rob, you've specified that the data is "x-www-form-urlencoded" - but your data hasn't been URL-encoded at all.
Gravatar
Lance Fisher - Wednesday, January 30, 2008 - I was thinking about trying to do something similar to validate my RSS feeds with feedvalidator.org, but I shied away from making calls to a web service in my unit tests. Don't you think that it could become a problem? It's too bad there aren't any html/rss/etc validator libraries (like .dlls) you could just add into your unit tests. Or are there?
Gravatar
David Fauber - Wednesday, January 30, 2008 - Great article. I've just written some controls where my primary dissatisfaction has been that the html generating methods are extremely difficult to write meaningful tests for. Not sure I'm going to go the exact same route, but it gives me some ideas and its nice to see I'm not the only one dealing with this.
Gravatar
Ryan Smith - Wednesday, January 30, 2008 - Rob, With respect to X (in this case XHTML) being a fad, I think the real issue is that unless your pages are coded in an XHTML transitional manner, IE will render them in quirks mode. Thus you end up with a web site that looks OK in one browser but completely messed up in the others. I think that's the reason why you get hammered for not validating as XHTML. Thanks for the post though. I can see this coming in handy in the near future.
Gravatar
Rob Conery - Wednesday, January 30, 2008 - @Ryan: not poking any fun at XHTML :) - I absolutely understand the need for valid HTML and only wish I could have paid more attention.
Gravatar
Joe Chung - Wednesday, January 30, 2008 - Your code looked fine except that you don't need to instantiate ASCIIEncoding (System.Text.Encoding.ASCII) and you should close and dispose your streams and dispose your stream reader either explicitly or via using statements. Also, like Duncan said, you should URL encode your data with HttpUtility.UrlEncode.

It's a shame that we can't use the XhtmlConformance Web.config setting, but it only applies to server control output (and not Literal- or Label-encoded HTML either).
Gravatar
Shawn Oster - Wednesday, January 30, 2008 - As much as one might cringe at making web requests in an unit test I definitely think this is the lesser of two evils. Sure, web requests make the tests a little brittle but at least you're testing the XHTML and that pretty much rocks. Nothing hacky about your code either, I think it's pretty damn clean and makes the best of a tough situation (that being the lack of a validator class library).

I'm with Joe, those streams need to be disposed or be in a using statement, it's never the best idea to trust the garbage collector to clean up your resources for you (memory yes, streams and handles, no). You should update the code on this page with the revised goodness once you've made the small tweaks so it can live on glory for the cut-n-pasters.
Gravatar
adminjew - Wednesday, January 30, 2008 - SubSonic Default Replacing Letter for reserved types is Guess what its X.
Gravatar
Rob Conery - Wednesday, January 30, 2008 - @adminjew: "That was Eric" @Shawn and others - thanks for the tips... I knew better :). Refactored and updated.
Gravatar
Ryan Lanciaux - Wednesday, January 30, 2008 - Rob, Thank you! This will definitely come in handy -- and save some time!
Gravatar
josh - Thursday, January 31, 2008 - How about Rob "ConX" Conery? .. hopefully not xcon -jx
Gravatar
Mike Minutillo - Thursday, January 31, 2008 - Hey RobX,

This is cool.

@Lance Fisher - Would you use this as a "Unit Test" or a higher-level one? I would want this one pre-commit but not on every unit-test run I think.

- Mike
Gravatar
Ben - Thursday, January 31, 2008 - If you're going to be using this throughout your test suite then it'd be worth installing the validator on a local server. http://validator.w3.org/docs/install.html (There's a windows install guide as well http://validator.w3.org/docs/install_win.html)
Gravatar
Josh Stodola - Thursday, January 31, 2008 - I think this is a slight indication as to the cause of legendary problems with the relationship between standards-compliance and Microsoft. Managers don't take the standards seriously enough, and here is raw proof of it. Rob, I know you understand the need for valid XHTML. So, why is your stuff not validating? I don't think it has anything to do with you being lame. Things slip through because your focus is elsewhere (as determined by your brains order of importance).

I am not sure that you should have a higher focus on standards (perhaps managers really do have bigger fish to fry), but at least have some XHTML-savvy sap validate your code before anybody else has the chance to give you shit about it.

Anyways, the real reason I commented was to point out that there is already an API for this:
http://blog.madskristensen.dk/post/Using-the-W3C-HTML-Validator-API.aspx

Best regards...
Gravatar
Duncan Smart - Thursday, January 31, 2008 - Sorry, but couldn't resist (did anyone say "fizz-buzz"?). WebClient is good for simple HTTP stuff like this: public static bool IsValidXhtml(string htmlFragment) { NameValueCollection values = new NameValueCollection(); values["fragment"] = htmlFragment; values["prefill"] = "1"; values["prefill_doctype"] = "xhtml10"; WebClient webClient = new WebClient(); string postResult = Encoding.UTF8.GetString(webClient.UploadValues("http://validator.w3.org/check", values)); //lame check - but it works bool isValid = postResult.Contains("Congratulations"); return isValid; }
Gravatar
Mike - Thursday, January 31, 2008 - Rob,

You are doing an excellent job with this blog, I am so glad you have joined the MS Team.

Keep up the great work!
Gravatar
Lance Fisher - Thursday, January 31, 2008 - @Josh,

The fact that PMs are writing unit tests to validate XHTML indicates to me that they do care about standards. This is really cool.

@Mike,

Ideally, I would like to be able to validate the output in my unit tests, but I would not want to hit the w3c server on every unit test. So with this solution, which I really like in a lot of ways, I would rather just use it on a pre-commit like you suggest.

However, there might be another way. I found this great article:
http://www.thejoyofcode.com/Validator_Module.aspx

They wrote up an HttpModule that validates the output of any aspx page and appends the results to the rendered page. The way they check the validation is by loading the DTD from the W3C and caching it. They use an XmlReader to read the document which will then throw errors if the page doesn't validate. So aside from using the whole HttpModule, I think it should be possible to validate the XHTML with a local DTD. Now whether this covers everything the W3C's validator covers, I don't know, but it would cover quite a bit and you wouldn't have to do a call over the network.
Gravatar
Rob Conery - Thursday, January 31, 2008 - @Josh - thanks for the link, I was looking all over for this :). >>I think this is a slight indication as to the cause of legendary problems with the relationship between standards-compliance and Microsoft<< Hardly. We issued a CTP and I created that toolkit in 10 days because we needed it. I went through everything to try and be sure that it was all compliant and I missed 4 attributes out of 323 :p. >>Things slip through because your focus is elsewhere (as determined by your brains order of importance).<< I'll remind your that ScottGu is the General Manager of our unit. He owns just about everything developer-related. Scott created MVC... Seriously though - PMs in Microsoft have a massive degree of freedom and we're expected to stay close to the code - not sit in a chair and push Gantt Charts. I wouldn't work there if that was the case. In truth I should have had a testing suite prepared but at the time I had never had to Unit Test HTML - I mean how do you do that properly? I had 10 days and so I relied on some old-fashioned testing. Not that it's an excuse, but honestly this is just a Preview and it never would go out the door (even as Alpha) without full testing (which is why I'm doing this now).
Gravatar
Josh Stodola - Thursday, January 31, 2008 - I didn't realize the time frame at all. Regardless, I think I should have recognized that this is indeed a step in the right direction. A couple of years ago, I doubt anybody would have even considered validating the output. With that said, I apologize for criticizing an obviously respectable move.

I don't believe standards-compliance is a religous priority for Microsoft yet, but it is definitely reassuring to see there is some transcendence.

Keep up the great work, guys!
Gravatar
Josh Stodola - Thursday, January 31, 2008 - By the way, Rob (feel free to delete this comment upon reading), I am not sure if you noticed this bug in your blog. When it animates the comment all AJAXY style (it fades in), the fonts end up being all wretched looking (not smooth).

I came across this about a month ago, so if you care to fix it, you can:
http://mattberseth.com/blog/2007/12/ie7_cleartype_dximagetransform.html
Gravatar
Rob Conery - Thursday, January 31, 2008 - @Josh - I'm looking to move off this theme when I get a moment. Probably wait for WP 3 to come round maybe? But thanks for letting me know. No apologies necessary :) - but I do want people to know that MS is now thinking about a lot of this stuff due to some input from the new hires. Phil is pushing the TDD thing harder than ever - it's good stuff! And it's true - ScottGu codes a lot. And thank goodness for it...
Gravatar
Josh Stodola - Thursday, January 31, 2008 - Wow, ScottGu codes? As if I didn't have enough respect for the guy already! Is he from the future?

Thanks for the quick reply, it really is good to see you guys are thinking about this stuff. I'm pretty excited about it!
Gravatar
Pete Hurst - Saturday, February 02, 2008 - So, what happens if the test content you're validating contains "Congratulations" buried in some *in*valid markup? Your user registration page will fail the unit test :)
Gravatar
Troy DeMonbreun - Monday, February 04, 2008 - Rob,

"PM as Haack tradition" - was that a Freudian Slip? ;-)
Gravatar
Rob Conery - Monday, February 11, 2008 - @Pete: the idea here is that this is a Unit Test and hopefully the HTML you pass off to the testing bits won't trip you up. But yah - that's a bit of snag :). I tried to find another way but you get back a massive blob of text and nothing really indicates a pass/fail that I could see. I know it's not optimal, but I tell ya, it caught some pretty crazy errors!
Gravatar
Igor - Sunday, March 09, 2008 - Rob,

Which part of your linq query is "predictive"?