Having a great discussion over on the ASP.NET MVC forums about Html-Encoding and what should be done about it. All-in-all it's a pretty good post and is rapidly making me see what working for Microsoft is all about :). Specifically the discussion centers around HTML-Encoding and when it should be dealt with. It's a volatile subject, for sure, but one which needs to be considered and discussed.
Cross-site Scripting Attacks
Cross-site scripting is on the rise and is a major security issue that web developers tend to ignore until it's too late:
Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications which allow code injection by malicious web users into the web pages viewed by other users. Examples of such code include HTML code and client-side scripts. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Vulnerabilities of this kind have been exploited to craft powerful phishing attacks and browser exploits
It's amazing how smart the evil geeks are out there. Check out some of the exploits that are possible:
On November 8th, 2006 Rajesh Sethumadhavan discovered a Persistant vulnerability in the social network site Orkut which would make it possible for Orkut members to inject HTML and JavaScript into their profile. Rodrigo Lacerda used this vulnerability to create a cookie stealing script known as the Orkut Cookie Exploit which was injected into the Orkut profiles of the attacking member(s). By merely viewing these profiles unsuspecting targets had the communities they owned transferred to a fake account of the attacker. On December 12th, Orkut fixed the vulnerability.
Two XSS vulnerabilities in Google.com website were identified and published by Yair Amit in December 2005. The vulnerabilities allowed an attacker to impersonate legitimate members of Google's services or to mount a phishing attack. This publication presented an obscure way to bypass common XSS countermeasures by using UTF-7 encoded payloads.
in August 2006, through a fake news summary which claimed President Bush appointed a 9 year old boy to be the chairperson of the Information Security Department. This claim was backed up with links to cbsnews.com and www.bbc.co.uk, both of which were vulnerable to separate XSS holes which allowed the attackers to inject an article of their choosing.
To see XSS in action try this sample (this is a test site btw, so feel free to hack it up):
- Navigate to http://testasp.acunetix.com/Search.asp
- Enter this into the search box:
<br><br>Please login with the form below before proceeding:<form action="destination.asp"><table><tr><td>Login:</td><td><input type=text length=20 name=login></td></tr><tr><td>Password:</td><td><input type=text length=20 name=password></td></tr></table><input type=submit value=LOGIN></form>
You might be thinking "yah but that's me spoofing the page - how would that actually translate into an attack". Well here, click this link and see. This is a very typical phishing attack.
If you're freaked out right now, that's a good thing :). If you're wondering if all of your search forms are vulnerable, the answer is most likely.
XSS Considerations in ASP.NET MVC
HTML-Encoding (transforming text into special HTML characters to represent reserved characters, thus avoiding XSS attacks) has become increasingly important given that we don't have Server Controls to render output using ASP.NET MVC CTP #1. Traditionally, using ASP.NET, this was mitigated some by controls automatically encoding viewable text (attackers could still exploit attribute values like "href", "src" and "value" however).
Damien Guard wrote up a nice post on this issue:
If you don't encode data when using any of the following methods to output to HTML your application could be compromised by unexpected HTML turning up in the page and modifying everything from formatting though to capturing and interfering with form data via remote scripts (XSS). Such vulnerabilities are incredibly dangerous...
Just imagine post.Author contains "><script src="http://abadsite.com"></script> after an unscrupulous user entered that into a field your application uses and it got into the database. The following typical ASP.NET techniques would leave you open.
Indeed it's a very big deal, and one we need to take very seriously with ASP.NET MVC.
Discussion
The thing that sparked this discussion was the use of UpdateFrom() in the MVCTookit. By default we don't encode the values coming in from Request.Form, which means that when you post a value to a controller, the controller could, inadvertently, return that text to the browser (like with the search example above) and you'd be in trouble.
When the thread started, I initially replied that I would turn encoding on by default to avoid these scenarios, and offer an override for turning encoding off. As I quickly realized, people are just not in favor of this approach:
- Encoding inbound is a bad idea - it litters your DB with encoded text that isn't searchable and consumable by other applications
- Encoding inbound offers a "false" sense of protection - meaning that developers won't encode outbound, where it should be
- Encoding outbound can result in "double-encoding", which is a mess
- People always use defaults, and therefore will always encode inbound
I agree with these points, but at the same time I recognize that XSS is a larger issue with MVC and more safeguards need to be put in place. Many developers (I'd say most) forget to turn on encoding at any level - and I've been guilty of it many, many times. To that end, enabling a level of protection by default seems to be a reasonable approach.
In addition, a developer may quickly find that for certain fields he needs to turn off encoding because the data going in is invalid. This is easily doable for individual fields that you need to search on, or fields that need to retain the raw HTML.
In summary - my opinion has been that "opt-out" is a reasonable choice in light of the threat - where you explicitly state that "I don't want these things encoded, thank you".
I do recognize that it's an extra step for most developers - but so is making sure that everything is encoded on the way back out. To me the extra steps should be to make the app less secure, not more.
This has always been at the core of any security discussion - how much work should added security impose? Damien adds a good point (posted in the forums):
Definitely do not encode it going into the database - to do so would pollute your data with HTML encoding. If you thought having presentation and model logic separate was important then imagine what stuffing HTML into the database would mean. If you ever wanted to show it via another application such as a WinForm, Console or XML variant you'll have to do cross-conversion. It would also mean that you can no longer HTML encode the output or you would double encode it and actually start displaying < in text boxes
It's important to recognize that in no way am I advocating storing HTML in your database as a rule - that would be irresponsible of me :).
Given Damien's scenario, I have to pose the question: it is better to have you, as the developer make the decision that "I need to use this data somewhere else so I will turn encoding off"? Or should it be off by default just in case this scenario happens?
It's more work to encode everything, is it worth it? You tell me :)!
What About <%=?
One idea brought up has to do with overriding the default <%= sugar that you can use on any ASP page (which translates to Response.Write) and making it encode every time, with an option for some other syntax like <%! to be used for unencoded strings.
There are some hurdles with this approach that Phil discusses on the thread:
Right now, the ASP.NET page parser converts <%= "Foo" %> to a call to Response.Write("Foo"); I don't think we want to change Response.Write() to default to HTML encoding because that would cause the entire page template to be an unencoded mess since literals within an aspx page are clumped together into Response.Write calls.
But he goes on with some possible thoughts:
1. Use the approach that Steve (last name?) does here. He effectively implements his own CSharpCodeCompiler to intercept the code generation phase I mentioned. Very neat approach.
2. Another theoretical approach (as in I think it would work, but I haven't tried it) is to hook in your own PageParserFilter. Ideally, this approach might allow for introducing an alternative syntax for unencoded HTML such as the <%!= Html.TextBox(...) %>
The next question is whether the ASP.NET team should make this the default behavior for ASP.NET MVC?
And this is where you come in. I'd love to hear from you about this issue. Specifically:
- Did you know the magnitude of possibilities regarding an XSS attack?
- How much of this should be handled by the platform versus your education?
