Hanalei, Hawaii 9/2/2010
438 Posts and Counting

... In Which We Discuss HTML-Encoding

Wednesday, December 19, 2007 -

Having a great discussion over on the ASP.NET MVC forums about Html-Encoding and what should be done about it. All-in-all it's a pretty good post and is rapidly making me see what working for Microsoft is all about :). Specifically the discussion centers around HTML-Encoding and when it should be dealt with. It's a volatile subject, for sure, but one which needs to be considered and discussed.

Cross-site Scripting Attacks
Cross-site scripting is on the rise and is a major security issue that web developers tend to ignore until it's too late:

Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications which allow code injection by malicious web users into the web pages viewed by other users. Examples of such code include HTML code and client-side scripts. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Vulnerabilities of this kind have been exploited to craft powerful phishing attacks and browser exploits

It's amazing how smart the evil geeks are out there. Check out some of the exploits that are possible:

On November 8th, 2006 Rajesh Sethumadhavan discovered a Persistant vulnerability in the social network site Orkut which would make it possible for Orkut members to inject HTML and JavaScript into their profile. Rodrigo Lacerda used this vulnerability to create a cookie stealing script known as the Orkut Cookie Exploit which was injected into the Orkut profiles of the attacking member(s). By merely viewing these profiles unsuspecting targets had the communities they owned transferred to a fake account of the attacker. On December 12th, Orkut fixed the vulnerability.

Two XSS vulnerabilities in Google.com website were identified and published by Yair Amit in December 2005. The vulnerabilities allowed an attacker to impersonate legitimate members of Google's services or to mount a phishing attack. This publication presented an obscure way to bypass common XSS countermeasures by using UTF-7 encoded payloads.

in August 2006, through a fake news summary which claimed President Bush appointed a 9 year old boy to be the chairperson of the Information Security Department. This claim was backed up with links to cbsnews.com and www.bbc.co.uk, both of which were vulnerable to separate XSS holes which allowed the attackers to inject an article of their choosing.

To see XSS in action try this sample (this is a test site btw, so feel free to hack it up):

  • Navigate to http://testasp.acunetix.com/Search.asp
  • Enter this into the search box:

    <br><br>Please login with the form below before proceeding:<form action="destination.asp"><table><tr><td>Login:</td><td><input type=text length=20 name=login></td></tr><tr><td>Password:</td><td><input type=text length=20 name=password></td></tr></table><input type=submit value=LOGIN></form>

You might be thinking "yah but that's me spoofing the page - how would that actually translate into an attack". Well here, click this link and see. This is a very typical phishing attack.

If you're freaked out right now, that's a good thing :). If you're wondering if all of your search forms are vulnerable, the answer is most likely.


XSS Considerations in ASP.NET MVC
HTML-Encoding (transforming text into special HTML characters to represent reserved characters, thus avoiding XSS attacks) has become increasingly important given that we don't have Server Controls to render output using ASP.NET MVC CTP #1. Traditionally, using ASP.NET, this was mitigated some by controls automatically encoding viewable text (attackers could still exploit attribute values like "href", "src" and "value" however).

Damien Guard wrote up a nice post on this issue:

If you don't encode data when using any of the following methods to output to HTML your application could be compromised by unexpected HTML turning up in the page and modifying everything from formatting though to capturing and interfering with form data via remote scripts (XSS). Such vulnerabilities are incredibly dangerous...

Just imagine post.Author contains "><script src="http://abadsite.com"></script> after an unscrupulous user entered that into a field your application uses and it got into the database. The following typical ASP.NET techniques would leave you open.

Indeed it's a very big deal, and one we need to take very seriously with ASP.NET MVC.

 

Discussion
The thing that sparked this discussion was the use of UpdateFrom() in the MVCTookit. By default we don't encode the values coming in from Request.Form, which means that when you post a value to a controller, the controller could, inadvertently, return that text to the browser (like with the search example above) and you'd be in trouble.

When the thread started, I initially replied that I would turn encoding on by default to avoid these scenarios, and offer an override for turning encoding off. As I quickly realized, people are just not in favor of this approach:

  1. Encoding inbound is a bad idea - it litters your DB with encoded text that isn't searchable and consumable by other applications
  2. Encoding inbound offers a "false" sense of protection - meaning that developers won't encode outbound, where it should be
  3. Encoding outbound can result in "double-encoding", which is a mess
  4. People always use defaults, and therefore will always encode inbound

I agree with these points, but at the same time I recognize that XSS is a larger issue with MVC and more safeguards need to be put in place. Many developers (I'd say most) forget to turn on encoding at any level - and I've been guilty of it many, many times. To that end, enabling a level of protection by default seems to be a reasonable approach.

In addition, a developer may quickly find that for certain fields he needs to turn off encoding because the data going in is invalid. This is easily doable for individual fields that you need to search on, or fields that need to retain the raw HTML.

In summary - my opinion has been that "opt-out" is a reasonable choice in light of the threat - where you explicitly state that "I don't want these things encoded, thank you".

I do recognize that it's an extra step for most developers - but so is making sure that everything is encoded on the way back out. To me the extra steps should be to make the app less secure, not more.

This has always been at the core of any security discussion - how much work should added security impose? Damien adds a good point (posted in the forums):

Definitely do not encode it going into the database - to do so would pollute your data with HTML encoding. If you thought having presentation and model logic separate was important then imagine what stuffing HTML into the database would mean. If you ever wanted to show it via another application such as a WinForm, Console or XML variant you'll have to do cross-conversion. It would also mean that you can no longer HTML encode the output or you would double encode it and actually start displaying &lt; in text boxes

It's important to recognize that in no way am I advocating storing HTML in your database as a rule - that would be irresponsible of me :).

Given Damien's scenario, I have to pose the question: it is better to have you, as the developer make the decision that "I need to use this data somewhere else so I will turn encoding off"? Or should it be off by default just in case this scenario happens?

It's more work to encode everything, is it worth it? You tell me :)!

 

What About <%=?
One idea brought up has to do with overriding the default <%= sugar that you can use on any ASP page (which translates to Response.Write) and making it encode every time, with an option for some other syntax like <%! to be used for unencoded strings.

There are some hurdles with this approach that Phil discusses on the thread:

Right now, the ASP.NET page parser converts <%= "Foo" %> to a call to Response.Write("Foo"); I don't think we want to change Response.Write() to default to HTML encoding because that would cause the entire page template to be an unencoded mess since literals within an aspx page are clumped together into Response.Write calls.

But he goes on with some possible thoughts:

1. Use the approach that Steve (last name?) does here. He effectively implements his own CSharpCodeCompiler to intercept the code generation phase I mentioned. Very neat approach.

2. Another theoretical approach (as in I think it would work, but I haven't tried it) is to hook in your own PageParserFilter. Ideally, this approach might allow for introducing an alternative syntax for unencoded HTML such as the <%!=  Html.TextBox(...) %>

The next question is whether the ASP.NET team should make this the default behavior for ASP.NET MVC?

And this is where you come in. I'd love to hear from you about this issue. Specifically:

    1. Did you know the magnitude of possibilities regarding an XSS attack?
    2. How much of this should be handled by the platform versus your education?

Related


Gravatar
Eric Kemp - Wednesday, December 19, 2007 - Nice post Rob... For those of you looking for a comprehensive list of different types XSS attacks, check out this site: http://ha.ckers.org/xss.html
Gravatar
Rick Strahl - Wednesday, December 19, 2007 - I'm in favor of some other expression syntax for encoded context - if you go the other way around it'll be confusing and is likely to break code in various ways.

We've discussed this before, but I really think that there should be some sort of control model specific to MVC (ie. not directly based on Page - maybe a stripped down Page engine that explicitly supports only a limited functional both for controls and Page) so this stuff can be abstracted in a reusable and 'official' way, not with more markup tags that feel more like a bandaid.

Without this, this issue will always end up being troublesome because there's a lack of resuability and you constantly have to 'think' about XSS which is extremely hard to enforce.
Gravatar
Steve Calvert - Wednesday, December 19, 2007 - I'm completely with Rick on this one. I think an MVC based control model is the way to go. After all, that's one of the things that appealed to me about ASP.NET in the first place...
Gravatar
Damien Guard - Wednesday, December 19, 2007 - Great post :) Worth bearing in mind that WebForms provides almost no help on this issue and that MVC is aimed at a more advanced audience. I think changing behaviour is acceptable given that audience and the various other changes (no code behind, writing controllers etc). I've been wrestling for some time whether to highlight just how evil script injection can be on m blog for fear of people taking the scripts and actually using them instead. [)amien
Gravatar
cathal - Wednesday, December 19, 2007 - Personally I believe it's much more sensible to encode the content on the way in. That way you don't have to worry if you forget to encode it on the way back out. As for searching, as the htmlencoding only affects 252 characters (http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references) most of which are rarely (if ever) searched on, it rarely impacts on searchs (unless you're using mathmatical\greek symbols in your content).

Orkut's written in asp.net, but didn't take advantage of an asp.net feature that could mitigate that type of attack. Microsoft added a custom extension known as HttpOnly to cookies with IE6.01. This was designed to mitigate against authentication cookie theft, which is a common target of XSS. We've been using it in DotNetNuke for over 2 years (http://www.dotnetnuke.com/Community/Blogs/tabid/825/EntryID/256/Default.aspx ), and I was glad to see that Microsoft automatically adds the attribute to the forms auth cookie in asp.net 2.0 (as DotNetNuke uses multiple cookies we override the default behaviour and apply it to allow site cookies -something Orkut should have done), as it fixed this issue for IE users at least. After a few years of refusing to implement it (probably as they suspected that anything from Microsoft must be inherenting evil :) ), Firefox added support for it just in time to break one of my demos at devconnections.
Gravatar
Philip Rieck - Wednesday, December 19, 2007 - There are many reasons why I'm in the "adamantly NO encoding on the way in" camp. But here's the #1 reason: If it's encoded inbound by default, it means that people are being actively encouraged NOT to encode output. Yes - If we encode inbound data we are *training* people to not encode outbound. I certainly don't want to be the guy training web developers to skip outbound encoding. Sure, the people who chimed on this post will understand where to encode, what overloads to use, etc... But I bet we won't be the only ones using MVC. Lets encourage good habits from the start, rather than institutionalizing bad ones. In an application of appreciable size, encoding UpdateFrom will probably not catch 100% of the input. And even if you encode 99.9% of the inbound data (or especially if you do) - that's an XSS vunerability. Now, since we trained them to forget about encoding outbound data, who do we point the finger at?
Gravatar
Lance Fisher - Wednesday, December 19, 2007 - I don't really like the idea of encoding the content on the way in. In building an RSS feed like Brad Abrams I wanted to encode my data. In this case it should be XML encoded and not HTML encoded since not all HTML codes are recognized by XML. On a webpage, this same data would need to be HTML encoded, and in JSON it would need to be JS encoded. There are many different places you can reuse the same data in just a web app.

I downloaded Microsoft's Anti-XSS Library and used the XmlEncode method and XmlAttribute encode methods in the view. To make it a little more fun, I created extension methods on System.String to wrap each of the "encode" methods. Now I can do things like <%= item.Title.XmlEncode() %gt; I like it.
Gravatar
Rob Conery - Wednesday, December 19, 2007 - >>>If it's encoded inbound by default, it means that people are being actively encouraged NOT to encode output. Yes - If we encode inbound data we are *training* people to not encode outbound.<<< If I use this logic, then that means WebForms made you forget how to write standards-based HTML :). >>>If it's encoded inbound by default, it means that people are being actively encouraged NOT to encode output. Yes - If we encode inbound data we are *training* people to not encode outbound.<<< Really? "Actively Encouraged"? Do you think 80% of the developers out there even know what XSS is? I don't think I'm advocating anything like that friend. >>>In an application of appreciable size, encoding UpdateFrom will probably not catch 100% of the input. And even if you encode 99.9% of the inbound data (or especially if you do) - that's an XSS vunerability<<< As opposed to doing nothing, which is 0%, and happens all too frequently...
Gravatar
Thoughts on awareness of security vulnerabilities & full disclosure » DamienG - Wednesday, December 19, 2007 - [...] some great people are now on the case including Rob Conery and Phil Haack who I believe in to push this from inside and Steve Sanderson who came up with an [...]
Gravatar
Joe Brinkman - Wednesday, December 19, 2007 - @Phillip - The issue is not a matter of whether you encode inbound or outbound: the real issue is do you validate input before outputing it. One of the primary principles in the security field is defense in depth: applying multiple levels of security. By encoding data on the input side you provide some level of security in case a developer forgets to validate the input before rendering it. Whether or not the MVC framework encodes data inbound, it does not absolve you of the responsibility to validate the input. Personally, I would expect Microsoft to take a general approach and then make it easy for me to turn it off and provide my own security checks. This helps keep casual developers safer (there is no such thing as perfect security), while also allowing more experienced developers to implement their own security methodologies. This is in keeping with all of the other security measures which already exist within ASP.Net.
Gravatar
Philip Rieck - Thursday, December 20, 2007 - @Rob - Sorry if I came off argumentative - I really didn't mean to do anything other than add my opinion, right or wrong. >>>> If I use this logic, then that means WebForms made you forget how to write standards-based HTML :). Perhaps :) But that's a different point, I think.... >>>> Really? "Actively Encouraged"? Do you think 80% of the developers out there even know what XSS is? I don't think I'm advocating anything like that friend. Again, I wasn't trying to point fingers or call names in any way. My apologies if it seemed that way. What I'm saying boils down to this: In my opinion, if you encode input by default on the way in, all the beginning devs (or devs who don't care) will see is "Blogs say I should encode untrusted data. But if I encode my output, it gets double encoded, so I don't encode my output. Looks like it's all taken care of for me!". In my opinion, this is a bad precedent. >>> As opposed to doing nothing, which is 0%, and happens all too frequently. This point is well taken - Most of my dev positions, I had the luxury of educating the users of my libraries by reviewing the code. You don't have that, and just saying "Should have thought about it yourself" won't help anyone. @Joe - I don't think anyone who thinks about security in depth is going to have a problem with this one way or the other, really. It's the people who don't do any security (let alone defense in depth) that will be the most affected, one way or another.
Gravatar
Rob Conery - Thursday, December 20, 2007 - @Philip - your opinion is always welcome, and no offense was taken :). I need to add more smileys - this is a great discussion and needs to happen on many fronts so fire away! :) Totally understand the inbound/outbound habit-forming thing; i think the message is one of education here. If we spin that point around to what is all too common today it would something like "Blogs say I should encode untrusted data ... [cricket chirps]" :):) How can we do the safest thing without "scaring the kids" :D
Gravatar
Mladen Mihajlovic - Thursday, December 20, 2007 - I quite like Phil's option 2 : .
Gravatar
Elmar - Thursday, December 20, 2007 - I like Rick´s idea of taking care of this at the page level. XSS attacks are a big issue for every developer out there. So it´s problem that should be solved by the framework requiring as less additional intervention as possible. Using a custom page that is perhaps derived from "Page" would allow for a very flexible way to keep all custom code unchanged even if new types of attacks occur. All could be handled internally at Page level.
Gravatar
Mike - Thursday, December 20, 2007 - 1. Yeah, but I tend to ignore it though which is very bad (I'm lazy) 2. Encode it by default, but let me choose not to on a field by field basis.
Gravatar
Mischa Kroon - Thursday, December 20, 2007 - Not allowing html input is more efficient for most solutions.

Saving data encoded means more overhead processing takes a bit more time.
Encoding on the way out makes sence for a few options.

Not saving html for search also makes some sence in terms of performance.
Think html editor fields.
Gravatar
Renaud Martinon of Syntopy Software - Thursday, December 20, 2007 - Interesting debate.

Frankly, I don't like the idea of inbound encoding either, for all the good practical reasons mentionned, and also for more abstract, conceptual reasons. For me, preventing attacks and encoding text are two different concerns and should not be merged into one single thing.

What I would expect from the ASP.NET framework by default is some kind of exception thrown when an attempt to inject active content is detected - with an override in case I want to implement my own detection code or skip this detection altogether.
Gravatar
Dave Neeley - Thursday, December 20, 2007 - As a beginning, self-taught developer, I just assumed that using the default forms authentication and what-not that's built into asp.net 2.0 would solve all of my security problems, and help me pass information security audit tests. I guess I'm about to find out if it will... I, for one, think that a config file full of turn-this-on, turn-this-off doesn't make it any easier to get a foot in the door education-wise. I've been reading "Designing the Obvious", and about to dig into "Don't Make Me Think" (both recommendations I found in comments on this blog, I believe). If I'm reading things right, the basic premise is that the developer should think more often and earlier in the process, so that end users can just get their work done. The problem is that devs want to "just get their work done" the same as end-users--which is why we use things like the .NET framework and SubSonic. While I'm not opposed to doing some reading and learning how to do something right, there are tons of devs out there who just want to get the job done and "Go level up your Rogue in Outland" (SubSonic starter site web.config). To make a long story short, I think it's a crap-shoot. The real problem is designing responsible developers.
Gravatar
Rob Conery - Thursday, December 20, 2007 - >>>Saving data encoded means more overhead processing takes a bit more time. Encoding on the way out makes sence for a few options.<<< @Mischa: It does take some processing power, but inbound is once, outbound is every time. In terms of perf you're better off with inbound. >>>For me, preventing attacks and encoding text are two different concerns and should not be merged into one single thing. What I would expect from the ASP.NET framework by default is some kind of exception thrown when an attempt to inject active content is detected - with an override in case I want to implement my own detection code or skip this detection altogether.<<< @Renaud: unfortunately encoding is the only way to avoid XSS since the exploit involves the browser's rendering of the text - you might want to rethink this (or maybe elaborate some more). ASP.NET 2.0 does validate input already so you don't need to worry about this. By default pageValidation is turned on, and it throws an error if any HTML is received. MVC is a different beast and we have the chance now to adjust our position on it. @Dave: well put :)
Gravatar
alberto - Thursday, December 20, 2007 - I can't believe you weren't already filtering output somehow. For all this in-out discussion: There are two rules in security you probably have already heard which summarize all concerns: "All input is evil" and "Accept only data that is know to be safe". But, what's input in the first place? If we are trying to protect ourselves from SQL Injection, we should do it before entering the data in the database. For that, we have Parameters in .NET, so we don't have to worry about it. When talking about XSS, the input is the data we are going to show. Therefore, you want to apply the second rule and assure that the data we are showing is safe. We can do it by encoding it (we could simply reject data if it's not known to be safe, but that not something we want here), and we want to do it when and where we are going to show it, because safety implies different encodings for different output formats. Oh, and you definitely want to do it automatically, because, you know, people forgets, and bad things happen.
Gravatar
Renaud Martinon of Syntopy Software - Thursday, December 20, 2007 - Rob, To clarify, I am aware of the current implementation of pageValidation in classic ASP.NET. What I meant to say is that I expected the MVC framework to handle it in a similar way - or even in a better one, in providing more customization possibilities.
Gravatar
Richard Willis - Thursday, December 20, 2007 - I would rather encode by default on the way out. Encoding by default on the way in only helps if the application only displays data which it has saved. If any other application, especially a non-MVC one saves it, especially if more than one application can modify it, then it's not going to be encoded.

In these cases, which are going to be common in any enterprise application, as Phillip said, people trained to not encode on out by the default being on in, are not going to catch any attacks in the data.
Gravatar
Igor Loginov - Friday, December 21, 2007 - IMHO, a kind of "safe" filtering on the way out is more efficient than encoding. For Rob's example with search box it should be someting like:

s = s.Replace("", " > ").Replace(" ", " ").Trim();

Also, I normally allow only one type of quote (single / double / grave accent) inside the user input on the way in (to avoid problems with SQL).

Would I beleive in techno magic, I could imagine a new Page method
bool IsOutputSafe(string output)
with a bunch of events like "Mixed quotes", "Output contains HTML tags", "JavaScript entry discovered", "Cross domain src attribute found", etc. This is a kind of SAX parser which evaluates the output against existing Page context and warns that HTML flow can be broken or harmfully extended. Thus, the developer can make a decision what to do at each point.
Gravatar
Igor Loginov - Friday, December 21, 2007 - Oops! Sorry, my replace example was corrected. Its idea is to replace LT / GT with a combination "space LT / GT space" (which is harmless), and then remove double spaces and trim the string.
Gravatar
Scott S. - Friday, December 21, 2007 - "I would rather encode by default on the way out"

I agree with Richard on this one. Encode by default on the way out. Give me a way to turn it off/override if needed. There are many scenarios where data comes into the database from other sources and/or the data is rendered in a format other than HTML. The only time you can be 99% sure the data is rendering to HTML is during the rendering of the view.

Also, let's say the ASP.NET team encodes the input by default. Now if I want to be consistent when I save data in the same database tables from my WinForms app, I've got to reference System.Web to use the HttpServerUtility class in my WinForms app. Things are starting to smell at that point.

Ok, you could turn off the default implementation, and explicitly encode every item you render during output, but it requires a lot more work for something that could be handled by the framework.

This leads me to another suggestion. How about allowing it to be configured either way (on input or output)? Then I could easily choose my preferred method and I couldn't care less about which way is the default.
Gravatar
Thomas Gravgaard - Saturday, December 22, 2007 - I'm with Scott S and others on the notion of default encoding on the way out, and make it overridable. The default should be safe and this seems for me like this is the most obvious way of doing it. If you want the HTML rendered because you know what you are doing, you will also be aware enough to know how to turn it off. Furthermore it will be easy for you to track down the places where you have overridden and thus where you need special attention to security - or to find the window that was broken through...
Gravatar
Erik Wynne Stepp - Thursday, December 27, 2007 - 1.) If everyone can't agree what the implicit behavior should be for either, then it should be explicit every time. This will force the developer to make a choice: should this be encoded or not?

Even if you have a default implicit behavior, you should make it easy to override.

2.) I prefer NOT encoding input by default.

As has already been discussed, developers will typically use the defaults and store encoded data which doesn't make sense outside of web applications.

My company's web application accepts data imported from other sources. I can't assume that any of it is safe. If one of these sources uses ASP.NET MVC with default input encoding, I am going to double-encode it.

3.) I prefer that we DO encode output by default.

Data can come from many sources; bad/evil data come doesn't only come from the website.

Again, my company's web application accepts data imported from other sources. I can't assume that any of it is safe so I need to make sure that it is encoded before it is displayed.

I like the idea of adding a default encoding inline syntax similiar to, but not identical to the current inline syntax, but I'm not sure if it would be used with so many developers familiar with the current inline syntax. Will they change their current habits to be more secure? Maybe.

I like the idea of MVC controls better. As stated by Rick Strahl and others above, this allows many benefits above and beyond resolving the encoding issue.

Reading many of the code samples for MVC, I'm concerned that the inline code has lost the elegance that ASP.NET controls offered to better encapsulate behaviors. I also think that using ASPX controls seems wrong, too, since they have different page lifecycles than an MVC control should.

Maybe both approaches would be best of all: MVC controls and an alternate inline syntax?
Gravatar
Elmar - Thursday, December 27, 2007 - Maybe you should have a closer look at RoR 2.0 (if you haven´t done already ;-)). They sure have the same issues and came up with something already: http://weblog.rubyonrails.org/2007/12/7/rails-2-0-it-s-done ... Action Pack: Security Making it even easier to create secure applications out of the box is always a pleasure and with Rails 2.0 we're doing it from a number of fronts. Most importantly, we now ship we a built-in mechanism for dealing with CRSF attacks. By including a special token in all forms and Ajax requests, you can guard from having requests made from outside of your application. All this is turned on by default in new Rails 2.0 applications and you can very easily turn it on in your existing applications using ActionController::Base.protect_from_forgery (see ActionController::RequestForgeryProtection for more). We've also made it easier to deal with XSS attacks while still allowing users to embed HTML in your pages. The old TextHelper#sanitize method has gone from a black list (very hard to keep secure) approach to a white list approach. If you're already using sanitize, you'll automatically be granted better protection. You can tweak the tags that are allowed by default with sanitize as well. See TextHelper#sanitize for details. ...
Gravatar
Liessiota - Wednesday, April 15, 2009 - emm.. nice :)