Site Network: | | Jongsma & Jongsma

Innovation in Information Security

Coverage of important Information Security and Information Technology news and events from the research team at S?nnet Beskerming.

Username: | Password: Contact us to request an account

The Joy of Variable-Width Encoding

One of the problems that web and application developers face is how to handle variable-width encoding, where each character represented on the screen can take more than one byte of memory to store and display (the standard ASCII set only uses one byte per character). Probably the most common trouble encountered is when sites encoded in ASCII encounter utf-8, unicode, or asian character sets that contain characters which require more than one byte to display. If a developer has not factored for the presence of these sort of character sets, their application or site may end up failing to properly display the desired input, or completely fail to show the characters. Back end databases that are not prepared to receive unicode-type input may also cause problems when handling this information (such as MySQL's latin encoding versus utf-8 input).

What is a multi-byte character? The ? in S?nnet Beskerming is a multi-byte character, which requires website code to be aware of its presence in order to display properly. While it isn't present in the base ASCII set, it is in the extended ASCII set, and is present in many other sets, such as utf-8.

With this known issue, it would be assumed that defensive software would be aware of how to handle data that is presented in a multi-byte character format. Unfortunately, this isn't the case. It has been discovered that many HTTP content scanners can not properly scan traffic that is encoded with half or full-width unicode character sets (which suggests that they are only set up to process a fairly basic ASCII character set), thus allowing the traffic to pass through without being able to detect malicious content (which the web application or server is more likely to understand).

To make matters worse, this is a method of attack that web attackers have known about for a very long time (based on how web applications handle odd input), and with the increasing use of HTTP content scanning many sites and users will find that they are a lot less protected than they think. The simpleness of launching an attack using one of these methods means that this oversight by security companies is much worse than it initially appears.

This is a case of unintentional snake oil - if the security vendors aren't aware of an attack vector (even if it is well known), then they can't be sure that they aren't selling snake oil.

17 May 2007

Social bookmark this page at eKstreme.
Alternatively, Bookmark or Share via AddThis

Do you like how we cover Information Security news? How about checking out our company services, delivered the same way our news is.

Let our Free OS X Screen Saver deliver the latest security alerts and commentary to your desktop when you're not at your system.

Comments will soon be available for registered users.