IE6 (Internet Explorer 6) eats form data
While working on some internal Web forms for Cadence, I came across a bonafide browser bug that I had never seen before. Some people were filling out a form, but the form sometimes wouldn't work -- data was lost, and we couldn't figure out why. Eventually I had eliminated bugs in my PHP code and as a last resort, I put a sniffer on the wire to watch the traffic. What I found was amazing to me. Internet Explorer 6 will fail to encode data properly (wiping out the first field) under very particular circumstances. What's interesting is that, searching across the Internet, I found lots of people having the bug but not a lot of solutions. That's odd, right? I mean, IE 6 is a really old browser. All the problems with it should have workarounds now, just as the box model issues in IE 6 do. But this doesn't. So I'll go over what I found, and how I fixed it.
The problem
Create a simple form in HTML -- don't specify DOCTYPE (or limit it to HTML 4), don't specify character set, etc. Just the minimum needed to create the form in semantically correct markup. However, you absolutely must add enctype="multipart/form-data" as an attribute in the form tag. That's usually used to create forms that take uploads, such as the form to take photos on Facebook. However, we don't need to create any upload fields. We just need to create 2 text fields. Throw some code in there somewhere to log the data that comes in. Next, fire up Internet Explorer 6 and view your creation. In the first field enter the word, "Hello." In the second field enter the word "they’re." What's important about the second word is that it has a curly apostrophe. Copy & paste my text if you need to. Then, submit the data.
Poof! The word "Hello" never made it! No matter what you put in that first field, it's lost.
The discovery
So the data isn't coming in. Where is it? Well, I fired up SmartSniff. It's a packet sniffer, which can watch the raw data that Internet Explorer is sending to my server. I'm watching because I want to see how the data is coming in -- is it funny somehow? In an unexpected format? It turns out, it's just broken. Each field in a "multipart/form-data" form is sent wrapped in a boundary, which is typically a bunch of dashes and a random string of letters/numbers. This is done so that each field is cleanly separated and the server can then view each field and correctly handle it. But as you can see from my screenshot below, the first field doesn't have the boundary that the other fields do:

The solution
The bug appears to rear its ugly head when UTF-8 data (or actually any data beyond ASCII), is sent as part of a multibyte form (a form that has upload fields). This is what makes the bug so nefarious -- it will only appear on a form that has upload fields, and only if the page isn't specified to be UTF-8, and only when someone pastes in some UTF-8 text. So one person might see the problem while another person with the same browser on the same system might not.
The fix? Add this attribute to your form tag:
accept-charset="UTF-8"
Now when non-ASCII data comes in, IE 6 will expect the multibyte characters and handle them properly. Yay!
