Blog

Where we discuss our work, thoughts, and process

Unacceptable Browser HTTP Accept Headers (Yes, You Safari and Internet Explorer)

Update: WebKit team responds to this post. Admits error, downplays importance.

When a web browser make a request it sends information to the server about what it is looking for in headers. One of these headers is the Accept header. The Accept header tells the server what file formats, or more correctly MIME-types, the browser is looking for. Let's take a look at Firefox's Accept header:

GET /page/routing-in-recess-screencast HTTP/1.1
Host: RecessFramework.org
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Let's translate Firefox's request to English:

Dear RecessFramework.org,
I want the resource "/page/routing-in-recess-screencast" and I want it in an HTML or XHTML format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in an XML instead. If you can't even give it to me in XML, well, I'll take anything you've got!
Love,
Firefox

The Accepts header gives the browser a chance to tell the server which format it wants for a resource. By giving a list of options this content negotiation happens in a single request. One of the key design goals of the HTTP spec is to minimize back-and-forth communication. The browser could ask for each of these formats one at a time but it would be wasteful.

How does the browser specify the preference Give me HTML/XHTML before XML before *? Preference is indicated by the "relative quality parameter" (q) and its value (qvalue), seen in application/xml;q=0.9,*/*;q=0.8. Here's how the HTTP spec defines it:

Each media-range MAY be followed by one or more accept-params, beginning with the "q" parameter for indicating a relative quality factor. The first "q" parameter (if any) separates the media-range parameter(s) from the accept-params. Quality factors allow the user or user agent to indicate the relative degree of preference for that media-range, using the qvalue scale from 0 to 1 (section 3.9). The default value is q=1.

For as brilliant as the spec is, it is a terrible read. What's going on is simple:

  1. Everything item's default preference value is 1.
    1: html, xhtml, xml, *
  2. If an item specifies q=X, its preference value is X.
    0.9: xml
    0.8: */*
    1: html, xhtml
  3. Order by preference value in descending order.
    1: html, xhtml
    0.9: xml
    0.8: *

The only other major detail is in cases where there are ambiguities the more specific one wins. For example if both application/xml and */* had a preference of 0.9 application/xml would still come first. Firefox chooses to make it explicit that */* is less preferred by giving it a preference of 0.8. Firefox's Accept header is sensible and well thought out. Opera's is too. Other browsers: not so much.

What in The Header Were You Thinking WebKit?

Don't relax yet IE, you're up next, and you're even more egregious. So, what's wrong with WebKit, the lauded engine behind Safari and Google's Chrome? Let's take a look:

GET /page/restful-php-framework HTTP/1.1
Host: RecessFramework.org
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,
        text/plain;q=0.8,image/png,*/*;q=0.5

Note: Accept split to two lines for width. On quick glance it doesn't look too different from Firefox's. Let's try it again in English just to be sure.

Dear RecessFramework.org,
I want the resource "/page/restful-php-framework" and I want it in an XML, XHTML, or PNG format. If you cannot serve me this way, I'll take "/page/routing-in-recess-screencast" in HTML or plain text instead. If you can't do that for me I'll take whatever!
Thanks,
WebKit

Really WebKit? The browsing engine most responsible for killing XHTML prefers XHTML over HTML! It would also prefer PNG over HTML. That's a little embarrassing, but what is worse: Safari and Chrome accept XML over HTML (and, ambiguously, over XHTML, too). WebKit's Accept header forces web developers to work against the HTTP spec.

Suppose you are Twitter and want to be a good RESTful internet citizens following the HTTP spec. You've got a resource called a tweet that can be represented as XML or JSON or HTML. You wouldn't want Safari users to get an XML copy of a Tweet by browsing around, so you have to actively ignore WebKit's Accept header preferring XML above all else. Aside: It turns out Twitter's REST API ignores many REST/HTTP best practices like the Accept header, anyway, but that's another story for another post.)

Update from Maciej Stachowiak of Apple's WebKit team:

Most WebKit-based browsers (and Safari in particular) would probably do a better job rendering HTML than XHTML or generic XML, if only because the code paths are much better tested. So the Accept header is somewhat in error. On the other hand, this isn't a hugely important bug, and we design our Accept header mainly to give the best compatibility on Web sites, since content negotiation is not really used much in the wild. Our current header was copied from an old version of Firefox.

Internet Explorer Accepts Polluting the Internet

We've covered the good and the bad. Now let's talk about Internet Explorer. The IE team made great strides with being a nicer player on the web. Unfortunately, its Accepts header is downright ugly:

GET /book/html/index.html HTTP/1.1
Host: RecessFramework.org
Accept: image/jpeg, application/x-ms-application, image/gif,
        application/xaml+xml, image/pjpeg, application/x-ms-xbap,
        application/x-shockwave-flash, application/msword, */*

This is the Accepts header for IE8 on a Windows 7 machine. One peculiarity is the "application/msword" MIME-type. Office isn't installed but the Word Document Viewer is. This made me wonder, what does IE's Accept header look like on a machine with Office installed? Brace yourlselves:

GET /book/html/index.html HTTP/1.1
Host: RecessFramework.org
Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application,
        application/vnd.ms-xpsdocument, application/xaml+xml,
        application/x-ms-xbap, application/x-shockwave-flash,
        application/x-silverlight-2-b2, application/x-silverlight,
        application/vnd.ms-excel, application/vnd.ms-powerpoint,
        application/msword, */*

Ok, now let's translate to English:

Dear RecessFramework.org,
I want the resource "/book/html/index.html". Now, bear with me, I'm Internet Explorer and Office is installed so I can accept this resource in a lot of formats, in this order of preference: GIF, JPG, Progressive JPG, Click Once App, Microsoft XPS Document, XAML, XAML Browser App, Flash, Silverlight 2, Silverlight 1, Excel Document, Powerpoint Document, or a Word Document. If you can't give me "/book/html/index.html" in any of those formats then give me anything you've got!
Thanks,
Internet Explorer

There are two things wrong with this picture. The lesser evil: IE has a hook for other applications to insert new MIME-types into its Accept header. This means if a resource could be represented on the server as a Word Document or as an HTML document, Word as an application can inject behavior into IE so that it always has higher precedence than HTML. All an application has to do is modify the registry (HKLM/Software/Microsoft/Windows/CurrentVersion/Internet Settings/Accepted Documents). (Hear that Cisco? You could increase internet consumption if you stuck a couple 255 character WebEx MIME-types in IE's Accept header.)

The greater evil is that IE sends this ~200-300byte Accept header for every single browsing request. 250 bytes isn't much, but on internet scale per every request of the most popular browser, it adds up. Internet Explorer's Accept header emissions pollute the information superhighway. Lets do some back-of-napkin calculation. Google gets 294 million searches a day now. If IE has roughly 55% market share thats 162 million IE requests on Google a day for 38GB worth of garbage internet traffic. On Google searches alone, IE pollutes the internet with over a terabyte of traffic every month in its Accept header. Anyone want to estimate what this number looks like across the rest of the internet?

Update 1: IE team Program Manager Eric Lawrence "I strongly recommend that developers not list MIME types here." Yet Silverlight and Office do. Whoops.

Update 2: IE doesn't send the extended header on *every* request, it sends */* for refreshes and some subsequent visits. [IEBlog]

It is not just wasted bandwidth that is the problem, it is wasted server processing, too. If a server or framework wants to follow through on the HTTP protocol the server must be sure it can't respond with any of the requested formats before it can respond with HTML. Bottom line: IE's Accept header is extremely ugly.

If WebKit is Foolish and IE is Prodigal how valuable is the Accept header?

This was the question I asked myself about half-way through writing the Accept parsing and content-negotiation code going into the next release of Recess, a RESTful PHP framework.

Content-negotiation with the Accept header is an interesting idea in principle that is hard to use properly in practice because browsers misuse it. As stated, Twitter's REST API doesn't use the Accept header for content-negotiation, they use extensions on the URL '.json' and '.xml'. Rails disables the Accept header by default. Frameworks can enhance performance by ignoring the Accept header and relying on '.xml'-like extensions. As such the next release of the Recess Framework, too, will disable Accept header based content-negotiation by default.

So, when would you want to parse Accept headers for content negotiation? When your consumers are respectful of HTTP and REST (RESpecTful!). This could mean RIAs written in javascript, Flash, or Silverlight. It could also mean other other servers consuming your RESTful API.

Bottom line: If you're building APIs for other developers to consume, consider using Accept-based content-negotiation. If you're building consumer facing web apps: ignore the Accept header until WebKit and IE get their acts together.

Comments

's avatar
IChrisI

I wonder how quickly this would change if all servers, everywhere, started obeying the request header. IE wants image/gif or image/jpeg first, so pack it up and send it off to them.

's avatar
Oliver

How does webkit kill xhtml? IE is the *only* browser that does not support XHTML -- its ie that forces people to send xhtml as text/html and thus force webkit, gecko, and opera to parse it as invalid html.

Ignoring that xhtml is still alive of course: xhtml2 was ignored by everyone because xhtml is just meant to be the xml serialisation of html, and html5 defines that in addition to what all the elements and APIs are.

's avatar
Kris Jordan

@Chris I have long pondered the use case of preferring an image over HTML. Would be interesting to see someone actually follow through on this!

@Oliver Sorry for the confusion, WebKit is "killing" XHTML with HTML5 where "killing" is a good thing / moving the web forward.

's avatar
Someone Ugly

Get your facts straight before you start rambling on.

In the accept field, the order in which they appear DO NOT denote the preference. the q= field is the one that denotes preference.

There is a reason behind the Webkit and IE bloat. Their used not just for browsing but also a lot of internal components of the OS.

's avatar
Tom

Hmmm, image/* preference in hard-to-code-for browsers? I'm just going to start rendering all my pages server-side with Gecko and sending them as images :)

's avatar
Elijah Grey

A nice solution that would get people off of IE would be to respect the accept header and return a GIF screenschot of the website.

's avatar
Jeff

The Accept header *could* be great, but I will always prefer the url extension denoting format over Accept. Its a more simple and accessible solution.

@Kris: Havent used Recess yet, but it looks solid. Keep up the good work!

's avatar
Scott

Not knowing the part of the spec that describes the Accept header, I got a little confused with deciphering it on my own. Maybe if you describe the format <i>before</i> you give an example, it might be clearer, like, "The Accept header is a comma-separated list of mime types, <i>each</i> optionally followed by a quality indicator." This establishes that the comma separates the important parts, not the semi-colon.

My point being that I thought the q= part would apply to all preceding mime types until the previous q= indicator. Please forgive me.

- Scott

's avatar
Scott

Yes, it must be Friday. I dont read so good. Cheers.

's avatar
Jack Watson

Easy solution, use Firefox!

RT
www.anonymize.tk

's avatar
Jim Garrison

Great article. One thing you didn't mention is that Firefox actually changes its Accept header based on the context. For instance, if it is doing a request because the url is referenced in an <img> tag, it will send an Accept header that clues the server that images are preferred over all else:

Accept: image/png,image/*;q=0.8,*/*;q=0.5

Unfortunately, the Firefox developers got this behavior wrong when they implemented the new <audio> and <video> elements in Firefox 3.5 (see https://bugzilla.mozilla.org/show_bug.cgi?id=489071). The Accept header for an audio link claims to prefer text/html over any known audio format. :-/

's avatar
Matt Dawson

I might just be dense, but in that Webkit header, the html accept has a higher priority than png, right? I see html as having a .9 quality and the png as having a .5...

Or am I missing something?

's avatar
edogawaconan

application/xml is originally mentioned as one of recommended media type in previous xhtml media types document (http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020801/#application-xml). It's removed in latest version though.

's avatar
Kris Jordan NMC team member

@Scott - No worries, it is not the most straightforward to explain.

@Jim - I did not know about the mishap with audio/video. Interesting. Also to note are the Accept headers for various XHR requests: http://www.grauw.nl/blog/entry/470

@Matt - The preferential q value only applies to the type it is specified on. So WebKits:

"application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, */*;q=0.5"

Is just the same as:

"*/*;q=0.5, application/xml, application/xhtml+xml, image/png, text/plain;q=0.8, text/html;q=0.9"

's avatar
Victor

Very interesting how text/html is not even mentioned in the IE accept header. Considering that */* actually does mean everything, and application/xhtml+xml presumably is more specific than text/html, IE is implicitly saying it supports XHTML better than HTML.

's avatar
Mike Seth

Congratulations, you failed to understand the problem.

First, the Accept: header does not specify which kinds of content the browser is "looking for" but which ones it is willing to accept. For example, if you are requesting text/plain as a file from a webserver and the file is not there, you will receive a 404 response and a 404 error page content which is unlikely to be text/plain.

Second, ordering of the Accept: content type arguments is not defined, and unless otherwise specified, default weight of 0.1 is assumed. It says so in the very specification bit you pasted. Did you read it?

's avatar
Rob Reisser

@Kris Jordan: Interesting read!

@Mike Seth: Congratulations for your bad trolling! In your first part you're ranting about semantics. Maybe you don't like the way the browsers were paraphrased but it seems to make sense to everyone but you. And yes, the example you give is covered in the article. It's the part where the browser says "I'll take anything you've got" ;D

As for the spec. I'll give you the direct link to the related section: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.1

And let me quote for you: "The default value is q=1." In this case the spec writers have used the decimal point as a full stop which denotes the end of a sentence. I see how '1.' might be mixed up with '.1' so I'll quote the relevant part again without unnecessary syntax - except for quotation marks - "q=1"

There you go.

's avatar
Woody Gilk

Wow, this article (and the comments) reek of FUD and blatant trolling.

If you added the default value of "q=1.0" for WebKit, you would end up with this:

application/xml;q=1.0,
application/xhtml+xml;q=1.0,
text/html;q=0.9,
text/plain;q=0.8,
image/png;q=1.0,
*/*;q=0.5

Which literally translates into "I would prefer any of the following content types application/xml, application/xhtml+xml, and image/png. I would also accept the following content types in this order: text/html, text/plain. If none of those types are available, I would accept another type."

The order of the accept types has _absolutely nothing_ to do with the preference, only the "q" value denotes preference.

I fail to see how the WebKit Accept header is in any way unacceptable, it is just not content-sensitive. The IE accept header is terrible just because it does not use any "q" values at all.

's avatar
Koen Deforche

Sorry to disagree.

Content-negotiation using Accept headers is a life-saver for sites that use browser-side graphics rendering techniques. Inline SVG is only possible using the application/xhtml+xml mime-types. Inline SVG is for example what Google maps will use, or other sane frameworks such as http://www.webtoolkit.eu/wt, if they detect support for it, because it works well and is more feature rich than the alternatives (HTML 5 canvas; rotated text anyone ?); and has indeed a different purpose all together.

Most browsers (including IE) correctly advertise their support for application/xhtml+xml, and this is therefore simply a valuable feature.

's avatar
Toby Inkster

Connection Negotiation is not dead despite some browsers sending slightly iffy Accept headers.

One factor that you fail to take account for is server-side preferences. The IE Accept header may advertise that it accepts text/html and application/xhtml+xml equally, but the response type may be calculated based on not just the q values given by the browser, but also q values assigned by the page author.

Example: FooBrowser advertises that is accepts text/html;q=1.0 and application/xhtml+xml;q=0.8. Example.com has two resources on offer, an XHTML page which has been carefully authored, and judged to have quality 1.0; and an HTML page auto-generated from the XHTML which has had various cool things like inline MathML stripped out, so is judged to have quality 0.5. The calculated qualities are thus HTML=1.0*0.5=0.5 and XHTML=0.8*1.0=0.8, so the server sends the XHTML.

Apache supports server-side preferences like this, as does my own PHP ConNeg class used on my website. I connection negotiate between text/html, application/xhtml+xml, text/plain and various other formats too, without any negative consequences in IE or Safari (IE gets HTML, Safari gets XHTML) and without any browser sniffing or special handling for them either. Its simply a matter of getting your server-side preferences correct.

's avatar
Omega

Good post! Thanks for the notes and definitely a good hard look at things.

There's definitely more overhead when processing HTTP Accept headers.

Scary stuff.

's avatar
Laj

Should be noted that Opera doesn`t change its accept header for inline image requests, really liking an XHTML pages and less so an image. Screwed up my webcomic idea of having the same location for the page and the image on the page. I planned on just letting it return an image/png in the case of an image and an application/xhtml+xml in the case of that, worked on Firefox, chrome et cetera, but not Opera because of that. Thought it was pretty nifty to just be able to hotlink the url in the top on boards and let the server figure it out, I sent them a bug report though.

Also, since when does a layout engine determine the accept header?

's avatar
Caleb L

So there are certainly good arguments around functionality, but I would add another arguement for changing the defaults in browsers: so that a server can tell when it is about to render content in a format that will be meaningless.

When you start to get a large enough base of content in any large content management system that is generated by not-so-competent people, you will inevitably get users who embed incorrect links into image tags. For instance an image request that is on, say, search.google.com that has a target of "null". If you are Google, and you interpret an HTTP request to http://search/google.com/null as a query for search term "null", then you are going to spend needless cycles running a search for "null" which, if you knew that the request was in an image tag, you could just ignore/return a default 1x1 pixel. That is a hypothetical use case, and I doubt Google works that way, but there are probably plenty of search engines that do, and it would be nice if they could reduce resources consumed by what are really just bugs.

The Accept header seems like it would be a great way to say "These are the only content types that will render meaningfully in the browser". Only, as this article notes, most browsers dont use it for this. Obviously for the main HTML, it might not apply so much, but when making requests for CSS, JS, images, audio, video, svg, etc... why not use Accept to tell the server exactly the formats that will actually be used? Will IE/WebKit ever actually RENDER HTML if it is returned as the response for an image request? If not, then why bother including it as an accepted type?

Ah well... browsers dont work the way I want. News flash. :)

's avatar
alleey
's avatar
Michael Tsang

I don't think WebKit is getting wrong. It happily accepts XML. My site is doing content negotiation but not in the standard way. It first check if there is application/xhtml+xml, if it is, send it as XHTML, if there is text/html, send it as HTML, if there is */*, send it as HTML, finally, throw an HTTP 415 out. The purpose of WebKit HTTP Accept header is wanting the server send XHTML instead of HTML.

's avatar
Mathieu
I'm glad to tell you that now webkit and IE both have correct Accept header. We can turn back on Accept handling

Leave a comment

Real Time Web Analytics