Time.to_s and the 19 million row scan 3
Lately we've been noticing slowness on Lingr's database. Some queries were taking a ridiculously long time to complete, even some COMMITs were taking half a minute or more. The slowdown seemed to be across the board, rather than isolated into a particular table or a particular query.
After much investigation, MySQL upgrades, etc, Kenn finally noticed something weird in the slow query logs.
We had many queries in there that referenced time ranges, like this:
SELECT * from events WHERE created_at >= '2008-01-21 00:00:00' ...
but occasionally, one would show up that looked just a little bit different, like this:
SELECT * from events WHERE created_at >= '2008-01-21 00:00:00 PST' ...With the help of MySQL's EXPLAIN statement, Kenn quickly discovered that while the first type of query would use indexes, and scan only, say, 100,000 rows, the second type would scan all 20,000,000+ rows. Further examination of the output of EXPLAIN revealed that the second query issued 4 warnings. SHOW WARNINGS revealed the following:
Incorrect datetime value: '2008-01-21 00:00:00 PST' for column 'created_at' at row 1
YIKES! We misformatted a timestamp in a query, and as a result, MySQL was scanning the entire 20M+ row table every time.
I tracked this query down in our code to something like this:
UserMessage.find_all_by_room_id @room.id, :conditions => "created_at >= '#{@time.to_s}'..."
As it turns out, time.to_s renders the timezone by default. Rails has added a nice format specifier, so you can say time.to_s(:db), and we were doing this everywhere else, but had forgotten the (:db) in this one place.
MySQL was barfing on the timezone specifier, and silently falling back to scanning the entire table. To add insult to injury, this query occurred in the rendering of an RSS feed, so it was used not infrequently.
Lesson learned- be very careful in how you format queries involving datetime fields. Forget to use the :db format specifier, and you could suffer terrible consequences.
For the extra curious, I did manage to find a MySQL ticket that seemed somewhat related.
HTTP Performance and Versioned URLs 0
Tenni Theurer at Yahoo has published a blog post detailing performance characteristics of typical HTTP page loads. While the article itself doesn't present any groundbreaking information (I've got high hopes for part 2, however), it does validate one important point:
Our experience shows that reducing the number of HTTP requests has the biggest impact on reducing response time
That's what the Versioned URLs plugin is all about.
Many people have the mistaken impression that browser caching reduces HTTP requesets- it doesn't, at least not usually. What it does do is reduce wire traffic. Versioned URLs are about not just shortening the HTTP transaction, but entirely eliminating it.
I'm definitely looking forward to Part 2 of Tenni's post to see what they recommend. Things like file aggregation (where multiple js or css files are aggregated into a single file) can help, but, ultimately, I can't see any incremental improvement beyond eliminating the entire HTTP request.
Going to Internet Identity Workshop 0
I will be attending the Internet Identity Workshop next week in Mountain View. If any readers are attending, please let me know and let's meet up!
I've been thinking a lot about OpenID lately- I've even prototyped a Rails site that uses OpenID for authentication (it was quite simple using JanRain's ruby-openid gem). I'm excited to attend this workshop and learn more about what i think is an important emerging technology!
Lingr Interview on Softies on Rails 1
Rails Core Team- time to cash in! 0
In a previous post at the Lingr blog, I offered drinks on me to the Rails Core Team, Thomas Fuchs, and Sam Stephenson, at any professional conference where we might meet up.
I'll be attending the Web 2.0 Conference next week, so, if you qualify for the drinks (or even if you don't- I might be feeling generous), let me know and let's meet up.
We've created a Lingr room for the conference, so drop in during the conference and let's have some backchannel talk about what's really going on.
Versioned URLs Plugin for Rails 23
Executive Summary
- Install versioned_urls plugin via
script/plugin install http://svn.lingr.com/plugins/versioned_urls
- Add appropriate rewriting and cache-header-pushing configuration to your web servers, e.g., for lightty:
url.rewrite-once = ( "^/(.*\.(css|js|gif|png|jpg))\.v[0-9.]+$" => "/$1" ) expire.url = ( "/stylesheets/" => "access 10 years" , "/javascripts/" => "access 10 years", "/images/" => "access 10 years" ) - Enjoy complements from your users about how responsive your site is
Gritty Details
One of the great "for free" features in Rails is asset timestamping. This feature, built into most of the methods in ActionView::Helpers::AssetTagHelper, automatically appends the timestamp of the referenced asset to generated URLs. So, when you write something like
<%= javascript_include_tag 'application' %>it actually generates something like this
<script src="/javascripts/application.js?1161807361" type="text/javascript"></script>where the
1161807362 parameter is the file modification time for /javascripts/application.js.
The theory is that, as long as applicaiton.js remains unchanged, the timestamp and URL remain the same, and the browser caches application.js. When application.js changes, the timestamp and URL change, and the browser refetches application.js. But, in fact, this is just the theory- not the reality.
In reality, the browser only halfway caches application.js. When the browser encounters the next
<script src="/javascripts/application.js?1161807361" type="text/javascript"></script>
it will actually send an HTTP GET request for /javascripts/application.js?1161807361, but it will include an If-Modified-Since header to notify the server that it should only return the requested data if it has changed since it was last requested. If application.js hasn't changed, the server sends a 304 Not Modified response, with no response body.
This is all well and good in that it saves the download time for application.js, but we still pay the TCP setup and teardown time, even when the asset hasn't changed. For many web applications, the number of javascripts, stylesheets, images, etc., included via asset tags is quite large. Thus, the cumulative penalty we pay in TCP setup/teardown to request unchanged assets can grow to a significant (read- noticeable) amount.
What we would really like to do is somehow tell the browser not to even bother asking if the asset is modified- that is, effectively, tell it that the asset will never change. But how can we be sure that an asset will never change? Simple- we just ensure that, when and if it does change, we modify its URL.
Now, we already know that the asset timestamping feature does just this (changing the URL whenever the referenced asset changes), but, it happens to do it in such a way that isn't completely compatible with some browsers caching systems. The main issue is that some browsers will not cache resources referenced by URLs that contain parameters (e.g.- ?1161807361). What we need is a way to move the asset version token into the path part of the URL. In other words, we need to produce code like this:
<script src="/javascripts/application.js.v1161807361" type="text/javascript"></script>
Now things are getting really tricky. With Rails' default asset timestamping feature, there's no need to make configuration changes to your web server, because it understands that a request for /javascripts/application.js?1161807361 is actually asking for the asset /javascripts/application.js. It can find /javascripts/application.js just fine, so, no problem there.
But, with versioned URLs, the server will receive requests for things like /javascripts/application.js.v1161807361, which it has no idea how to satisfy.
How can we solve this? We can use URL rewriting.
URL rewriting is a feature available at least in Apache and Lighttpd, and probably in just about any widely-used web server. If you use lightty, you'll need to add something like this to your lighttpd.conf:
url.rewrite-once = ( "^/(.*\.(css|js|gif|png|jpg))\.v[0-9.]+$" => "/$1" )
This tells lightty to interpret a request for /javascripts/application.js.v1161807361 as a request for /javascripts/application.js, so, everyone is happy again.
Now, the final step- telling the browser that the asset will never change. What we need to do is push back Expires and Cache-Control headers whenever we serve an asset that has a versioned URL. With lightty, you can do this by adding something like the following to your lighttpd.conf:
expire.url = ( "/stylesheets/" => "access 10 years" ,
"/javascripts/" => "access 10 years",
"/images/" => "access 10 years" )
Ten years is effectively "forever" in web terms, but you can use any ridiculously long time period you feel like.
Now, to the real point of this post :-). Today, we are releasing a versioned_urls plugin for Rails. Install the plugin via:
script/plugin install http://svn.lingr.com/plugins/versioned_urlsand, voila, you've got versioned asset URLs for free, using the file modification time as the file version.
Note that, by default, URL versioning is only active when RAILS_ENV != 'development', because, if you use WEBrick, you don't have URL rewriting. If you use a rewrite-capable web server in development, just set VersionedUrlsPlugin::enable appropriately.
The other configuration parameter is VersionedUrlsPlugin::version_for_asset. As I mentioned, by default, the version for an asset is its file modification time. If you want to do something more sophisticated, you can set VersionedUrlsPlugin::version_for_asset to a Method, Proc, or anything else that respond_to?(:call) (see versioned_urls.rb for details). For example, at Lingr, we use the subversion revision number of an asset as its URL version, via something like this:
YAML.load(`svn info #{file}`)['Last Changed Rev']
Of course, we use local caching to ensure that we only do svn info the very first time an asset is requested.
Finally, I really should refer you to Cal Henderson's excellent article at Vitamin. Cal is one of the top technical guys at Flickr, so, when he talks about optimizing HTTP semantics, he's speaking from the top of a big pile of data :-). Flickr has been using versioned urls for quite some time now. We're just the newcomers at this party.
Update - 01 Nov 2006
I should mention one caveat to using versioned URLs with images. If you are serving any images from CSS files, it requires careful planning.
URLs that you list in CSS files are hand-coded- they aren't generated by the ActionView::Helpers::AssetTagHelper methods. Thus, they don't get URL versioning. You typically end up with things like this in your CSS file:
.myStyle
{
margin: 0;
border: 0;
background: transparent url(/images/backgrounds/gradient-blue.gif) no-repeat -2px -20px;
}
It's important that you not push back the Expires and Cache-Control headers for images served from CSS, since the URLs for these images don't carry the versioning information.
A great way to approach this is to segregate your CSS-served images from your HTML-served images, then modify your expire.url statements to only push Expires and Cache-Control headers for the HTML-served images, like so:
expire.url = ( "/stylesheets/" => "access 10 years" ,
"/javascripts/" => "access 10 years",
"/images/html/" => "access 10 years" )
Thus avoiding pushing the Expires and Cache-Control headers for images located in /images/css/.
Rails, shall I compare thee to a summer's day... 0
Let me get this right out front– I love Rails
My days were mired in a endless succession of javac compile cycles, huge bloated IDEs, xml config file madness, build hell, etc. Then I found Rails, and I’ve never looked back.
I love everything about Rails- the overriding philosophy of convention over configuration, the extraordinary thought and care that has gone into the API, and the seemingly endless joy of discovering new aspects of the API or even the Ruby language itself. Beyond all the technical arguments that might make Ruby and Rails the right choice for a given web application, the most important point of all is that Rails makes happy programmers. DHH has said it elloquently in his presentations, and in the Optimize for Happiness section of Getting Real-
A happy programmer is a productive programmer
This is absolutely, 100% true. Programmers are, by nature, self-motivated and driven by what inspires them. If you put them on a new framework with as much ellegance and depth as Rails, their productivity will go through the roof.
I have yet to meet anyone who tried Rails and didn’t get the bug almost immediately. On the contrary- most of the sad stories you hear about Rails involve technologists who find Rails and fall in love with it, but then can’t convince management to take a risk on it. Well, I’ve got the perfect solution for that- become management.
So there you have it- I’m a complete Rails nut. So don’t expect any blog entries about the latest C++ or Java developments- I’m sticking to what I love.
