Wednesday, July 8, 2009

Top 10 sites running on Google App Engine (July 2009)

Two months ago I compiled a toplist with the ten most popular sites running on the Google App Engine (see App Engine top 10, May 2009).

I figured it was time to create a new toplist with update traffic rankings for July 2009. I've used the same methodology as for the May 2009 - that is using Alexa as a proxy (albeit imperfect) for site popularity.

Here is the list of the most popular site running on Google App Engine as of July 8, 2009: The following sites were on the previous top 10 list (compiled May 5, 2009) but did not make it to the updated list:
  • robtex.com had the number one spot in the last toplist but seems to have moved to Amazon EC2 (!)
  • chromeexperiments.com and desktop-reporting.com are still running on Google App Engine but their Alexa rankings did not allow for a top 10 placement on the new list
Do you believe that I've missed a Google App Engine hosted site that belongs to the top ten list? Please leave a comment! Don't forget to specify the Alexa ranking of the site that you believe should be added to the list. Please also let me know of any false positives in the list above. Well, please let me know if the list can be improved in any way, shape or form.

Tuesday, May 12, 2009

Google App Engine runtime switching and the mysterious multiple of 9001

Following my previous posts (see #1, #2, #3) on JVM-switching on the Google App Engine platform I've done some additional testing to find out how requests are spread between GAE/J runtimes/JVMs.

These are my results from the latest round of testing:

During a period of 443 hours 1,132,310 requests were successfully served by a total of 13 runtimes (as identifed by Runtime.getRuntime().hashCode()). Approximately one request per second was sent during the duration of the test.

The requests were divided between the 13 runtimes as follows:
  • Runtime #: First request served - last request served (active during X hours), Y requests served
  • A: 2009-04-24 12:35:11 - 2009-05-12 20:30:31 (440 h), 205042 reqs
  • B: 2009-04-24 12:54:58 - 2009-05-12 07:01:00 (426 h), 36286 requests
  • C: 2009-04-24 17:41:18 - 2009-05-12 23:29:23 (438 h), 126793 requests
  • D: 2009-04-24 21:16:22 - 2009-05-12 21:00:07 (432 h), 93573 requests
  • E: 2009-04-25 02:25:25 - 2009-04-25 05:53:43 (3 h), 9001 requests
  • F: 2009-04-27 17:19:38 - 2009-05-06 23:04:39 (222 h), 180020 requests
  • G: 2009-04-27 23:44:52 - 2009-05-11 19:37:07 (332 h), 204317 requests
  • H: 2009-04-28 00:24:36 - 2009-04-28 03:53:25 (3 h), 9001 requests
  • I: 2009-04-28 14:36:22 - 2009-05-11 13:40:38 (311 h), 99011 requests
  • J: 2009-04-30 05:19:11 - 2009-05-11 02:49:43 (262 h), 135180 requests
  • K: 2009-04-30 19:30:01 - 2009-05-12 23:12:16 (292 h), 25084 requests
  • L: 2009-05-01 16:25:13 - 2009-05-01 19:54:14 (3 h), 9001 requests
  • M: 2009-05-11 08:11:53 - 2009-05-11 08:11:53 (0 h), 1 request
  • Total: 13 runtimes, 1,132,310 requests
As we can see there seem to be no upper limit on how long a runtime is kept alive. It seems like the runtimes are kept alive as long as new requests are coming in continuously.

At the time of doing this summary only two of the thirteen runtimes had served requests during the last two hours. These were runtimes C and K. Let's classify these runtimes as active.

This leaves us with eleven inactive runtimes. If we look at the request counts for these inactive runtime we find that five of the eleven runtimes have request counts that are multiples of 9001 (number of requests served % 9001 == 0). These are: runtime E with 9001 requests, F 180020, H 9001, I 99011 and L 9001.

It seems like each active runtime is given requests in increments of 9001 requests. My hypothesis is that one test request is passed to a newly introduced runtime, and that if this test request is successfully served within a pre-defined timeout period then the runtime is allowed to serve 9000 requests. The testing procedure is then repeated after those 9000 requests have been served. These are my speculations, so please comment if you can confirm or dismiss these speculations.

Tuesday, May 5, 2009

Top 10 most popular sites running on Google App Engine

Checking whether a specific website is running on the Google App Engine is pretty easy (see "How to identify if a website is hosted using Google App Engine").

So what are some popular sites running on the Google App Engine?

I've compiled a list of what I believe are the ten most popular sites currently running on the Google App Engine. In this context the site's Alexa ranking is taken as a proxy for the site's popularity. Sure, Alexa is not a perfect indicator of popularity, but it is the best publicly available traffic data source there is.

Here is the list of the most popular site running on Google App Engine as of May 6, 2009:
Do you believe that I've missed a Google App Engine hosted site that belongs to the top ten list? Please leave a comment! Don't forget to specify the Alexa ranking of the site that you believe should be added to the list. Please also let me know of any false positives in the list above. Well, please let me know if the list can be improved in any way, shape or form.

Jaiku.com should probably be given a special mention. While Jaiku.com appears to be running on the App Engine platform I'd guess that Jaiku is running in a server environment that is separate from the standard public GAE environment. The CNAME setup of www.jaiku.com would support that hypothesis (CNAME jaiku.l.google.com instead of the standard GAE CNAME of ^(ghs|appspot)) . Does anyone know if that is the case? Please leave a comment.

I'll update this top list from time to time as I'm certain that a significant number of high-traffic sites will be migrated to GAE over the coming months thanks to the recently added support for Java (or more specifically Java bytecode).

Monday, May 4, 2009

How to identify if a website is hosted using Google App Engine

If a website is hosted using Google App Engine the following three conditions should hold true:
  • The website's CNAME points to ghs.google.com, ghs.l.google.com or appspot.l.google.com
  • Accessing the website's /form path returns a Google-styled 404 page
  • The website's "Server"-header is "Google Frontend"
The three conditions can easily be tested. On a standard Mac OS X system these commands can be used to test if www.example.com is hosted using Google App Engine:

CNAME google.com:
  dig www.example.com cname | egrep -i 'cname.*google.com'
Google 404 page for /form:
  curl -s -D - http://www.example.com/form | egrep 'G.+o.+o.+g.+l.+e'
"Google Frontend" server string
  curl -s -D - http://www.example.com/ | egrep '^Server:'


Please note that the first two conditions (CNAME + /form) are true also for Blogspot hosted domains, Google Apps hosted domains and possibly other Google hosted services. However, the server string can be used to distinguish between GAE hosted domains (with server string "Google Frontend") and Blogspot/Google Apps domains (with server string "GFE/2.0").

So if all three conditions are true that would strongly indicate that the website is hosted using Google App Engine.

Please let me know if you have any counter-examples - either sites that fulfill the three conditions but are NOT hosted using Google App Engine (false positives), or sites that are hosted using Google App Engine but do not fulfill the three conditions (false negatives).

Sunday, May 3, 2009

The per page-view cost of hosting a resonably efficient GAE/J application

The new CPU and bandwidth free quota levels that will be put in place on May 25, 2009 will be:
  • Bandwidth: 1 GB of data transferred in per day, and 1 GB out per day.
  • CPU time: 6.5 hours of CPU time per day (measured in 1.2 GHz Intel x86 CPU time)
According to calculations done by the Google App Engine team the new free quota levels will serve around 5M page-views per month for a "reasonably efficient application".

What's the per page-view assumptions behind these calculations?

Let's start with the bandwidth:

It is safe to assume that the amount of consumed outgoing bandwidth is much larger than the amount of incoming bandwidth consumed (average size of HTTP-request < average size of HTTP-response). 1 GB per day should serve at least 5M page-views per month. Assuming 30 days per month:
  • (30 days * 1 073 741 824 bytes/day) / 5 000 000 page-views = 6442 bytes of outgoing bandwidth per page-view
And the CPU time:

6.5 CPU-hours per day should be enough to serve at least 5M page-views per month. Again assuming 30 days per month:th:
  • (30 days * 6.5 CPU-hours/day * 3600 CPU-seconds/CPU-hour) / 5 000 000 page-views = 0.1404 CPU-seconds per page-view
When doing back-of-the-envelope calculations for estimating the resource usage for a "normal" GAE/J application, use the following numbers:
  • Outgoing bandwidth: 6442 bytes per page-view
  • CPU time: 0.1404 CPU-seconds per page-view
Using the same assumptions - what's the per page-view cost of hosting an application using Google App Engine?

The current per unit price of billable Google App Engine resources:
  • Outgoing bandwidth: 0.12 USD per gigabyte
  • Incoming bandwidth: 0.10 USD per gigabyte
  • CPU time: 0.10 USD per CPU-hour
    (measured in 1.2 GHz Intel x86 CPU time)
  • Stored data: 0.15 USD per GB and month
  • E-mail recipients: 0.0001 USD per recipient
The per page-view cost can be estimated as:
  • Bandwidth cost for 6442 bytes: 6442 bytes * 0.12 USD per GB / 1073741824 bytes per GB = 0.0000007199 USD per page-view
  • CPU cost for 0.1404 CPU-seconds: 0.1404 CPU-seconds * 0.10 USD per CPU-hour / 3600 CPU-seconds per CPU-hour = 0.0000039000 USD per page-view
Which gives a total of 0.0000046199 USD per page-view.

So in summary - the per page-view cost of hosting a resonably efficient GAE/J application is roughly 0.0000046199 USD per page-view. This is equivalent to 4.62 USD per million page-views.

Google App Engine: Google's monthly 322.50 USD gift to you

As you've probably noticed from my previous posts I'm really thrilled by the Google App Engine/J platform. Thanks to some clever architectural decisions GAE/J gives a properly written Java application near infinite scalability. So from a technical perspective I believe that the GAE/J is a slam dunk. But what about the economics?

The quotas of a free GAE account has been really really generous. The free quotas will be adjusted downwards on May 25, 2009 but the quotas will remain very generous.

These are the changes that will be made on May 25, 2009 (the quotas are on a per application basis):
  • Outgoing bandwidth: The current 10 GB per day will be decreased to 1 GB per day.
  • Incoming bandwidth: The current 10 GB per day will be decreased to 1 GB per day.
  • CPU time: The current 46 CPU-hours per day will be decreased to 6.5 CPU-hours per day.
  • Stored data: The current quota of 1 GB space is left unchanged.
  • E-mail recipients: The current quota of 2000 e-mail recipients per day is left unchanged.

What are the values of these free quotas before and after these quota changes?

The current per unit price of billable Google App Engine resources:
  • Outgoing bandwidth: 0.12 USD per gigabyte
  • Incoming bandwidth: 0.10 USD per gigabyte
  • CPU time: 0.10 USD per CPU-hour (measured in 1.2 GHz Intel x86 CPU time)
  • Stored data: 0.15 USD per GB and month
  • E-mail recipients: 0.0001 USD per recipient

The current free quotas (before May 25, 2009) translate into the following theoretical dollar values:
  • Outgoing bandwidth: 10 GB/day * 30 days per month * 0.12 USD/GB = 36.00 USD
  • Incoming bandwidth: 10 GB/day * 30 days per month * 0.10 USD/GB = 30.00 USD
  • CPU time: 46 CPU-hours/day * 30 days per month * 0.10 USD/CPU-hour = 138.00 USD
  • Stored data: 1 GB * 0.15 per GB and month = 0.15 USD
  • E-mail recipients: 2000 recipients/day * 30 days per month * 0.0001 USD/recipient = 6.00 USD
Total value of free quota: 210.15 USD per month and application. Each GAE-account has a maximum of 10 applications, so the theoretical value of one GAE-account with 10 fully utilized applications is 2101.50 USD per month. This is before the free quota change on May 25, 2009.

The new free quotas (from May 25, 2009 and onwards) translate into the following theoretical dollar values:
  • Outgoing bandwidth: 1 GB/day * 30 days per month * 0.12 USD/GB = 3.60 USD
  • Incoming bandwidth: 1 GB/day * 30 days per month * 0.10 USD/GB = 3.00 USD
  • CPU time: 6.5 CPU-hours/day * 30 days per month * 0.10 USD/CPU-hour = 19.50 USD
  • Stored data: 1 GB * 0.15 per GB and month = 0.15 USD
  • E-mail recipients: 2000 recipients/day * 30 days per month * 0.0001 USD/recipient = 6.00 USD
Total value of free quota: 32.25 USD per month and application. Each GAE-account has a maximum of 10 applications, so the theoretical value of one GAE-account with 10 fully utilized applications is 322.50 USD per month. This is after the free quota change on May 25, 2009.

In summary: The current enormously generous quota allocation worth 2101.50 USD per month is reduced to the still extremely generous new free quota worth 322.50 USD per month on May 25, 2009. Thanks Google!

Sure - the value calculations are a bit theoretical, but my point is that the basic price tag (free!) for a GAE/J account is very generous - before and after May 25, 2009!

Monday, April 27, 2009

Getting started with Grails 1.1.1-SNAPSHOT on Google App Engine/J

Official Grails support for Google App Engine/J is coming with Grails 1.1.1 that is set to be released this week according to a recent Twitter post from Grails project lead Graeme Rocher.

I'm really looking forward to this release - the combination of Grails (high productivity) and GAE/J (high scalability) is a very nice match.

If you're eager to get going before the official Grails 1.1.1 release - follow this small guide to build Grails 1.1.1-SNAPSHOT from source and deploy your first Grails GAE/J application.

# Step 1: Fetch and build Grails 1.1.1-SNAPSHOT
cd
mkdir grails-1.1.1-build
cd grails-1.1.1-build
git clone git://github.com/grails/grails.git
cd grails/grails
export ANT_OPTS="-Xmx512m"
ant jar
export GRAILS_HOME="$(pwd)"
export PATH="${GRAILS_HOME}/bin:${PATH}"
grails
# Version string should show "Grails 1.1.1-SNAPSHOT"

# Step 2: Create Grails-GAE/J application
grails create-app gaetest
cd gaetest
grails uninstall-plugin hibernate
emacs grails-app/conf/BuildConfig.groovy
# Add: google.appengine.sdk="/path/to/appengine-java-sdk-1.2.0"
grails install-plugin app-engine
# Fix a bug in the app-engine plugin.
emacs ~/.grails/1.1.1-SNAPSHOT/projects/gaetest/plugins/app-engine-0.5/scripts/_Events.groovy
# Add <sessions-enabled>true</sessions-enabled> after <version>${appVersion}</version>
grails install-templates
emacs src/templates/war/web.xml
# Remove the <welcome-file-list> definition
grails app-engine run
# Now browse to http://localhost:8080/

# Step 3: Deploy Grails-GAE/J application
grails set-version 1
emacs grails-app/conf/Config.groovy
# Add: google.appengine.application="app-id"
grails app-engine deploy
# The app should now be deployed at http://app-id.appspot.com/