Improving Browser Performance 10x
We recently improved the performance of the Universe.com homepage by more than ten times. Let’s explore the techniques we used to achieve this result.
But first, let’s find out why website performance is important (there are links to the case studies at the end of the blog post):
- User experience: poor performance leads to unresponsiveness, which may be frustrating for users from a UI and UX perspective.
- Conversion and revenue: very often slow websites can lead to lost customers and have a negative impact on conversion rates and revenue.
- SEO: Starting July 1, 2019, Google will enable mobile-first indexing by default for all new websites. Websites will be ranked lower if they are slow on mobile devices, and don’t have mobile-friendly content.
In this blog post, we will briefly cover these main areas, which helped us to improve the performance on our pages:
- Performance measurement: lab and field instruments.
- Rendering: client-side and server-side rendering, pre-rendering, and hybrid rendering approaches.
- Network: CDN, caching, GraphQL caching, encoding, HTTP/2 and Server Push.
deferscripts, image optimizations (WebP, lazy loading, progressive), and resource hints (
For some context, our homepage is built with React (TypeScript), Phoenix (Elixir), Puppeteer (headless Chrome), and GraphQL API (Ruby on Rails). This is how it looks like on mobile:
Without data, you’re just another person with an opinion. ― W. Edwards Deming
Lab instruments allow collecting data within a controlled environment with the predefined device and network settings. With these instruments, it is much simpler to debug any performance issues and have well-reproducible tests.
Lighthouse is an excellent tool for auditing webpages in Chrome on a local computer. It also provides some useful tips on how to improve performance, accessibility, SEO, etc. Here are some Lighthouse performance audit reports with a Simulated Fast 3G and 4x CPU Slowdown:
There is, however, a disadvantage of using just lab instruments: they don’t necessarily capture real-world bottlenecks which may depend on the end-users’ devices, network, location, and many other factors. That is why it is also important to use field instruments.
Field instruments allow to simulate and measure real user page loads. There are multiple services which can help to get real performance data from the actual devices:
- WebPageTest — allows performing tests from different browsers on real devices from various locations.
- Test My Site — uses Chrome User Experience Report (CrUX) which is based on Chrome usage statistics; it is publicly available and updated monthly.
- PageSpeed Insights — combines both lab (Lighthouse) and field (CrUX) data.
There are multiple approaches for rendering content, and each has its pros and cons:
- Pre-rendering is similar to server-side rendering but happens during buildtime in advance instead of runtime. Pros: serving built static files is usually simpler than running a server, SEO-friendly, fast initial page load. Cons: requires pre-rendering all possible pages in advance on any code changes, the full page reloads, non-rich website interactions, limited access to the browser features.
We decided to rebuild some parts of the app by using React.
- Our developers are already familiar with building React applications (e.g. embedded widgets).
- We already have a few React component libraries which can be shared across multiple projects.
- The new pages have some interactive UI elements.
- There is a huge React ecosystem with lots of tools.
Pre-rendering and server-side rendering
Once we decided on using React, we started experimenting with other potential rendering options to allow browsers to render the content faster.
- Gatsby.js allows pre-rendering pages with React and GraphQL. Gatsby.js is a great tool which supports many performance optimizations out of the box. However, using pre-rendering doesn’t work for us since we have a potentially unlimited number of pages with user-generated content.
- Next.js is a popular Node.js framework which allows server-side rendering with React. However, Next.js is very opinionated, requires to use its router, CSS solution, and so on. And our existing component libraries were built for browsers and are not compatible with Node.js.
That is why we decided to experiment with some hybrid approaches, which try taking the best from each rendering option.
Puppeteer is a Node.js library allows working with a headless Chrome. We wanted to give Puppeteer a try for pre-rendering in runtime. That enables using an interesting hybrid approach: server-side rendering with Puppeteer and client-side rendering with the hydration. Here are some useful tips by Google on how to use a headless browser for server-side rendering.
Using this approach has some pros:
- Allows building a simple browser React application once, and using it both on the server-side and in browsers. Making the browser app faster automatically makes SSR faster, win-win.
- Rendering pages with Puppeteer on a server is usually faster than on end-users’ mobile devices (better connection, better hardware).
- We don’t need to know about all possible pages in advance in order to pre-render them.
However, we faced a few challenges with this approach:
- Throughput is the main issue. Having each request executed in a separate headless browser process uses up a lot of resources. It is possible to use a single headless browser process and run multiple requests in separate tabs. However, using multiple tabs decreases the performance of the whole process.
- Stability. It is challenging to scale up or scale down many headless browsers, keep the processes “warm” and balance the workload. We tried different hosting approaches: from being self-hosted in a Kubernetes cluster to serverless with AWS Lambda and Google Cloud Functions. We noticed that the latter had some performance issues with Puppeteer:
As we’ve become more familiar with Puppeteer, we’ve iterated our initial approach (read below). We also have some interesting ongoing experiments with rendering PDFs through a headless browser. It is also possible to use Puppeteer for automated end-to-end testing, even without writing any code. It now supports Firefox in addition to Chrome.
Hybrid rendering approach
Using Puppeteer in runtime is quite challenging. That’s why we decided to use it in buildtime with the help of a tool which could return an actual user-generated content in runtime from the server-side. Something which is more stable and has a better throughput than Puppeteer.
We decided to try the Elixir programming language. Elixir looks like Ruby but runs on top of BEAM (Erlang VM), which was created to allow building fault-tolerant and stable systems.
Elixir uses the Actor concurrency model. Each “Actor” (Elixir process) has a tiny memory footprint of about around 1–2KB. That allows for running many thousands of isolated processes concurrently. Phoenix is an Elixir web framework which enables high throughput and allows handling each HTTP request in a separate Elixir process.
We combined these approaches by taking the best from each world, which satisfies our needs:
- Puppeteer pre-renders React pages the way we want during buildtime and saves them in HTML files (app shell from the PRPL pattern).
- Our Phoenix application serves these pre-rendered pages and dynamically injects the actual content to the HTML.
That makes the content SEO friendly, allows processing a huge number of various pages on demand and scaling more easily.
- Clients receive and start showing the HTML immediately, then hydrate the React DOM state to continue as a regular SPA.
Content delivery network (CDN)
Using a CDN enables content caching and allows to speed up its delivery across the world. We use Fastly.com, which serves over 10% of all internet requests and is used by companies such as GitHub, Stripe, Airbnb, Twitter, and many others.
Fastly allows us to write custom caching and routing logic by using the configuration language called VCL. Here is how a basic request flow works where each step can be customized depending on the route, request headers, and so on:
Another option to improve performance is to use WebAssembly (WASM) at the edge with Fastly. Think of it like using serverless but at the edge with such programming languages as C, Rust, Go, TypeScript, etc. Cloudflare has a similar project to support WASM on Workers.
It is important to cache as many requests as possible to improve performance. Caching on a CDN level allows delivering responses faster for new users. Caching by sending a
Cache-Control header allows speeding up response time for the repeated requests in the browser.
Most of the build tools such as Webpack allow adding a hash to the filename. These files can be safely cached since changing the files will create a new output filename.
One of the most common ways of sending GraphQL requests is to use the POST HTTP method. One approach we use is to cache some GraphQL requests on Fastly level:
- Our React app annotates the GraphQL queries which can be cached.
- Before sending an HTTP request, we append a URL argument by building a hash from a request body, which includes the GraphQL query and variables (we use custom
fetchwith Apollo Client).
- Varnish (and Fastly) by default uses the full URL as part of the cache key.
- That allows us to keep sending POST requests with GraphQL query in the request body and cache at the edge without hitting our servers.
Here are some other potential GraphQL cache strategies to consider:
- Cache on the server-side: the whole GraphQL requests, on the resolver level or declaratively by annotating the schema.
- Using persisted GraphQL queries and sending
GET /graphql/:queryIdto be able to rely on HTTP caching.
- Integrate with CDNs by using automated tools (e.g. Apollo Server 2.0) or use GraphQL-specific CDNs (e.g. FastQL).
All major browsers support
gzip with the Content-Encoding header to compress data. That allows sending fewer bytes to browsers, which usually means faster content delivery. It is also possible to use a more effective
brotli compression algorithm in supported browsers.
HTTP/2 is a new version of the HTTP network protocol (
h2 in DevConsole). Switching to HTTP/2 may improve performance, thanks to these differences compared to HTTP/1.x:
- HTTP/2 is binary, not textual. It is more efficient to parse, more compact.
- HTTP/2 is multiplexed, which means that that HTTP/2 can send multiple requests in parallel over a single TCP connection. It allows us not to worry about browser connections per host limits and domain sharding.
- It uses header compression to reduce request / response size overhead.
- Allows servers to push responses proactively. This feature is particularly interesting.
HTTP/2 Server Push
There are a lot of programming languages and libraries which don’t fully support all HTTP/2 features because they introduce breaking changes for existing tools and the ecosystem (e.g. rack). But even in this case, it is still possible to use HTTP/2, at least partially. For example:
- Set up a proxy server such as h2o or nginx with HTTP/2 in front of a regular HTTP/1.x server. E.g. Puma and Ruby on Rails can send Early Hints, which can enable HTTP/2 Server Push with some limitations.
Bundle size budget
- Use the current bundle size as a baseline or try to reduce it by, for example, 10%.
- Try to have the fastest website among your competitors and set the budget accordingly.
Kill your dependencies
That’s the title of the popular blog post written by the author of Sidekiq.
No code runs faster than no code. No code has fewer bugs than no code. No code uses less memory than no code. No code is easier to understand than no code.
ls node_modules | wc -l.
- Components on a page which are not visible immediately. E.g. modals, footer which is below the fold.
- Polyfills and ponyfills to support the latest browser features in all major browsers.
- Prevent code duplication by using Webpack’s
- Locales files on demand to avoid shipping all our supported languages at once.
We built a function instead of
React.lazy to support named exports rather than default exports.
Async and defer scripts
All major browsers support
defer attributes on
- Using a script with
- Using scripts with
Here is a visualized difference between the scripts in a
One way of reducing the image size is to use a more lightweight WebP image format in supported browsers. For browsers which don’t support WebP, it is possible to use one of the following strategies:
Loading the images lazily only when they are in or near the viewport is one of the most significant performance improvements for initial page loads with lots of images. You can either use the
IntersectionObserver feature in supported browsers or use some alternative tools to achieve the same result, for example,
Some other image optimizations may include:
- Reducing the quality of images to reduce the size.
- Resizing and loading the smallest possible images.
- Using the
srcsetimage attribute for automatically loading high-quality images for high-resolution retina displays.
- Using progressive images to show a blurry image immediately.
You can consider using some generic CDNs or specialized image CDNs which usually implement most of these image optimizations.
Resource hints allow us to optimize the delivery of resources, reduce round trips, and fetch resources to deliver content faster while a user is browsing a page.
Preloaddownloads resources in the background for the current page load before they are actually used on the current page (high priority).
Prefetchworks similarly to
preloadto fetch the resources and cache them but for future user’s navigations (low priority).
Preconnectallows to set up early connections before an HTTP request is actually sent to the server.
There are also some other resource hints such as
dns-prefetch. Some of these resource hints can be specified in response headers. Just be careful when using resource hints. It is quite simple to start making too many unnecessary requests and downloading too much data, especially if users use a cellular connection.
Performance in a growing application is a neverending process which usually requires constant changes across the whole stack.
This video reminds me of you wanting to decrease the app bundle size. — My colleague
Here is a list of other potential performance improvements we use or are planning to try which were not mentioned previously:
- Using Service Workers for caching, offline support, and offloading the main thread.
- Inlining critical CSS or using functional CSS to decrease the size over the long-term.
- Using font formats such as WOFF2 instead of WOFF (up to 50%+ compression).
- Keeping the
browserslistup to date.
- Using the
webpack-bundle-analyzerto analyze build chunks visually.
- Preferring smaller packages (e.g.
date-fns) and plugins which allow reducing the size (e.g.
- Running Lighthouse in CI.
- Progressive hydration and streaming with React.
There is an endless number of exciting ideas to try. I hope this information and some of these case studies will inspire you to think about performance in your application:
Amazon has calculated that a page load slowdown of just 1 second could cost it $1.6 billion in sales each year.
Walmart saw up to a 2% increase in conversions for every 1 second of improvement in load time. Every 100ms improvement also resulted in up to a 1% increase in revenue.
Google has calculated that by slowing its search results by just 0.4 of a second, they could lose 8 million searches per day.
Rebuilding Pinterest pages for performance resulted in a 40% decrease in wait time, a 15% increase in SEO traffic, and a 15% increase in conversion rate to signup.
BBC has seen that they lose an additional 10% of users for every additional second it takes for their site to load.
Tests of the new faster FT.com showed users were up to 30% more engaged — meaning more visits and more content being consumed.
Instagram increased impressions and user profile scroll interactions by 33% for the median by decreasing the response size of the JSON needed for displaying comments.