Improving Browser Performance 10x

We recently improved the performance of the Universe.com homepage by more than ten times. Let’s explore the techniques we used to achieve this result.

But first, let’s find out why website performance is important (there are links to the case studies at the end of the blog post):

  • User experience: poor performance leads to unresponsiveness, which may be frustrating for users from a UI and UX perspective.
  • Conversion and revenue: very often slow websites can lead to lost customers and have a negative impact on conversion rates and revenue.
  • SEO: Starting July 1, 2019, Google will enable mobile-first indexing by default for all new websites. Websites will be ranked lower if they are slow on mobile devices, and don’t have mobile-friendly content.

In this blog post, we will briefly cover these main areas, which helped us to improve the performance on our pages:

  • Performance measurement: lab and field instruments.
  • Rendering: client-side and server-side rendering, pre-rendering, and hybrid rendering approaches.
  • Network: CDN, caching, GraphQL caching, encoding, HTTP/2 and Server Push.
  • JavaScript in the browser: bundle size budget, code-splitting, async and defer scripts, image optimizations (WebP, lazy loading, progressive), and resource hints (preload, prefetch, preconnect).

For some context, our homepage is built with React (TypeScript), Phoenix (Elixir), Puppeteer (headless Chrome), and GraphQL API (Ruby on Rails). This is how it looks like on mobile:

Universe homepage and explore

Performance measurement

Without data, you’re just another person with an opinion. ― W. Edwards Deming

Lab instruments

Lab instruments allow collecting data within a controlled environment with the predefined device and network settings. With these instruments, it is much simpler to debug any performance issues and have well-reproducible tests.

Lighthouse is an excellent tool for auditing webpages in Chrome on a local computer. It also provides some useful tips on how to improve performance, accessibility, SEO, etc. Here are some Lighthouse performance audit reports with a Simulated Fast 3G and 4x CPU Slowdown:

Before and after: 10x improvement for the First Contentful Paint (FCP)

There is, however, a disadvantage of using just lab instruments: they don’t necessarily capture real-world bottlenecks which may depend on the end-users’ devices, network, location, and many other factors. That is why it is also important to use field instruments.

Field instruments

Field instruments allow to simulate and measure real user page loads. There are multiple services which can help to get real performance data from the actual devices:

  • WebPageTest — allows performing tests from different browsers on real devices from various locations.
  • Test My Site — uses Chrome User Experience Report (CrUX) which is based on Chrome usage statistics; it is publicly available and updated monthly.
  • PageSpeed Insights — combines both lab (Lighthouse) and field (CrUX) data.
WebPageTest report

Rendering

There are multiple approaches for rendering content, and each has its pros and cons:

  • Server-side rendering (SSR) is a process of getting the final HTML documents for browsers on the server-side. Pros: search engines can crawl the website without executing JavaScript (SEO), fast initial page load, code lives only on the server-side. Cons: non-rich website interactions, the full page reloads, limited access to the browser features.
  • Client-side rendering is a process of rendering content in the browser by using JavaScript. Pros: rich website interactions, fast rendering on route changes after the initial load, access to modern browser features (e.g. offline support with Service Workers). Cons: not SEO-friendly, slow initial page load, usually requires implementing a Single Page Application (SPA) and an API on the server-side.
  • Pre-rendering is similar to server-side rendering but happens during buildtime in advance instead of runtime. Pros: serving built static files is usually simpler than running a server, SEO-friendly, fast initial page load. Cons: requires pre-rendering all possible pages in advance on any code changes, the full page reloads, non-rich website interactions, limited access to the browser features.

Client-side rendering

Previously, we had our homepage implemented with Ember.js framework as a SPA with client-side rendering. One issue we had was a big bundle size of our Ember.js application. It means that users see just a blank screen while the browser downloads JavaScript files, parses, compiles, and executes them:

White screen of death

We decided to rebuild some parts of the app by using React.

  • Our developers are already familiar with building React applications (e.g. embedded widgets).
  • We already have a few React component libraries which can be shared across multiple projects.
  • The new pages have some interactive UI elements.
  • There is a huge React ecosystem with lots of tools.
  • With JavaScript in the browser, it is possible to build a Progressive Web App with lots of nice features.

Pre-rendering and server-side rendering

The issue with client-side rendered applications built, for example, with React Router DOM is still the same as with Ember.js. JavaScript is expensive, and it takes a while to see the First Contentful Paint in the browser.

Once we decided on using React, we started experimenting with other potential rendering options to allow browsers to render the content faster.

Conventional rendering options with React
  • Gatsby.js allows pre-rendering pages with React and GraphQL. Gatsby.js is a great tool which supports many performance optimizations out of the box. However, using pre-rendering doesn’t work for us since we have a potentially unlimited number of pages with user-generated content.
  • Next.js is a popular Node.js framework which allows server-side rendering with React. However, Next.js is very opinionated, requires to use its router, CSS solution, and so on. And our existing component libraries were built for browsers and are not compatible with Node.js.

That is why we decided to experiment with some hybrid approaches, which try taking the best from each rendering option.

Runtime pre-rendering

Puppeteer is a Node.js library allows working with a headless Chrome. We wanted to give Puppeteer a try for pre-rendering in runtime. That enables using an interesting hybrid approach: server-side rendering with Puppeteer and client-side rendering with the hydration. Here are some useful tips by Google on how to use a headless browser for server-side rendering.

Puppeteer for runtime pre-rendering a React application

Using this approach has some pros:

  • Allows SSR, which is good for SEO. Crawlers don’t need to execute JavaScript to be able to see the content.
  • Allows building a simple browser React application once, and using it both on the server-side and in browsers. Making the browser app faster automatically makes SSR faster, win-win.
  • Rendering pages with Puppeteer on a server is usually faster than on end-users’ mobile devices (better connection, better hardware).
  • Hydration allows building rich SPAs with access to the JavaScript browser features.
  • We don’t need to know about all possible pages in advance in order to pre-render them.

However, we faced a few challenges with this approach:

  • Throughput is the main issue. Having each request executed in a separate headless browser process uses up a lot of resources. It is possible to use a single headless browser process and run multiple requests in separate tabs. However, using multiple tabs decreases the performance of the whole process.
The architecture of server-side rendering with Puppeteer
  • Stability. It is challenging to scale up or scale down many headless browsers, keep the processes “warm” and balance the workload. We tried different hosting approaches: from being self-hosted in a Kubernetes cluster to serverless with AWS Lambda and Google Cloud Functions. We noticed that the latter had some performance issues with Puppeteer:
Puppeteer response time on AWS Lambdas and GCP Functions

As we’ve become more familiar with Puppeteer, we’ve iterated our initial approach (read below). We also have some interesting ongoing experiments with rendering PDFs through a headless browser. It is also possible to use Puppeteer for automated end-to-end testing, even without writing any code. It now supports Firefox in addition to Chrome.

Hybrid rendering approach

Using Puppeteer in runtime is quite challenging. That’s why we decided to use it in buildtime with the help of a tool which could return an actual user-generated content in runtime from the server-side. Something which is more stable and has a better throughput than Puppeteer.

We decided to try the Elixir programming language. Elixir looks like Ruby but runs on top of BEAM (Erlang VM), which was created to allow building fault-tolerant and stable systems.

Elixir uses the Actor concurrency model. Each “Actor” (Elixir process) has a tiny memory footprint of about around 1–2KB. That allows for running many thousands of isolated processes concurrently. Phoenix is an Elixir web framework which enables high throughput and allows handling each HTTP request in a separate Elixir process.

We combined these approaches by taking the best from each world, which satisfies our needs:

Puppeteer for pre-rendering and Phoenix for server-side rendering
  • Puppeteer pre-renders React pages the way we want during buildtime and saves them in HTML files (app shell from the PRPL pattern).

We can keep building a simple browser React application and have a fast initial page load without waiting for JavaScript on end-users’ devices.

  • Our Phoenix application serves these pre-rendered pages and dynamically injects the actual content to the HTML.

That makes the content SEO friendly, allows processing a huge number of various pages on demand and scaling more easily.

  • Clients receive and start showing the HTML immediately, then hydrate the React DOM state to continue as a regular SPA.

That way, we can build highly interactive applications and have access to the JavaScript browser features.

The architecture of pre-rendering with Puppeteer, server-side rendering with Phoenix, and hydration on the client-side with React

Network

Content delivery network (CDN)

Using a CDN enables content caching and allows to speed up its delivery across the world. We use Fastly.com, which serves over 10% of all internet requests and is used by companies such as GitHub, Stripe, Airbnb, Twitter, and many others.

Fastly allows us to write custom caching and routing logic by using the configuration language called VCL. Here is how a basic request flow works where each step can be customized depending on the route, request headers, and so on:

VCL request flow

Another option to improve performance is to use WebAssembly (WASM) at the edge with Fastly. Think of it like using serverless but at the edge with such programming languages as C, Rust, Go, TypeScript, etc. Cloudflare has a similar project to support WASM on Workers.

Caching

It is important to cache as many requests as possible to improve performance. Caching on a CDN level allows delivering responses faster for new users. Caching by sending a Cache-Control header allows speeding up response time for the repeated requests in the browser.

Most of the build tools such as Webpack allow adding a hash to the filename. These files can be safely cached since changing the files will create a new output filename.

Cached and encoded files through HTTP/2

GraphQL caching

One of the most common ways of sending GraphQL requests is to use the POST HTTP method. One approach we use is to cache some GraphQL requests on Fastly level:

  • Our React app annotates the GraphQL queries which can be cached.
  • Before sending an HTTP request, we append a URL argument by building a hash from a request body, which includes the GraphQL query and variables (we use custom fetch with Apollo Client).
  • Varnish (and Fastly) by default uses the full URL as part of the cache key.
  • That allows us to keep sending POST requests with GraphQL query in the request body and cache at the edge without hitting our servers.
Sending POST GraphQL requests with a SHA256 URL argument

Here are some other potential GraphQL cache strategies to consider:

  • Cache on the server-side: the whole GraphQL requests, on the resolver level or declaratively by annotating the schema.
  • Using persisted GraphQL queries and sending GET /graphql/:queryId to be able to rely on HTTP caching.
  • Integrate with CDNs by using automated tools (e.g. Apollo Server 2.0) or use GraphQL-specific CDNs (e.g. FastQL).

Encoding

All major browsers support gzip with the Content-Encoding header to compress data. That allows sending fewer bytes to browsers, which usually means faster content delivery. It is also possible to use a more effective brotli compression algorithm in supported browsers.

HTTP/2 protocol

HTTP/2 is a new version of the HTTP network protocol (h2 in DevConsole). Switching to HTTP/2 may improve performance, thanks to these differences compared to HTTP/1.x:

  • HTTP/2 is binary, not textual. It is more efficient to parse, more compact.
  • HTTP/2 is multiplexed, which means that that HTTP/2 can send multiple requests in parallel over a single TCP connection. It allows us not to worry about browser connections per host limits and domain sharding.
  • It uses header compression to reduce request / response size overhead.
  • Allows servers to push responses proactively. This feature is particularly interesting.

HTTP/2 Server Push

There are a lot of programming languages and libraries which don’t fully support all HTTP/2 features because they introduce breaking changes for existing tools and the ecosystem (e.g. rack). But even in this case, it is still possible to use HTTP/2, at least partially. For example:

  • Set up a proxy server such as h2o or nginx with HTTP/2 in front of a regular HTTP/1.x server. E.g. Puma and Ruby on Rails can send Early Hints, which can enable HTTP/2 Server Push with some limitations.
  • Use a CDN which support HTTP/2 to serve static assets. For instance, we use this approach to push fonts and some JavaScript files to the clients.
HTTP/2 Push fonts

Pushing critical JavaScript and CSS can also be very useful. Just don’t over-push and be aware of some gotchas.


JavaScript in the browser

Bundle size budget

The #1 JavaScript performance rule is not to use JavaScript. ― me

If you already have an existing JavaScript application, setting a budget can improve visibility of the bundle size and keep everybody on the same page. Exceeding the budget forces developers to think twice about the changes and to minimize the size increase. These are some examples of how to set a budget:

  • Use numbers based on your needs or some recommended values. For instance, < 170KB minified and compressed JavaScript.
  • Use the current bundle size as a baseline or try to reduce it by, for example, 10%.
  • Try to have the fastest website among your competitors and set the budget accordingly.

You could use the bundlesize package or Webpack performance hints and limits to keep track of the budget:

Webpack performance hints and limits

Kill your dependencies

That’s the title of the popular blog post written by the author of Sidekiq.

No code runs faster than no code. No code has fewer bugs than no code. No code uses less memory than no code. No code is easier to understand than no code.

Unfortunately, the reality with JavaScript dependencies is that your project most probably uses many hundreds of dependencies. Just try ls node_modules | wc -l.

In some cases adding a dependency is necessary. In this case, the dependency bundle size should be one of the criteria when choosing between multiple packages. I highly recommend using BundlePhobia:

BundlePhobia finds the cost of adding an npm package to your bundle

Code-splitting

Using code-splitting is perhaps the best way to significantly improve JavaScript performance. It allows splitting the code and shipping only the part which a user needs at the moment. Here are some examples of code-splitting:

  • Routes are loaded separately in separate JavaScript chunks.
  • Components on a page which are not visible immediately. E.g. modals, footer which is below the fold.
  • Polyfills and ponyfills to support the latest browser features in all major browsers.
  • Prevent code duplication by using Webpack’s SplitChunksPlugin.
  • Locales files on demand to avoid shipping all our supported languages at once.

You can use code-splitting with Webpack dynamic imports and React.lazy with Suspense.

Code-splitting with dynamic import and React.lazy with Suspense

We built a function instead of React.lazy to support named exports rather than default exports.

Async and defer scripts

All major browsers support async and defer attributes on script tags:

Different ways of loading JavaScript
  • Inline scripts are useful for loading small critical JavaScript code.
  • Using a script with async is useful for fetching JavaScript without blocking HTML parsing when the script is not required for your users or any other scripts (e.g. analytics scripts).
  • Using scripts with defer is probably the best way from a performance point of view for fetching and executing non-critical JavaScript without blocking HTML parsing. Additionally, it guarantees the execution order as the scripts are called, which is useful if one script depends on another.

Here is a visualized difference between the scripts in a head tag:

Different ways of script fetching and execution

Image optimizations

Although 100 KB of JavaScript has a very different performance cost compared to 100 KB of images, it is in general important to keep the images as light as possible.

One way of reducing the image size is to use a more lightweight WebP image format in supported browsers. For browsers which don’t support WebP, it is possible to use one of the following strategies:

  • Fallback to regular JPEG or PNG formats (some CDNs do it automatically based on a browser’s Accept request header).
  • Loading and using WebP polyfill after detecting browser support.
  • Using Service Workers to listen to fetch requests and the changing actual URLs to use WebP if it is supported.
WebP images

Loading the images lazily only when they are in or near the viewport is one of the most significant performance improvements for initial page loads with lots of images. You can either use the IntersectionObserver feature in supported browsers or use some alternative tools to achieve the same result, for example, react-lazyload.

Lazy loading images during the scroll

Some other image optimizations may include:

  • Reducing the quality of images to reduce the size.
  • Resizing and loading the smallest possible images.
  • Using the srcset image attribute for automatically loading high-quality images for high-resolution retina displays.
  • Using progressive images to show a blurry image immediately.
Loading regular vs progressive images

You can consider using some generic CDNs or specialized image CDNs which usually implement most of these image optimizations.

Resource hints

Resource hints allow us to optimize the delivery of resources, reduce round trips, and fetch resources to deliver content faster while a user is browsing a page.

Resource hints with link tags
  • Preload downloads resources in the background for the current page load before they are actually used on the current page (high priority).
  • Prefetch works similarly to preload to fetch the resources and cache them but for future user’s navigations (low priority).
  • Preconnect allows to set up early connections before an HTTP request is actually sent to the server.
Preconnect in advance to avoid DNS, TCP, TLS roundtrip latencies

There are also some other resource hints such as prerender or dns-prefetch. Some of these resource hints can be specified in response headers. Just be careful when using resource hints. It is quite simple to start making too many unnecessary requests and downloading too much data, especially if users use a cellular connection.


Conclusion

Performance in a growing application is a neverending process which usually requires constant changes across the whole stack.

This video reminds me of you wanting to decrease the app bundle size. — My colleague
Strip everything out of this plane you don’t need now! – Pearl Harbor movie

Here is a list of other potential performance improvements we use or are planning to try which were not mentioned previously:

There is an endless number of exciting ideas to try. I hope this information and some of these case studies will inspire you to think about performance in your application:

Amazon has calculated that a page load slowdown of just 1 second could cost it $1.6 billion in sales each year.
Walmart saw up to a 2% increase in conversions for every 1 second of improvement in load time. Every 100ms improvement also resulted in up to a 1% increase in revenue.
Google has calculated that by slowing its search results by just 0.4 of a second, they could lose 8 million searches per day.
Rebuilding Pinterest pages for performance resulted in a 40% decrease in wait time, a 15% increase in SEO traffic, and a 15% increase in conversion rate to signup.
BBC has seen that they lose an additional 10% of users for every additional second it takes for their site to load.
Tests of the new faster FT.com showed users were up to 30% more engaged — meaning more visits and more content being consumed.
Instagram increased impressions and user profile scroll interactions by 33% for the median by decreasing the response size of the JSON needed for displaying comments.