December 7, 2015

Day 7 - Beyond 'Just Enough Visibility' on the Browser

Written by: Miguel Fonseca (@miguelcnf)
Edited by: Tom Purl (@tompurl)

Trends are great

They change our lives on a daily basis and we can deny them, embrace them or just sit tight and wait for them to move on.

Nevertheless trends happen, and trends in technology are no different. They can shift the paradigm of how we work, how we see our work and how others see our work.

At this point you're probably thinking where are you going with this? Bare with me for a couple more paragraphs. For now just look at how the web's (as in world wide web's) first site looked like in 1992, 23 years ago.

For those of you who didn't inspect the source, that is a completely static html page. In fact you can see ALL of the 73 lines of source code here.

At the risk of simplifying things quite a bit the application workflow consisted of a browser requesting an html resource and a server returning the matching file from disk.

Moving forward a couple of years, quite a few actually, we started seeing complex web applications, with tons of templating and rendering behind the scenes. But when a browser requested a web page it would receive a bunch of html mostly ready to be presented. Every time the browser wanted a new page it had to request it from the server again.

Skipping fast to the current days, with the rise of Single Page Apps, we now have complete web applications being sent to the browser to be bootstrapped and completely rendered on the client side. This fact reduced the number of requests to the server to a single initial request for the base html resource and subsequent requests for the javascript and css content. It's funny how, in part, that first html request reassembles the html request of the web's first site, as in being just some static file that gets served from disk.

Well not really from disk these days but you know what I mean.

Two things that evolved over the years that had to adapt to all of these trends were monitoring and performance. We live in an age of super fast internet connections and extra powerful devices which makes reliability and speed key factors to the success of any web or mobile application.

If we just think about it for two seconds, not instrumenting or measuring our application in the browser would be the same as not measuring our server side rendering farm or not monitoring our system cpu or memory.

So brace yourselves, browser instrumentation is exactly what we're going to talk about today.

Metrics, metrics everywhere

Standard

For years, performance and page load times of any application on the browser were ruled by and reduced to the results of the onload event time.

It's easy to understand why. You see, onload is a standard event implemented by all browsers with the same details and mostly the same meaning across them all. It made a lot of sense when pages were mostly static and didn't change much dynamically after being rendered by the browser.

The thing is, that's not what happens nowadays with the amount of javascript and asynchronous requests that most web pages require to properly work on first page load. The onload event has become almost meaningless when it comes to actually measuring user perceived performance. But don't take my word on this, see this 2013 article from Mr. Steve Souders kicking off a whole new way of thinking about performance and how we should look at it from the user's perspective.

This becomes even more noticeable when you have web pages that significantly differ in render times at above and below the fold, since onload might as well be showing optimistic or pessimistic results in comparison to the actual user experience.

In the past few years some great ideas and methods have come up to try and solve this very same problem. One of the most interesting is Speed Index from Mr. Patrick Meenan. In a nutshell it measures the time it takes for a web page to actually display elements on the visible part of the screen. It makes sense right?

Custom

As we've seen before, not all pages are the same. No matter how good the standard technique or metric is, it will eventually not be appropriate as a measure for a particular web application or workflow.

Here's where custom user metrics come into the picture.

Let's say we have a video player web application, running on the browser, and want to measure how long it takes for a user to open the application right until the moment the video starts streaming. With what we've talked so far we could:

  • measure the onload event - which would be effectively broken since it wouldn't take into consideration any dynamic behaviour of our page

  • measure the speed index - although this would be a much more realistic measurement it still wouldn't take into consideration, for instance, the time the video player library would spend buffering the video

In this particular case what we could do is instrument our video player library to emit an event when playback actually started. We could then measure against the start of the web page and know effectively how long it took for our users to start seeing videos, or even a particular video.

One very popular representation of this kind of instrumentation is Twitter's Time to First Tweet metric. This allowed the engineers behind the page timeline to know how long it took for a user to actually see a timeline's first tweet after clicking on a link. According to the article, by drilling down on their findings they were able to reduce the metric value to one-fifth of what it was in the first place.

Instrumenting applications with real user measurements is bound to produce results that are not nearly as consistent as synthetic testing. However, once we start getting a meaningful amount of measurements we are able to exclude a common percentage of outliers. We can then use the remaining results to create a trend which more often than not is quite realistic when it comes to the user experience.

There are quite a few ways to achieve this kind of instrumentation, each with different approaches, techniques and results. They can be custom made, open source libraries or even vendor locked if we happen to be into those kinds of things. But you know, standards are great, mostly because there are so many of them!

Luckily for us the good and smart people on the performance group of the W3C alongside with a bunch of awesome other people came up with the User Timing specification. This allows web developers to instrument any web application by giving them access to high precision timestamps and the ability to measure between moments in time. The most important thing about this specification is that it's supposed to be browser and platform agnostic which, as we'll see in a bit, is not entirely true as I write this.

Know the Theory

Let's start by learning a bit about the theory behind the specification.

Marks and Measures

There are two concepts that we need to understand in order to be able to use this in the real word, marks and measures.

  • Mark - a mark is a point in time and is represented by a timestamp associated to a mark name

  • Measure - a measure is the duration between two points in time and is represented by a duration associated to a measure name

Both the mark and measure methods are defined in the PerformanceMark and PerformanceMeasure interfaces and are available from the window.performance attribute. They have the following signatures:


/**

 * Create a mark

 * @param {DOMString} markName - name of the mark to be created

 */

void mark(DOMString markName);



/**

 * Create a measure

 * @param {DOMString} measureName - name of the measure to be created

 * @param {DOMString} startMark - reference to a previously created mark as the initial point in time

 * @param {DOMString} endMark - reference to a previously created mark as the final point in time

 */

void measure(DOMString measureName, optional DOMString startMark, optional DOMString endMark);

In a nutshell that's really just it. With this we can create marks to save points in time and use measures to calculate time differences between marks or marks and standard points in time in the lifecycle of a page.

We should, although, understand a couple more things:

  • Mark timestamps are measured in milliseconds since the start of navigation following the High Resolution Time specification

  • The current time is considered to be the number of milliseconds from the start of navigation until the current moment in time

  • All browsers need to implement a Monotonic Clock, which really only means that the time counter needs to be continuously incremented and measured as the time elapsed from a particular moment in time. This is required to avoid clock skews when the system clock is adjusted

After we went around on a instrument-all-the-things-spree and have created loads of marks and measures we might want to clean up after ourselves. We can do exactly that with another couple of methods, clearMarks and clearMeasures, also available from the window.performance attribute.

For future reference don't forget to read the whole specification, it's really not that long and has a much more detailed description of all methods and interfaces.

Advantages

All of this is great but, at this point, you might be thinking that we can do just that by using plain old javascript and abusing the crap out of Date.now(), which is true, but as the specification mentions:

Web developers need the ability to assess and understand the performance characteristics of their applications. While JavaScript [ECMA262] provides a mechanism to measure application latency (retrieving the current timestamp from the Date.now() method), the precision of this timestamp varies between user agents.

Well, that is the single worst thing evar if we're trying to get a trend or baseline of timings from real users.

Not only that but a lot of naive real user measurement solutions use custom initial time variables as a base for the elapsed time calculations, which is complete bollocks when compared with the realistic times of the navigationStart event from the Navigation Timing specification. I should mention that's exactly what the User Timing specification uses to calculate measurements relative to the start of navigation.

Well I hope you're just as IN on this whole User Timing thing as I am, shall we get our hands dirty?

Try it Out

Instrument all the things

First things first, open up a console on your browser's development tools.

I'll be using Chrome on the next steps but, you can follow up on Firefox or IE as you wish. Although if you're on Safari you're gonna have a bad time, and you might just open up a Chrome window to keep up. More about this later on.

Now that we have our console ready let's try it out:

  • Create a couple of marks with a few seconds a part:

> window.performance.mark("foo");

< undefined

> window.performance.mark("bar");

< undefined
  • Check that the PerformanceMark objects (our points in time) were created as expected:

> window.performance.getEntriesByType("mark")

< [PerformanceMark

        duration: 0

        entryType: "mark"

        name: "foo"

        startTime: 6566.500000000001

   PerformanceMark

        duration: 0

        entryType: "mark"

        name: "bar"

        startTime: 13722.595000000001]

At this point we have a couple of different marks, each representing a different point in time measured from the navigation start. The time value in milliseconds is stored in the startTime attribute.

See how the duration attribute of both objects is 0? That's exactly what we would expect from a point in time representation.

Let's crack on with it.

  • Create a measure that will calculate the time difference between the both marks:

> window.performance.measure("zoo", "foo", "bar")

< undefined
  • Check the PerformanceMeasure object that we've just created:

> window.performance.getEntriesByName("zoo")

< [PerformanceMeasure

        duration: 7156.095

        entryType: "measure"

        name: "zoo"

        startTime: 6566.500000000001]

Looking at that output we see that we now have a measure called "zoo" that represents the time difference between the two marks that we've created before, "foo" and "bar". It has a duration value of 7156.095, which actually means that I've waited around 7 seconds between creating both marks. Pretty neat right?

This looks great if we want to measure, for instance, the time of the navigation events on our Single Page Application, or the time of that serialize operation we think is getting expensive.

But what if we want to measure the time it takes for something to happen in our homepage, right since the user clicked on a link to our application? Remember Time to First Tweet?

All right, let's simulate this:

  • Refresh our current page and create a new measure with no start or end mark defined:

> window.performance.measure("asd")

< undefined
  • Check the PerformanceMeasure object we've just created:

> window.performance.getEntriesByName("asd")

< [PerformanceMeasure

        duration: 15584.455000000002

        entryType: "measure"

        name: "asd"

        startTime: 0]

Awesome! So that means it took me around 15 seconds to refresh the page, clear my console history and call the measure method to create a measure from the navigation start until the current time.

That's one of the most powerful things about measures, they can calculate values between marks but they have sane defaults if you omit each or any of the start/end marks.

The specification describes these particular behaviours in detail:

  • If neither the startMark nor the endMark argument is specified, measure() will store the duration as a DOMHighResTimeStamp from navigationStart to the current time.

  • If the startMark argument is specified, but the endMark argument is not specified, measure() will store the duration as a DOMHighResTimeStamp from the most recent occurrence of the start mark to the current time.

  • If both the startMark and endMark arguments are specified, measure() will store the duration as a DOMHighResTimeStamp from the most recent occurrence of the start mark to the most recent occurrence of the end mark.

In the examples above we've used some of the methods available in the Performance Timeline specification to retrieve the objects created by the mark and measure methods. There are three that we need to be familiar with:

  • getEntries - returns all the PerformanceEntry objects

  • getEntriesByType - return all the PerformanceEntry objects that have the same entryType as the one passed as an argument

  • getEntriesByName - return all the PerformanceEntry objects that have the same name passed as the one passed as an argument

This means that we can choose to fetch all entries of a certain type, a particular entry by name, or even all performance entries that ever existed in the current session.

In action

This wouldn't be a great intro to a topic without a proper demo wouldn't it? So let's say we want to instrument our application to know how much time it takes for one of our users to actually click on something in the page, because you know, why not?

As we've seen before, by using the user timing specification, creating our own Time to First Click metric could be as simple as creating a mark when the window click event is fired. Here's how we could achieve that:


var firstClickFired = false,

    firstClickMark;



var handleFirstClick = function() {

  // handle first click

  if (!firstClickFired) {

    firstClickFired = true;

    window.performance.mark("firstClick");



    firstClickMark = window.performance.getEntriesByName("firstClick");

    alert("Time to First Click: " + firstClickMark[0].startTime + " milliseconds");

  }

}



window.addEventListener('click', handleFirstClick);

Running this snippet on a any web page will show an alert with the actual time it took from the start of navigation until the first click on that page. That's great! At this point, for instance, we could adapt it a bit and have it send the value in milliseconds as a timer into our own metrics storage system so we could visualise the mean or p90 Time to First Click of all our users.

This gets particularly interesting if we exclude natural outliers and turn this metric into a relevant measure for our particular case, such as page performance, user experience, content value, etc.

If, like me, you thought that was interesting, go on and play around with a small user timing demo that tracks the Time to First Click described above and also allows you to create marks and measures on demand.

The code for the demo is really simple and is available on github.

Can I use it?

As I mention above, unfortunately at the time I'm writing this the User Timing specification isn't available on Safari. In fact it isn't available on Safari, iOS Safari or Opera Mini. All of the other common browsers have completely adopted it in their latest versions.

But fear not, as with all things open-source the community stepped up and created polyfills that we can include within our applications, which make the specification available even on browsers that do not currently support it by default.

See a couple of these implementations here and here.

Please take this with a grain of salt and do your own tests before using those. Speaking for myself I haven't actually tested all scenarios with both libraries. At the end of the day it should be ok but use at your own risk!

Control the Budget

By the moment we have our application instrumented, the next logical question is: how do we know if we're improving or degrading performance over time?

To answer that question we could go on a couple of different, yet related, routes:

  1. Ability to measure results and define static thresholds

  2. Ability to measure and compare results over time

Set up a ceiling

Point number 1. above refers to implementing our own "you're now in the red zone, do something about it now" system.

Let's see how that can go:

  1. Choose a particular custom metric (or set of metrics) that define how our application is performing

  2. Run performance tests periodically, let's say on every release, to obtain results for that metric

  3. Compare those with our "red zone" thresholds

  4. Alarm if any of the results hit red

  5. Act upon and fix it

Go with the trend

Point number 2. however refers to implementing our own "you're now in the red zone, but it might be ok, let's see how the next days work out first" system.

Let's see what that means:

  1. Choose a particular custom metric (or set of metrics) that define how our application is performing

  2. Run performance tests on every code change to obtain results for that metric and store it

  3. Compare those with our "red zone" thresholds, and with the last X number of results

  4. Alarm if any of the results hit red for the last Y out of X runs

  5. Alarm if the results have gotten worse for the last Z runs

  6. Act upon and fix it

Performance Budget

Both scenarios are valid, and both try to apply the concept of a performance budget.

It means that we're now in a position to have an effective measure over a performance metric from within our application. We can use this measure to understand if we're within budget or if we've spent it all and we need to do something about it.

User Timing marks and measures can be seen as an enabler for this kind of system. They ease the process of getting these kinds of metrics from an application running on the browser, which tends to be what the user sees and how they perceive our application's performance.

Xmas Present

If you don't know about this yet, consider this a Christmas present, from me to you.

Thanks to Mr. Patrick Meenan we can see User Timing marks values on the Web Page Test results tables and see pretty purple lines on the waterfall view, which represent our marks.

sysadventXmasMarkScreenshot

The values of the custom User Timings marks are even available on the API results, which if you've been paying attention, can be used to fuel our performance budget tests.

This is just one of the advantages of using an open and standard specification.

Happy Holidays

Hopefully this was all new to some of you and got you motivated to instrument things on the browser. Or at least it was an interesting read for everyone, especially if you're the hardcore measure all the things type.

I'd like to thank Tom for fixing all my spelling mistakes and making sure I don't make a fool out of myself.

Feel free to reach out to me @miguelcnf and have a great holiday season.

No comments :