Blog Post

Revisiting the Anatomy of HTTP: Part II

Updated

Published

September 15, 2016

mins read

Mehdi Daoudi

in this blog post

Heading 2

In Part I, we discussed the various components of HTTP request. In particular, we walked the reader through the details of TLS (Transport Layer Security), its impact on performance and how to mitigate it. Besides security, mobile performance has assumed the center stage owing to increasing use of mobile devices (as highlighted by June 2016 report from Cisco, see below for the two key takeaways) and with the advent of IoT devices.

Mobile data traffic will grow at a CAGR of 53 percent between 2015 and 2020, reaching 30.6 exabytes per month by 2020.
Fixed IP traffic will grow at a CAGR of 19 percent between 2015 and 2020, while mobile traffic grows at a CAGR of 53 percent. Global mobile data traffic was 5 percent of total IP traffic in 2015, and will be 16 percent of total IP traffic by 2020.

Likewise, Gartner forecasted that by 2017, mobile apps will be downloaded more than 268 billion times, generating revenue of more than $77 billion.

The need for high mobile performance is also driven by the dwindling attention span of humans. In a research from Microsoft, it was highlighted that the following factors adversely impact our selective attention:

Media consumption
Social media usage
Technology adoption rate
Multi-screening behavior

Further, it was suggested that, from a business perspective, providing increasingly immersive, multi-touchpoint experiences should combat drop-off (note that an integral component of this is providing high mobile performance). In a similar vein, in the context of ecommerce apps, Jammp also highlighted the waning attention spans with an 88% decrease in the average time spent on e-commerce apps from Q1 2015 to Q1 2016. As per research by dscout, an average user engages in 76 separate phone sessions a day and heavy users (the top 10%) average around 132 sessions a day.

Fun Fact:

As per Statistic Brain, the average human attention span in 2015 was 8.25 seconds, whereas average attention span of a gold fish was 9 seconds.

Poor mobile performance directly impacts business metrics such as, but not limited to, click through rate (CTR), app installs, in-app purchases, session length (which is one of the measures of engagement), mobile e-commerce. The impact of poor mobile performance is further compounded by the following:

Rising user acquisition (UA) for mobile as reported by Fiksu. As per Invesp, it costs five times as much to attract a new customer than to keep an existing one.[1]
As per Lee Resources, 91% of unhappy customers will not willingly do business with you again.

Common issues which plague an end-user’s mobile experience include, but not limited to, slow page load times, app crashes, high API and end-to-end latency, sensitivity to high App load and network errors. Optimizing mobile performance calls for a fundamentally different approach owing to additional set of constraints mentioned below:

Lower processing power
Smaller memory
Battery power constraints
Lower bandwidth

The plot (the data was obtained via Catchpoint’s portal) shows the document complete time on mobile for some of the news sources.

The table below summarizes the key statistics for Document Complete.

MinMaxMedian524.515674.54334

From the plot and the table above we note that even the median Document Complete time is over 4 seconds! This clearly is below the (arguable conservative) threshold of 1 second being advocated in the industry. Other factors which impact performance in general are round trip time (RTT), bandwidth, loss rate, number of objects on a page, and object sizes, page load dependencies and computation.

An example illustrating that PLT is determined by the critical path (source: click here)

Performance optimizations can be carried out at various levels. Optimizations can be done in the application itself. From the plot below we note that the size of a web page has increased by 3x over the last 5 years and that a single page makes over 100 requests. Over 80% of page load time (PLT) is spent loading the additional resources needed to render the page—including style sheets, script files, and images—and performing client-side processing. This has direct implications on end-user experience.

Source: click here

Common recommended optimizations include:

Reducing number of requests:

Resource consolidation
Avoid redundant download of script files
Spriting: Multiple images are fetched as one large CSS background image and then CSS background positioning is used to display the individual component images as needed on the page.
Use of browser cache and localstorage: One can usually save and read key/value data 5 MB per domain HTML5 localStorage, if available, This capability makesit very well suited for client-side caching.
Embedding resources in HTML for first-time use: Resources with low re-use should be embedded (or inlined) in page’s HTML – via Script and style tags – rather than storing them externally and linking to them. Note that images and other binary resources can also be inlined by using data URIs that contain base64- encoded text versions of the resources.
o Eliminating re-directs
Reducing payload
Text and image compression: Material – as much as 70% – reduction in size can be achieved for text-based responses, including HTML, XML, JSON, JavaScript, and CSS.
C_ode minification:_ It corresponds to elimination of inessential characters such as spaces, newline characters, and comments from scripts and style sheets. Further, names that are not publicly exposed, such as variable names, can be shortened to just one or two characters. File size reduction up to 20% can be achieved via minification.
Image resizing: Dynamic resizing of images in the application helps to speed up page rendering and reduce memory and bandwidth consumption.
Optimize client-side processing
Deferring rendering of below-the-fold content
Deferring loading and execution of scripts
Use of Ajax for progressive enhancement: This can be used to load a bare-bones version of a page quickly; subsequently, detailed content can be filled in while the user is already viewing the page.
Adapting to network connection
Using HTML5 web worker for multithreading

Alternatively, optimizations can be done at the transport layer protocol level, at the application layer protocol level etc. The most common TCP-related delays affecting HTTP are:

The TCP connection setup handshake
TCP slow-start congestion control
Nagle’s algorithm for data aggregation
TCP’s delayed acknowledgment algorithm for piggybacked acknowledgments
TIME_WAIT delays and port exhaustion

Example TCP-level optimizations include:

TCP Fast Open – it decreases application network latency by one full round-trip time, decreasing the delay experienced by short TCP transfers.
Proportional Rate Reduction for TCP – it improves fast recovery even when losses are heavy, is quick in recovery for short flows and is accurate even when acknowledgments are stretched, lost or reordered.
Tail Loss Probe – it enables quickly recover of lost segments at the end of transactions as opposed to lengthy retransmission timeouts (see this and this for performance of Tail Loss Probe).

There’s a large amount of literature on tuning TCP. Most commonly, the approaches are centered around tuning the congestion window of TCP. Several fair-share algorithms have been proposed to manage competition between connections for buffers.

In the note below we discuss about the experimental protocol SPDY from Google which was launched to address the performance limitations of HTTP/1.1.

——————————————————————————————————————————–

SPDY

The SPDY protocol was developed at Google for reducing web page load latency and improving web security. SPDY added a session layer atop of SSL (see below) to allowed for multiple concurrent, interleaved streams over a single TCP connection. The usual HTTP GET and POST message formats remained the same; however, SPDY specified a new framing format for encoding and transmitting the data over the wire. TLS encryption was common to all SPDY implementations and transmission headers were gzip- or DEFLATE-compressed by design.

Source: click here

Performance benefits of SPDY stem from the following key features:

* Multiplexing multiple HTTP transfers over a single connection

* Header compression: It eliminates redundant data for HTTP headers

* HTTP request prioritization: SPDY allows the client to specify a priority level for each object, which is then used by the server in scheduling the transfer of the object.

* Server-push: One of the downsides is that the content pushed by the server may already be present in client’s cache or may never be consumed by the user.

Wang et al. pointed out that the benefits of SPDY can be easily overwhelmed by dependencies and computation, reducing the improvements with SPDY to 7% for our lower bandwidth and higher RTT scenarios. The authors report the following:

“SPDY helps for small object sizes and under low loss rates by: batching several small objects in a TCP segment; reducing congestion-induced retransmissions; and reducing the time when the TCP pipe is idle. Conversely, SPDY significantly hurts performance under high packet loss for large objects. This is because a set of TCP connections tends to perform better under high packet loss; it is necessary to tune TCP behavior to boost performance.”

When accounting for browser computation and dependencies, the authors find:

* Computation and dependencies increase PLTs of both HTTP and SPDY, reducing the network load.

* SPDY reduces the amount of time a connection is idle, lowering the possibility of slow start.

* Dependencies help HTTP by making traffic less bursty, resulting in fewer retransmissions.

* Having fewer outstanding objects diminishes SPDY’s gains, because SPDY helps more when there are a large number of outstanding objects

Thus, dependencies and computation reduce and can easily nullify the benefits of SPDY, implying that speeding up computation or breaking dependencies might be necessary to improve the PLT using SPDY.

There have been numerous other studies on performance evaluation of SPDY and its performance with HTTP. In February 2016, Google announced that Chrome shall be transitioning from SPDY to HTTP/2.

Use of SPDY proxy[2] services was explored to exploit the benefits of SPDY in cases where the web servers did not support the SPDY protocol. Bundling multiple requests and encrypting all web traffic for a user inside a single connection created an opaque tunnel, thereby hiding the true source of the content and breaking network management (e.g., performance data collection, video optimization), content distribution, and network services such as malware detection.

Another experimental transport layer network protocol QUIC (Quick UDP Internet Connections) was announced in 2013 by Google. QUIC runs a stream multiplexing protocol over a new flavor of Transport Layer Security (TLS) on top of UDP instead of TCP. The key highlights of QUIC are:

* Fast (often 0-RTT) connectivity similar to TLS Snapstart combined with TCP Fast Open

* Packet pacing to reduce packet loss

* Packet error correction to reduce retransmission latency

* UDP transport to avoid TCP head-of-line blocking

* A connection identifier to reduce reconnections for mobile clients

* A pluggable congestion control mechanism

The reader is referred to QUIC’s design document for further details.

——————————————————————————————————————————–

It should be noted that SPDY and HTTP performance depend on many factors external to the protocols themselves, including network parameters, TCP settings, and a myriad other web page characteristics. Further, the variance in PLT stems from random events like network loss, but also from browser computation (i.e., JavaScript evaluation and HTML parsing).

SPDY led the genesis of HTTP/2. The key features of SPDY are embraced by HTTP/2. First and foremost, it should be noted that HTTP/2 is extends, not replaces HTTP/1.x. The application semantics are carried over to HTTP/2 and no changes were made to the offered functionality or core concepts such as HTTP methods, status codes, URIs, and header fields. Unlike HTTP/1.x, HTTP/2 uses fewer TCP connections which in turn reduces competition with other flows and results in longer-lived connections. This improves the utilization of the available network capacity. Further, processing of messages in HTTP/2 is more efficient owing to the use of binary message framing which forms the bedrock of all the performance enhancements of HTTP/s. At a high level, the layer is responsible for how the HTTP messages are encapsulated and transferred between the client and server. The figure below highlights where does the new binary framing layer sits in the overall stack.

Source: click here

Akin to SPDY, HTTP/2 enables multiplexing of requests and responses. This is achieved by letting the client and server to break down an HTTP message into independent frames, interleave them and then reassemble them on the other end. Delving into the design details of HTTP/2 is beyond the scope of this blog. Additional resources can be referred to for an extended discussion of HTTP/2.

To improve the performance of HTTP/2, several techniques have been proposed such as, but not limited to:

[1] http://www.invespcro.com/blog/customer-acquisition-retention/

[2] A proxy is an intermediary between the client and server that processes requests from the client to server, and/or responses from the server to client.

By: Arun Kejariwal and Mehdi Daoudi

Mobile data traffic will grow at a CAGR of 53 percent between 2015 and 2020, reaching 30.6 exabytes per month by 2020.
Fixed IP traffic will grow at a CAGR of 19 percent between 2015 and 2020, while mobile traffic grows at a CAGR of 53 percent. Global mobile data traffic was 5 percent of total IP traffic in 2015, and will be 16 percent of total IP traffic by 2020.

Likewise, Gartner forecasted that by 2017, mobile apps will be downloaded more than 268 billion times, generating revenue of more than $77 billion.

Media consumption
Social media usage
Technology adoption rate
Multi-screening behavior

Fun Fact:

As per Statistic Brain, the average human attention span in 2015 was 8.25 seconds, whereas average attention span of a gold fish was 9 seconds.

Rising user acquisition (UA) for mobile as reported by Fiksu. As per Invesp, it costs five times as much to attract a new customer than to keep an existing one.[1]
As per Lee Resources, 91% of unhappy customers will not willingly do business with you again.

Lower processing power
Smaller memory
Battery power constraints
Lower bandwidth

The plot (the data was obtained via Catchpoint’s portal) shows the document complete time on mobile for some of the news sources.

The table below summarizes the key statistics for Document Complete.

MinMaxMedian524.515674.54334

An example illustrating that PLT is determined by the critical path (source: click here)

Source: click here

Common recommended optimizations include:

Reducing number of requests:

Resource consolidation
Avoid redundant download of script files
Spriting: Multiple images are fetched as one large CSS background image and then CSS background positioning is used to display the individual component images as needed on the page.
Use of browser cache and localstorage: One can usually save and read key/value data 5 MB per domain HTML5 localStorage, if available, This capability makesit very well suited for client-side caching.
Embedding resources in HTML for first-time use: Resources with low re-use should be embedded (or inlined) in page’s HTML – via Script and style tags – rather than storing them externally and linking to them. Note that images and other binary resources can also be inlined by using data URIs that contain base64- encoded text versions of the resources.
o Eliminating re-directs
Reducing payload
Text and image compression: Material – as much as 70% – reduction in size can be achieved for text-based responses, including HTML, XML, JSON, JavaScript, and CSS.
C_ode minification:_ It corresponds to elimination of inessential characters such as spaces, newline characters, and comments from scripts and style sheets. Further, names that are not publicly exposed, such as variable names, can be shortened to just one or two characters. File size reduction up to 20% can be achieved via minification.
Image resizing: Dynamic resizing of images in the application helps to speed up page rendering and reduce memory and bandwidth consumption.
Optimize client-side processing
Deferring rendering of below-the-fold content
Deferring loading and execution of scripts
Use of Ajax for progressive enhancement: This can be used to load a bare-bones version of a page quickly; subsequently, detailed content can be filled in while the user is already viewing the page.
Adapting to network connection
Using HTML5 web worker for multithreading

Alternatively, optimizations can be done at the transport layer protocol level, at the application layer protocol level etc. The most common TCP-related delays affecting HTTP are:

The TCP connection setup handshake
TCP slow-start congestion control
Nagle’s algorithm for data aggregation
TCP’s delayed acknowledgment algorithm for piggybacked acknowledgments
TIME_WAIT delays and port exhaustion

Example TCP-level optimizations include:

TCP Fast Open – it decreases application network latency by one full round-trip time, decreasing the delay experienced by short TCP transfers.
Proportional Rate Reduction for TCP – it improves fast recovery even when losses are heavy, is quick in recovery for short flows and is accurate even when acknowledgments are stretched, lost or reordered.
Tail Loss Probe – it enables quickly recover of lost segments at the end of transactions as opposed to lengthy retransmission timeouts (see this and this for performance of Tail Loss Probe).