This June probably set the record for large scale failures on the Internet. First Facebook experienced an outage that lasted hours from May 31st to June 1st. On June 14 a several users were complaining on Twitter that Google services were down – luckily it lasted about 10-15 minutes and had no press coverage. On the same day Amazon Web Services experienced an outage again on their East Coast datacenter impacting many services including Heroku, a cloud application provider. Lastly Twitter went down for a little over two hours on June 21st.
These failures impacted not only the users of the sites in question, but also end users of websites that relied on widgets, ads, or infrastructure delivered from these companies.
Failures Are Bound to Happen
It was not the first time these companies experienced problems and it will not be the last time they fail. It does not matter how big a company or their infrastructure is, failures are bound to happen. It could be a complete failure impacting anyone, or a partial failure impacting a lucky few.
Third Party Performance Impact
During our tenure at DoubleClick we learned the hard way that failure will happen and that the impact was HUGE. Therefore we incorporated managing failure in our plans, processes, and infrastructure. We were one of the first adserving companies that introduced SLAs in the 90sand we monitored our system carefully and ensured we met those SLAs. If we did not, we had to give money back to our customers.
Performance Is not the Only Risk
While SLAs on speed and availability are key because they ensure the provider has skin in the game and the website has an insurance, they might not be sufficient to mitigate all risks. A third-party vendor can impact a publisher in many different ways besides slowing down the page. Their code is executing on the browser and it can conflict with code on the page, it can break user experience in certain browsers, it can deliver inappropriate content or worst malware, and their system could store data about the user without the user or the website knowing inadvertently or worst on purpose.
These are all risks that websites and their providers must overcome, and the only way that can be achieve is if the providers clearly follow certain rules. Based on our previous and current experience in the area we wrote up a set of rules that we believe third-party providers should aim to follow in order to mitigate risks and be successful.
The Golden Rules
1. Provide and enforce meaningful SLAs. As a service provider to other companies you must have SLAs set in place. You should make sure they are properly monitored and enforced – so that everyone in your organization is aware of the consequences of “failure”.
3. Rely on a global infrastructure that covers your clients needs (self built or CDN). Your clients place your tags on their pages and have visitors from various geographical locations. Therefore ensure that your service is reachable and close to these locations. If your servers are in US West Coast, and you clients have mainly European users – you might want to rethink the location of your infrastructure and rely on CDNs for static libraries and distributed DNS system.
5. Ensure security of the content you deliver. Ensure that your system/service does not deliver malware, phishing, or viruses. In some cases you might also ensure that the content is appropriate for the site, for example not delivering alcohol related ads on kids websites.
9. Don’t be a resource hog. Ensure your content does not become a hog of bandwidth, CPU, Memory, or other resources. Remember you are not the only tag in the page and not the primary content the end-user is visiting the page for. So be a minimalist and smart on end-user resource utilization.
10. Have a test plan for new releases. Test your tags and new releases on actual web pages of your clients and make sure it does not break them. Test on different browsers and devices to ensure it works as planned. If it is a major change, notify clients of the change and involve them in the testing process. Don’t just test a new release on your client’s pages without clients knowing of the change – you might break something for their end users and the site is not aware of it.
Whether you are a vendor or a website, we hope you will find these rules helpful in ensuring great user experience. If you have any suggestions or feedback please feel free to share with us.
From Velocity 2012 – Mehdi – Catchpoint
- Facebook Outage: Wake Up Call For Websites! (6/2012)
- The vendor who flunked the web performance test! (5/2012)
- Webpages turning into Airports without Traffic Controller! (10/2011)
- 3rd Party Monitoring – A process (12/2010)
- True Availability of a Webpage (7/2010)
- Monitoring the Performance of 3rd Party Providers (7/2010)