Is web scraping illegal? - Notes
Is web scraping illegal?
Depending on who you ask, web scraping can be loved or hated. Startups love it because it’s a cheap and powerful way to gather data without the need for partnerships. Big companies use web scrapers for their own gain but also don’t want others to use bots against them. The general opinion on the matter does not seem to matter anymore because in the past 12 months it has become very clear that the federal court system is cracking down more than ever.
Lets take a look back. Web scraping started in a legal grey area where the use of bots to scrape a website was simply a nuisance. Not much could be done about the practice until in 2000 eBay filed a preliminary injunction against Bidder’s Edge. In the injunction eBay claimed that the use of bots on the site, against the will of the company violated Trespass to Chattels law. The court granted the injunction because users had to opt in and agree to the terms of service on the site and that a large number of bots could be disruptive to eBay’s computer systems. The lawsuit was settled out of court so it all never came to a head but the legal precedent was set.
In 2001 however, a travel agency sued a competitor who had “scraped” its prices from its Web site to help the rival set its own prices. The judge ruled that the fact that this scraping was not welcomed by the site’s owner was not sufficient to make it “unauthorized access” for the purpose of federal hacking laws. Two years later the legal standing for eBay v Bidder’s Edge was implicitly overruled in the “Intel v. Hamidi”, a case interpreting California’s common law trespass to chattels. It was the wild west once again.
Over the next several years the courts ruled time and time again that simply putting “do not scrape us” in your website terms of service was not enough to warrant a legally binding agreement. For you to enforce that term, a user must explicitly agree or consent to the terms. This left the field wide open for scrapers to do as they wish.
Fast forward a few years and you start seeing a shift in opinion. In 2009 Facebook won one of the first copyright suits against a web scraper. This laid the groundwork for numerous lawsuits that tie any web scraping with a direct copyright violation and very clear monetary damages. The most recent case being AP v Meltwater where the courts stripped what is referred to as fair use on the internet. Previously, for academic, personal, or information aggregation people could rely on fair use and use web scrapers. The court now gutted the fair use clause that companies had used to defend web scraping. The court determined that even small percentages, sometimes as little as 4.5% of the content, are significant enough to not fall under fair use. The only carveout the court made was based on the simple fact that this data was available for purchase. Had it not been, it is unclear how they would have ruled.
Then a few months back the gauntlet was dropped. Andrew Auernheimer was convicted of hacking based on the act of web scraping. Although the data was unprotected and publically available via AT&T’s website, the fact that he wrote web scrapers to harvest that data in mass amounted to “brute force attack”. He did not have to consent to terms of service to deploy his bots and conduct the web scraping. The data was not available for purchase. It wasn’t behind a login. He did not even financially gain from the aggregation of the data. Most importantly, it was buggy programing by AT&T that exposed this information in the first place. Yet Andrew was at fault. This isn’t just a civil suit anymore. This charge is a felony violation that is on par with hacking or denial of service attacks and carries up to a 15 year sentence for each charge.
I am in the business of blocking bots. I think there is a legitimate need for some companies to be able to prevent unwanted web scrapers from accessing their site. That said, I don’t know if I can agree with the recent direction of court rulings. The eff and many industry experts agree. Are the existing laws too antiquated to deal with the problem? Should new legislation be introduced to provide more clarity? Most sites don’t have any protections in place, don’t the companies have some burden to prevent web scraping? As the courts try to further decide the legality of scraping, companies are still having their data stolen. Instead of looking to the law to eventually solve this technology problem, it’s time to start solving it with anti-bot and anti-scraping technology today.