GitHub Restores Popular Python Repo Hit by Bogus DMCA Takedown
Yesterday, following a DMCA complaint from HackerRank, GitHub took down a repository that hosts the official SymPy project documentation website.
First released fifteen years ago, SymPy is an open source library for symbolic computation and helps Python developers implement various computer algebra capabilities in their programs.
It turns out the DMCA notice filed by HackerRank’s representatives was sent out in error and generated much backlash from the open source community.
The DMCA notice has since been rescinded and GitHub has restored the repository.
Dubios DMCA claim knocks docs site offline
On Wednesday, April 20th, SymPy’s documentation site stopped working and instead served 404 (Not Found) error messages to visitors, as seen by BleepingComputer:
Documentation sites of software projects provide installation instructions, tutorials, how-to guides, and explainer articles to users who are new to the project.
As such, docs sites like SymPy’s, are a vital ‘official’ resource for both novice and seasoned programmers who may want to learn more about specific features of a software library from time to time.
The repository, seen by BleepingComputer, began showing a DMCA takedown page yesterday and GitHub’s reason for taking it down:
It turns out the DMCA (copyright infringement) complaint was filed by HackerRank’s outsourced contractor, WorthIT Solutions, who regularly handles such takedown requests for HackerRank.
HackerRank is a competitive programming, remote interview, and hiring platform aimed towards developers and tech companies. In its words, HackerRank matches “developers with great companies.”
HackerRank’s online assessment exercises and interview solutions have been adopted by major tech players including Vanguard, VMWare, Snap, RedHat, and many more, when hiring top talent.
The erroneous copyright violation complaint filed by HackerRank’s partner knocked SymPy’s docs website offline. And this did not sit well with the open source community, who called out the “unethical” behavior of the company.
A YCombinator Hacker News reader said:
“Not the first time they’ve taken down GitHub repos, some with a legal basis… others with no involvement at all,” citing the example of React-Leaderboard repository that was removed by GitHub last year following HackerRank’s complaint. But, apparently, the repo contained no code or content infringing any party’s copyright.
This isn’t the first time that the job seeker platform has targeted repositories either—a popular complaint among the community members remains, HackerRank frequently asks GitHub to takedown entire repositories as opposed to limiting the removal request to just the infringing content in a repository.
Following the mishap, HackerRank CEO and founder Vivek Ravisankar stepped in to swiftly address the situation:
“In the interest of moving swiftly, here are the actions we are going to take: we have withdrawn the DMCA notice for SymPy; sent a note to senior leadership in GitHub to act on this quickly,” announced Ravisankar in the thread.
Additionally, to prevent such incidents from recurring, the company has suspended the DMCA takedown process for the time being until it reviews internal guidelines of what constitutes a “real violation.”
As a good-faith gesture, HackerRank further donated a sum of $25,000 to the SymPy project, following a suggestion from Travis Oliphant who is the founder of SymPy’s sponsor, NumFOCUS, and the original author of NumPy library.
“As a company we take a lot of pride in helping developers and it sucks to see this. I’m extremely sorry for what happened here,” concluded Ravisankar.
Within a few hours of Ravisankar’s involvement, SymPy’s docs repository was restored by GitHub and the documentation site is back up today.
Not all DMCA claims may be bogus
It is hard to conclude on what grounds are these copyright violation complaints being raised by HackerRank’s partner.
HackerRank does maintain sets of sample code and several banks of practice questions and exercises aimed to test and hone developer skills.
But this could have turned into a chicken-and-egg situation by now, where a developer builds their application from scratch but may have used parts of introductory code from a sample project provided by HackerRank.
Alternatively, could it be that, HackerRank used some introductory material from documentation websites of popular open source projects, but that later, its outsourcing provider believed the content originated from HackerRank? We don’t know.
As some readers have suspected though, GitHub’s hands might be tied in legal situations like these.
If GitHub does not take down the content reported via DMCA requests, the platform itself risks losing ‘safe harbor’ status and becoming liable in future copyright litigation.
A hosting provider, in this case, GitHub, that receives valid DMCA notices, may be compelled to act on these requests by taking down the allegedly infringing content quickly to remain legally compliant.
“The DMCA’s ‘safe harbor’ regime offers immunity to claims of copyright infringement if (among other requirements) online service providers promptly remove or block access to infringing materials after copyright holders give appropriate notice,” explains a copyright guide published by Fenwick & West LLP, a Silicon Valley-based law firm.
It is true that some repositories taken down were practically empty but contained skeleton code that the developer could have borrowed from HackerRank’s data, but in such cases, entire repositories were taken down for copyright violation.
Equally interesting is the fact that not all DMCA notices filed by HackerRank and other copyright owners are bogus.
In fact, several copyright notices [1, 2, 3, 4, 5] clearly show links to repos showing signs of plagiarism: with materials copied-pasted from platforms like HackerRank and LeetCode verbatim. Some of these repositories, still retained by Google Cache, even mention the terms, “HackerRank” and “Leet Code.”
As such, while HackerRank could have gotten DMCA takedowns wrong a few times, it does not seem to be immune from legitimate cases of plagiarism.
Until copyright laws are simplified for a digital world, and corporate legal processes are refined, we are bound to see cases where honest users are penalized by erroneous copyright violation notices, and those actively plagiarizing content get away.