Categories: GoogleTechnology

Google Drive Flags Nearly Empty Files for ‘Copyright Infringement’

Google Drive Flags Nearly Empty Files for ‘Copyright Infringement’

Users were left startled as Google Drive’s automated detection systems flagged a nearly empty file for copyright infringement.

The file, according to one Drive user, contained nothing other than just the digit “1” within.

Is digit ‘1’ copyrighted?

This week, Assistant Professor at Michigan State University, Dr. Emily Dolson, Ph.D. reported seeing some odd behavior when using Google Drive.

Also Read: Got Hacked? Here Are 5 Ways to Handle Data Breaches

One of the files in Dolson’s Google Drive, ‘output04.txt’ was nearly empty—with nothing other than the digit ‘1’ inside it.

But according to Google, this file violated the company’s “Copyright Infringement policy” and was hence flagged.

And what’s worse is, the warning sent to the professor ended with “A review cannot be requeste for this restriction.”

Dolson’s file ‘output04.txt’ was stored at path ‘CSE 830 Spring 2022/Testcases/Homework3/Q3/output’ in Drive which led the professor to wonder if the file path possibly contributed to the false alarm.

Present on Dolson’s “non-educational Google account,” the file was among a batch of TXTs containing output generated as part of a homework assignment.

One too many digits

A pseudonymous user also shared screenshots of their Google Drive account where files containing just the digit “1”—with or without newline characters, were flagged.

“The 1 byte files contain just ‘1’, the 2-byte file is ‘1\n’, and the 3-byte (not flagged yet) file has ‘1\r\n’,” wrote the user.

Files with ‘1’ also flagged by Google Drive for copyright violation (Imgur)

And, it turns out the behavior isn’t limited to just files containing the digit “1.”

Dr. Chris Jefferson, Ph.D., an AI and mathematics researcher at the University of St Andrews, was also able to reproduce the issue when uploading multiple computer-generated files to Drive.

Also Read: Compliance Course Singapore: Spotlight on the 3 Offerings

Jefferson generated over 2,000 files, each containing just a number between -1000 and 1000.

The files containing the digits 173, 174, 186, 266, 285, 302, 336, 451, 500, and 833 were shortly flagged by Google Drive for copyright infringement.

Some allege that should the file contain just the digit “0,” Google would permanently disable your account, although the outcome more likely applies to users that Google deems to be repeat infringers.

“I deleted the experiment, just in case I got my account deleted for too many naughty numbers,” writes Jefferson.

Mikko Ohtamaa, founder of Defi company Capitalgram, alleged that Google’s automated style of flagging suspected copyright infringement candidates could be problematic with parts of the GDPR legislation.

Note, however, the GDPR Article 22 aka “automated individual decision-making, including profiling,” more specifically refers to making automated decisions about individuals by profiling their online behavior, such as before granting loan or when making hiring decisions, as explained by UK’s ICO.

“I’d have more sympathy if it weren’t ‘A review cannot be requested for this restriction,'” writes HackerNews user OneLeggedCat. “It’s designed to be as brutal and draconian as possible. They chose this. It is guilty until proven innocent, with no recourse.”

It isn’t known yet what causes this behavior, and BleepingComputer has been unable to reproduce the issue at the time of writing.

In 2018, Google published a detailed document explaining how the company fights piracy. But when specifically talking about Google Drive, the report states a “full-time abuse engineering
team” was set up by Google for tackling illegal streams served on Google Drive. As such, not much information is available on how Google’s algorithms process non-video content stored on Drive. 

BleepingComputer reached out to Google well in advance of publishing with specific questions—such as, whether Google relied on checksums to keep track of copyrighted content and if this behavior rose from a possible hash-collision between copyrighted files and a benign ones sharing the same hash.

We have not heard back from Google at this time.

Update 11:43 AM ET—Google seems aware of the issue and is working on a resolution. The company additionally shared links for requesting review of a violation and urged users to visit the Community Forum for additional assistance.

Privacy Ninja

Recent Posts

Role of Enhanced Access Controls in Safeguarding Personal Data in Telecommunications

Role of Enhanced Access Controls in Safeguarding Personal Data in Telecommunications that every Organisation in…

2 weeks ago

Role of Effective Incident Response Procedures in Strengthening Data Security

Effective Incident Response Procedures in Strengthening Data Security that every Organisation in Singapore should know…

2 weeks ago

Strengthening Your Cyber Defenses: The Crucial Role of Regular Vulnerability Scanning

Crucial Role of Regular Vulnerability Scanning that every Organisation in Singapore should know. Strengthening Your…

2 weeks ago

Enhancing Data Security with Multi-Factor Authentication

Enhancing Data Security with Multi-Factor Authentication that every Organisation in Singapore should know. Enhancing Data…

3 weeks ago

A Strong Password Policy: Your Organization’s First Line of Defense Against Data Breaches

Strong Password Policy as a first line of defense against data breaches for Organisations in…

3 weeks ago

Enhancing Website Security: The Importance of Efficient Access Controls

Importance of Efficient Access Controls that every Organisation in Singapore should take note of. Enhancing…

4 weeks ago