IA E A PRAGA DO PLÁGIO QUE PODE SER COMBATIDA

Views: 6

newsletter banner

ISSUE 21.36.F • 2024-09-02 • Text Alerts!Gift Certificates
You’re reading the FREE newsletter

Plus Membership

You’ll immediately gain access to the longer, better version of the newsletter when you make a donation and become a Plus Member. You’ll receive all the articles shown in the table of contents below, plus access to all our premium content for the next 12 months. And you’ll have access to our complete newsletter archive!

Upgrade to Plus membership today and enjoy all the Plus benefits!

 

In this issue

PUBLIC DEFENDER: Is this article plagiarism? Now you can find out.

Additional articles in the PLUS issue

MICROSOFT 365: Get Office Copilot now — without paying

MICROSOFT: Microsoft’s new Master Services Agreement

ON SECURITY: Safe remote access — from anything to anything


 

ADVERTISEMENT
Tech BrewTech Brew

Join the over 400K people reading Tech Brew – the free 3x/week email delivering the latest updates on the technology changing the business world. Check it and start getting smarter today!

Try it!

 


 

PUBLIC DEFENDER

Is this article plagiarism? Now you can find out.

Brian Livingston

By Brian LivingstonComment about this article

An epidemic of plagiarism — outright duplication of other people’s works — is raging through chatbots and other artificial-intelligence technologies.

One study shows that almost 60% of the outputs from some chatbots contain plagiarism. The good news? The latest detection software can be 100% accurate in separating AI-plagiarized text from original, human work.

The problem is real. An AI detection company, Copyleaks, released in February 2024 an analysis of 1,045 outputs from OpenAI’s GPT-3.5 chatbot. Some form of plagiarized content was found in 59.7% of the bot’s writings, according to the firm.

This study can be criticized because Copyleaks itself sells software that aims to detect AI-generated text and plagiarism. Also, OpenAI released GPT-4 in 2023, which by all accounts is an improved version of the bot.

But finding copied or “lifted” writing in so much of a chatbot’s output represents a huge challenge for those of us who expect information we find on the Web to be factual and original.

Let’s be clear: there is such a thing as ‘fair use’

It’s important at this point that we make a hard-and-fast distinction between plagiarism — the copying or almost-identical paraphrasing of other people’s writing by an AI or a human writer — and fair use. The latter involves a quotation of someone else’s work while giving full credit to the original author. There is no attempt to claim that a quote from an author is an original composition by the second writer.

Unlike patents and trademarks, which require a formal application to (and approval by) a governmental agency, copyrights are granted automatically. The author of a work gains the protection of copyright the moment a piece of writing exists in tangible form — whether or not it includes a copyright notice. (See a Copyright Laws explanation.)

In my own articles, I quote other people’s words and reprint images they use to promote themselves. But I always identify the origin and direct my readers to the original source. This enhances the value of the original work rather than merely copying it and claiming it as my own.

A few services rise to the top in detecting AI-written text and plagiarism

What can we do to protect ourselves against copycat writing — and outright falsehoods — when we’re reading material that’s supposed to be true and accurate?

People who copy others’ works wholesale — perhaps to fill a website with stolen or bogus material — currently use a variety of tricks to avoid detection. One site, Surfer Blog, offers a so-called text humanizer that supposedly adjusts AI output so it evades identification by anti-AI software.

Fortunately for those of us who want honest and original information, the makers of plagiarism detectors seem to have finally achieved the upper hand.

An exhaustive test suite by William H. Walters, executive director of the library of Manhattan University, finds that 3 out of 16 AI text detectors have a perfect or very-nearly perfect record of distinguishing artificially generated writing from the prose of actual humans.

Walters’s tests included academic papers written by both ChatGPT-3.5 and the newer ChatGPT-4. This avoids the criticism that the two versions create different content. The chatbot’s writings were mixed together with papers on the same subjects that had been turned in by college undergraduates. The human-authored papers were written in 2014 or 2015, before AI programs became widely available. This ensured that AI could not have been involved in the preparation of the students’ assignments.

The test included 126 documents of approximately 2,000 words each on topics in natural science, humanities, and social science. A human instructor would be hard-pressed to guess which papers had been written by students and which had been output by an emotionless AI.

But the three AI-detection services that scored the best in the suite of tests were almost flawless in sorting the wheat from the chaff.

Survey results
Figure 1. A survey by Dr. Donald L. McCabe of more than 70,000 high school students in the United States found that 58% admitted to plagiarism, according to ICAI. Because the results are self-reported, the actual percentage may be higher.Photo by Nestor Rizhniak

The three services that Walters’s study determined were 98% to 100% accurate at detecting AI-generated writing are:

  • Copyleaks. Free with registration: 45,000 words per day. Free with no registration: 6,250 words per day. With subscription of $108 to $168 per year: word allowances vary. Pricing
  • Turnitin. Negotiates licenses with educational institutions: Unlimited words per day. Kent State University reports that it pays Turnitin $3 per student per year, according to 97unique.
  • Originality. $60 one-time payment for 600,000 words. $15 per month for 100,000 words per month. $137 per month for 1,500,000 words per month. Pricing

In Walters’s study, Copyleaks and Turnitin were rated as 100% accurate. Originality was rated 98% correct, but that simply means it labeled 2 out of 126 tested documents as “uncertain.” The software demonstrates a degree of accuracy that’s comparable to Copyleaks and Turnitin.

By contrast, the other 13 AI text detectors scored poorly in Walters’s tests. Those software services were able to correctly identify text as AI-generated or original human content in only 63% to 88% of the cases. That represents a lot of false positives and false negatives if you’re judging many writing samples. (See Table 4 of the study for the complete results.)

The services offer different pricing structures. The plans provide a varying number of words per day or month they’ll process for you. Some websites allow a limited amount of free use or a free-trial period. Unlike other services, Turnitin doesn’t market to individuals at all — instead, the company negotiates the prices of its annual contracts with universities and other large institutions.

The services competing for your business are many and varied

Aside from Walters’s exhaustive tests, there are several other reviews of AI-detection software on the Web. Unfortunately, it’s hard for the average reader to determine which of the many reviewers to believe.

One of the most comprehensive comparisons of plagiarism-checker software is by Trust Radius. Its website compiles hundreds of ratings by actual users of software into lengthy articles showing ratings up to 5 stars for the best products.

Trust Radius reviews business software
Figure 2. Trust Radius includes thousands of business-software ratings by reviewers who are authenticated for both their computing experience and their independence from vendors. Source: Trust Radius project-management article

Unfortunately, while it’s certainly helpful, Trust Radius’s listing of 29 plagiarism-detection programs fails to sort the products from the highest-rated to the lowest. Allow me to report to you here the software apps that garnered the best ratings:

  • 5.0 out of 5.0 stars: Noplag
  • 4.5 out of 5.0 stars: Copyleaks, Copyscape, Dupli Checker, Plagiarism Detector, PlagScan, Paper Rater, Grammarly

In the 4.5-out-of-5.0 category, I’ve listed Grammarly last. That’s because 100% of the reviewers of the other high-rated programs said they were “happy with the feature set.” Only 96% of the reviewers said that about Grammarly, although the difference is admittedly minor.

Notice that the Trust Radius reviewers gave high marks to Copyleaks, which came out at the top of the Walters study. But the very best Trust Radius rating went to a product called Noplag, which was not included in Walters’s tests. If you’re serious about finding the best plagiarism-detection software for your needs, you’ll need to do some research of your own into these apps’ features and costs. See the Trust Radius listing for full details.

Speaking of costs, many of my readers can only consider new software that has a free API. If that’s the case for you, check out a listing of free plagiarism checkers posted this month in a Quora thread. The five suggested apps are:

  • Plagscan
  • Plagiarism Checker (by Small SEO Tools)
  • Quetext
  • Plagiarism Checker (by Search Engine Reports)
  • Unicheck

The original poster points out: “Many services offer limited features in their free versions or require payment for comprehensive access. … For the most accurate and robust solutions, consider investing in a paid service if your needs are significant.”

Confusingly, the Quora listing features some apps that were not included in Walters’s test suite or in Trust Radius’s reviews. Until this category of software sorts itself out into a few acknowledged leaders, you’ll have to choose your personal favorite from a dizzying array of different products and services.

At least the detectors don’t think that I’m a chatbot

To see for myself how well detectors can distinguish between AI-generated material and human creativity, I pasted a plain-text version of this column into the free submission form at Copyleaks.

The service reports that my material shows no signs of plagiarism (a “matched text” score of 0%) and an AI content score of zero. Whew! I was starting to get worried. After all my years of being saturated in computer technology, I sometimes feel a bit robotic myself.

Stay safe out there!