The "autoplagiarizer"

by Peter on Wednesday, May 11, 2011

Instead of plagiarizing manually for this post, I decided to write a program that does it for me. Check it out: The Autoplagiarizer.

Here’s how it works, essentially:

  • A script scrapes the raw content from five different Wikipedia articles that I chose as sources (in this case: ‘Plagiarism’, ‘Copyright infringement’, ‘Cryptomnesia’, ‘Plagiarism detection’, ‘Academic dishonesty’).
  • It then cleans up that content (removing links, most wiki markup, etc), splits it into sentences, and throws all of those sentences in a bucket.
  • A randomizer then goes through and creates five paragraphs of varying lengths, picking sentences at random from the bucket.
  • A template formats the essay in HTML and the script passes that HTML to a web server for public access.

I still have some issues to work out and things to add. For example, you might notice that there are some weird encoding issues that cause symbols to display as gibberish. I’d also like to add the ability for the user to choose other Wikipedia articles as sources for the generated essay (and perhaps a bit more processing to make paragraphs a tiny bit more coherent). For now though, I’m pretty happy with it—I think it simulates a poorly-plagiarized college essay pretty well. :-)

Leave your comment