The "autoplagiarizer"

by Peter on Wednesday, May 11, 2011

Instead of plagiarizing manually for this post, I decided to write a program that does it for me. Check it out: The Autoplagiarizer.

Here’s how it works, essentially:

  • A script scrapes the raw content from five different Wikipedia articles that I chose as sources (in this case: ‘Plagiarism’, ‘Copyright infringement’, ‘Cryptomnesia’, ‘Plagiarism detection’, ‘Academic dishonesty’).
  • It then cleans up that content (removing links, most wiki markup, etc), splits it into sentences, and throws all of those sentences in a bucket.
  • A randomizer then goes through and creates five paragraphs of varying lengths, picking sentences at random from the bucket.
  • A template formats the essay in HTML and the script passes that HTML to a web server for public access.

I still have some issues to work out and things to add. For example, you might notice that there are some weird encoding issues that cause symbols to display as gibberish. I’d also like to add the ability for the user to choose other Wikipedia articles as sources for the generated essay (and perhaps a bit more processing to make paragraphs a tiny bit more coherent). For now though, I’m pretty happy with it—I think it simulates a poorly-plagiarized college essay pretty well. :-)

Plagiarism and the web

by Peter on Monday, April 18, 2011

As evidenced by our heated (albeit productive) class discussion on Wednesday, plagiarism is an extremely complex issue, and it’s one with few black-and-white answers. Its definition and consequences have long been a point of contention in both academic and public contexts, and the waters have been muddied even further by the advent of the web. In this post, I’d like to explore some of the issues with content ownership and intellectual property on the web and see if I can’t relate them back to our academic context.

One of the most intriguing movements of the late 20th century is copyleft, which was inspired in part by new technologies burgeoning in the 1970s and 80s. As its name hints, copyleft positions itself as an alternative to copyright legislation, which some would argue has only become convoluted and unwieldy in the last century. Whereas copyright is intended to claim intellectual property and prevent others from profiting from identical/similar work, copyleft disclaims this ownership (to an extent—more on this later) and actively encourages others to share, reuse, redistribute, rework, and in some cases even profit from the original author’s work.

While it may not be explicitly copyleft, Creative Commons is an organization with their own, contemporary approach to similar goals. Most of the licenses that Creative Commons maintains encourage varying levels of sharing, remixing, and reusing,1 but they have additional terms which are intended to simultaneously protect the originality of the author’s work and prevent abuse (or commercial use).

This brings me to Wikipedia, most of the content of which is licensed by a Creative Commons Attribution-ShareAlike (CC BY-SA) license. The terms of this license allow users to paraphrase portions of or even copy/paste entire Wikipedia articles, but under two important conditions: the user must attribute the text to the proper Wikipedia article, and they must release the “new” work under a similar license.2

It’s the “attribution” requirement that is particularly interesting to me. The original CC BY-SA gives the licensor responsibility to define their own terms for attribution: “You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).”3 However, to me at least, Wikipedia’s own attribution terms and requirements are somewhat unclear. Wikipedia’s own Reusing Wikipedia content article implies that a simple link somewhere on the licensee’s page is satisfactory. As one Creative Commons blog entry discusses, surveyed Wikipedia contributors loosely agreed with this statement. Both of the articles I linked to also mention that a list of contributing authors should also be cited if possible, but the very nature of Wikipedia makes this impractical for most content.

However, is a simple link to a Wikipedia article truly sufficient? I tend to believe that it’s not, and I know for a fact that it isn’t by the standards of higher-education institutions. As we saw in the wiki assignment, a paraphrased intro paragraph with a link sitting underneath it is simply not acceptable in academia. But how do we reconcile these two standards? Should there be compromise between the two contexts, or no? I’m of the opinion that all uses of Wikipedia’s content should be cited properly according to an academic standard like the MLA. I realize though that the technological knowledge required to do this is not common by any means as proven again by the wiki assignment—most students understand how to use Word’s inbuilt footnote system, but very few understand how to use the <ref> tag in a wiki, and even fewer understand how to write the HTML required for footnotes in other, non-wiki contexts.

Class blogs like this one bring up an even more interesting set of questions. When referencing another blog article, is an inline link sufficient, as I’ve done here? When does a footnote become necessary, and how do those standards differ from class blogs to personal or professional blogs? (Even as I write this entry, I hesitate to think that I’ve cited everything perfectly. In fact, I find that proper MLA footnotes are overkill for what I’m doing here.) How far out of my way should I go to find a so-called “stable” URL for all of the pages I’ve referenced, one which will remain identical and accessible for decades to come? In many cases, this is actually impossible—Wikipedia’s revision archival system comes close, but there’s still no guarantee that their URLs will stay intact for as long as may be necessary. In a broader sense, how far do we go as a society to “protect” content? Perhaps more importantly, how do we decide what content is worth protecting?

Ultimately, I’m left with more questions than answers, as usual.

Rhetorical analysis proposal

by Peter on Tuesday, March 1, 2011

For my rhetorical analysis, I’ve chosen to build a product website for Publishr, a hypothetical blogging service targeted at designers. For the past week or so, I’ve been in an accelerated version of the “product development” phase, where I’m actually planning and designing the Publishr service as if it were going to be an actual product. This phase involves a number of things, like defining Publishr’s target market, brainstorming names and branding and writing copy. Perhaps most importantly, I’ve thought a lot about how Publishr could distinguish itself from the wealth of similar services already offered like Wordpress, Blogger, TypePad and Tumblr. (As a side note, Tumblr itself actually served as the primary inspiration for my fake product, since it also positions itself as a better alternative to Blogger and the others, and as a blogging service for creative people.) Starting from this early point in the process is a great way to consider every opportunity I have to build the verbal and visual rhetoric that will (hopefully) drive visitors to sign up for the service.

I’ve started designing and building the site itself, and my next step will be to analyze my own rhetoric directly on the site itself. This will be facilitated by some sort of mechanism that allows users to switch between the normal product page and my analysis—perhaps taking the form of a “rhetoric switch” to illuminate the “writing on the wall”, so to speak, pointing out specific and general examples of the site’s rhetoric.

Why is all of this important? As consumers, I think it’s imperative that we educate ourselves about the various ways in which sellers persuade us to buy their products or sign up for their services. I hope this project will help shed some light on those methods. I also wish to make an implicit distinction between a site like this one and others that might use “dirtier” tricks and shady techniques to get your business. Of course, every seller wants to persuade you, but that doesn’t mean that all methods of persuasion are necessarily evil. Some marketing techniques are perfectly valid (in my opinion, at least), and an informed customer might be encouraged to give their money to a company that doesn’t engage in shady tactics. I hope that my site will help people to make that distinction.

Questions

In what ways do product websites attempt to entice visitors to sign up? What kinds of tactics do they employ?

What’s the difference between “shady” rhetoric and legitimate rhetoric? How easy is it for readers to distinguish between the two?

What kinds of metrics can a website like this use to gauge the success of their rhetoric?

How do companies market online services differently than others—tangible goods, for instance? (I might have to downplay this one, since it will be difficult to explore in the context I’ve chosen.)

Five (or maybe eight) websites

by Peter on Saturday, February 19, 2011

You Might Find Yourself
Bobulate
i can read
Sexpigeon
ill iterate
Eunoia
I Love Charts
All That Is Interesting

These blogs are particularly interesting to me because of the nature of their presentation. Seemingly endless fragments of content—images, quick thoughts, quotes, videos, audio clips—pour into reverse chronological order to create one vast stream of consciousness. This sea of content paints a rather vivid picture of who the author is, and it does so more effectively than the majority of blogs that I come across. In this way, each of these sites is very much a personal experience.

A few other common factors contribute to the relatively unique style of these websites. Most of them don’t allow comments, and while some users may not like this, I think it adds more to these sites’ experience than it takes away from it. They all distinguish themselves with minimalistic design, further emphasizing the actual content of each one. There are also few or no ads, and I think that ads would only detract from reading in this context.

One more note for web nerds and bloggers alike: All of these sites are powered by Tumblr, a service which lends itself well to these stream-of-consciousness style blogs. It’s free, so check it out if you’re looking to maintain yet another online profile.

This gives "hyperthreading" a whole new meaning

by Peter on Monday, February 14, 2011

Speaking of expecting hypertext even in traditional media:

Maria Fischer's "Traumgedanken" ... uses threads pierced through the pages and affixed to other pages to make physical hyperlinks between ideas.

[via BoingBoing]

Designing Media

by Peter on Sunday, February 13, 2011

As I make my way through the interviews in Designing Media, I find myself leaning further and further towards Chris Anderson’s opinion of the term “media”:

I think media is an expired word. I don’t know what it means. It’s a word that maybe once had meaning but that meaning has been fuzzied to the point that it means everything and as a result nothing today. I think in the twentienth century media meant something pretty crisp until Marshall McLuhan came and screwed it all up. Today I have no idea what media means.

I notice that as I read, the word “media” practically disappears—I ignore it in the same way that I ignore words like “a” and “the”. Like it has for Chris, the word to me has become stretched so thin that it no longer means anything.

Even so, one interview in particular stood out to me for the purposes of this blog assignment. In his interview, Roger McNamee discusses media in the context of musicians and promotion, particularly those of his band Moonalice. The implicit definition of media in McNamee’s interview seems to be any means of connection—connection between a band and their fans, connection among communities centered around certain passions and interests, and so on. This type of connection requires participation from all sides, and participation is highly valuable in today’s culture. He contrasts this to the old media models of forty or fifty years ago, where the aim was to get a very one-sided message out to large segments of consumers. As McNamee points out, “a behavioral change has taken place, and people are returning to the notion that it’s more fun and entertaining to create media than it is just to consume it.” The barriers for entry to the world of media creation have all but dissolved.

McNamee implies that the type of communication is not important; whether it be posters, t-shirts, DVDs, or Twitter, the medium is very much secondary to the connection that people seek by communicating across it. What’s valuable to him is what’s actually happening between the band and their fan base.

I am a reader who...

by Peter on Sunday, February 6, 2011

I am a reader who…

  • gets distracted easily.
  • loves discussing what I’ve read in a group, sometimes more than I enjoy actually reading it.
  • sometimes finds it difficult to enter the frame of mind necessary to read a text for a class assignment.
  • Googles unfamiliar concepts or phrases.
  • rarely reads in a linear fashion, even with “linear” texts.
  • loves experimental/non-traditional storytelling methods or textual styles.
  • seeks connections to a story and its characters.
  • sympathizes with authors and characters.
  • expects a high level of linguistic fluency (spelling, grammar, syntax, etc).
  • enjoys hearing about others’ reading and learning styles—they often help me better understand my own.

As I write these, I can’t help but think that I fit squarely into the “community-based literacies” camp, also described as “engaged learning” by Alexander and Fox. Even in my earliest education, my learning process has always been defined clearly by my environment—by my context as a student. As I alluded to in the above list, I absorb more from in-class discussions than I do from the actual reading process. I’m eager to hear what other people have to say about a text, and I’m also eager (perhaps too eager, at times) to share my thoughts as well and place them in the context of the class and ultimately, the overall community.

It’s also worth noting that my reading style has been heavily influenced by new technology—more specifically, the web. Hypertext has caused me to expect almost all reading to be easily cross-referenced, interactive, and my mind tends to wander when it’s not (in the case of a printed article or a book, for example). This proliferation of and transformation by technology is one of the guiding principles of the “Era of Engaged Learning” according to Alexander and Fox.

As a sidenote: It might be a bit premature, but I’d be willing to bet that the majority of this class will also claim “allegiance” to the community-based literacies camp, simply because it is the educational style that we’ve been subjected to for most of our student careers.