Introduction

Citations to the papers being discussed are bold and included in the text below.

Objectives

Help give Wikimedians a window into the academic world studying Wikipedia by:
- Giving people more details on papers they have heard about.
- Introducing people to new papers or perspectives that they've not heard.
Start building bridges between academics and their specimens (that's us!).
Give Wikimedians ideas and inspiration for how they can benefit from those who are benefiting from their example.

Scope Conditions

Mako Hill suggested this project after doing a similar project for the Debian Project. That review was comprehensive and covered all academic articles on Debian. Foolishly, the idea was to do something similar for Wikipedia and Wikimedia projects. However, with hundreds of articles published in peer reviewed journals, even covering the last year proved prohibitively difficult.

As a result, we are not presenting everything (or even coming close!). There are many more papers than we could possibly review, read, or present in limited time.

The result is a highly curated selection of only 10 papers (or groups of papers). Here is a description of how we've attempted to limit things:

Only things from the last year and that have already been published (i.e., no Wikisym 2009 whose proceedings are still forthcoming)
Only Wikimedia/Wikipedia related works (i.e., not work on wikis in general)
Only papers that are strongly related to Wikimedia (i.e., not just one of several data sources used to test a theory)
Only articles written in English
A selection of articles with a biased toward being broad and representative:
- e.g., if we have taken one paper from a particular workshop, we will try to not take others, even if they are wonderful)
- e.g., we've selected papers that aim to represent the wide variety of fields currently constituting Wikimedia/Wikipedia scholarship
Papers that are likely to be relevant, interesting, or useful to Wikimedia Community members
No papers done by people or research groups attending Wikimania 2009
- As a result, authors can't criticize us for getting their paper wrong -- at least not right away
- Hopefully, the threat of us reviewing others papers creates an incentive for scholars to show up at Wikimania
A random selection from the piles of papers that remain

There are wonderful papers we've left out of here for no reason than that we didn't have room. We apologize to all the authors whose work great work was omitted.

Wikipedia As a Data Source

Pesenhofer, Andreas, Sonja Edler, Helmut Berger, and Michael Dittenbach. 2008. “Towards a patent taxonomy integration and interaction framework.” Pp. 19-24 in Proceeding of the 1st ACM workshop on Patent information retrieval. Napa Valley, California, USA: ACM

Everyone here made to the 1st ACM workshop on Patent information retrieval, right?

This is the only paper that is here basically entirely because it is representative of a larger class of scholarship. There are tons of these types of papers that are extracting data from Wikipedia -- usually from links, categories, and keywords.

Patent categories are helpful because they constrain searches in patent databases that make it more likely for patent filers to find prior art. But these schemes are made by experts. Wikipedia provides one way of getting a more general category scheme. The authors:

Started with the English Wikipedia Science Portal.
Extracted keywords from each of the categories listed on the portal.
Looked for those keywords in patents.
Matched up patents with scientific categories.
The result was both the Wikipedia Science Ontology and a patent browser based applications that the authors built that was built around it.

Descriptive Work

Quantitative Analysis

Ortega, Felipe. 2009. “Wikipedia: A Quantiative Analysis.” PhD dissertation, Universidad Rey Juan Carlos http://libresoft.es/Members/jfelipe/phd-thesis (Accessed June 22, 2009).

Analysis of the top 10 Wikipedias
Showed a trend towards progressive increase of the effort spent by the most active authors, as time passes by
Shows off WikiXRay -- a tool used to automatically download dumps and run analysis

Questions include:

How does the community of authors in the top ten Wikipedias evolve over time? (A: Reaches a steady state in 2006 or 2007 in most.)
What is the distribution of content and pages in the top ten Wikipedias? (A: Steady state as well, but bimodal in terms of article size.)
How does the coordination among authors in the top ten Wikipedias evolve over time? (A: Steady as well. JP/NL/PO use talk pages not very much; English and French use it like crazy.)
Which are the key parameters defining the social structure and stratification of Wikipedia authors? (A: The core is real. There is real inequality over where contributions go.)
What is the average lifetime of Wikipedia volunteer authors in the project? (A: Half life is about 200 days and less than 30 in PT and EN.)
Can we identify basic quantitative metrics to describe the reputation of Wikipedia authors and the quality of Wikipedia articles?Is it possible to infer, based on previous history data, any sustainability conditions affecting the top ten Wikipedias in due course? (a:High quality articles are the work of large number of people over long periods of time 1000s of days and lots of work from the core.)

Conclusions are that:

On top of that, the lack of new core members seriously threaten the scalability of the top-ten language versions regarding the quality of their content. We have demonstrated in the analysis previously presented that the eldest, top-active contributors are responsible for the majority of revisions in FAs, as well. Since the number of core authors has reached a steady-state (due to the leverage in the total number of active authors per month), the group of authors providing the primary source of effort in the revision of quality articles has stalled.

Not much work on mechanisms, but provides a wonderful starting place for people interested in both data and tools.

Topic and Sub-topic Coverage

Halavais, Alexander, and Derek Lackaff. 2008. “An Analysis of Topical Coverage of Wikipedia.” Journal of Computer-Mediated Communication 13:429-440.

Two major studies. The first one was based on a comparison of printed books Wikipedia articles in coverage:

3000 articles downloaded (excluding articles of less than 30 words)
A subset of 500 of these articles was categorized by two coders
Comparisons were made between the books and WP aricles.

Study 1 results:

They showed a mismatch in a number of categories
They showed that different articles are edited at very different rates based on categories (Naval articles, not edited so much)

Study 2 focused on coverage within areas by comparing Compared 3 print encyclopedias with Wikipedia and looked at coverage. Resuls were that Wikipedia had:

81% of physics articles
79% of linguistic articles
only 63% of poetry articles

Quality

There's a whole bunch of different people talking about quality in subareas. One great general argument is:

Stvilia, Besiki, Michael B. Twidale, Linda C. Smith, and Les Gasser. 2008. “Information quality work organization in wikipedia..” Journal of the American Society for Information Science & Technology 59:983-1001.

Samples of 1000 articles from dumps
Plus all FAs and 1000 from templates, categories, and project pages
Analysis of 60 talk pages (30 from FA)
User pages from 100 user pages
Handcoded by authors themselves

Findings:

The findings won't be surprising to any very active Wikipedian. But they way they are presented just might be.

There is also plenty of people who go into pretty extreme depth and evaluate just one subarea. There are loads of these types of papers. Two examples include:

Clauson, Kevin A, Hyla H Polen, Maged N Kamel Boulos, and Joan H Dzenowagis. 2008. “Scope, completeness, and accuracy of drug information in Wikipedia.” The Annals of Pharmacotherapy 42:1814-1821.

Comparison of Wikipedia with the Medscape Drug Reference in regards to a series of eight questions. 80% of people search for drugs online. Wikipedia often the first search result. 80 questions in 8 categories about common and potentially dangerous drugs were asked and evaluated between the two databases and WP 90 days before. Results:

Answers: WP: 40%, MDR 82.5%
WP answered more questions about indicators (60% versus 50%) and tied for mechanisms at 80%
WP answered none of dosage versus 90% for MDR
WP answers were less complete (76% versus 95.5%)
WP was never wrong but MDR was 4 times
Old pages were statistically significantly worse

Suppiah, A., and J. Cowley. 2008. “Perianal Fistula and Fistula-in-Ano..” Diseases of the Colon & Rectum 51:257-261.

Wikipedia got 3 1/2 stars which was fair (not the worse) but slightly lower than par.

How Does Wikipedia Work?

Wikipedian Personalities

Amichai-Hamburger, Yair, Naama Lamdan, Rinat Madiel, and Tsahi Hayat. 2008. “Personality Characteristics of Wikipedia Members..” Cyber Psychology & Behavior 11:679-681.

You probably heard about this study. That said, the actually study is not as "bad" as people made it out to be. It's only a 3 pager.

The study interviewed 139 subjects, 69 of which were Wikipedians.

The paper was designed to test a "real me" hypotheses about where people locate their "Real Me" (a psychological test).

Results you know:

Wikipedians were less agreeable, open, and conscientious than non-Wikipedians.
WP women were less extroverted than non-WPians

But he differences aren't as strong as you think.

Decentralization in Decision-Making

Forte, Andrea, and Amy Bruckman. 2008. “Scaling Consensus: Increasing Decentralization in Wikipedia Governance.” P. 157 Proceedings of the 41st Annual International Conference on System Sciences.

In-depth interviews of 11 interviews? (Almost certainly people in the room.)

The result is a short (10 page) paper. Like a lot of qualitative work, it's hard to describe but does a good job of walking through a series of policies issue related to BLPs (for example).

It's main argument is about "decentralization" of policies. It's also framed in terms of a set of theory about how commons are governed that comes from outside the online space (in particular, from Elenor Ostrom).

It backs this up with discussion of WikiProjects, related to decentralized creation and some discussion of decentralized implementation and enforcement of policy as well. It's a good, concise, informed description of how policy is made that's worth a read.

Modeling Promotion Decisions

Burke, Moira, and Robert Kraut. 2008b. “Taking up the mop: Identifying future wikipedia administrators.” Pp. 3441-3446 in Proceedings of Conference on Human Factors in Computing Systems. Florence, Italy: Association for Computing Machinery

Burke, Moira, and Robert Kraut. 2008a. “Mopping up: modeling wikipedia promotion decisions.” Pp. 27-36 in Proceedings of the ACM 2008 conference on Computer Supported Cooperative Work. San Diego, CA, USA: ACM

The CSCW paper:

Reviewed all RfA between January 2006 through October 2007

Model took into account:

strong edit history
varied experience
user interaction
helping with chores
observing consensus (measured by village pump activity)
edit summaries

Result is:

helping with chores doesn't seem to help
edit summaries sure do
edit counts sure do matter (10% increase in chance of approval for every 3800 edits)

Conclusions is that there is a real disconnect between the criteria we claim we use for RfAs and the actual criteria. When we control for other factors, the thing we say matter, often don't. At least, as far the authors have correctly operationalized things.

There are also interesting applications:

"The model is fast and easily computable on the fly, and thus could be applied as a self-evaluation tool for editors considering becoming administrators, as a dashboard for voters to view a nominee’s relevant statistics, or as a tool to automatically search for likely future administrators."

That said, the system is very focused on easy-to-measure data which has some important limitations.

Rule Creation

Many academics are suprised by the way that norms have been created over time.

Hoffman, David A., and Salil Mehra. n.d. “Wikitruth Through Wikiorder.” SSRN eLibrary. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1354424 (Accessed August 25, 2009).

Written from the perspective of law scholars, WikiTruth provides a really solid description of how disputes are handled on Wikipedia with attention paid to RFCs, Editor Assitance, Wikiquette, informal and formal Mediation, and Arbitration.

It has been theorized that public goods can only be produced by the state and through law and policy. This paper looks at the way that Wikipedia works.

Two key questions:

Why do people spend time editing Wikipedia articles, and why do they care enough about this particular fact to disagree?

• Why does Wikipedia have a dispute resolution system that doesn’t resolve disputes?

Data involves 268 resolved disputes that went through ArbCom each was coded:

The type of behavior at issue (i.e., antisocial, consensus, policies, impersonation, contempt, and and article chaos)
Types of remedies (i.e., thanks, refer to mediation, bans from topics or articles, or WP, or things with admins)

They use a quantitative statistical model to find out what tends to result in strict punishments.

The result won't suprising to anyone who has followed ArbCom and seems to basically confirm that they are doing their stated job:

ArbCom avoids engaing in factual disputes
Remedies are highly dependent on the type of behavior
Strict remedies are only very rarely used and only in the case of the most antisocial editors

The article struggles with the fact that most bad editors are kept.

The conclusions come down to wedding in and out weeding out. Weeding out is easy and it happens to only the most disruptive editors.

We think of these disruptive, troublesome users as being “weeded in” to the fray. The “weeding in” concept helps to explain the negative correlation between editing violations and being banned from Wikipedia. The site relies on the willingness of individuals to contribute and edit pages voluntarily. The arbitrators seem to want, inasmuch as it is possible, to retain those users who take the initiative to edit. The best result for the community is that violators be warned or subjected to lesser punishments and continue to contribute. By contrast, were the arbitrators to actually resolve disputes, they would strip editors of the motivation to continue to work on improving articles. And were the arbitrators to take it a step further, and actually ban bad editing, they would quickly eat away at the productive core of the project.

How Might It Work Better?

Some of the papers not only evaluate a process but suggest (and evaluate) a tool that is designed to highlight it and, in process, to change how people view or edit Wikipedia. WikiSym tends to post a bunch of these.

"WikiDashboard" is a great example of this. By Ed Chi's group. I recommend folks check it out.

Another cool example of this is Maria Bruzzi's work on Wikipedia for blind editors.

Buzzi, Marina, Barbara Leporini, and Caterina Senette. 2008. “Making Wikipedia editing easier for the blind.” Pp. 423-426 in. Lund, Sweden: ACM

Buzzi, Marina, and Barbara Leporini. 2009. “Editing Wikipedia content by screen reader: easier interaction with the Accessible Rich Internet Applications suite.” Disability and Rehabilitation Assistive Technology 4:264-275.

As it turns out, there are some serious problems with getting Wikipedia editing to work for blind users:

Applying formatting with the button bar basically doesn't work
Typing special characters basically doesn't work
Switching between editing and selecting modalities doesn't really work

The authors present a redesigned editing page using the ARIA standard that gives users the ability to edit the page with a screen reader much more effectively. There's no user test here so the evaluation isn't rigorous but the authors seem pretty confident that their system works well.

Simple English

Den Besten, Matthijs, and Jean-Michel Dalle. 2008. “Keep It Simple: A Companion for Simple Wikipedia?.” Industry and Innovation 15:169-178.

Interesting little article with some stronger and weaker points. Which I suppose is about normal.

Using the Flesch formula to compute a measure of simpleness which is sort of what you would find in a word processor:

Score = 206:835 - 84:6 * (syllables per words) - 1.1015 * (words per sentence)

(Finally, someone figured out what simple is actually designed for! ;))

They look to see how often this simplicity score corresponds to whether something is tagged {{simple}} or not. They were able to note when things were improved and whether the tags work (they do work, but not that well).

They show a steady decrease in average "simplicity" over time.

They also suggest a companion. Essentially, a way of showing certain metadata (or easily computed metadata) that could help give editors of Simple information while editing that could help guide human decisions.

Even if you don't suggest that Flesch is a good measure -- and I'm skeptical -- their model of a companion is a useful way that people can move and that corresponds to some of the work in terms of javascript additions.

This is also one of those papers where I think some good conversations with Wikimedians could really have improved this research. There are lots of things like this. And the optimistic way to approach this is a positive place to start collaborating.