User:Daniel Mietchen/Sandbox/Open Knowledge Conference 2010

From Citizendium
< User:Daniel Mietchen‎ | Sandbox
Revision as of 08:42, 31 March 2010 by imported>Daniel Mietchen (updated)
Jump to navigation Jump to search

Formatting conversion to TeX is done. Style as per http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0. Still requiring work are the "Open questions" and "Open perspectives" sections as well as references and, possibly, figures. Source pasted in below.


%%%%%%%%%%%%%%%%%%%%%%% file typeinst.tex %%%%%%%%%%%%%%%%%%%%%%%%%
%
% This is the LaTeX source for the instructions to authors using
% the LaTeX document class 'llncs.cls' for contributions to
% the Lecture Notes in Computer Sciences series.
% http://www.springer.com/lncs       Springer Heidelberg 2006/05/04
%
% It may be used as a template for your own input - copy it
% to a new file with a new name and use it as the basis
% for your article.
%
% NB: the document class 'llncs' has its own and detailed documentation, see
% ftp://ftp.springer.de/data/pubftp/pub/tex/latex/llncs/latex2e/llncsdoc.pdf
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\documentclass[runningheads,a4paper]{llncs}

\usepackage{amssymb}
\setcounter{tocdepth}{3}
\usepackage{graphicx}

\usepackage{url}
\urldef{\mailsa}\path|{alfred.hofmann, ursula.barth, ingrid.haas, frank.holzwarth,|
\urldef{\mailsb}\path|anna.kramer, leonie.kunz, christine.reiss, nicole.sator,|
\urldef{\mailsc}\path|erika.siebert-cole, peter.strasser, lncs}@springer.com|    
\newcommand{\keywords}[1]{\par\addvspace\baselineskip
\noindent\keywordname\enspace\ignorespaces#1}

\begin{document}

\hyphenation{wiki-space}

\mainmatter  % start of an individual contribution

% first the title is needed
\title{Collaborative Structuring of Knowledge by Experts and the Public}

% a short form should be given in case it is too long for the running head
\titlerunning{Collaborative Structuring of Knowledge by Experts and the Public}

% the name(s) of the author(s) follow(s) next
%
% NB: Chinese authors should write their first names(s) in front of
% their surnames. This ensures that the names appear correctly in
% the running heads and the author index.
%
\author{Tom Morris\inst{1}%
%\thanks{Please note that the LNCS Editorial assumes that all authors have used
%the western naming convention, with given names preceding surnames. This determines
%the structure of the names in the running heads and the author index.}%
\and Daniel Mietchen\inst{2}}
%
\authorrunning{Tom Morris and Daniel Mietchen}
% (feature abused for this document to repeat the title also on left hand pages)

% the affiliations are given next; don't give your e-mail address
% unless you accept that it will be published
\institute{\url{http://www.citizendium.org/User:Tom_Morris}
\and
\url{http://www.citizendium.org/User:Daniel_Mietchen}}

%\institute{Springer-Verlag, Computer Science Editorial,\\
%Tiergartenstr. 17, 69121 Heidelberg, Germany\\
%\mailsa\\
%\mailsb\\
%\mailsc\\
%\url{http://www.citizendium.org/User:Tom_Morris}}
%\url{http://www.citizendium.org/User:Tom_Morris\\http://www.citizendium.org/User:Daniel_Mietchen\\}}

%
% NB: a more complex sample for affiliations and the mapping to the
% corresponding authors can be found in the file "llncs.dem"
% (search for the string "\mainmatter" where a contribution starts).
% "llncs.dem" accompanies the document class "llncs.cls".
%

\toctitle{Collaborative Structuring of Knowledge by Experts and the Public}
\tocauthor{Tom Morris and Daniel Mietchen}
\maketitle


\begin{abstract}
There is much debate on how public participation and expertise can be brought together in collaborative knowledge environments. One of the experiments addressing the issue directly is Citizendium. In seeking to harvest the strengths (and avoiding the major pitfalls) of both user-generated wiki projects and traditional expert-approved reference works, it is a wiki to which anybody can contribute using their real names, while those with specific expertise are given a special role in assessing the quality of content. Upon fulfillment of a set of criteria like factual and linguistic accuracy, lack of bias, and readability by non-specialists, these entries are forked into two versions: a stable (and thus citable) approved "cluster" (an article with subpages providing supplementary information) and a draft version, the latter to allow for further development and updates. We provide an overview of how Citizendium is structured and what it offers to the open knowledge communities, particularly to those engaged in education and research. Special attention will be paid to the structures and processes put in place to provide for transparent governance, to encourage collaboration, to resolve disputes in a civil manner and by taking into account expert opinions, and to facilitate navigation of the site and contextualization of its contents.
%\keywords{We would like to encourage you to list your keywords within
%the abstract section}
\end{abstract}


\section{Introduction}

\begin{quote}
{\it Science is already a wiki if you look at it a certain way. It's just a highly inefficient one -- the incremental edits are made in papers instead of wikispace, and significant effort is expended to recapitulate existing knowledge in a paper in order to support the one to three new assertions made in any one paper.  } \begin{flushright} John Wilbanks (\cite{Wilbanks2009})
\end{flushright}\end{quote}

There are many ways to structure knowledge. One is via coordinated cellular activity in your brain. Others may involve spatial arrangements of sheets of paper or numeric arrangements of digital documents. Here, we will focus on the latter, and even there, a multitude of approaches are possible, of which only a limited number have been tried on a larger scale. Amongst those are wikis, which allow to aggregate and inter-link diverse sets of knowledge in an online-accessible manner using an Open Access approach, i.e. with no costs to the reader. 

\subsection{Wikis as an example of public knowledge environments online}

As implied by the introductory quote, it is probably fair to say that turning science (or any system of knowledge production, for that matter) into a wiki (or a set of interlinked collaborative platforms) would make research, teaching and outreach much more transparent, less prone to hype, and more efficient. Just imagine you had a time slider and could watch the history of research on general relativity, plate tectonics, self-replication, or cell division unfold from the earliest ideas of their earliest proponents (and opponents) onwards up to you, your colleagues, and those with whom you compete for grants. So why don't we do it?

Traditionally, given the scope of a particular journal, knowledge about specialist terms (which may describe completely non-congruent concepts in different fields), methodologies, notations, mainstream opinions, trends, or major controversies could reasonably be expected to be widespread amongst the audience, which reduced the need to redundantly say and then repeat the same things all over again and again (in cross-disciplinary environments, there is a higher demand for proper disambiguation of the various meanings of a term). Nonetheless, redundancy is still quite visible in journal articles, especially in the introduction, methods, and discussion sections and the abstracts, often in a way characteristic of the authors (such that services like eTBLAST and JANE can make qualified guesses on authors of a particular piece of text, with good results if some of the authors have a lot of papers in the respective database, mainly PubMed, and if they have not changed their individual research scope too often in between).

A manuscript well-adapted to the scope of one particular journal is often not very intelligible to someone outside its intended audience, which hampers cross-fertilization with other research fields (we will get back to this below). When using paper as the sole medium of communication there is not much to be done about this limitation. Indeed, we have become so used to it that some do not perceive it as a limitation at all. Similar thoughts apply to manuscript formatting. However, the times when paper alone reigned over scholarly communication have certainly passed, and wiki-like platforms provide for simple and efficient means of storing information, updating it and embedding it into a wider context.

Cross-field fertilization, for example, is crucial with respect to interdisciplinary research projects, digital libraries and multi-journal (or indeed cross-disciplinary) bibliographic search engines (e.g. Google Scholar), since these dramatically increase the likelihood of, say, a biologist stumbling upon a not primarily biological source relevant to her research (think shape quantification or growth curves, for instance). What options do we have to systematically integrate such cross-disciplinary hidden treasures with the traditional intra-disciplinary background knowledge and with new insights resulting from research?

The by now classical example of a wiki environment are the Wikipedias, a set of interlinked wikis in multiple languages where basically anyone can edit any page, regardless of subject matter expertise or command of the respective language. As a consequence of this openness, the larger Wikipedias have a serious problem with vandalism: take an article of your choice and look at its history page for reverts - most of them will be about neutralizing subtle or blunt forms of destructive edits that do nothing to improve the quality of the articles, but may reduce it considerably. Few of these malicious edits persist for long  \cite{Priedhorsky:2007}, but finding and fixing them takes time that could better be spent on improving articles. This is less of an issue with more popular topics for which large numbers of volunteers may be available to correct "spammy" entries but it is probably fair to assume that most researchers value their time too much to spend it on repeatedly correcting information that had already been correctly entered. Other problems with covering scientific topics at the Wikipedias include the nebulous notability criteria which have to be fulfilled to avoid an article being deleted, and the rejection of "original research" in the sense of not having been peer reviewed before publication. Despite these problems, one scientific journal~-- RNA biology~-- already requires an introductory Wikipedia article for a subset of papers it is to publish \cite{RNABiol}.

Peer review is indeed a central aspect of scholarly communication, as it paves the way towards the reproducibility that forms one of the foundations of modern science. Yet we know of no compelling reason to believe that it works better before than after the content concerned has been made public (doing it beforehand was just a practical decision in times when journal space was measured in paper pages), while emerging movements like Open Notebook Science~-- where claims are linked directly to the underlying data that are being made public as they arise~-- represent an experiment in this direction whose initial results look promising and call into question the "no original research" as a valid principle to generate encyclopaedic content.

Although quite prominent at the moment, the Wikipedias are not the only wikis around, and amongst the more scholarly inclined alternatives, there are even a number of wiki-based journals, though usually with a very narrow scope and/or a low number of articles. On the other hand, Scholarpedia (which has classical peer review and an ISSN and may thus be counted as a wiki journal, too \cite{Scholarpedia}), OpenWetWare \cite{OpenWetWare}, Citizendium \cite{Citizendium} and the Wikiversities \cite{Wikiversity} are cross-disciplinary and structured (and of a size, for the moment) such that vandalism and notability are not really a problem. With minor exceptions, real names are required at the first three, and anybody can contribute to entries about anything, particularly in their fields of expertise. None of these is even close to providing the vast amount of context existing in the English Wikipedia but the difference is much less dramatic if the latter were broken down to scholarly useful content. Out of these four wikis, only OpenWetWare is explicitly designed to harbour original research, while the others allow different amounts thereof. Furthermore, a growing number of yet more specialized scholarly wikis exist (e.g. WikiGenes \cite{WikiGenes}, the Encyclopedia of Earth \cite{EoEarth}, the Encyclopedia of Cosmos \cite{EoCosmos}, the Dispersive PDE Wiki \cite{Dispersive-PDE}, or the Polymath Wiki \cite{Polymath}), which can teach us about the usefulness of wikis within specific academic fields. 


\section{The Citizendium model of wiki-based collaboration}
Despite the above-mentioned tensions between public participation and expertise in the collaborative structuring of knowledge, it is not unreasonable to expect that these can be overcome by suitably designed public knowledge environments, much like Citizen Science projects involve the public in the generation of scientific data. One approach at such a design is represented by Citizendium. The founder of Citizendium~-- Larry Sanger~-- being the co-founder of Wikipedia, the two projects share the common goal of providing free knowledge to the public, they are based on variants of the same software platform, and they use the same Creative Commons-Attribution-Share Alike license \cite{CC-BY-SA}\footnote{Since the copyright transfer agreement we had to sign for submitting this paper was not compatible with this license, we did not include figures here. They are, however, included in the talk, and we encourage readers to take a look at the site itself.}. Yet they differ in a number of important ways, such that Citizendium can be seen as composed of a Wikipedia core (stripped down in terms of content, templates, categories and policies), with elements added that are characteristic of the other wiki environments introduced above: A review process leading to stable versions (as at Scholarpedia), an open education environment (as at Wikiversity) and an open research environment (as at OpenWetWare). Nonetheless, assuming that the reader is less familiar with these three latter environments, we will follow previous commenters and frame the discussion of Citizendium in terms of properties differentiating it from Wikipedia, and specifically the latter's English language branch \cite{Wikipedia:En}. 

\subsection{Real names}
The first of these is simply an insistence on real names. While unusual from a Wikipedia perspective, this is custom in professional environments, including traditional academic publishing and some of the above-mentioned wikis, e.g. Scholarpedia and Encyclopedia of Earth. It certainly excludes a number of legitimate contributors who prefer to remain anonymous but otherwise gives participants accountability and allows to bring in external reputation to the project. 

\subsection{Expert guidance}
To compose and develop articles and to embed them in the multimedial context of a digital knowledge environment, expert guidance is important. Of course, many experts contribute to Wikipedia, and the Wikipedias in turn have long started to actively seek out expert involvement, yet the possibility to see their edits overturned by anonymous users that may lack even the most basic education in that field keeps professionals away from spending their precious time on such a project. The Citizendium approach of verifying expertise takes a different approach~-- sometimes termed "credentialism"~-- that rests on a common sense belief that some people do know more than others: it is sometimes the case that the thirteen-year-old kid in Nebraska does know more than the physics professor. But most of the time, at least when matters of physics are concerned, this is not the case. The role the experts have at Citizendium is not, as frequently stated in external comments, that of a supreme leader who is allowed to exercise his will on the populace. On the contrary, it is much more about guiding. We use the analogy of a village elder wandering around the busy marketplace \cite{Basar} who can resolve disputes and whom people respect for their mature judgement, expertise and sage advice. Wikipedia rejects "credentialism" in much the same way that the Internet Engineering Task Force (IETF) does. David Clark summarised the IETF process thusly \cite{Clark:1992}: "We reject kings, presidents and voting. We believe in rough consensus and running code." In an open source project, or an IETF standardisation project, one can decide a great many of the disputes with reference to the compiler. If the code doesn't compile, think again. For rough consensus to happen under such circumstances, one needs to get the people together who have some clear aim in mind: getting two different servers to communicate with one another. The rough consensus required for producing an encyclopaedia article is different~-- it should attempt to put forward what is known, and people disagree on this to a higher degree than computers do on whether a proper connection has been established. It is difficult to get "rough consensus, running code" when two parties are working on completely different epistemological standards. At this point, one needs the advice of the village elderly who will vet existing content and provide feedback on how it can be expanded or otherwise improved. Upon fulfillment of a set of criteria like factual and linguistic accuracy, lack of bias, and readability by non-specialists, these vetted entries are forked into two versions: a stable (and thus citable) approved "cluster" (an article with subpages providing supplementary information) and a draft version, the latter to allow for further development and updates.

The respect for experts because of their knowledge of facts is only part of the reasoning: the experts point out and correct factual mistakes, but they also help to guide the structuring of content within an article and by means of the subpages. The experts bring with them the experience and knowledge of years of in-depth involvement with their subject matter, and the project is designed to make best use of this precious resource, while still allowing everyone to participate in the process. Of course, experts are likewise free to bring in content, be it within their specialty or in other areas, where others take over the guiding role. The Citizendium can also host 'Signed Articles', which are placed in a subpage alongside the main article. A Signed Article is an article on the topic described by a recognised expert in the field, but can express opinions and biases in a way that the main article ought not to. 

\subsection{Contextualization}
Citizendium attempts to structure knowledge in a different way. Each article on Citizendium can make comprehensive use of Subpages, i.e. pages providing additional information that are subordinate to an article's page. Some of these~-- e.g. Bibliography, External Links, Video, Code, Catalog, Timeline, Advanced Level, Tutorial or general-purpose Addendum subpages~-- are similar to but more flexible than the supplementary online materials now being published routinely along scholarly articles. Two subpages types are different, with keywords  and running title being the closest analogs from academic papers: All pages are encouraged to have a short Definition subpage (around 30 words or 150 characters) which defines or describes the subject of the page. They are also encouraged to have a comprehensive Related Articles subpage, which uses templates to pull in the definitions from the pages that it links to. If one looks at the Related Articles subpage of 'Biology', one can see the parent topics of biology (science), the subtopics - subdisciplines of biology like zoology, genetics and biochemistry, articles on the history of biology and techniques used by biologists - and finally other related topics, including material on the life cycle, the various biochemical substances like DNA and proteins, the components of the cell, and other specialised language. This Related Articles page gives a pretty comprehensive contextual introduction to what biology is all about, and is structured by the authors of the article in a way that is consistent across the site. This goes beyond Wikipedias categories, See also  sections and ad-hoc infoboxes and can be considered as a next step towards linking encyclopaedic content with the Semantic Web.

Subpages are one way in which Citizendium is attempting to go beyond what is provided in either traditional paper-based encyclopaedias or by Wikipedia: to engage with context, with related forms of knowledge, and to emancipate knowledge from the page format to which it was confined in the print era. Marx wrote that "Philosophers have hitherto only interpreted the world in various ways; the point is to change it". Traditional encyclopaedias attempt to reflect the world, but we are attempting to go further. The open science movement - which has formed around the combination of providing open access to journal articles, making scientific data more openly available in raw forms, using and sharing open source software and experimenting with some of the new techniques appearing from the community that is formed under the 'Web 2.0' banner - is exploring the edge of what is now possible for scientists to do to create new knowledge. Some of the electronic engagements by academics has been for actual research benefit, some has just been PR for universities - doing podcasts to sound 'relevant'. The Citizendium model, while a little bit more traditional than some of the open science platforms, is willing to try a variety of new things. Wikipedia has produced a pretty good first version of a collaboratively written encyclopedia~-- the challenge is to see if we can go further and produce a citizens' compendium of structured and comprehensive knowledge and update it as new evidence or insights arise. 

\subsection{Open governance}
Citizendium has an evolving, but hopefully soon-to-be clearly defined governance process - currently, a Charter is being drafted by an elected group of writers that will allow for democratic governance and oversight. The broad outline is this: we will have a democratically elected Editorial Council which will deal with content policy and resolving disputes regarding content, and we will also have a Management Committee, responsible for anything not related to content. The Management Committee appoint Constables who uphold community policy regarding behaviour. Disputes with the Constables can be brought to an Ombudsman selected by the Editorial Council and Management Committee. The charter is still to be ratified by the community. One of the reasons we have this is that although there is a cost to having bureaucracy and democracy, the benefits of having an open governance process outweigh the costs. We have a real problem when governments of real-life communities are controlled by shadowy cabals who invoke byzantine legal codes - all the same problems would seem to apply to online communities. With a Wikipedia article, the debate seems to rarely be the content itself, but all too often ritualized arguments about acronyms (AfDs, NPOV, CSD, ArbCom, OR etc.). There is always a challenge in any knowledge-based community in attempting to reconcile a fair and democratic process with a meritocratic respect for expertise. There are no easy answers - if we go too far towards bureaucracy, we risk creating a system where management is separated from the actual day-to-day writing of the site, while if we attempt to let the site 'manage itself', we risk creating a rather conservative mob rule that doesn't afford due process to interested outsiders. A more traditional management structure, combined with real names and civility, should help those outside of the online community - the many experts in real life who work in universities, in business and in public life - be able to participate. Hopefully, if we get the governance decisions right, we can also not get in the way of the people who engage on hobbyist terms with Citizendium. 

\subsection{Open education}
An important part of the governance process is collaboration with those outside of Citizendium. We have a long-standing project called Eduzendium, which allows for educators in higher education to work on articles as part of a course. We have most recently had politics students from the Illinois State University work on articles on pressure groups in American public life, as well as medical students from Edinburgh, biologists from City University of New York and the University of Colorado at Boulder, finance students from Temple University and others. These courses reserve a batch of articles usually for one semester, assign each article to one or more students. The course leader can either reserve the articles for just the group to work on, or they can work on them alongside editors on the site.

There are still a number of challenges and opportunities:

    * CZ -- in many fields -- does not yet meet its own standard to offer reliable expert content, while the desired atmosphere of cooperation works well 
    
\section{Open questions}
    *  how to motivate registered users to contribute
    * how to motivate more users to register
    * how to allow feedback by non-registered users
    * how to codify the policies (and especially the subpages system) into a MediaWiki extension
    * financial perspectives 
    
\section{Open perspectives}
    *  contextualization
    * potential for mutually beneficial partnerships with projects at similar wavelengths, e.g. AcaWiki \cite{AcaWiki} for references, OpenWetWare for primary research, Open Access publishers as possible content providers 

\subsubsection*{Acknowledgments.} The authors wish to thank Russell D. Jones, Howard C. Berkowitz, Steven Mansour and Peter Schmitt for critical comments on earlier versions of this draft as well as Claudia Koltzenburg, Fran\c{c}ois Dongier and Charles van den Heuvel for helpful discussions. 

\begin{thebibliography}{4}
%\bibitem{jour} Smith, T.F., Waterman, M.S.: Identification of Common Molecular
%Subsequences. J. Mol. Biol. 147, 195--197 (1981)

%\bibitem{lncschap} May, P., Ehrlich, H.C., Steinke, T.: ZIB Structure Prediction Pipeline:
%Composing a Complex Biological Workflow through Web Services. In: Nagel,
%W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128,
%pp. 1148--1158. Springer, Heidelberg (2006)

%\bibitem{book} Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing
%Infrastructure. Morgan Kaufmann, San Francisco (1999)

%\bibitem{proceeding1} Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid
%Information Services for Distributed Resource Sharing. In: 10th IEEE
%International Symposium on High Performance Distributed Computing, pp.
%181--184. IEEE Press, New York (2001)

%\bibitem{proceeding2} Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the
%Grid: an Open Grid Services Architecture for Distributed Systems
%Integration. Technical report, Global Grid Forum (2002)

%\bibitem{url} National Center for Biotechnology Information, \url{http://www.ncbi.nlm.nih.gov}

\bibitem{AcaWiki} AcaWiki,\\ \url{http://acawiki.org/} \\All URLs referenced in this article were functional as of March 31, 2010.

\bibitem{Citizendium} Citizendium,\\ \url{http://www.citizendium.org/}

\bibitem{Clark:1992} Clark, D.: Plenary lecture, "A Cloudy Crystal Ball~-- Visions of the Future", Proc. 24th IETF: 539 (1992),\\ \url{http://www.ietf.org/proceedings/prior29/IETF24.pdf}

\bibitem{CC-BY-SA} Creative Commons-Attribution-Share Alike license 3.0,\\ \url{http://creativecommons.org/licenses/by-sa/3.0/}

\bibitem{DOAJ} Directory of Open Access Journals,\\ \url{http://www.doaj.org/}

\bibitem{Dispersive-PDE} Dispersive PDE Wiki,\\ \url{http://tosio.math.utoronto.ca/wiki/}

\bibitem{EoCosmos}  Encyclopedia of Cosmos,\\ \url{http://www.cosmosportal.org/}

\bibitem{EoEarth} Encyclopedia of Earth,\\ \url{http://www.eoearth.org}

\bibitem{OpenWetWare} OpenWetWare,\\ \url{http://www.openwetware.org/}

\bibitem{Polymath} Polymath WIki,\\ \url{http://michaelnielsen.org/polymath1/}

\bibitem{Priedhorsky:2007}
Priedhorsky R, Chen J, Lam STK, Panciera K, Terveen L, et~al. (2007) Creating,
  destroying, and restoring value in wikipedia.
In: GROUP '07: Proceedings of the 2007 international ACM conference
  on Supporting group work. New York, NY, USA: ACM, pp. 259--268.
\url{http://doi.acm.org/10.1145/1316624.1316663}.

\bibitem{Basar} Raymond, Eric S.: The Cathedral and the Bazaar, \\ \url{http://www.catb.org/~esr/writings/homesteading/}
 
\bibitem{RNABiol} RNA Biology, Guidelines for the RNA Families Track,\\ \url{http://www.landesbioscience.com/journals/rnabiology/guidelines/}

\bibitem{Scholarpedia} Scholarpedia,\\ \url{http://www.scholarpedia.org/}

\bibitem{WikiGenes} WikiGenes,\\ \url{http://www.wikigenes.org/} 

\bibitem{Wikipedia:En} English Wikipedia,\\ \url{http://en.wikipedia.org} 

\bibitem{Wikiversity} Wikiversity,\\ \url{http://www.wikiversity.org} 

\bibitem{Wilbanks2009} Wilbanks, J.: Publishing science on the web,\\ \url{http://scienceblogs.com/commonknowledge/2009/07/publishing_science_on_the_web.php}

\end{thebibliography}

\end{document}