User talk:Thomas Wright Sulcer/sandbox7: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Thomas Wright Sulcer
(fixing)
imported>Daniel Mietchen
(basic cleanup)
 
(23 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Image|Panton-Principles-Drafters-Fractal-Trace.png|right|350px|The drafters of the Panton Principles in front of the [[Panton Arms]] pub. in the spirit of the Principles, this image is a [[derivative work]] from the [[:Image:Panton-Principles-Drafters-Original.png|original photo]] which is in the [[Public Domain]].}}
==={{pl|Panton Principles}}===
The '''Panton Principles''' are recommendations for scientists releasing [[science|scientific]] [[data (general)|data]] which urges them to attach a simple identifying label to the released data which indicates their wishes concerning its future use. The basic idea is to promote a simple standard notification for scientists to use when releasing data. The notification says, in effect, that other scientists can use the data without worrying about [[copyright]] issues. The idea is to promote sharing of scientific data. The name ''Panton Principles'' (or sometimes abbreviated as '''PP''') is named after the [[Panton Arms]] pub in [[Cambridge, UK|Cambridge]], [[United Kingdom|UK]] which was the location where the principles were officially drafted in September 2009.
Already started. Further possible additions below.
 
==Background==
Scientific data is different from other types of information such as encyclopedia articles in [[Citizendium]], or government [[census]] information, or pictures on [[Flickr]], or [[baseball]] [[statistics]], or patterns of [[Internet]] searches such as [[SERP]] results. Rather, scientific data can help advance human knowledge, possibly leading to valuable breakthroughs which may, in turn, lead to new technologies or capabilities. A page of numbers may reveal a cure for [[cancer]] or a way to build a vehicle which defies [[gravity]], or reveal how to decipher an ancient [[language]]. It's possible that a data set contains information which the initial researcher missed, but which is potentially valuable. There may be a temptation to withhold information until it is thoroughly investigated. It's possible, as well, that a scientist will selectively pick and choose only certain nuggets of information, while avoiding others which don't support the pre-ordained conclusions, to make a case for a specific hypothesis; revealing the entire set may allow other researchers to use their own data to prove them wrong. At the same time, scientists have great leeway in determining when, how, and whether to release their data.
 
Scientific data can be highly volatile, in a sense. It's possible that it could be re-used for unethical purposes such as making dangerous [[weapon|weapons]] or [[chemistry|chemicals]], or re-purposed to some new task unintended by the original data collector. Scientists, like all [[academia|academics]], are under pressure to publish their results and build their reputations by adding to the storehouse of human knowledge. While it's a fair statement that many scientists expect others to use their data without permission and without acknowledgement of their contribution, and that most scientists release their data to maintain credibility in the scientific community, some scientists have a different view. It is factors such as these which led to the declaration.<ref name=twsMAR23swaaq>{{cite news
|author= petemr's blog
|title= The Panton Principles: A breakthrough on data licensing for public science?
|publisher= Unilever Cambridge Centre for Molecular Informatics
|date= 2009-05-16
|url= http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1939
|accessdate= 2010-03-23
}}</ref>


==Principles==
==Principles==
Accordingly, for a variety of reasons, some scientists withhold their data. Further, scientists coming across data which is in the [[public sphere]] may face uncertainty about whether they are allowed to access it, use it, study it further, or base new studies on it. It is enough of an issue that many scientists issued a statement known as the ''Panton Principles''.
But the big incentive for the declaration is for scientists who come across data which has been used in previous studies. Is the data acceptable for using? Are there limitations on what it can used for? Will using a specific set of data generate legal issues? Since many of these issues are unclear, scientists coming across data in the [[public sphere]] may face uncertainty about whether they are allowed to access it, use it, study it further, or base new studies on it. It is enough of an issue that a group of scientists have issued a statement known as the ''Panton Principles''.
 
Generally the principle of [[validation]] makes it necessary to reveal data as a matter of accepted practice. This allows other scientists to replicate the original results by conducting independent follow-up studies to see if the original conclusions are valid.
 
What the drafters of the Panton Principles are suggesting is that when scientists release data, they attach a statement or marker which describes their wishes regarding the future use of the data. The idea is to come up with an easily understood tag applicable to all data they choose to release, so that others who come across the data will be able to understand what the data creator's intentions were when releasing the data. The hope is, of course, that all data might be freely used for any purpose, but the tag enables this to be more readily understood.


There have been concerns that current license formats such as the Public Domain Dedication and License (PDDL) and the Creative Commons CC0 are complex. And the movement favoring the Panton Principles is, in some respects, a way to simplify matters. One scientist explained that the benefit of declaring data "open" is that it makes it possible for subsequent researchers to use it freely, without fear or anxiety or uncertainty:
What the drafters of the Panton Principles want is that when scientists release data, they attach a statement or marker which describes the wishes of the originating scientist regarding the future use of the data. The idea is to come up with an easily understood tag applicable to all data they choose to release, so that others who come across the data will be able to understand what the data creator's intentions were when releasing the data. The hope is, of course, that all data might be freely used for any purpose, but the tag enables this to be more readily understood.


There are concerns that current license formats such as the Public Domain Dedication and License (PDDL) and the Creative Commons CC0 are complex from a legal standpoint. And the movement favoring the Panton Principles is, in some respects, a way to simplify matters. One scientist explained that the benefit of declaring data "open" is that it makes it possible for subsequent researchers to use it freely, without [[fear]] or [[anxiety]] or uncertainty:
<blockquote>The biggest danger is NOT making the assertion that the data is Open. There may be second-order problems from CC0 or PPDL but they are nothing compared to the uncertainty of NOT making this simple assertion. Do not try to be clever and use SA, NC or other restricted licenses. Simply state the data are Open.<ref name=twsMAR23swaaq/></blockquote>
<blockquote>The biggest danger is NOT making the assertion that the data is Open. There may be second-order problems from CC0 or PPDL but they are nothing compared to the uncertainty of NOT making this simple assertion. Do not try to be clever and use SA, NC or other restricted licenses. Simply state the data are Open.<ref name=twsMAR23swaaq/></blockquote>
[[Image:Panton Arms Pub in Cambridge UK.jpg|thumb|350px|right|alt=A building.|The [[Panton Arms]] pub in [[Cambridge]], [[United Kingdom]] where the Panton Principles were drawn up.]]
[[Image:Panton Arms Pub in Cambridge UK.jpg|thumb|350px|right|alt=A building.|The [[Panton Arms]] pub in [[Cambridge]], [[United Kingdom]] where the Panton Principles were drawn up.]]


Blogger Walter Jessen explained:
Walter Jessen explained:
<blockquote>Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.<ref name=twsMAR23a544>{{cite news
<blockquote>Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.<ref name=twsMAR23a544>{{cite news
  |author= Walter Jessen
  |author= Walter Jessen
Line 37: Line 21:
}}</ref></blockquote>
}}</ref></blockquote>


In March 2010, the Panton Principles is an Internet-based initiative calling for scientists to make an "explicit and robust statement" regarding his or her wishes for the data by using a "recognized waiver or license that is appropriate for data."<ref name=twsMAR23av33>{{cite news
In March 2010, the Panton Principles is an Internet-based initiative calling for a scientist to make an "explicit and robust statement" regarding his or her wishes for the data by using a "recognized waiver or license that is appropriate for data."<ref name=twsMAR23av33>{{cite news
  |author= Bill Hooker
  |author= Bill Hooker
  |title= Panton Principles for Open Data in Science  
  |title= Panton Principles for Open Data in Science  
Line 55: Line 39:
[[Image:Panton Arms Signers.jpg‎|thumb|350px|left|alt=People in front of a building.|The signers of the Panton Principles in September 2009 included (from left:) Jenny Meyer, Jordan Hatcher, Rufus Pollock, John Wilbanks, Cameron Neylon, Peter Murray-Rust, Carolina Rossini.]]
[[Image:Panton Arms Signers.jpg‎|thumb|350px|left|alt=People in front of a building.|The signers of the Panton Principles in September 2009 included (from left:) Jenny Meyer, Jordan Hatcher, Rufus Pollock, John Wilbanks, Cameron Neylon, Peter Murray-Rust, Carolina Rossini.]]


An Internet search of major newspapers and magazines, including the ''[[New York Times]]'' and ''[[BBC News]]'' using the search term "Panton Principles" did not find any results on March 23, 2010, although there are Internet sites dealing with scientific issues that have posted comments about the initiative.
An Internet search of about twenty major newspapers and magazines, including the ''[[New York Times]]'' and ''[[BBC News]]'' using the search term "Panton Principles" did not find any results on March 23, 2010, although there are Internet sites dealing with scientific issues that have posted comments about the initiative.
 
Here are the four principles:


# When publishing data make an explicit and robust statement of your wishes.
# When publishing data make an explicit and robust statement of your wishes.
Line 71: Line 57:
}}</ref> But the idea about using a "waiver or license appropriate for data" was, in the view of this blogger, "debatable", particularly when it came to the possibility of mixing data sets, and prefers the [[copyleft]] license approach. He didn't like the non-commercial restrictive clause since, in his view, it doesn't make things easier, and prefers [[public domain]] via the PDDL or CCZero licenses.<ref name=twsMAR234qqw/>
}}</ref> But the idea about using a "waiver or license appropriate for data" was, in the view of this blogger, "debatable", particularly when it came to the possibility of mixing data sets, and prefers the [[copyleft]] license approach. He didn't like the non-commercial restrictive clause since, in his view, it doesn't make things easier, and prefers [[public domain]] via the PDDL or CCZero licenses.<ref name=twsMAR234qqw/>


One source credits the launch of the Panton Principles to [[Jonathan Gray]].<ref name=twsMAR239iio>{{cite news
The statement grew out of discussion between many scientists, although one source credits the launch of the Panton Principles to [[Jonathan Gray]].<ref name=twsMAR239iio>{{cite news
  |author= Cameron Neylon
  |author= Cameron Neylon
  |title= The Panton Principles: Finding agreement on the public domain for published scientific data
  |title= The Panton Principles: Finding agreement on the public domain for published scientific data
Line 84: Line 70:
==References==
==References==
{{reflist}}
{{reflist}}
----
==Possible further articles==
==={{pl|Data sharing}}===
[[Image:JKepler.png|thumb|240px|left|alt=Picture of a portrait of a man.|[[Johannes Kepler]] used data measurements from [[Tycho Brahe]] to develop three fundamental laws of planetary motion.]]
There's a well-known example from the history of [[astronomy]] in which one scientist took the data from another in a totally new direction. The [[Denmark|Danish]] scientist [[Tycho Brahe]] (1546-1601) worked tirelessly to make accurate measurements of [[planet|planetary]] [[parallax]] were accurate to the arcminute. By systematic and rigorous observation, night after night, Brahe amassed a comprehensive set of data detailing the positions of the planets and [[star (astronomy)|stars]]. But Brahe was unable to fit his data into a comprehensive [[scientific theory|theory]]. After Brahe's death, fellow scientist [[Johannes Kepler]] used Brahe's data to develop the ''[[Mathematicus Imperialis]]'' at the court of emperor [[Rudolph II]] in [[Prague]], [[Czechoslovakia]] and, using Brahe's data, figured out the three laws of planetary motion, including the fact that planets moved in [[ellipse|elliptical]] [[orbit|orbits]] not [[circle|circular]] ones.
Scientists may have reasons &mdash; actual or perceived &mdash; to withhold or delay the release of data: For instance, they may need time to investigate data fully to remove artifacts or to prevent valuable information from being overlooked. The data may also contain the seed for a [[scientific paper|publication]], [[patent]] or [[business model]], or private information about patients. It is possible, as well, that a scientist may selectively pick and choose data which supports a given conclusion while ignoring outliers, perhaps to make a case for a specific hypothesis. In such an instance, revealing the entire data set may allow other researchers to use their own data to prove them wrong. Conversely, making data available as they arise lends additional credit to the researchers involved, and making it available in a reusable form (i.e. in some standard format and with proper annotations) may allow others to build upon their work even ahead of formal publication.
==={{pl|Scientific data}}===
Started.
==={{pl|Open data}}===
Sometimes data can be used for different purposes by different scientists. While data is often released on the Internet, it's sometimes unclear what guidelines apply as to how the data can be used or whether there are [[copyright]] restrictions. Accordingly, a group of scientists in [[Cambridge, U.K.|Cambridge]], [[United Kingdom|U.K.]] in a pub called the [[Panton Arms]] wrote in September 2009 a set of guidelines called the [[Panton Principles]]. The idea behind this effort is that a scientist, releasing data into the public, can attach a tag to the data indicating that the data is free to use and is not subject to copyright restrictions. Hopefully this will enable future scientists to use data freely without anxiety about any possible [[law|legal]] repercussions.

Latest revision as of 11:11, 25 March 2010

Stub Panton Principles

Already started. Further possible additions below.

Principles

But the big incentive for the declaration is for scientists who come across data which has been used in previous studies. Is the data acceptable for using? Are there limitations on what it can used for? Will using a specific set of data generate legal issues? Since many of these issues are unclear, scientists coming across data in the public sphere may face uncertainty about whether they are allowed to access it, use it, study it further, or base new studies on it. It is enough of an issue that a group of scientists have issued a statement known as the Panton Principles.

What the drafters of the Panton Principles want is that when scientists release data, they attach a statement or marker which describes the wishes of the originating scientist regarding the future use of the data. The idea is to come up with an easily understood tag applicable to all data they choose to release, so that others who come across the data will be able to understand what the data creator's intentions were when releasing the data. The hope is, of course, that all data might be freely used for any purpose, but the tag enables this to be more readily understood.

There are concerns that current license formats such as the Public Domain Dedication and License (PDDL) and the Creative Commons CC0 are complex from a legal standpoint. And the movement favoring the Panton Principles is, in some respects, a way to simplify matters. One scientist explained that the benefit of declaring data "open" is that it makes it possible for subsequent researchers to use it freely, without fear or anxiety or uncertainty:

The biggest danger is NOT making the assertion that the data is Open. There may be second-order problems from CC0 or PPDL but they are nothing compared to the uncertainty of NOT making this simple assertion. Do not try to be clever and use SA, NC or other restricted licenses. Simply state the data are Open.[1]

A building.
The Panton Arms pub in Cambridge, United Kingdom where the Panton Principles were drawn up.

Walter Jessen explained:

Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.[2]

In March 2010, the Panton Principles is an Internet-based initiative calling for a scientist to make an "explicit and robust statement" regarding his or her wishes for the data by using a "recognized waiver or license that is appropriate for data."[3] The call for a set of principles stems in part from a sense that "many widely recognized licenses are not intended for ... data". Licenses such as "Creative Commons" have been described as unsuitable for handling issues such as scientific data.[4]

People in front of a building.
The signers of the Panton Principles in September 2009 included (from left:) Jenny Meyer, Jordan Hatcher, Rufus Pollock, John Wilbanks, Cameron Neylon, Peter Murray-Rust, Carolina Rossini.

An Internet search of about twenty major newspapers and magazines, including the New York Times and BBC News using the search term "Panton Principles" did not find any results on March 23, 2010, although there are Internet sites dealing with scientific issues that have posted comments about the initiative.

Here are the four principles:

  1. When publishing data make an explicit and robust statement of your wishes.
  2. Use a recognized waiver or license that is appropriate for data.
  3. If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
  4. Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.

Why are the principles necessary? A post-doctoral student in Sweden explained in a blog about being perplexed when finding useful data but without any explicit information about what could be done with it. Contacting the creators of the data for permission is cumbersome and slow, and there is the possibility that the initial author of the data is "missing in action". An explicit statement is much preferred.[5] But the idea about using a "waiver or license appropriate for data" was, in the view of this blogger, "debatable", particularly when it came to the possibility of mixing data sets, and prefers the copyleft license approach. He didn't like the non-commercial restrictive clause since, in his view, it doesn't make things easier, and prefers public domain via the PDDL or CCZero licenses.[5]

The statement grew out of discussion between many scientists, although one source credits the launch of the Panton Principles to Jonathan Gray.[6] Cameron Neylon described how the principles came about:

The Principles came out of a discussion in the Panton Arms a pub near to the Chemistry Department of Cambridge University ... Where we found agreement was that for science, and for scientific data, and particularly science funded by public investment, that the public domain was the best approach and that we would all recommend it. ... placing data explicitly, irrevocably, and legally in the public domain satisfies both the Open Knowledge Definition and the Science Commons Principles for Open Data was something that we could all personally sign up to. The end result is something that I have no doubt is imperfect ... Above all, it is a start.[6]

References

  1. Cite error: Invalid <ref> tag; no text was provided for refs named twsMAR23swaaq
  2. Walter Jessen. The Panton Principles for Open Data in Science, Next Generation Science, February 19, 2010. Retrieved on 2010-03-23.
  3. Bill Hooker. Panton Principles for Open Data in Science, Science Commons Symposium, 2010-03-23. Retrieved on 2010-03-23.
  4. The Panton Principles for Open Data in Science. Connected Knowledge (February 19th, 2010). Retrieved on 2010-03-23. “Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described here. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.”
  5. 5.0 5.1 Egon Willighagen. Panton Principles, Egon Willighagen's Blog, February 19, 2010. Retrieved on 2010-03-23.
  6. 6.0 6.1 Cameron Neylon. The Panton Principles: Finding agreement on the public domain for published scientific data, Science in the Open (blog), 22 February 2010. Retrieved on 2010-03-23. “The launch of the Panton Principles, many months after they were first suggested is really largely down to the work of Jonathan Gray. This was one of several projects that I haven’t been able to follow through properly on and I want to acknowledge the effort that Jonathan has put into making that happen.”

Possible further articles

Stub Data sharing

Picture of a portrait of a man.
Johannes Kepler used data measurements from Tycho Brahe to develop three fundamental laws of planetary motion.

There's a well-known example from the history of astronomy in which one scientist took the data from another in a totally new direction. The Danish scientist Tycho Brahe (1546-1601) worked tirelessly to make accurate measurements of planetary parallax were accurate to the arcminute. By systematic and rigorous observation, night after night, Brahe amassed a comprehensive set of data detailing the positions of the planets and stars. But Brahe was unable to fit his data into a comprehensive theory. After Brahe's death, fellow scientist Johannes Kepler used Brahe's data to develop the Mathematicus Imperialis at the court of emperor Rudolph II in Prague, Czechoslovakia and, using Brahe's data, figured out the three laws of planetary motion, including the fact that planets moved in elliptical orbits not circular ones.

Scientists may have reasons — actual or perceived — to withhold or delay the release of data: For instance, they may need time to investigate data fully to remove artifacts or to prevent valuable information from being overlooked. The data may also contain the seed for a publication, patent or business model, or private information about patients. It is possible, as well, that a scientist may selectively pick and choose data which supports a given conclusion while ignoring outliers, perhaps to make a case for a specific hypothesis. In such an instance, revealing the entire data set may allow other researchers to use their own data to prove them wrong. Conversely, making data available as they arise lends additional credit to the researchers involved, and making it available in a reusable form (i.e. in some standard format and with proper annotations) may allow others to build upon their work even ahead of formal publication.

Stub Scientific data

Started.

Stub Open data

Sometimes data can be used for different purposes by different scientists. While data is often released on the Internet, it's sometimes unclear what guidelines apply as to how the data can be used or whether there are copyright restrictions. Accordingly, a group of scientists in Cambridge, U.K. in a pub called the Panton Arms wrote in September 2009 a set of guidelines called the Panton Principles. The idea behind this effort is that a scientist, releasing data into the public, can attach a tag to the data indicating that the data is free to use and is not subject to copyright restrictions. Hopefully this will enable future scientists to use data freely without anxiety about any possible legal repercussions.