Semantic Web: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Blake Willmarth
No edit summary
imported>Blake Willmarth
No edit summary
Line 12: Line 12:
accessdate=2010-07-11}}
accessdate=2010-07-11}}
</ref> related to this concept.   
</ref> related to this concept.   
Semantic web was developed to meet a specific deficiency in web based communications.  Although well defined in [[RFC]]'s, [[HTML]] is architected to perform exchange of information that is delimited and optimized for presentation.  That is, the use of [[HTML]] is designed to communicate the appearance of documents within web browsers.  This is wonderful when attempting to create a document that will render in the same form across multiple platforms (or web browsers) but is problematic for transmitting meaning of data.  There are a few HTML specifications (notably META tags and other document head elements<ref name="HTML head">{{cite web|url=http://www.w3.org/TR/html401/struct/global.html#h-7.4.4|title=The global structure of an HTML document|publisher=[http://www.w3.org/ W3C]}}</ref>) that convey meaning, but these are precious few. 


In order to associate meaning with content, Semantic Web utilizes structures for identification, categorization and linking data.  While a web page about soccer might specify how pictures and text should be arranged, what colors and font to use, and other presentation data, a similar Semantic Web document would convey the fact that the data pertained to the sport of soccer, perhaps a list of teams, scores of recent matches, and other data in categorization containers.  This presentation allows other consumers (mainly programs) of the data to parse and utilize the data in meaningful ways.  As opposed to modern web crawlers which must catalogue, index, and apply a certain amount of artificial intelligence to derive the meaning of documents on the web, semantic web allows data to be parsed easily for meaning - ultimately resulting in greater ability to share and discover information.
In order to associate meaning with content, Semantic Web utilizes structures for identification, categorization and linking data.  While a web page about soccer might specify how pictures and text should be arranged, what colors and font to use, and other presentation data, a similar Semantic Web document would convey the fact that the data pertained to the sport of soccer, perhaps a list of teams, scores of recent matches, and other data in categorization containers.  This presentation allows other consumers (mainly programs) of the data to parse and utilize the data in meaningful ways.  As opposed to modern web crawlers which must catalogue, index, and apply a certain amount of artificial intelligence to derive the meaning of documents on the web, semantic web allows data to be parsed easily for meaning - ultimately resulting in greater ability to share and discover information.
Line 34: Line 32:
===Competing Visions===
===Competing Visions===


The "Semantic Web" concept has evolved under competing understandings and visions. Historically, [[Cyc]] and the [[Knowledge Interchange Format]] (KIF) sought to provide a technological backbone for a similar grand vision of a universal knowledge acquisition by [[Artificial Intelligence]] researchers.<ref name="RDFnot">{{cite web|url=http://www.w3.org/DesignIssues/RDFnot.html|author=Tim Berners-Lee|year=1998}}</ref> Apple's [[Knowledge Navigator]] represented a vision of networked hypertext with intelligent agent mediators. The Semantic Web was conceived with these goals in mind, but to extend and embed existing technologies in WWW stack of technologies while formally defining and standardizing the coordination of data exchange underlying the web.
These quite different sources of inspiration have aided in confusing the meaning of Semantic Web. On the one hand, the Semantic Web aims to create a machine-readable web through the coordinated linking of data and knowledge such that intelligent agents could be devised to provide precise answers to queries of arbitrary depth and nuance. On the other hand, it also seeks to improve human interaction and traditional linking and search of the web by incrementally imparting connections between individual pieces of those data embedded in documents and realized as "micro-transactions" of web activity, conventionally stored in relational databases.
Under the latter perspective, Semantic Web was developed to meet a specific deficiency in web based communications.  Although well defined in [[RFC]]'s, [[HTML]] is architected to perform exchange of information that is delimited and optimized for presentation.  That is, the use of [[HTML]] is designed to communicate the appearance of documents within web browsers.  This is useful when attempting to create a document that will render in the same form across multiple platforms (or web browsers) but is problematic for transmitting meaning of data.  There are a few HTML specifications (notably META tags and other document head elements<ref name="HTML head">{{cite web|url=http://www.w3.org/TR/html401/struct/global.html#h-7.4.4|title=The global structure of an HTML document|publisher=[http://www.w3.org/ W3C]}}</ref>) that convey meaning, but these are precious few.
Recent efforts have largely focused along a spectrum of both perspectives; that is, on the mechanical inference of relationships between particular data in tightly-coupled yet loosely-connected ontological domains.<ref name="whichsw">{{cite conference|coauthors=Marshall, C. C. and Shipman, F. M.|title=Which Semantic Web?|publisher=ACM|booktitle=Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia|year=2003}}</ref>


There is a final perspective that focuses less on the machine-readable component of Semantic Web (linking data in terms of relationships) than the universal metadata cataloging and tagging of existing documents and data. This perspective has received less attention, especially as advanced indexing and search tools - both with Google on the web as a whole and in individual curated collections - have largely addressed these needs.


==Semantic Web Technologies==
==Semantic Web Technologies==


The stack of technologies comprising the Semantic Web infrastructure is largely standard and mature. HTTP URIs identify concepts and objects, RDF ([[Resource Descriptive Framework]]) describes a data model, OWL expresses ontological relationships, and the SPARQL permits operations on the resultant graph data.
The stack of technologies comprising the Semantic Web infrastructure is largely standard and mature. HTTP URIs identify concepts and objects, RDF ([[Resource Descriptive Framework]]) describes a data model, OWL expresses ontological vocabularies, and the SPARQL permits operations on the resultant graph data.


===Triplestore===
===Triplestore===

Revision as of 22:20, 8 August 2010

All unapproved Citizendium articles may contain errors of fact, bias, grammar etc. A version of an article is unapproved unless it is marked as citable with a dedicated green template at the top of the page, as in this version of the 'Biology' article. Citable articles are intended to be of reasonably high quality. The participants in the Citizendium project make no representations about the reliability of Citizendium articles or, generally, their suitability for any purpose.

Nuvola apps kbounce green.png
Nuvola apps kbounce green.png
This article is currently being developed as part of an Eduzendium student project. The course homepage can be found at CZ:Special_Topics_2010.
To provide students with experience in collaboration, you are warmly invited to join in here, or to leave comments on the discussion page. The anticipated date of course completion is 13 August 2010. One month after that date at the latest, this notice shall be removed.
Besides, many other Citizendium articles welcome your collaboration!


This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Overview

The Semantic web (often referred to as Web 3.0[1]) is a concept, first named by Tim Berners-Lee, for a "web of knowledge" in which data on the world wide web, whether in structured data stores or loosely-structured documents, would be annotated and classified so that machines can infer relationships based on the semantic information - that is, what the content means - rather than simply on the matching of text strings.[2] There is also a W3C standards effort[3] related to this concept.

In order to associate meaning with content, Semantic Web utilizes structures for identification, categorization and linking data. While a web page about soccer might specify how pictures and text should be arranged, what colors and font to use, and other presentation data, a similar Semantic Web document would convey the fact that the data pertained to the sport of soccer, perhaps a list of teams, scores of recent matches, and other data in categorization containers. This presentation allows other consumers (mainly programs) of the data to parse and utilize the data in meaningful ways. As opposed to modern web crawlers which must catalogue, index, and apply a certain amount of artificial intelligence to derive the meaning of documents on the web, semantic web allows data to be parsed easily for meaning - ultimately resulting in greater ability to share and discover information.

One interesting challenge that faces semantic web is the ability to not only transmit data, but also to associate metadata. Metadata is descriptive information that conveys relationships between data types. In order to provide a flexible framework that is capable of transmitting multiple different types of data, as well as the meaning and relationships of that data, semantic web has integrated metadata into the format. This allows dynamic and unpredictable data formats and types to be transmitted and consumed by facilitating consumers' ability to process data by utilizing the embedded metadata to parse and understand data and inter-relationships.[4]

What differentiates the Semantic Web from existing data structures is the use of URIs to uniquely identify things, and relationships between things. The sort of problem scenario that Semantic Web technologies try to solve are those involving multiple disparate source of data - for instance, hooking together train timetables and class timetables, so a student can automatically plan their travel itinerary without having to manually match the data together.

Semantic web is closely tied to microformats with are an alternative way to embed meaning into HTML documents. Microformats use standard HTML tags along with generally agreed upon conventions for attributes, in order to delineate certain data within documents. For instance, microformats can be used to embed contact data or calendar data in web pages for easy integration with other programs. This can allow users of popular calendaring or contact management software to simply click on elements within web pages and import calendar events, or contacts, directly into their calendaring or address book software.[5]

The W3C have put forward a variety of standards built on top of the Resource Description Framework, a formal semantic model for representing things and the relationships between them.

Competing Visions

The "Semantic Web" concept has evolved under competing understandings and visions. Historically, Cyc and the Knowledge Interchange Format (KIF) sought to provide a technological backbone for a similar grand vision of a universal knowledge acquisition by Artificial Intelligence researchers.[6] Apple's Knowledge Navigator represented a vision of networked hypertext with intelligent agent mediators. The Semantic Web was conceived with these goals in mind, but to extend and embed existing technologies in WWW stack of technologies while formally defining and standardizing the coordination of data exchange underlying the web.

These quite different sources of inspiration have aided in confusing the meaning of Semantic Web. On the one hand, the Semantic Web aims to create a machine-readable web through the coordinated linking of data and knowledge such that intelligent agents could be devised to provide precise answers to queries of arbitrary depth and nuance. On the other hand, it also seeks to improve human interaction and traditional linking and search of the web by incrementally imparting connections between individual pieces of those data embedded in documents and realized as "micro-transactions" of web activity, conventionally stored in relational databases.

Under the latter perspective, Semantic Web was developed to meet a specific deficiency in web based communications. Although well defined in RFC's, HTML is architected to perform exchange of information that is delimited and optimized for presentation. That is, the use of HTML is designed to communicate the appearance of documents within web browsers. This is useful when attempting to create a document that will render in the same form across multiple platforms (or web browsers) but is problematic for transmitting meaning of data. There are a few HTML specifications (notably META tags and other document head elements[7]) that convey meaning, but these are precious few.

Recent efforts have largely focused along a spectrum of both perspectives; that is, on the mechanical inference of relationships between particular data in tightly-coupled yet loosely-connected ontological domains.[8]

There is a final perspective that focuses less on the machine-readable component of Semantic Web (linking data in terms of relationships) than the universal metadata cataloging and tagging of existing documents and data. This perspective has received less attention, especially as advanced indexing and search tools - both with Google on the web as a whole and in individual curated collections - have largely addressed these needs.

Semantic Web Technologies

The stack of technologies comprising the Semantic Web infrastructure is largely standard and mature. HTTP URIs identify concepts and objects, RDF (Resource Descriptive Framework) describes a data model, OWL expresses ontological vocabularies, and the SPARQL permits operations on the resultant graph data.

Triplestore

Triplestore is the data convention utilized by Semantic Web and RDF to relate objects and meaning. Triplestore is a rather simple linguistic convention that makes it easy to classify data and make connections. Triplestore takes the form "Subject" - "Predicate" - "Object". For example:

Garden location Backyard
Firstrow location Garden
Firstrow plantedWith Beets
Firstrow plantedWith Carrots

Using this standard convention it is easy to catalogue data and to trace relationships between them. For instance, using the above example I can figure out what is planted in the first row of the garden in the backyard by tracing the relationships:

?Garden location Backyard -> finds the Garden I'm looking for
?Firstrow location Garden -> finds the row in the Garden just retrieved
Firstrow plantedwith ?Veggie -> gets the vegetable planted in the first row

This rather simple model makes it possible to define (and query) complex relationships without first having a defined data model. This convention gives semantic web the adaptability to handle evolving dynamic data without constraining that data. This also means that the model doesn't have to be redefined to deal with emerging data types.

Triplestores can be used to create complex graphs of data. When expressing these data using RDF/XML they are typically rendered as N-Triples, which are expressed in plain text and used for transmitting this data across the network. N-triples do contain redundancy, however, so when moving N-triples across the wire it is common to utilize the RDF N3 notation, which compresses the data by removing duplication.

RDFa

Although using RDF is compact, it is not easily human readable. RDFa is a response to the disparity of data presentations between XHTML and RDF. RDFa allows RDF data to be embedded in XHTML content. Using standard XHTML tags like the <span> tag semantic web data can be mixed into XHTML presentation. For example:

<span xmlns:example="http://example.tld/example/0.a" about="http://foo.tld/bar.rd#ts" property="example:bar" content="some_data">Some XHTML for presentation</span>

OWL

TODO

SPARQL

TODO


Programming with Semantic Web

Because RDF is an open format, libraries exist for almost every programming language to make it easy for programmers to produce and consume RDF data. Some examples include the RDF.rb[9] library for Ruby, JRDF[10] for Java, a PEAR[11] RDF package for PHP and many more.


Domain-specific semantic models

Medicine

Semantic models seem the major trend in expert support to medicine. As an example of how semantic methodologies are used, consider several isolated concepts, which could be considered "nouns":

One of the notations for relationships is the Unified Medical Language System® (UMLS®). Informally, some of the "verb" semantic relationships among the above could be:

  • beta-adrenergic antagonists TREAT hypertension and benign hand tremor
  • beta-adrenergic antagonists CAUSE bradycardia
  • beta-adrenergic antagonists TRIGGER asthma

"Hypertension" would have a number of other TREATS relations, from drug classes such as thiazide diuretics, angiotensin-II converting enzyme antagonists, calcium channel blockers, angiotensin-II receptor blockers, etc.

ULMS is now being extended with formal ontologies: [12]

Semantic Web in CMS

Content management systems (CMS) can benefit greatly from RDF features. RDF is an expressive means by which CMS can both publish and consume data. Because RDF makes data more easily machine readable it is perfect for systems that integrate data (such as CMS).

Drupal

The Drupal content management system is making a big push to include RDF and semantic web as part of the upcoming Drupal 7 release.[13] There is a Drupal group devoted to semantic web as well as a code sprint devoted to the topic. Drupal 7 will automatically include RDFa elements in page presentation. The will mean that new Drupal 7 sites will automatically include RDFa data without any additional overhead, coding, or administration necessary from site administrators. This powerful new feature will allow site users to leverage RDFa seamlessly. With over significant and growing market share of CMS, Drupal's support of semantic web will mean a vast increase in implementation of RDF.[14]

Wordpress

Wordpress has several third party plugins that implement RDF.[15]

MediaWiki

MediaWiki has Semantic MediaWiki to integrate the Semantic Web in a wiki setting.

Other Notable Uses

The BBC made heavy use of semantic web technologies for their internet coverage of the 2010 World Cup games.[16]

Facebook recently announced support for open graph protocol which is an RDF implementation of semantic web.

Google has announced support for "Rich Snippets" which appear as summary data in search results (for things like customer reviews, map location, etc.) utilizing RDFa. [17]

DBpedia is a project designed to extract structured data from the popular Wikipedia site.

References

  1. Entrepreneurs See a Web Guided by Common Sense. New York Times (2006).
  2. The Semantic Web, Scientific American Magazine, 2001
  3. W3C Semantic Web Frequently Asked Questions. W3C (2010). Retrieved on 2010-07-11.
  4. Segaran, Toby; Colin Evanas, Jamie Taylor (2009). Programming the Semantic Web. O'Reilly. 
  5. Microformats hCal example. Microformats.org (2010).
  6. Error on call to Template:cite web: Parameters url and title must be specifiedTim Berners-Lee (1998). .
  7. The global structure of an HTML document. W3C.
  8. (2003) "Which Semantic Web?". Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia, ACM.
  9. RDF library for the Ruby programming language.
  10. JRDF - An RDF Library in Java.
  11. RDF library for PHP from PEAR (PHP Extension and Application Repository).
  12. Burgun, Anita & Olivier Bodenreider, Mapping the UMLS Semantic Network into General Ontologies
  13. The RDFa initiative in Drupal 7, and how it will impact the Semantic Web.
  14. Drupal RDF Mapping API. Drupal.org (2009).
  15. Does Facebook Really Want a Semantic Web?. ReadWriteWeb (2010).
  16. BBC World Cup 2010 dynamic semantic publishing (2010).
  17. Google introduces rich snippets (2009).