Semantic Web for the Working Ontologist. Dean Allemang
Чтение книги онлайн.

Читать онлайн книгу Semantic Web for the Working Ontologist - Dean Allemang страница 17

Название: Semantic Web for the Working Ontologist

Автор: Dean Allemang

Издательство: Ingram

Жанр: Программы

Серия: ACM Books

isbn: 9781450376167

isbn:

СКАЧАТЬ communal knowledge out of a chaotic mess of information. This was nicely illustrated in the Pluto example.

      The next several chapters of the book introduce each of the modeling languages of the Semantic Web and illustrate how they approach the challenges of modeling in a Semantic Web context. For each modeling language—RDF, RDFS, and OWL—we will describe the technical details of how the language works, with specific examples “in the wild” of the standard in use.

       Fundamental concepts

      The following fundamental concepts were introduced in this chapter.

      • Modeling—Making sense of unorganized information.

      • Formality/informality—The degree to which the meaning of a modeling language is given independent of the particular speaker or audience.

      • Commonality and variability—When describing a set of things, some of them will have some things in common (commonality), and some will have important differences (variability). Managing commonality and variability is a fundamental aspect of modeling in general, and of Semantic Web models in particular.

      • Expressivity—The ability of a modeling language to describe certain aspects of the world. More expressive modeling language can express a wider variety of statements about the model. Modeling languages of the Semantic Web—RDF, RDFS, and OWL—differ in their levels of expressivity.

      3 RDF—The Basis of the Semantic Web

      Resource Description Framework (RDF), Resource Description Framework Schema (RDFS), and Web Ontology Language (OWL) are the basic representation languages of the Semantic Web, with RDF serving as the foundation. RDF addresses one fundamental issue in the Semantic Web: managing distributed data. All other Semantic Web standards build on this foundation of distributed data. RDF relies heavily on the infrastructure of the Web, using many of its familiar and proven features, while extending them to provide a foundation for a distributed network of data and the resulting paradigm of linked data on the Web will be explained in detail in Chapter 5.

      The Web that we are accustomed to is made up of hypertext documents that are linked to one another. Any connection between a document and the thing(s) in the world it describes is made only by the person who reads it. There could be a link from a document about Shakespeare to a document about Stratford-upon-Avon, but there is no notion of an entity that is Shakespeare or linking it to the thing that is Stratford.

      In the Semantic Web we refer to the things in the world as resources; a resource can be anything that someone might want to talk about. Shakespeare, Stratford, “the value of X,” and “all the cows in Texas” are all examples of things someone might talk about and that can be resources in the Semantic Web. This is admittedly a pretty odd use of the word “resource,” but alternatives like “entity” or “thing,” which might be more accurate, have their own issues. In any case, resource is the word used in the Semantic Web standards. In fact, the name of the base technology in the Semantic Web (RDF) uses this word in an essential way: RDF stands for Resource Description Framework.

      In a web of information, anyone can contribute to our knowledge about a resource. It was this aspect of the current Web that allowed it to grow at such an unprecedented rate. To implement the Semantic Web, we need a model of data that allows information to be distributed over the Web.

      Data are most typically represented in tabular form, in which each row represents some item we are describing, and each column represents some property of those items. The cells in the table are the particular values for those properties. Table 3.1 shows a sample of some data about works completed around the time of Shakespeare.

      Let’s consider a few different strategies for how these data could be distributed over the Web. In all of these strategies, some part of the data will be represented on one computer, while other parts will be represented on another. Figure 3.1 shows one strategy for distributing information over many machines. Each networked machine is responsible for maintaining the information about one or more complete rows from the table. Any query about an entity can be answered by the machine that stores its corresponding row. One machine is responsible for information about Sonnet 78 and Edward II, whereas another is responsible for information about As You Like It.

      This distribution solution provides considerable flexibility, since the machines can share the load of representing information about several individuals. But because it is a distributed representation of data, it requires some coordination between the servers. In particular, each server must share information about the columns. Does the second column on one server correspond to the same information as the second column on another server? This is not an insurmountable problem, and, in fact, it is a fundamental problem of data distribution. There must be some agreed-on coordination between the servers. In this example, the servers must be able, in a global way, to indicate which property each column corresponds to.

      Figure 3.2 shows another strategy, in which each server is responsible for one or more complete columns from the original table. In this example, one server is responsible for the publication dates and medium, and another server is responsible for titles. This solution is flexible in a different way from the solution of Figure 3.1. The solution in Figure 3.2 allows each machine to be responsible for one kind of information. If we are not interested in the dates of publication, we needn’t consider information from that server. If we want to specify something new about the entities (say, how many pages the manuscript is), we can add a new server with that information without disrupting the others.

      This solution is similar to the solution in Figure 3.1 in that it requires some coordination between the servers. In this case, the coordination has to do with the identities of the entities to be described. How do I know that row 3 on one server refers to the same entity as row 3 on another server? This solution requires a global identifier for the entities being described.

Image Image

      Figure 3.1 Distributing data across the Web, row by row.

      The strategy outlined in Figure 3.3 is a combination of the previous two strategies, in which information is neither distributed row by row nor column by column but instead is distributed cell by cell. Each machine is responsible for some number of cells in the table. This system combines the flexibility of both of the previous strategies. Two servers can share the description of a single entity (in the figure, the year and title of Hamlet are stored separately), and they can share the use of a particular property (in Figure 3.3, the Medium of rows 6 and 7 are represented on different servers).

Image

СКАЧАТЬ