Название: Domain-Sensitive Temporal Tagging
Автор: Jannik Strötgen
Издательство: Ingram
Жанр: Программы
Серия: Synthesis Lectures on Human Language Technologies
isbn: 9781681731858
isbn:
TEMPORAL INFORMATION CAN BE ORGANIZED HIERARCHICALLY
Temporal expressions can be of different granularities. For example, they can be of granularity day (e.g., “August 3, 1992”), month (e.g., “August 1992”), or year (e.g., “1992”). Due to the fact that years consist of months and months consist of days, expressions of one granularity (e.g., day) can be mapped to coarser granularities (e.g., month or year) based on the hierarchy of temporal information. In Figure 2.3, this hierarchy information is shown using the concept of timelines. A timeline is associated with a specific granularity (e.g., tday, tmonth, tquarter, tyear) so that expressions of respective granularities can be placed on the timelines as points in time. Note, however, that coarse expressions represent a point on the timeline with the same granularity (e.g., “August 1992” on tmonth) but span a time interval on finer granularities (e.g., “August 1992” spans from “August 1, 1992” to “August 31, 1992” on tday).
Figure 2.3: Temporal information can be organized hierarchically. The blue triangles show how points on coarser timelines (e.g., “1990s” on tdecade) span an interval on finer timelines (e.g., “1990s” spans from “1990” to “1999” on tyear).
2.2 TEMPORAL EXPRESSIONS IN DOCUMENTS
There are different types of temporal expressions according to what kind of temporal information an expression refers to, for example, a point in time or a duration. Note that we use the term point in time to refer to an expression if it can be anchored on a timeline of any granularity although, strictly speaking, expressions of coarse granularities span a time interval on finer granularities (cf. Figure 2.3).
In the context of temporal tagging, it is common practice to distinguish between the following four types of expressions—as it is specified in the temporal markup language TimeML, which will be detailed in Section 3.1 together with further annotation standards.
• Date expressions: A date expression refers to a point in time of the granularity “day” (e.g., “July 10, 2015”) or any other coarser granularity, for example, “month” (e.g., “July 2015”) or “year” (e.g., “2015”).
• Time expressions: A time expression refers to a point in time of any granularity smaller than “day” such as a part of a day (e.g., “Friday morning”) or time of a day (e.g., “3:30 pm”).
• Duration expressions: A duration expression provides information about the length of an interval. They can refer to intervals of different granularities (e.g., “three hours” or “five years”). In addition to the length of the interval, it might also be possible to specify the point in time when the interval starts or ends. However, the main semantics of a duration expression is about the length of the interval.
• Set expressions: A set expression refers to the periodical aspect of an event, that is, it describes a set of times or dates (e.g., “every Monday”) or a frequency within a time interval (e.g., “twice a week”).
As mentioned above, date expressions—and also (coarse) time expressions—can also be considered as time intervals since there is always a smaller temporal unit out of which such expressions consist, for example, a single “day” as a point in time consists of hours and could thus be regarded as a duration of the granularity “hour”. However, time and date expressions can be placed on timelines as single points—although the timelines are of different granularities depending on the expressions, as exemplified in Figure 2.3. In contrast, a duration expression cannot be placed on a timeline as a single point although the point in time when the interval starts or ends might be specified in addition to the length of the interval. Thus, time and date expressions of different granularities are not treated as durations despite the fact that they often have a duration.
2.3 REALIZATIONS OF TEMPORAL EXPRESSIONS
Temporal expressions, in particular those of the types “date” and “time”, can be realized in natural language in several different ways. Besides the fact that the full variety of realizations should be covered and thus extracted by a temporal tagger, a major issue is that depending on the realization, the difficulty in the normalization of date and time expressions varies significantly.
Many different terms have been used in the literature to describe various realizations and characteristics of point expressions, and a brief survey of alternative namings and their descriptions is given below. In this book, we use the four types of realizations described by Strötgen [2015], whose namings are motivated by observations earlier discussed in the literature. However, the goal of the four types is to cover those characteristics of point expressions that are particularly relevant for temporal tagging. In Table 2.1, the four categories are shown with sample expressions and an explanation of what information is required for their normalization.
• Explicit expressions: Explicit expressions are date and time expressions that carry all the required information for their normalization. Thus, no further knowledge or context information is required, the expressions are fully specified and context-independent. For example, the expressions of the granularity day “March 11, 2013” and of the granularity month “March 2013” can be directly normalized to 2013-03-11
and 2013-03
, respectively.
• Implicit expressions: Implicit expressions can be normalized once their implicit temporal semantics is known. Thus, this category is designed specifically for named dates. Examples are holidays that can be directly mapped to a point in time. A simple implicit expression is “Christmas 2013” since Christmas refers to December 25. Thus, the expression can be normalized to 2013-12-25
. A more complex example is “Columbus Day 2013” since Columbus Day is scheduled as the second Monday in October. Some calendar calculations have to be performed to normalize the expression to 2013-10-14
.
Table 2.1: The four categories how temporal expressions can be realized with examples and an overview of information required for their normalization
• Relative expressions: In contrast to explicit and implicit expressions, relative expressions cannot be normalized without context information. More precisely, a reference time has to be detected to normalize expressions such as “today” and “the following year”. For some relative expressions, the reference time is the point in time when the expression was formulated (e.g., for “today”) while the reference time of other expressions is a point in time mentioned in the context of the expression (e.g., in the statement “in 2000 … in the following year”, 2001
is the normalized value of “the following year” since “2000” is the reference time). In both cases, the reference time is the only required information, because the relation to the reference time is carried by the expressions.
• Underspecified expressions: For the normalization of underspecified expressions, the relation to the reference time is required in addition to the reference time itself. For instance, expressions СКАЧАТЬ