Skip to main content

Papers

Computational media aesthetics as a methodological framework for automated content analysis of film sound

Computational media aesthetics as a methodological framework for automated content analysis of film sound

Pedro Silva
psilva@sfsu.edu
Broadcast & Electronic Communication Arts department
San Francisco State University

26 November 2006

1  Updated research question

I am focusing on understanding whether computational media aesthetics (CMA), as a field and a theory, can be used in my long-term research. This will be a longitudinal study of the evolution of the film sound track in Hollywood, from the dominance of the sound film starting in the early 1930s through today -specifically looking at two sets of variables: the sound track elements of dialogue, music, noise, and silence, and the concept of an auditory scene, independent from a visual scene. The methodology will be automated content analysis; due to its speed and efficiency, this is the underlying catalyst to such a comprehensive study. However, while research on the capabilities of audio-specific data mining has been conducted, particularly in the fields of music information retrieval (MIR) and speech processing, no applications have resulted in film sound investigation. I thus propose to research the possibility of effectively analyzing content in the described setting automatically. This implies that the technology used for this effect must be able to extract and cluster low-level audio features, or units of analysis, and organize them into semantic units, or classes, before any statistically meaningful conclusions may be drawn. There are, then, two components to this methodology: the technological and the technical. The technological component is related to the state-of-the-art in computational data mining; the technical aspect derives from the field of media aesthetics: the study of semantic significance in production practice. Therefore, my research is based on the computational media aesthetics theoretical framework.

2  Computational media aesthetics

2.1  Description

Computational media aesthetics contends that "we must understand compositional and aesthetic media principles to guide [automated] content analysis" [,p. 10]. It is defined as the älgorithmic study of a variety of image and aural elements in media (based on film grammar). It is also the computational analysis of the principles that have emerged underlying their manipulation in the creative art of clarifying, intensifying, and interpreting an event for an audience" (). It attempts to address a problem raised by multimedia content management (MCM): the semantic gap [,p. 18]. The semantic gap is "the gulf between the rich meaning and interpretation that users expect systems to associate with their queries for searching and browsing media and the shallow, low-level features (content descriptions) that the systems actually compute" [,p. 15]. identify two sources for analyzing and interpreting media:
F
irst, structuralism is used in film studies as an analytical tool. It consists of segmenting content -film, in this case-, and analyzing and interpreting the resulting sections, usually based upon a semiotic approach.
S
econd, film grammar, they consider, is a far richer grounding for the automated content analysis of media
. Film grammar constitutes an effective ontology of production knowledge, in that it fairly represents a "worldwide use [of] accepted rules and techniques to solve problems in transforming a story from a written script to a captivating visual and aural narration" <Arijon, 1976, cited in>[p. 10]nack2001. These rules and others, covering production practices in other fields of the media, combine to form the general field of media aesthetics. This field's researchers, Zettl argues in , are responsible for addressing the problem of the semantic gap. What should result from this description of CMA is its underlying interdisciplinary nature, as the diverse contributing panel in exemplifies. Although computer science and media aesthetics are its main areas, these further subdivide into speech processing, MIR, content-based image retrieval (CBIR), audiovisual segmentation, classification and indexing, film aesthetics, and sound aesthetics.

2.2  Critique

(2001) have their concepts of structuralism and film ontology as pertaining to two different levels of analysis supported by previous research: citing , summarizes:
The three types of indexes that are generally required [in voice-on-demand applications by end-users], of which two are of interest:
  • Structural (for example, segments, scenes, and shots), and
  • Content (for example, objects and actors in scenes).
Structuralism does not address the semantic gap problem. However, the use of a film ontology in computed analysis must necessarily build upon the results of segmentation and clustering of low-level description units into high-level semantic units. This means one builds on the other. I argue that this hierarchical structure is essential, because it accounts for the development of technology and its specific techniques. While the semantic gap is not resolved, one may implement structuralist approaches in insightful research. reviews in detail past implementations of CMA. There have been multiple attempts extracting high-level meaning from multimedia content, starting with structuralist approaches during the mid-late nineties <for example, see>pffeiffer96,lienhart95,guimaraes1998 through current research in content-based semantic analysis <see>truong2002,davis2003,mulhem2003. Most of these investigative efforts have used film-specific terminology to guide their efforts. One striking example is the authors's own project in , in which film grammar is used in semantic construct extraction of tone, shot rhythm, and pace in film; this is done using a "primitive feature extraction", which is simply a structural projection of Zettl's lighting, color, time-motion and sound (p. 96), although they also extract simpler features such as shot length and type. Moreover, media aesthetics, and specifically Zettl's Sight Sound Motion , is the framework most widely used for such a terminology. This evidences how automated content analysis has taken into consideration the basic CMA assumptions. Therefore, one may say such efforts have been adequately implemented.

2.3  Relevance

To understand the relevance of CMA to my research, its premises should be analyzed in detail. These, I've shown, are as follows:
My research question is concerned with the possibility of using automated content analysis techniques to gather accurate longitudinal data on
t
he evolution of the sound track elements -speech, music, noise, and silence- and
t
he correlation between auditory versus visual scene parameters
such as length and relative position (i.e. coincidental, asynchronous, etcetera) over an extended period of time. Ideally, such time frame should span across the entire length of the existence of the sound film in a single place (i.e. United States, France), industry (i.e. Hollywood, Bollywood), school of practice (i.e. French modernist, Russian contrapuntal sound), or genre (i.e. horror, comedy). These requirements pose constraints on the methodology: obviously, such system must be able to accurately classify the sound track elements -the structuralist approach. Additionally, there is a semantic requirement, and that is that the system is able to classify, through whatever low-level features necessary, when an auditory scene begins and ends, and what are some of its characteristics. A less obvious requirement is that this process be fast enough for a timely classification of an adequate sample size of a universe that most likely will conglomerate seven decades of film production. An assessment of the relevant literature shows that these three main requirements are probably achievable with current technology and techniques. Conceptually, my methodology for doing so is computational media aesthetics. This theory describes the process of automating the process of analyzing content, starting with a simple structuralist approach in dividing media elements (whatever they may be), clustering these in categories, and classifying such categories. It combines a multitude of structuralist approaches with production knowledge, or grammar, or terminology -aesthetic principles, simply put- to close in on the semantic gap, thereby creating a system capable of inferring higher-level meaning from simple descriptors. Computational efficiency is achievable currently, and this supports a longitudinal approach to the research. The fact that CMA is a methodological framework is defendable by the fact that the research it supports is strictly intended to test a set of tools and techniques.
Nevertheless, there are striking limitations to this methodology: the obvious one is that the questions that I ask are necessarily bound by the limits of the technology. Until even higher-level semantics are computable from a medium such as audio or video, one can not investigate more complex production practices. Because of that, I am chiefly concerned with appraising what my methodological limitations are. Hence a preliminary study (the object of my proposal and research question updated here) to determine whether the theoretical considerations exposed in this paper are indeed correct.

References

[]
Adams, B.2003. Where does computational media aesthetics fit? , (2), 18-27.
[]
Davis, M.2003. Editing out video editing. , (2), 54-64.
[]
Dorai, C., Mauthe, A., Nack, F., Rutledge, L., Sikora, T.  Zettl, H.2002. Media semantics: who needs it and why? , 580-583.
[]
Dorai, C.  Venkatesh, S.2001. Bridging the semantic gap in content management systems: Computational media aesthetics. , 94-99.
[]
Dorai, C.  Venkatesh, S.2003. Bridging the semantic gap with computational media aesthetics. , (2), 15-17.
[]
Guimaraes, N., Correia, N., Oliveira, I.  Martins, J.1998. Designing computer support for content analysis: A situated use of video parsing and analysis techniques. (7), 159-180.
[]
Lienhart, R., Pfeiffer, S.  Effelsberg, W.1995.  (Technical Report  TR-95-034). Mannheim: University of Mannheim, Department for Mathematics and Computer Science.
[]
Mulhem, P., Kankanhalli, M., Yi, J.  Hassan, H.2003. Pivot vector space approach for audio-video mixing. , (2), 28-40.
[]
Nack, F., Dorai, C.  Venkatesh, S.2001. Computational media aesthetics: Finding meaning beautiful. , (4), 10-12.
[]
Pfeiffer, S., Fischer, S.  Effelsberg, W.1996.  (Technical Report \ TR-96-008). Mannheim: University of Mannheim, Department for Mathematics and Computer Science.
[]
Rowe, L., Boreczky, J.  Eads, C.1994. Indexes for user access to large video databases. , 150-161.
[]
Truong, B., Venkatesh, S.  Dorai, C.2002. Application of computational media aesthetics methodology to extracting color semantics in film. , 339-342.
[]
Zettl, H.1999.  (3rd ). Belmont: Wadsworth Publishing.



File translated from TEX by TTH, version 3.77.
On 24 Apr 2007, 23:50.

Accessibility
  • Creative Commons License
  • Valid XHTML 1.0 Strict
  • Valid CSS!
  • Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0

This page employs valid XHTML 1.0 Strict and CSS for cross-browser compatibility.

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.