Papers
Computational media aesthetics as a methodological framework for automated content analysis of film sound
Pedro Silva
psilva@sfsu.edu
Broadcast & Electronic Communication Arts department
San Francisco State University
26 November 2006
1 Updated research question
I am focusing on understanding whether computational media aesthetics (CMA), as a field and a theory, can be used in my long-term research. This will be a longitudinal study of the evolution of the film sound track in Hollywood, from the dominance of the sound film starting in the early 1930s through today -specifically looking at two sets of variables: the sound track elements of dialogue, music, noise, and silence, and the concept of an auditory scene, independent from a visual scene. The methodology will be automated content analysis; due to its speed and efficiency, this is the underlying catalyst to such a comprehensive study. However, while research on the capabilities of audio-specific data mining has been conducted, particularly in the fields of music information retrieval (MIR) and speech processing, no applications have resulted in film sound investigation. I thus propose to research the possibility of effectively analyzing content in the described setting automatically. This implies that the technology used for this effect must be able to extract and cluster low-level audio features, or units of analysis, and organize them into semantic units, or classes, before any statistically meaningful conclusions may be drawn. There are, then, two components to this methodology: the technological and the technical. The technological component is related to the state-of-the-art in computational data mining; the technical aspect derives from the field of media aesthetics: the study of semantic significance in production practice. Therefore, my research is based on the computational media aesthetics theoretical framework.2 Computational media aesthetics
2.1 Description
Computational media aesthetics contends that "we must understand compositional and aesthetic media principles to guide [automated] content analysis" [,p. 10]. It is defined as the älgorithmic study of a variety of image and aural elements in media (based on film grammar). It is also the computational analysis of the principles that have emerged underlying their manipulation in the creative art of clarifying, intensifying, and interpreting an event for an audience" (). It attempts to address a problem raised by multimedia content management (MCM): the semantic gap [,p. 18]. The semantic gap is "the gulf between the rich meaning and interpretation that users expect systems to associate with their queries for searching and browsing media and the shallow, low-level features (content descriptions) that the systems actually compute" [,p. 15]. identify two sources for analyzing and interpreting media:- F
- irst, structuralism is used in film studies as an analytical tool. It consists of segmenting content -film, in this case-, and analyzing and interpreting the resulting sections, usually based upon a semiotic approach.
- S
- econd, film grammar, they consider, is a far richer grounding for the automated content analysis of media
2.2 Critique
(2001) have their concepts of structuralism and film ontology as pertaining to two different levels of analysis supported by previous research: citing , summarizes:The three types of indexes that are generally required [in voice-on-demand applications by end-users], of which two are of interest:Structuralism does not address the semantic gap problem. However, the use of a film ontology in computed analysis must necessarily build upon the results of segmentation and clustering of low-level description units into high-level semantic units. This means one builds on the other. I argue that this hierarchical structure is essential, because it accounts for the development of technology and its specific techniques. While the semantic gap is not resolved, one may implement structuralist approaches in insightful research. reviews in detail past implementations of CMA. There have been multiple attempts extracting high-level meaning from multimedia content, starting with structuralist approaches during the mid-late nineties <for example, see>pffeiffer96,lienhart95,guimaraes1998 through current research in content-based semantic analysis <see>truong2002,davis2003,mulhem2003. Most of these investigative efforts have used film-specific terminology to guide their efforts. One striking example is the authors's own project in , in which film grammar is used in semantic construct extraction of tone, shot rhythm, and pace in film; this is done using a "primitive feature extraction", which is simply a structural projection of Zettl's lighting, color, time-motion and sound (p. 96), although they also extract simpler features such as shot length and type. Moreover, media aesthetics, and specifically Zettl's Sight Sound Motion , is the framework most widely used for such a terminology. This evidences how automated content analysis has taken into consideration the basic CMA assumptions. Therefore, one may say such efforts have been adequately implemented.
- Structural (for example, segments, scenes, and shots), and
- Content (for example, objects and actors in scenes).
2.3 Relevance
To understand the relevance of CMA to my research, its premises should be analyzed in detail. These, I've shown, are as follows:- Automated content analysis entails an understanding of the basic compositional and aesthetic media principles
- Such understanding must be projected into algorithmic procedures
- The computational execution should be informed by the relevant grammar -film, as proposed by
- Low-level structuralist descriptions are built upon to form higher levels of meaning
- In addition to this, CMA represents a strictly methodological framework for my research.
- t
- he evolution of the sound track elements -speech, music, noise, and silence- and
- t
- he correlation between auditory versus visual scene parameters
References
- []
- Adams, B.2003. Where does computational media aesthetics fit? , (2), 18-27.
- []
- Davis, M.2003. Editing out video editing. , (2), 54-64.
- []
- Dorai, C., Mauthe, A., Nack, F., Rutledge, L., Sikora, T. Zettl, H.2002. Media semantics: who needs it and why? , 580-583.
- []
- Dorai, C. Venkatesh, S.2001. Bridging the semantic gap in content management systems: Computational media aesthetics. , 94-99.
- []
- Dorai, C. Venkatesh, S.2003. Bridging the semantic gap with computational media aesthetics. , (2), 15-17.
- []
- Guimaraes, N., Correia, N., Oliveira, I. Martins, J.1998. Designing computer support for content analysis: A situated use of video parsing and analysis techniques. (7), 159-180.
- []
- Lienhart, R., Pfeiffer, S. Effelsberg, W.1995. (Technical Report TR-95-034). Mannheim: University of Mannheim, Department for Mathematics and Computer Science.
- []
- Mulhem, P., Kankanhalli, M., Yi, J. Hassan, H.2003. Pivot vector space approach for audio-video mixing. , (2), 28-40.
- []
- Nack, F., Dorai, C. Venkatesh, S.2001. Computational media aesthetics: Finding meaning beautiful. , (4), 10-12.
- []
- Pfeiffer, S., Fischer, S. Effelsberg, W.1996. (Technical Report \ TR-96-008). Mannheim: University of Mannheim, Department for Mathematics and Computer Science.
- []
- Rowe, L., Boreczky, J. Eads, C.1994. Indexes for user access to large video databases. , 150-161.
- []
- Truong, B., Venkatesh, S. Dorai, C.2002. Application of computational media aesthetics methodology to extracting color semantics in film. , 339-342.
- []
- Zettl, H.1999. (3rd ). Belmont: Wadsworth Publishing.
File translated from TEX by TTH, version 3.77.
On 24 Apr 2007, 23:50.
