Papers
Computational media aesthetics and Zettl's media literacy model: Applications in film sound research
Pedro Silva
psilva@sfsu.edu
Sep 12, 2007
1 Research statement
I am interested in developing a theoretical framework supporting the application of computational media aesthetics in film sound research, based on Herbert Zettl's media literacy model [12]. There are striking parallels between film sound theory research [10,7,1], computational media aesthetics [9,2,6], and encoding media aesthetics [13]. The proposed annotated bibliography would serve as a starting point to that investigation. Together, the three fields fall under three different paradigms of knowledge - the humanities, the hard sciences, and the social sciences. The research would be looking at how film sound theory can be construed as the undelying basis for computational media aesthetics, which in turn I plan to employ in my master thesis, which is essentially a social scientific effort at looking how the film sound track encoding has evolved since the 1930s. Since the methodology to do so is necessarily a decoding one, albeit at the computational level, the relation to Zettl's media literacy model should be evident. In other words, I would like to gather support for my argument that Zettl's four-tiered media encoding/decoding model actually combines this multidisciplinar approach involving the humanities, social science, and computer science, if applied to decoding analysis. Aesthetic vectors (1 level) may be extracted from media computationally; and semantic significance (4 level) may be inferred using film theory methods. However, the process itself is social scientific, and therefore statistically processed. And if so, then my longitudinal long-term research should be successful,2 The semantic gap
[4,6] argue that the work of computational media aesthetics (hence both computer scientists and media scholars) is the closing of the semantic gap: that is, to aproximate structural analysis from higher-level semantic inference from media. Computers have shown in the past decades that they are capable of acomplishing the first; communication, media, and, specifically in this case, film scholars have created an ontology that covers a broad range of "intellectual and cultural framework[s] for media criticism and theory" [12,p. 84-85]. Stipulating that computers are better analytical tools than humans, the next step is then to move computational techniques into the higher tiers of the media literacy model. This has been done successfully in some fields; specifically in sound, auditory scene analysis is able to detece audio scene changes. This is done primarily using context cues (Zettl's second level). Similarly, at level three, a number of projects have shown to be effective at determining audio program genres, for example. Through the use of cognitive models of perception, it is possible to extend this (as has been; see [11,5,2,8] automatic audio analysis in film sound research.3 Media theory, media practice
The media literacy model works both ways, however, and encoding is also an essential part of it. Again, there is a consistent argument throughout the computational media aesthetics literature that the main benefit of bridging the semantic gap is what can then be returned to encoding, or production. Davis [2,3] has demonstrated a number of applications of computational analysis techniques, both during capture or editing time, in faciliting subsequent encoding efforts, even by non-technical users. Such applications are not as foreseeably immediate in my research in film sound. However, there is a number of likely outcomes, should the investigation prove successful:- Development of an open sound classification system, which could be used as the underlying engine in a database cataloguing and retrieving application. Such application how be immediately useful to sound designers in managing their sound effects libraries1
- The same underlying engine could also be applied to the semantic web ontology. This would enable the creation of a media search engine for film sound. A user could theoretically provide an example file with an audio recording of an unknown movie, and the engine might output detailed data on its origins, including time and place, and whether any other movies of the same kind might exist.
- In the future, such engine might be coupled with a video-specific system such as those described by Davis, and therefore achieve higher efficiency in automating encoding and decoding processes, through their cross-referencing in an iterative feedback loop with one another.
4 Research proposal
Specifically, the research at hand would comprise a comprehensive review of the literature in this interdisciplinary field; to that end, it would include film sound theory, social scientific research, and computer science investigation. Not only that, it should include whatever literature there is (such as the examples cited elsewhere in this paper) that covers a minimum of two of these topics. The idea is to correlate on-going research at the computer science level (more specifically, in knowledge discovery, data mining, machine learning, and/or artificial intelligence) with the role of both analysts in media production (film theorists in the humanities and social-scientific researchers in media encoding) and synthesists (that is, actual media producers), by following the direction of Zettl's media literacy model arrows of both decoding and encoding. I intend to conclude by demonstrating how computational media aesthetics, as a field, is an answer to how to approach Zettl's media literacy model. In fact, Zettl himself has collaborated with computer scientists in [4], where he put closing the semantic gap on the laps of media researchers.References
- [1]
- Michel Chion. Audio-Vision. Columbia University Press, New York, 1994.
- [2]
- M. Davis. Editing out video editing. IEEE MultiMedia, 10(2):54-64, 2003a.
- [3]
- Marc Davis. Active capture: integrating human-computer interaction and computer vision/audition to automate media. In IEEE Conference on Multimedia and Expo Special Session on Moving from Features to Semantics using Computational Media Aesthetics, Baltimore, MD, 2003b.
- [4]
- C. Dorai, A. Mauthe, F. Nack, L. Rutledge, T. Sikora, and H. Zettl. Media semantics: who needs it and why? Proceedings of the tenth ACM international conference on Multimedia, pages 580-583, 2002.
- [5]
- C. Dorai and S. Venkatesh. Bridging the semantic gap in content management systems: Computational media aesthetics. Proceedings of the First Conference on Computational Semiotics for Games and New Media-COSIGN, pages 94-99, 2001.
- [6]
- C. Dorai and S. Venkatesh. Bridging the semantic gap with computational media aesthetics. IEEE MultiMedia, 10(2):15-17, 2003.
- [7]
- C. Metz and G. Gurrieri. Aural objects. Yale French Studies, (60):24-32, 1980.
- [8]
- P. Mulhem, M.S. Kankanhalli, J. Yi, and H. Hassan. Pivot vector space approach for audio-video mixing. IEEE MultiMedia, 10(2):28-40, 2003.
- [9]
- F. Nack, C. Dorai, and S. Venkatesh. Computational media aesthetics: Finding meaning beautiful. IEEE Multimedia, 8(4):10-12, 2001.
- [10]
- P. Schaeffer. Traite de objets musicaux. Candide, Paris, rev. edition, 1968.
- [11]
- B.T. Truong, S. Venkatesh, and C. Dorai. Application of computational media aesthetics methodology to extracting color semantics in film. Proceedings of the tenth ACM international conference on Multimedia, pages 339-342, 2002.
- [12]
- Herbert Zettl. Contextual media aesthetics as the basis for media literacy. Journal of Communication, 48(1), 1998.
- [13]
- Herbert Zettl. Sight Sound Motion: applied media aesthetics. Wadsworth Publishing, Belmont, 3rd edition, 1999.
Footnotes:
1Such a system is already in place and patented by Muscle Fish. It is not open, however, being closed close and proprietary, and therefore can not be freely used in academic researchFile translated from TEX by TTH, version 3.67.
On 12 Sep 2007, 00:24.
