Papers
A film sound perspective on Teaching computers to watch television: Content based image retrieval for content analysis
Pedro Silva1
psilva@sfsu.edu
16 October 2006
1 Introduction
Teaching computers to watch television: Content based image retrieval for content analysis [1] is relevant to those wanting to research in the social sciences in general, and communications and media in particular. It reviews the state-of-the-art in automated content analysis systems; it examines a number of exemplary systems; it identifies emerging technologies and standards; and it considers the role of human coding and intelligence in their effectiveness. Nevertheless, the paper is not without its shortcomings. It is image-centric, disregarding even its own conclusions about the effectiveness of such an approach; it fails to consider content analysis of audio independently from video; emerging technologies, arguably fundamental even in Evans's own perspective, aren't treated in depth; it places too much emphasis on human assisted coding and analysis, again defeating its own purpose. The first part briefly describes the general structure of Evans's study. The second section argues that an updated review of the state-of-the-art in automated content analysis systems is in order; since Evans's conclusions regarding effectiveness of multi-modal analysis systems were undoubtedly right, there was no reason to designate the object of his study Content Based Image Retrieval (CBIR); human assisted, semi-automated content analysis defeats the initial objective of arguing for the use of automated media content analysis systems for television and film research.2 Article summary
Considering it is a review article, Evans's piece introduces his argument in an effective manner: starting by briefly addressing the current state-of-the-art, he considers asset management to be driving force behind research in CBIR systems [1,246], and the reduction of time, money and error issues to be the main priority of CBIR developers. He then moves on to reason that social scientists have traditionally been interested in content analysis of the media, but currently do not have a large amount of computerized tools at their disposal [1,247].2.1 CBIR technology and history
Evans provides us with a history of CBIR systems, ranging from early still image processing to current multimedia applications. He defines pixel patterns and color histograms as essential in automated segmentation of texture, shape and dominant objects. [1,247-248]. He further identifies current applications of CBIR in the academic, military and medical research fields [1,248]. Worthy of note is the Movie Content Analysis Project (MoCA), which has attained 90% rates in identifying human faces in still images. Still, his main focus is undoubtedly the moving image. Evans mentions shot detection as an obvious useful application of CBIR systems, much in the same way as camera motion type, etc. [1,249]. This is stressed to be of particular use to social scientists in that high-level video parsing may be employed to extract semantic meaning from video repositories. He then mentions the incorporation of text (captions) and speech into CBIR [1,249-250]. He provides examples of such systems currently in use. Likewise, audio is considered an important element in scene segmentation, for example, with the MoCA project again being cited [1,250]. And much in the same manner, Evans also proposes the inclusion of industry codes and ratings as a way of improving CBIR [1,250-251]. This, however, he considers not to be currently a focus of research, unlike the previous media. The MPEG-7 standard, at the time near completion, is also discussed as a promising new set of descriptor classes that could eventually be used in social science as well. However, he does not develop his analysis of the applicability of the standard to content analysis in film or television. Still, the article "puts is all together" by exemplifying multimedia CBIR systems - that is, those incorporating video, audio and text - as the most successful to date. The Name-It, Videologger and MoCA Workbench systems support his claim [1,251].2.2 Applicability to social science
Evans considers current applications of CBIR in social science to be scarce, although the technology is available. He provides a number of examples of content analysis research employing CBIR, namely [2] and Zhang et al. Still, he considers the preprocessing of video as the most likely application of CBIR to social science currently and in the near future, as a way of enabling a better human coding practice [1,252]. Assessing a high-level of expertise and high technology costs in current CBIR, social researchers are advised to at least seek computerized assistance in content analysis.3 Critique
I mentioned before a number of strengths and shortcomings apparent in Evans's piece. I discuss these in this section.3.1 The problem of image-centrism
Evans himself deduced it: the most successful systems necessarily incorporate other media in addition to video, most notably audio and text. It is, then, rather unfortunate he chose to focus the attention on the image. Assuming the goal of his review is to engage social scientists in using automated content analysis, Evans would do well not to alienate those also concerned with sound or text content analysis. If, on the other hand, he intended to stir interesting computer science research directly applicable to communications, it could be considered irresponsible to disregard those with an interest in sound in the social sciences.3.2 The paradox of human-coded automated content analysis
While it is obvious that Evans is considering a transitional phase where CBIR (although, we have seen, CBIR is not an adequately representative term) could be used in an incremental manner by social science, this conclusion seems to render all his previous argumentative efforts inoperative. That is, after spending so much time on the basis of automated content analysis technology and its current possibilities (arguably, endless), that he chooses to step down and concede a transitional phase seems self-defeating. Again, assuming his goal is to stir up social scientists to the benefits of (already available) CBIR systems.3.3 MPEG-7's importance
Evans had no way of knowing the future of the MPEG-7 standard. There are, nowadays, a number of tools which employ low-level descriptors (LLDs) as units of content analysis [5,3]. These are effectively parsed, with high-levels of success rate, into coherent semantic units which can be categorized as in a traditional human-coded content analysis. It is, then, understandable his failure to stress its importance.3.4 The strength of Evan's topic
Despite its shortcomings, Teaching computers to watch television is important, in more than one aspect: the year it was published was an essential one, as the MPEG-7 standard was ready in 2001. Thus, its publication just before may have alerted many social scientists to the ready benefits of automated content analysis; it is a comprehensive review of the technical and social literature, describing with more than mere superficial detail its inner workings, techniques and tools. It is thus a good starting point for someone starting out in the field; it provides a solid, balanced overview of both technology and its social science application. This is important in that most research tends to fall into the purely technical [2,8,4,7,5,3], or the purely social [6,9,10,11]. Evans's methodology of secondary documentary analysis was perfectly adequate, I believe, to the task at hand of reviewing computer science and social science perspectives of automated content analysis in film and television. The CBIR examples given are welcome, and solidly support the argument that emerging technologies can be extrapolated to enhance these systems's capabilities and further their use to social researchers. It is disappointing, however, that Evans backs away from its own conclusions and offers perhaps too conservative a view of the possibilities of automated content analysis in producing large amounts of data with a high level of meaning to social scientists.References
- [1]
- William Evans. Teaching computers to watch television: Content based image retrieval for content analysis. Social Science Computer Review, 18(3):246-257, 2000.
- [2]
- Nuno Guimarães, Nuno Correia, Inês Oliveira, and João Martins. Designing computer support for content analysis: A situated use of video parsing and analysis techniques. Multimedia Tools and Applications, (7):159-180, 1998.
- [3]
- Hyoung-Gook Kim, Nicolas Morea, and Thomas Sikora. Audio classification based on the MPEG-7 spectral basis representations. IEEE Transactions on Circuits and Systems for Video Technology, 14(5):716-725, May 2004.
- [4]
- Rainer Lienhart, Silvia Pfeiffer, and Wolfgang Effelsberg. The MoCA Workbench: support for creativity in movie content analysis. Technical Report TR-95-034, University of Mannheim, Department for Mathematics and Computer Science, Mannheim, 1995.
- [5]
- José M. Martinez, Rob Koenen, and Fernando Pereira. MPEG-7: the generic multimedia content description standard, part 1. IEEE Multimedia, 09(2):78-87, April-June 2002.
- [6]
- Elizabeth Monk-Turner, Peter Ciba, Matthew Cunningham, P. Gregory McIntire, Mark Pollard, and Rebecca Turner. A content analysis of violence in american war movies. Analyses of Social Issues and Public Policy, 4(1):1-11, 2004.
- [7]
- Tin Lay Nwe and Haizhou Li. Broadcast news segmentation by audio type analysis. In Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05)., volume 2, pages 1065-1068. IEEE International Conference, March 2005.
- [8]
- Silvia Pfeiffer, Stephan Fischer, and Wolfgang Effelsberg. Automatic audio content analysis. Technical Report TR-96-008, University of Mannheim, Department for Mathematics and Computer Science, Mannheim, 1996.
- [9]
- Srividya Ramasubramanian. A content analysis of the portrayal of india in films produced in the west. The Howard Journal of Communications, 16(4):243-265, 2005.
- [10]
- Barry S. Sapolsky, Fred Molitor, and Sarah Luque. Sex and violence in slasher films: Re-examining the assumptions. Journalism and Mass Communication Quarterly, 80(1):28-38, 2003.
- [11]
- Susannah R. Stern. Messages from teens on the big screen: smoking, drinking, and drug use in teen-centered films. Journal of Health Communication, 10(4):331-346, 2005.
Footnotes:
1Broadcast and Electronic Arts Department, San Francisco State UniversityFile translated from TEX by TTH, version 3.77.
On 24 Apr 2007, 23:49.
