Skip to main content

Papers

Introduction to Java\ Final project proposal:\ Computer-assisted soundtrack annotation

Introduction to Java
Final project proposal:
Computer-assisted soundtrack annotation

Pedro Silva
psilva@sfsu.edu
Broadcast & Electronic Communication Arts Department
San Francisco State University, Mauricio Ardila
mardila@sfsu.edu
Computer Science Department
San Francisco State University

Contents

1  Introduction
2  The team
3  Implementation
    3.1  Project.java
    3.2  Sound.java
    3.3  Classify.java
    3.4  proyectostart.java
    3.5  proyectogui.java
    3.6  proyecto5.java
4  Scope and limitations
5  Future work
A  Appendix

1  Introduction

This project aims at producing a computer assisted sound annotation system. In the broad field of audio segmentation and classification, manual classification and labelling of an audio stream into categories is often required. This is an essential step in providing baseline results for direct comparison to automatic computational classification algorithms, as is always the case with so-called supervised classification systems1. There are many such systems in use, but none provide the specific requirements at hand. Accordingly, this system implements the following:
#1Classify an audio signal into classes [1] PCM-encoded mono sound file of length N Take segment length n and nominal sound labels cat[] as parameters Create hierarchical folder structure according to cat[] Split input file into N/n number of sub-segments seg[] Play segi Take cat as label for segi Move segcati to cati folder Store results in plain text file resultsfile

2  The team

Pedro Silva is focusing on the backend engine. This includes: Project.java class that generates a new project by initializing the necessary instance variables or resumes an existing project by reading from a plain text file; Sound.java class that contains all methods necessary for handling audio files and streams using the java.sound API; and a Classify.java class that performs the actual classification process. Mauricio Ardila is implementing a graphical user interface to the backend, using the java.swing and java.awt APIs, with calls to the necessary backend components.

3  Implementation

3.1  Project.java

An object in this class is initialized by a call from Sound.java to the following constructor:
public Project(String soundfile, int segmentlength, String[] categories)
{
filename=soundfile;
seglen=segmentlength;
cat=categories;
currseg=null;
statefile=filename + "-state.log";
workdir=filename + "-work";
storeState();
makeDirs();
}

If resuming from a previously saved project, an alternative constructor is called:
public Project(String newstatefile)
{
recoverState(newstatefile);
}

In addition to accessors and mutators, the following methods are implemented:
public void setStateVariables(String soundfile, int segmentlength, String categories[], String currentsegment, String workingdirectory);
public void storeState();
public void recoverState(String newstatefile);
public void makeDirs();

setStateVariables sets all configuration variables for the project. storeState saves all configuration variables to a plain text file. recoverState reads from plain text file and calls setStateVariables. makeDirs builds a hierarchical folder structure for storing each classified segment.

3.2  Sound.java

This class extends Project.java and implements javax.sound.sampled.LineListener. An object in this class is initialized by calling the following:
public Sound(String soundfile, int segmentlength, String[] categories)
{
super(soundfile,segmentlength,categories);
File inputsoundfile=new File(super.getFilename());
AudioInputStream ais=null;
try {
    ais=AudioSystem.getAudioInputStream(inputsoundfile);
}
catch(Exception e) {
    System.err.println(e.getMessage());
}
AudioFormat aff=ais.getFormat();
CHANNELS=aff.getChannels();
SAMPLE_RATE=aff.getSampleRate();
BITS_PER_SAMPLE=aff.getSampleSizeInBits();
split();
}

If resuming from a previously saved project, an alternative constructor is called:
public Sound(String newstatefile)
{
super(newstatefile);
File soundfile=new File(super.getFilename());
AudioInputStream ais=null;
try {
    ais=AudioSystem.getAudioInputStream(soundfile);
}
catch(Exception e) {
    System.err.println(e.getMessage());
}
AudioFormat aff=ais.getFormat();
CHANNELS=aff.getChannels();
SAMPLE_RATE=aff.getSampleRate();
BITS_PER_SAMPLE=aff.getSampleSizeInBits();
split();
}

Both constructors called the relevant super constructor from Project.java.
In addition to accessors and mutators, the following methods are implemented:
public double[] read(String filename);
public void write(double[] s, int n);
public void split();
public void play(String filename);
public void update(LineEvent event);

read takes an audio file as input and returns a double array with the amplitude values for each sample in the input audio file. write takes a double array as input and writes a binary file with the correct WAVE file headers. split takes a double array as input and calls write for each batch of samples that constitute a segment. play takes a segment name, creates an AudioInputStream object, gets a DataLine from the system's sound card, and outputs the data stream to it. update implements a LineListener that gets event types and parses these into STOP, START, and CLOSE requests, and acts appropriately.

3.3  Classify.java

This class extends Sound.java. A Classify object is initialized when the classification task starts, by calling the following constructor:
public Classify()
{
super();
setSegments();
setResultsfile();
setCurrseg(getNextSeg());
super.play(currseg);
}

In addition to accessors and mutators, the following methods are implemented:
public void setSegments();
public void setResultsfile();
public String getNextSeg();
public void storeSegclass();

setSegments lists every non-directory file in the working directory, and populates segments[] with it. setResultsfile creates a new {filename}-results.log file. getNextSeg keeps a count of the iteration over all segments, and returns the corresponding segment. storeSegclass appends current segment name and class to results file, and renames/moves current segment file to corresponding class label.

3.4  proyectostart.java

This class is designed to ask the user if he/she wants to start a new project or resume an old one. Also in this window the new user will have the opportunity to read a help file. This class has a three JButton, and its own listener.
When the program is executed the constructor public proyectostart() is called. This constructor calls the function iniltialize. This function will initialize all the instance variables, and required functions.
public proyectostart()
{
super();
this.setVisible(true);
initialize();//function called inside of constructor to create the window
}

Furthermore, if the button start is called, it will create a new instance of the class proyecto gui.
private void start(java.awt.event.MouseEvent evt) 
{                      
setVisible(false);
new proyectogui();
}

Also, if the button resume is called, it will create a new instance of the class proyecto5 which calls the sound and classify class.
private void resume(java.awt.event.MouseEvent evt) 
{                      
new proyecto5(categories2, length1, file);
}

Additionally, if the button help is called, it will open the proposal.pdf file.
private void help(java.awt.event.MouseEvent evt) 
{                       
try {                                      
Runtime.getRuntime().exec(``rundll32 url.dll,FileProtocolHandler `` + ``C:\\Documents and Settings\\Mauricio Ardila\\workspace\\audio\\src\\proposal.pdf'');   
//open the file proposal.pdf
} 
catch (Exception e) //catch any exceptions here
{
System.out.println(``Error'' + e );  //print the error}

3.5  proyectogui.java

This class is designed to ask the user to input a sound file to start a new project. Also in this window the new user will have the opportunity to select the length of the segments form a select list, and the user will be able to input the name of the categories that he/she wants to use.
JTextField filename; // text field to input the file name
JTextField categories; // text field to input folder name
JComboBox lenght; // drop down menu
JTextArea welcome; // text are to display welcome
JScrollPane jScrollPane1;
JTextArea jTextArea2;
JScrollPane jScrollPane2;
JButton Browse;
JButton Done;// button 1
JButton clear;// button 2
JPanel contentPane;

public proyectogui()
{
super();
this.setVisible(true);
initialize();//function called inside of constructor to create the window
}

Furthermore, if the button browse is called, it will call the filechooser method.
Browse.addActionListener(new java.awt.event.ActionListener()
{
public void actionPerformed (java.awt.event.ActionEvent evt)
{
openfilechooser(evt);
}
});

Also, if the button done is called, it will create a new instance of the class proyecto5 which calls the sound and classify class.
private void done(java.awt.event.MouseEvent evt) 
{                      
new proyecto5(categories2, length1, file);
}

3.6  proyecto5.java

This class is designed to ask the user to select the corresponding category for each segment. Also in this window the user will have the opportunity to pause and play each segment. Additionally, the user will have the option to change the volume. Also, the class proyecto5 will display the name of the current segment and the previous. This class has:
JPanel contentPane;
String [] numbers;
JButton[] Buton;
JButton Bpause;
JButton Bplay;
JButton Bsave;
javax.swing.JProgressBar progress;
javax.swing.JSlider svol;
javax.swing.JLabel Lvol;
javax.swing.JLabel now1;
javax.swing.JLabel next1;
javax.swing.JLabel before1;
javax.swing.JTextField now;
javax.swing.JTextField before;
javax.swing.JTextField next;

when the class proyecto5 is called, the default constructor is called which will initialize all the instance variables using the funcition initialize. Also, this constructor will create a new object of the class sound and classify.
public proyecto5(String a[], int lenght, String filename)
{
super();
numbers=a;
initialize();
  
// Add any constructor
this.setVisible(true);
Sound project = new Sound (filename, lenght, a);
Classify task = new Classify();
}

4  Scope and limitations

Currently the most limiting factor resides in the Sound.java implementation. Due to the nature of the java.sound API, there is a restricted number of sound file formats and encodings supported by the JVM. Only RIFF (PCM-encoded WAVE and AIFF) and law (m-law and a-law) formats are supported, unless the Java Media Framework plugin is in the system, in which case there is also support for discrete-cosine transform type of encodings (mpeg1-layer 3 and ogg vorbis). For maximum stability of the system, currently the only supported format is single-channel (mono) WAVE files encoded on a 16 bit word-length at any sample rate. The factor of time played a big roll in this project. In the function proyecto5 each of the buttons and the respective listener was hard code something I am not happy about. I will like to change this part of the code, by making a class call buttonproyecto which will accept an string as parameter.

5  Future work

Given the limitations enumerated above, other possibilities for the implementation of this system are:

A  Appendix

This following script is the original implementation of CASA (Computer Assisted Soundtrack Annotation). It is a Bourne-Again Shell (BASH) script, that requires the mp3splt and mpg321 posix programs. It supports mpeg1-layer3 exlusively. It is included as a reference to the spirit with which the project begun.
#!/bin/bash
# Process: Performs computer-assisted annotation on an mp3 soundtrack (designed for classification of feature-length soundtracks)
# Input: MPEG1-Layer3 file. Prefers constant bit rate (CBR) because of segmentation process
# Output: Classifies audio segments grouped in categorical folder structure
# Requirements: bash, mp3splt, mpg321
# Revision: 0.1 - 2007/07/30
# Author: Pedro Silva
#         psilva@sfsu.edu
#	  http://thecity.sfsu.edu/~psilva
# 	  Broadcast and Electronic Communication Arts Department
# 	  San Francisco State University
# Copyright Notice: Copyright (C) Pedro Silva 2007. Licensed under GNU GPLv3 - GNU General Public License version 3 (http://www.gnu.org/copyleft/gpl.html). This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses


##### FUNCTION DEFINITIONS #####

# Checks user input ([r]eplay, [s]plit segment, [p]ause, classification), and calls relevant function (playback, split, store, classify)
validate() {
    case "$class" in
	r|R) playback;;
	s|S) split;;
	p|P) store;;
	*) 
	    for cat in ${cat[@]} # checks that input category exists and classifies, else calls playback function
	    do
		if [ "$class" == "$cat" ]
		then
		    catExists="yes"
		    break
		fi  
	    done
	    if [ "$catExists" == "yes" ]
	    then
		classify
	    else
		echo $class 'does not match any of' ${cat[@]}
		playback
	    fi;;
    esac
}

# Plays back segment, asks to replay, split again, or classify, reads stdin
playback() {
    echo ''
    echo 'Playing back' $segment 'using mpg321...'
    echo ''
    mpg321 -q $segment		# Calls mpg321
    echo 'Possible classes: ' ${cat[@]}
    echo -n 'Classification ([r]eplay to listen to segment again, [s]plit to split file in half, [p]ause to store current state)? '
    #9menu -label CASA -shell bash 'replay:ls' 'split:class=s' 'pause:class=p' exit # Testing 9menu's X menus. How to pass variable assignment as a command in form button:command?
    read class
    validate
}	

# Splits current segment in half, if user thinks original segment is not statically categorizable
split() {
    mkdir -p tmp
    let splitseglen=$seglen/2
    mp3splt -nt 0.$splitseglen -d tmp $segment # Calls mp3splt
    cd tmp
    for segment in *mp3
      do
      cd ..
      playback
    done
}

# Saves current execution state (file name, segment name, segment length, possible categories) into state.log
store() {
    echo "Storing current script execution state in state.log..."
    echo $ipfile > state.log
    echo $segment >> state.log
    echo $seglen >> state.log
    echo ${cat[*]} >> state.log
    echo "Done! Resume script by calling it with 'resume' and '$ipfile' parameters"
    echo ''
    exit
}

# Classifies current segment into one of the available categories, by moving the current segment into its corresponding categorical folder, and appending its name and category into a row in classification.log
classify() {
    echo ''
    echo '=========================================='
    echo $segment 'has category' $class
    echo '=========================================='
    mv $segment $class/$i-${segment}
    echo $segment$'\t'$class >> classification.log 
    echo ''
}

##### PRE-PROCESS #####
process() {
    catnum=${#cat[@]}				 # Counts number of categories
    mkdir -p ${ipfile}-work				 # Creates working directory
    mp3splt -nt 0.$seglen -d ${ipfile}-work $ipfile	 # Splits given input file in segments with given length
    cd ${ipfile}-work			         # Move into working directory
    index=0
    while [ "$index" -lt "$catnum" ]		 # Creates directories for each directory
      do
      mkdir -p ${cat[$index]}
      let "index = $index + 1"
    done
}

##### RUN #####
run() {
    for segment in *mp3				 # Loops through every mp3 file in working directory
      do
      playback
    done
    echo ''			# Wrap up
    echo -n 'Done!'
    echo "Listing results..."
    rm -r ${ipfile}-work/tmp
    echo `ls ${ipfile}-work/*` > results.log # Iterate over categorical directories and list their contents to results.log
    echo ''
    exit
}


##### MAIN PROGRAM #####
# Check for arguments
if [ "$1" == "resume" ]
    then
    cat << EOF

    CASA - Computer assisted soundtrack annotation.
    Copyright (C) Pedro Silva 2007.
    This program comes with ABSOLUTELY NO WARRANTY.
    This is free software, and you are welcome to redistribute it
    under certain conditions; See the copyright notice for details.

EOF
    echo 'Resuming from a previous run...'
    ipfile=$2
    segment=`head -2 ${ipfile}-work/state.log | tail -1`
    seglen=`head -3 ${ipfile}-work/state.log | tail -1`
    cat=(`head -4 ${ipfile}-work/state.log | tail -1`)
    cd ${ipfile}-work			         # Move into working directory
    run
else
    # No arguments
    if [ $# -eq 0 ]
    then
    cat << EOF
    
    CASA - Computer assisted soundtrack annotation.
    Copyright (C) Pedro Silva 2007.
    This program comes with ABSOLUTELY NO WARRANTY.
    This is free software, and you are welcome to redistribute it
    under certain conditions; See the copyright notice for details.

EOF
#####  TEST PARAMETERS. THESE CAN BE CHANGED TO SUIT NEEDS (RUN CASA WITHOUT ARGUMENTS IN THAT CASE) #####
        ipfile=toy_story.mp3
	seglen=6
	cat=(sp mus env sil sp-mus sp-env mus-env fuz)
	echo "Running with following default parameters: "
	echo "ipfile = " $ipfile
	echo "seglen (secs) =" $seglen
	echo "cat =" ${cat[@]}
	echo -n "Parameters ok? ([y] or [n]) "
	read runDefaultParameters
	if [ $runDefaultParameters == "n" ]
	then
	    echo -n "Enter input file: "
	    read ipfile
	    echo -n "Enter segment length in seconds: "
	    read seglen
	    echo -n "Enter categories separated by spaces: "
	    declare -a cat
	    read -a cat
	else
	    if [ $runDefaultParameters == "y" ]
	    then
		echo ''
	    else
		echo $runDefaultParameters 'is not a valid choice. Exiting...'
		echo ''
		exit
	    fi
	fi
	process
	run
    else
##### INPUT PROMPT #####
	if [ $# -eq 2 ]
	then
	cat << EOF

    CASA - Computer assisted soundtrack annotation.
    Copyright (C) Pedro Silva 2007.
    This program comes with ABSOLUTELY NO WARRANTY.
    This is free software, and you are welcome to redistribute it
    under certain conditions; See the copyright notice for details.

EOF
	    ipfile=$1
	    seglen=$2
	    echo -n 'Enter category names (separated by a space): '
	    declare -a cat
	    read -a cat				 # Reads categories until return is pressed
	    echo ''
	    process
	    run
	fi
    fi
fi


Footnotes:

1supervised here means that each input instance used in training a learning model is categorically labeled.


File translated from TEX by TTH, version 3.79.
On 11 Dec 2007, 13:57.

Accessibility
  • Creative Commons License
  • Valid XHTML 1.0 Strict
  • Valid CSS!
  • Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0

This page employs valid XHTML 1.0 Strict and CSS for cross-browser compatibility.

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.