This book constitutes the thoroughly refereed joint post-workshop proceedings of two co-located events: the Second International Workshop on Classification of Events, Activities and Relationships, CLEAR 2007, and the 5th Rich Transcription 2007 Meeting Recognition evaluation, RT 2007, held in succession in Baltimore, MD, USA, in May 2007.
The workshops had complementary evaluation efforts; CLEAR for the evaluation of human activities, events, and relationships in multiple multimodal data domains; and RT for the evaluation of speech transcription-related technologies from meeting room audio collections. The 35 revised full papers presented from CLEAR 2007 cover 3D person tracking, 2D face detection and tracking, person and vehicle tracking on surveillance data, vehicle and person tracking aerial videos, person identification, head pose estimation, and acoustic event detection. The 15 revised full papers presented from RT 2007 are organized in topical sections on speech-to-text, and speaker diarization.