This user study evaluates the effectiveness of the VisAug system in enhancing navigation and engagement with speech-rich video content. We hypothesize that VisAug will improve user efficiency in video content navigation and increase content comprehension through its visual enhancements.
The study employs a within-subjects design where each participant experiences two conditions: using a standard video player (control) and using the VisAug system with visual enhancements (treatment). The experimental materials comprise diverse speech-rich video clips selected to evaluate VisAug system performance comprehensively.
For task design, participants complete information location tasks to assess navigation efficiency and content engagement. The tasks involve locating video segments discussing AI social concerns, identifying content about government AI policy initiatives, and finding discussions about AI's impact on education. Specifically, participants use the VisAug system to quickly locate content about public concerns regarding AI's societal implementation, government responses to AI development challenges, and AI's influence on educational models and learning methods.
All tasks are meticulously designed to ensure controlled data analysis while maintaining ecological validity, providing an environment that effectively tests system functionality under realistic conditions. This approach allows for robust evaluation of the VisAug system's ability to enhance video content navigation and user engagement.
Familiarization Phase: Introduce participants to study procedures and allow time for environmental acclimation. Baseline Phase: Participants view video clips using standard player without visual aids. Researchers record task completion times and issues encountered.
Intervention Phase: Same participants watch identical clips with VisAug system's visual and text enhancement features enabled (enhancement coefficient: 5). Performance metrics continue to be tracked.
Personalization Phase: Participants adjust visual enhancement settings according to preferences and view additional videos. Post-viewing questionnaire collects subjective ratings on task load, sense of control, and system intrusiveness. Open-ended discussion allows participants to share insights and experiences.
Based on statistical power analysis, we determined the required sample size and recruited 2 doctoral, 7 master's, and 11 undergraduate participants for an effective user study. The participants had diverse backgrounds, including computer science, artificial intelligence, interaction design, journalism, law, and experienced product managers from internet companies.
Through preliminary interviews, we found that the participants had high information processing capabilities and critical thinking skills, which provided valuable feedback for our research. Additionally, to ensure sample diversity, we paid special attention to balancing gender, age, and technical proficiency during the recruitment process. There were 12 female and 8 male participants, all aged between 20-30 years old, with good English listening and reading abilities.
The participants (labeled P1 to P20) were asked to use the system to complete the aforementioned positioning tasks, and then undergo a structured usability interview to deeply understand their user experience and improvement suggestions. Each participant completed the entire research process in approximately 1.5-2 hours, and received a $15 reward as compensation at the end of the study.
Task 1 evaluated participants' ability to locate video segments discussing AI's educational impact, with target time windows at 07:25 and 19:22-21:15. Prior to using VisAug, 14 participants achieved exact matches and 3 achieved partial matches, yielding a weighted accuracy of 77.5% (weights: full match=1.0, partial=0.5). After implementing VisAug, accuracy improved significantly with 18 exact matches and 1 partial match, resulting in 92.5% weighted accuracy. This 15% improvement suggests VisAug effectively enhanced users' content navigation and location capabilities.
Task 2 evaluated locating content about government AI policies, with target windows at 13:43-14:30 and 18:27-19:20. Without VisAug, participants achieved 15 exact matches, 3 partial matches, and 2 non-matches, yielding 82.5% weighted accuracy (weights: full=1.0, partial=0.5). With VisAug, results improved to 16 exact matches, 2 partial matches, and 1 non-match, reaching 85% accuracy. This modest 2.5% improvement, smaller than Task 1's gain, may reflect the content's complexity or lower topic engagement among participants.
Task 3 assessed participants' ability to locate discussions of AI developers' societal concerns within a short segment 2:35-3:53. Without VisAug, participants achieved 3 exact matches, 4 partial matches, and 13 non-matches (25% weighted accuracy). Post-VisAug results showed modest improvement with 5 exact matches, 2 partial matches, and 13 non-matches (30% accuracy). This limited 5% improvement, smaller than previous tasks, likely reflects challenges in navigating brief, concentrated dialogue segments.
In summary, VisAug effectively improved user efficiency and accuracy in processing speech-rich video content, as demonstrated by accuracy improvements across tasks (Task 1: +15%, Task 2: +2.5%, Task 3: +5%). However, performance gains were less pronounced for brief dialogue segments, suggesting potential areas for system optimization.
Please rate your agreement with the following statements about VisAug system usage (5-point Likert scale):
How mentally demanding was using the VisAug system?
How physically demanding was using the VisAug system?
How frustrating was using the VisAug system?
How much time pressure did you feel using the VisAug system?
How much effort did using VisAug require?
Purpose: Evaluate time and energy required for system adaptation
How efficient and effective was VisAug in task completion?
Purpose: Assess speed and quality improvements in task completion
System Control: Level of control over visual enhancements?
Viewing Experience: Did VisAug interrupt video viewing?
Rate the following (1-5 scale):
Evaluate:
Rate helpfulness of:
Rate satisfaction with:
Task Difficulty and Time Investment - Initial assessment of whether the VisAug system reduced task difficulty and saved time.
Understanding and Engagement with Visual Enhancements
Task Performance: Users found moderate difficulty and time investment with VisAug system, showing consistent ratings.
Content Engagement: Visual enhancements improved information comprehension and engagement.
Analyzing visual enhancement correlations for cognitive and interactive benefits:
| Variables | Engagement | Clarification | Understanding |
|---|---|---|---|
| Engagement | 1 | r = 0.4387 p = 0.0530 |
r = 0.6331 p = 0.0027 |
| Clarification | 1 | r = 0.7808 p = 0.0000 |
|
| Understanding | 1 |
Correlation analysis revealed significant relationships between visual enhancement metrics. Strong positive correlation between understanding new information and clarifying complex concepts (r=0.7808, p<0.0001) indicates that improvements in understanding closely align with concept clarification. Understanding new information also showed moderate positive correlation with engagement (r=0.6331, p=0.0027), suggesting that better understanding leads to increased engagement. The relationship between concept clarification and engagement showed moderate correlation (r=0.4387, p=0.0530) but wasn't statistically significant. These results demonstrate that visual enhancements most effectively connect understanding with concept clarification, while their impact on engagement varies.
The interview data analysis involved coding open-ended responses to identify recurring themes and patterns. This qualitative analysis complemented the quantitative findings, providing comprehensive insight into user experiences and valuable direction for VisAug system improvements.
The subtitle panel analysis revealed strong user appreciation for comprehension support, particularly for non-native content. Users (P2&3, P10) highlighted precise dialogue location and keyword highlighting benefits. However, several usability issues emerged: automatic scrolling constraints frustrated users (P11&12), while some requested customizable scroll controls (P15). Users suggested integrating mouseover image enhancements (P17) and improving subtitle marker visibility (P19). Interface preferences varied, with requests for vertical timeline layouts (P17) and customizable display options (P15). Overall, users confirmed subtitles' effectiveness for content comprehension, with P19 noting their particular value for quick content overview and theme identification.
Keyword highlighting analysis revealed mixed user feedback. While generally helpful for content navigation, users identified several limitations. P5 praised visualization of abstract terms, but P7 and P11 noted issues with term selection relevance. Users found common terms like "technology" and "GPT" (P20) highlighted too frequently while missing contextually important phrases like "potential damage." Suggestions included user-customizable keywords (P5), improved algorithmic selection, and timestamp-based context summaries (P14). P12 valued highlighting for identifying key content and emotional tone, though P13 noted occasional inaccuracies. Overall, users found the feature valuable but suggested more precise term selection focusing on topic-specific rather than general terminology.
Users reported navigation challenges, particularly with non-native content, highlighting the need for improved content organization. P2&3 and P9 requested topic-based segmentation for easier navigation, while P7 suggested adding annotation capabilities for marking key points. Layout concerns emerged, with P11 noting the split-screen design dispersed attention and suggesting a centered video layout with peripheral subtitle and image placement. Most users relied heavily on subtitles and keywords for navigation, with P12 requesting enhanced timestamp functionality for better content location. Users consistently expressed desire for improved context viewing capabilities and more efficient navigation tools.
Image enhancement received mixed feedback, with users finding limited utility for concept comprehension. P2&3 and P5 noted image content was often unclear and added cognitive load, particularly for abstract concepts. Users suggested several improvements: event-based visualization (P9), data-specific charts (P10, P15), and context-appropriate imagery (P11, P12). P13 highlighted tone mismatches, noting cartoonish images undermined serious topics. P18 valued relevant academic concept visualization, while P20 requested summary functionality with expandable details. Overall, users sought more contextually relevant and purpose-driven visual enhancements that align with content tone and complexity.