VisAug System User Study

1. Research Objectives and Hypotheses

This user study evaluates the effectiveness of the VisAug system in enhancing navigation and engagement with speech-rich video content. We hypothesize that VisAug will improve user efficiency in video content navigation and increase content comprehension through its visual enhancements.

2. Study Design

The study employs a within-subjects design where each participant experiences two conditions: using a standard video player (control) and using the VisAug system with visual enhancements (treatment). The experimental materials comprise diverse speech-rich video clips selected to evaluate VisAug system performance comprehensively.

Task Design:

For task design, participants complete information location tasks to assess navigation efficiency and content engagement. The tasks involve locating video segments discussing AI social concerns, identifying content about government AI policy initiatives, and finding discussions about AI's impact on education. Specifically, participants use the VisAug system to quickly locate content about public concerns regarding AI's societal implementation, government responses to AI development challenges, and AI's influence on educational models and learning methods.

All tasks are meticulously designed to ensure controlled data analysis while maintaining ecological validity, providing an environment that effectively tests system functionality under realistic conditions. This approach allows for robust evaluation of the VisAug system's ability to enhance video content navigation and user engagement.

Experimental Procedure:

Familiarization Phase: Introduce participants to study procedures and allow time for environmental acclimation. Baseline Phase: Participants view video clips using standard player without visual aids. Researchers record task completion times and issues encountered.

Intervention Phase: Same participants watch identical clips with VisAug system's visual and text enhancement features enabled (enhancement coefficient: 5). Performance metrics continue to be tracked.

Personalization Phase: Participants adjust visual enhancement settings according to preferences and view additional videos. Post-viewing questionnaire collects subjective ratings on task load, sense of control, and system intrusiveness. Open-ended discussion allows participants to share insights and experiences.

Participant Recruitment:

Based on statistical power analysis, we determined the required sample size and recruited 2 doctoral, 7 master's, and 11 undergraduate participants for an effective user study. The participants had diverse backgrounds, including computer science, artificial intelligence, interaction design, journalism, law, and experienced product managers from internet companies.

Through preliminary interviews, we found that the participants had high information processing capabilities and critical thinking skills, which provided valuable feedback for our research. Additionally, to ensure sample diversity, we paid special attention to balancing gender, age, and technical proficiency during the recruitment process. There were 12 female and 8 male participants, all aged between 20-30 years old, with good English listening and reading abilities.

The participants (labeled P1 to P20) were asked to use the system to complete the aforementioned positioning tasks, and then undergo a structured usability interview to deeply understand their user experience and improvement suggestions. Each participant completed the entire research process in approximately 1.5-2 hours, and received a $15 reward as compensation at the end of the study.

3. Accuracy Analysis

Task 1 evaluated participants' ability to locate video segments discussing AI's educational impact, with target time windows at 07:25 and 19:22-21:15. Prior to using VisAug, 14 participants achieved exact matches and 3 achieved partial matches, yielding a weighted accuracy of 77.5% (weights: full match=1.0, partial=0.5). After implementing VisAug, accuracy improved significantly with 18 exact matches and 1 partial match, resulting in 92.5% weighted accuracy. This 15% improvement suggests VisAug effectively enhanced users' content navigation and location capabilities.

Task 2 evaluated locating content about government AI policies, with target windows at 13:43-14:30 and 18:27-19:20. Without VisAug, participants achieved 15 exact matches, 3 partial matches, and 2 non-matches, yielding 82.5% weighted accuracy (weights: full=1.0, partial=0.5). With VisAug, results improved to 16 exact matches, 2 partial matches, and 1 non-match, reaching 85% accuracy. This modest 2.5% improvement, smaller than Task 1's gain, may reflect the content's complexity or lower topic engagement among participants.

Task 3 assessed participants' ability to locate discussions of AI developers' societal concerns within a short segment 2:35-3:53. Without VisAug, participants achieved 3 exact matches, 4 partial matches, and 13 non-matches (25% weighted accuracy). Post-VisAug results showed modest improvement with 5 exact matches, 2 partial matches, and 13 non-matches (30% accuracy). This limited 5% improvement, smaller than previous tasks, likely reflects challenges in navigating brief, concentrated dialogue segments.

In summary, VisAug effectively improved user efficiency and accuracy in processing speech-rich video content, as demonstrated by accuracy improvements across tasks (Task 1: +15%, Task 2: +2.5%, Task 3: +5%). However, performance gains were less pronounced for brief dialogue segments, suggesting potential areas for system optimization.

4. User Study Questionnaire

Task Load Assessment:

Please rate your agreement with the following statements about VisAug system usage (5-point Likert scale):

Mental Demand

How mentally demanding was using the VisAug system?

1
Very Low
2
3
4
5
Very High
Physical Demand

How physically demanding was using the VisAug system?

1
Very Low
2
3
4
5
Very High
Frustration Level

How frustrating was using the VisAug system?

1
Very Low
2
3
4
5
Very High
Temporal Demand

How much time pressure did you feel using the VisAug system?

1
Very Low
2
3
4
5
Very High

Effort & Effectiveness Assessment:

Effort Level

How much effort did using VisAug require?

1
Very Low
2
3
4
5
Very High

Purpose: Evaluate time and energy required for system adaptation

Efficiency and Effectiveness

How efficient and effective was VisAug in task completion?

1
Very Low
2
3
4
5
Very High

Purpose: Assess speed and quality improvements in task completion

System Control & Viewing Experience:

System Control: Level of control over visual enhancements?

Viewing Experience: Did VisAug interrupt video viewing?

Enhancement Effectiveness:

Rate the following (1-5 scale):

  • Understanding new information
  • Clarifying complex concepts
  • Engagement level
  • Conversation stimulation

Enhancement Quality:

Evaluate:

  • Content relevance
  • Quality and clarity
  • Visual appeal

Feature Usage:

Rate helpfulness of:

  • Transcript feature
  • Timeline feature
  • Image storyboard

Overall Satisfaction (ASQ Scale):

Rate satisfaction with:

  • Task difficulty
  • Time spent
  • Support information

5. Statistical Summary of the VisAug Questionnaire Results

Descriptive Statistics Analysis:

Task Difficulty and Time Investment - Initial assessment of whether the VisAug system reduced task difficulty and saved time.

Task Difficulty
Time Investment

Descriptive Statistics:

Understanding and Engagement with Visual Enhancements

Understanding New Information:
Engagement:
Summary:

Task Performance: Users found moderate difficulty and time investment with VisAug system, showing consistent ratings.

Content Engagement: Visual enhancements improved information comprehension and engagement.

Correlation Analysis:

Analyzing visual enhancement correlations for cognitive and interactive benefits:

Variables Engagement Clarification Understanding
Engagement 1 r = 0.4387
p = 0.0530
r = 0.6331
p = 0.0027
Clarification 1 r = 0.7808
p = 0.0000
Understanding 1

Correlation analysis revealed significant relationships between visual enhancement metrics. Strong positive correlation between understanding new information and clarifying complex concepts (r=0.7808, p<0.0001) indicates that improvements in understanding closely align with concept clarification. Understanding new information also showed moderate positive correlation with engagement (r=0.6331, p=0.0027), suggesting that better understanding leads to increased engagement. The relationship between concept clarification and engagement showed moderate correlation (r=0.4387, p=0.0530) but wasn't statistically significant. These results demonstrate that visual enhancements most effectively connect understanding with concept clarification, while their impact on engagement varies.

6. Figures

7. Interview Data Analysis

The interview data analysis involved coding open-ended responses to identify recurring themes and patterns. This qualitative analysis complemented the quantitative findings, providing comprehensive insight into user experiences and valuable direction for VisAug system improvements.

Subtitle Panel Usability Analysis:

The subtitle panel analysis revealed strong user appreciation for comprehension support, particularly for non-native content. Users (P2&3, P10) highlighted precise dialogue location and keyword highlighting benefits. However, several usability issues emerged: automatic scrolling constraints frustrated users (P11&12), while some requested customizable scroll controls (P15). Users suggested integrating mouseover image enhancements (P17) and improving subtitle marker visibility (P19). Interface preferences varied, with requests for vertical timeline layouts (P17) and customizable display options (P15). Overall, users confirmed subtitles' effectiveness for content comprehension, with P19 noting their particular value for quick content overview and theme identification.

Analysis of Keyword Highlighting Effectiveness:

Keyword highlighting analysis revealed mixed user feedback. While generally helpful for content navigation, users identified several limitations. P5 praised visualization of abstract terms, but P7 and P11 noted issues with term selection relevance. Users found common terms like "technology" and "GPT" (P20) highlighted too frequently while missing contextually important phrases like "potential damage." Suggestions included user-customizable keywords (P5), improved algorithmic selection, and timestamp-based context summaries (P14). P12 valued highlighting for identifying key content and emotional tone, though P13 noted occasional inaccuracies. Overall, users found the feature valuable but suggested more precise term selection focusing on topic-specific rather than general terminology.

Video Navigation Challenges:

Users reported navigation challenges, particularly with non-native content, highlighting the need for improved content organization. P2&3 and P9 requested topic-based segmentation for easier navigation, while P7 suggested adding annotation capabilities for marking key points. Layout concerns emerged, with P11 noting the split-screen design dispersed attention and suggesting a centered video layout with peripheral subtitle and image placement. Most users relied heavily on subtitles and keywords for navigation, with P12 requesting enhanced timestamp functionality for better content location. Users consistently expressed desire for improved context viewing capabilities and more efficient navigation tools.

Image Enhancement Analysis:

Image enhancement received mixed feedback, with users finding limited utility for concept comprehension. P2&3 and P5 noted image content was often unclear and added cognitive load, particularly for abstract concepts. Users suggested several improvements: event-based visualization (P9), data-specific charts (P10, P15), and context-appropriate imagery (P11, P12). P13 highlighted tone mismatches, noting cartoonish images undermined serious topics. P18 valued relevant academic concept visualization, while P20 requested summary functionality with expandable details. Overall, users sought more contextually relevant and purpose-driven visual enhancements that align with content tone and complexity.