A Dynamic Protocol (DAP) Design for Signal Clarity in Generative AI Education: An HCI-Ethical Approach

Master’s Thesis Proposal
From: YIN Renlong
Date: February 15, 2026

Title: A Dynamic Protocol (DAP) Design for Signal Clarity in Generative AI Education: An HCI-Ethical Approach

1. Abstract

The integration of Generative AI into Computer Science and Digital Humanities education has created a systemic “Socio-Technical Evaluation Gap.” While institutional policies increasingly permit AI use, the assessment interface is so underspecified that a semantic and epistemological misalignment arises. This creates a collision between traditional “Process Certainty” (evaluating manual labor) and emerging “Outcome Certainty” (evaluating architectural verification). Consequently, a student’s transparent disclosure of hybrid workflows (an Agentic Runtime) can easily be misinterpreted by assessors as a lack of sufficient engagement or a breach of academic integrity.

This creates a mutual failure of understanding, leading to the “Transparency Paradox.” This thesis proposes a Dynamic Assessment Protocol (DAP), a multi-layered framework designed to architect clarity within this system. The core is the Weighted Authorship Matrix (WAM), an instrument (or a State Machine for grading) creating a binding alignment between permitted tooling, assessment weights, and required evidence (similar to a relational database schema that mapping Entity A (Tooling) to Entity B (Evidence) to Entity C (Weight)). By replacing ambiguous natural language guidelines with a structured system, the DAP aims to increase reliability, ensure equity-aware design, and support the transition to an era where new proxies for student mastery are required.

2. Theoretical Framework & Ethical Design

This thesis anchors the DAP in established frameworks to move beyond a simple technical fix, positing that software engineering pedagogy is undergoing a ‘Clinical Shift.’ Traditional assessment frameworks, rooted in a deterministic, ‘Structural Engineering’ paradigm that values manual construction, are increasingly misaligned with the emerging professional practice of AI-assisted development. This new practice more closely resembles a ‘Clinical’ model, where practitioners diagnose, manage, and verify complex, probabilistic systems they did not create from first principles.

Consequently, this proposal introduces a new pedagogical goal: the cultivation of the ‘System Auditor.’ Unlike an ‘AI Clinician,’ who may only be equipped to resolve superficial syntax errors (analogous to treating symptoms), the System Auditor possesses a deep, fundamental understanding of principles. This enables them to diagnose and remediate latent logical and architectural flaws (analogous to diagnosing the underlying disease) in AI-generated code.

Practitioner analyses of AI-native development suggest that the introduction of generative AI shifts both (a) the economics of implementation and (b) the basis on which trust in software is established. Wang (2025–2026), writing from an industrial practitioner perspective, argues that when the marginal cost of producing code and auxiliary tooling approaches zero, established “best practices” that treated code as a scarce, durable asset (e.g., DRY and manual process control) become less central. Instead, effective work increasingly emphasizes rapid orchestration, high-resolution observability, and explicit acceptance criteria (“outcome certainty”) supported by agentic runtimes and verification loops. In parallel, Wang’s observations on AI education highlight that learner attrition often occurs at incidental “friction” points (e.g., configuration, accounts, deployment) rather than core reasoning tasks, motivating infrastructure-oriented teaching designs that remove non-essential barriers while increasing opportunities for authentic practice and delivery. While these practitioner claims are not presented as controlled educational studies, they provide a coherent account of changing professional expectations that this thesis uses as a complementary lens to guide the DAP’s design choices—particularly the weighting of verification, documentation, and contextual justification over manual syntactic production.

As an HCI-informed socio-technical artifact, the DAP/WAM also functions as a boundary object (Star & Griesemer, 1989) — a shared representational form that coordinates multiple stakeholders (students, instructors, and academic integrity offices) who may hold different interpretations of permissible AI use. By standardizing the disclosure vocabulary and expected evidence at the interface level, the WAM enables consistent adjudication without requiring full philosophical agreement about AI, authorship, or “good” process.

The DAP is designed to assess these competencies through the following frameworks:

  • Procedural Justice (Tyler): The DAP is designed to ensure the assessment process is fair, consistent, and transparent, shifting from a model of high discretion to one of fair procedure.

  • Agentic Software Engineering & Orchestration: The thesis is grounded in the emerging paradigm of ‘Agentic Software Engineering,’ which redefines the human role from a direct implementer to an ‘Agent Coach’ or ‘Orchestrator’ (Hassan et al. 2025; Elarde et al. 2024). The DAP provides a mechanism to assess a student’s ability to define the architectural ‘Contract Layer’ and verify outcomes, rather than focusing solely on their manual production of syntax in the ‘Runtime Layer’ (Wang 2026).

  • The Separation of Friction from Competency: Drawing on Wang’s (2026) distinction between “False Difficulties” (e.g., configuration, syntax, environment setup) and “Core Competencies” (e.g., architectural design, handling ambiguity), this thesis argues that AI allows education to bypass the former to focus on the latter. Traditional assessment often conflates “overcoming friction” with “learning.” The DAP separates these. It allows detailed implementation to be treated as “disposable friction” (delegated to AI) so that assessment can focus on the student’s ability to act as a “Master Builder”—one who orchestrates systems to solve high-level problems rather than merely executing low-level code.

  • Value Sensitive Design (Friedman): As an HCI-informed protocol, the DAP is explicitly designed to embody key values: Clarity (reducing ambiguity), Equity (reducing the disparate impact of unwritten rules), and Privacy (preferring interaction over surveillance).

  • Epistemic Injustice (Fricker): The current system creates “Hermeneutical Injustice,” where students lack the shared concepts to make their hybrid workflow legible. The WAM functions as a “hermeneutic repair,” providing the necessary vocabulary for accurate disclosure.
  • Epistemic Responsibility: The protocol enables a rebalancing of assessment focus. While foundational skills (“manual labor” or “bricklaying”) remain valuable and can be weighted accordingly, the DAP provides a formal mechanism to also assess higher-order skills like verification and architectural integrity (“epistemic responsibility”) or the defining intent (Endres et al. 2024) , which are critical in an AI-augmented workflow.

3. Problem Statement

3.1 The Socio-Technical Evaluation Gap
Current university GenAI policies typically rely on “Natural Language Disclosure.” This underspecified approach creates a systemic gap that places both assessors and students in an untenable position, characterized by critical flaws:

  • The Collapse of the “Runtime Layer”: Modern GenAI tools have evolved from passive assistants into “Agentic Runtimes” (Hassan et al. 2025; Wang 2026) that commoditize the execution of code. The central problem this research addresses is the decoupling of implementation from authorial intent. In the absence of ‘Process Certainty’ (i.e., the verifiable struggle of manual coding), a new proxy for mastery is required. This thesis argues that the locus of essential cognitive effort is rebalancing from implementation (syntactic fluency) toward verification (the rigorous auditing of AI-generated artifacts). Current policies fail because they lack instruments to measure this new form of intellectual labor.

  • Interpretive Ambiguity: Subjective terms like “substantive contribution” lack operational definitions. This forces assessors to make high-stakes judgment calls without clear guidelines, leading to significant variance in how disclosures are interpreted across different courses.

  • Semantic Misalignment (The Taxonomical Gap): Modern GenAI tools simultaneously handle syntax, logic, and implementation, causing the traditional academic distinction between “correction” (assistance) and “creation” (authorship) to collapse. Consequently, neither students nor faculty possess a shared vocabulary to accurately define these new hybrid workflows, creating a high risk of misclassification where structural ambiguity is interpreted as academic misconduct.
  • Epistemic Opacity: The current submission interface does not capture the metadata required to “see” the cognitive process. Without a shared framework, assessors face an epistemological blind spot: they cannot reliably distinguish between “Passive Generation” (uncritical adoption) and “Iterative Orchestration” (critical refinement). The latter is a formal process in modern AI systems, often enabled by mechanisms like a ‘Reflexion loop’ that allows for iterative self-correction under human guidance (Shinn et al. 2023). Without a framework to capture this process, assessors face an epistemological blind spot.
  • Low Inter-Rater Reliability: Without a standardized instrument, assessment outcomes depend heavily on an individual assessor’s personal philosophy regarding AI rather than pre-defined institutional criteria, undermining the consistency of the grading system.
  • High Interpretation Latitude: Underspecified policies create a vacuum of reliability, forcing assessors to rely on intuition (High Discretion) rather than defined criteria (Low Reliability), creating vulnerability for the institution regarding grading fairness.
  • Asymmetry of Risk & Labor: This ambiguity creates a dual burden. For students, particularly non-native speakers, it creates the risk of high-stakes adverse outcomes for misinterpreting unwritten rules. For faculty, it creates significant emotional and administrative labor, transforming educators into “forensic investigators” tasked with policing grey zones rather than teaching.

3.2 The “Underspecified Interface” (HCI Perspective)
Current submission protocols function as an “Underspecified Interface.” They lack the metadata fields necessary to distinguish “uncritical adoption of output” from “Transparent Provenance.”

  • The Signal Failure: The current submission interface suffers from a lack of “Semiotic Clarity.” It fails to transmit the student’s intent (provenance) to the receiver (assessor), leading to systemic errors in judgment. For example, unedited AI comments are often interpreted by assessors as a “Competence Signal” (lack of effort) rather than a “Transparency Signal” (proof of provenance). This aligns with research demonstrating that static analysis of AI-generated code can be unreliable and prone to error, suggesting a need for more robust evaluation methods (Peng et al. 2025).
  • Unbounded Inference Risk: The system does not bound the assessor’s discretion with explicit data fields, allowing transparent disclosure to be misread as negligence.

3.3 The Agentic Threat & The Limits of Static Evidence
This thesis anticipates the imminent rise of Agentic AI (autonomous coding agents). As these agents become capable of simulating human behaviors—generating fake Git commit histories and mimicking debugging cycles—we face a critical realization:

This trend suggests that the evidentiary value of static artifacts is diminishing, creating a significant risk for assessment integrity in the near future.
Therefore, the DAP treats logs as a “bridge” (Low-Strength Evidence) while moving toward Contextual Verification (High-Strength Evidence) for ambiguous cases. This shift is consistent with calls in the literature for ‘outcome-driven evaluation’ that prioritizes the verifiable behavior of the final product over the analysis of its static artifacts (Peng et al. 2025).

4. The Proposed Solution: The Dynamic Assessment Protocol (DAP)

A comprehensive system for clarity, fairness, and pedagogical validity. This proposal introduces a new pedagogical goal: the cultivation of the ‘System Auditor.’ Unlike an ‘AI Clinician,’ who may only be equipped to resolve superficial syntax errors (analogous to treating symptoms), the System Auditor possesses a deep, fundamental understanding of principles. This enables them to diagnose and remediate latent logical and architectural flaws (analogous to diagnosing the underlying disease) in AI-generated code. The DAP is structured to explicitly assess the competencies of a System Auditor. These competencies include, but are not limited to, the student’s ability to:

  • Explain the architectural choices and data model invariants.
  • Justify the selection and implementation of core logic and potential edge cases.
  • Design and interpret verification tests.
  • Efficiently localize and repair seeded bugs within an AI-generated codebase.

Layer 1: The “Tiered Use Framework” (Policy Alignment)

Note on Institutional Alignment: This framework aligns with the specific distinction in institutional guidelines (e.g., KU Leuven 2025-2026) between “Verbatim Copying” and “Functional Generation.” While regulations often limit the verbatim copying of text to an “absolute minimum,” policies regarding code generation often authorize broader use provided transparency is maintained. The Tiered system operationalizes this distinction, ensuring that “absolute minimum” rules are applied to Tier 1 (Foundational) tasks, while Tier 3 (Architectural) tasks leverage the broader allowance to teach System Auditing skills.

Preserves academic freedom by allowing instructors to select a pedagogical approach declared in the ECTS fiche.

  • Tier 1: Foundational: AI prohibited. Assessment focuses on manual proficiency.
  • Tier 2: Hybrid: AI as Junior Assistant. Assessment focuses on logic and integration.
  • Tier 3: Architectural: AI as Implementer. Assessment focuses on system design and verification, aligning with the ‘Software Orchestration’ paradigm where the student acts as the ‘conductor’ of the AI ‘orchestra’ (Elarde et al. 2024).

Layer 2: The “Weighted Authorship Matrix” (The Core Instrument)

The WAM serves as an instrument of Constructive Alignment. It creates a pre-commitment to criteria, ensuring that grading is anchored to learning outcomes rather than retrospective judgment. It explicitly uncouples “Manual Typing” from “Authorship,” allowing instructors to assign high weight to “Verification/Contract” while assigning low weight to “Runtime/Syntax,” provided the latter is disclosed.

Example WAM Configuration (Tier 3: Architectural Persona):

Component of Work Permitted Tooling Disclosure Granularity Evidence Strength Required Assessment Weight
System Architecture Student Only Diagram-Level E3 (Design rationale) 40% (Critical)
Core Logic Hybrid / Co-Pilot Function-Level E2 (Verification log) 30% (High)
Syntax/Boilerplate AI Allowed File-Level E1 (Execution check) 10% (Low)
Documentation AI Allowed Assignment-Level Provenance Label (Y/N) 5% (Low)
Debugging Hybrid Function-Level E2 (Unit tests) 15% (High)

The “Definition of Mastery” Selector:
To ensure assessment criteria align with the institutional goal of “encouraging responsible use,” and to prevent interpretative ambiguity (where “Architecture” might be interpreted to imply manual coding), the DAP requires instructors to explicitly define the scope of mastery:

  • Mode A (Foundational/Manual): Mastery is defined as the ability to manually construct the artifact (e.g., for introductory syntax courses).
  • Mode B (Orchestration/System Auditor): Mastery is defined as the ability to design, verify, and orchestrate the artifact using AI tools. (Manual coding is NOT required; evidence is based on Design Rationale and Verification Logs).

The Evidence Strength Ladder:
To address the Agentic AI threat without creating a surveillance regime, the DAP prioritizes interactive verification over invasive monitoring:

  • E1 (Weak): Self-attestation + executable output (Low Intrusion).
  • E2 (Moderate): Git diffs, unit tests, and documented logic fixes (Low Intrusion).
  • E3 (Strong): Tool-exported metadata and design artifacts (Low Intrusion).
  • E4 (Strongest): Live Contextual Verification (Layer 3) (High Accuracy).

Interpretation Constraint:
AI use in low-weight or non-core components (e.g., Comments/Style), when disclosed per the WAM, is processed through a framework of Procedural Justice. It ensures that adverse decisions are based on evidence of missing mastery, not on the mere presence of AI artifacts.

Layer 3: The Contextual Verification Step (Procedural Safeguard)

To ensure scalability, this is a targeted mechanism governed by specific rules:

  • Trigger Conditions: Activated only when (a) a discrepancy exceeds a learning outcome threshold, (b) a grade sits at a critical boundary (Pass/Fail), or (c) a formal integrity concern is raised.
  • Standard Script: A brief, 10-minute code walkthrough with fixed questions aligned to WAM components (e.g., “Explain the logic of this Join”).
  • Outcome Types: (1) Confirm Mastery, (2) Request Revision, or (3) Refer to Integrity Committee.

Layer 4: The “Inclusive Assessment Pathway” (UDL)

Aligning with Universal Design for Learning (UDL), this layer offers multiple evidence modes by design. It allows students (including those with documented crises via STUVO) to utilize an Alternative Evidence Mode—shifting weight from “Manual Output” to “Verification Logic”—without altering the core learning outcomes.

Layer 5: The “Frontier Assessment” Model (The Newton Protocol)

For advanced courses, this extension replaces the “Viva” with a project-based metric.

  • The Principle: Standing on the shoulders of giants. The AI’s output is the baseline.
  • The Metric: Grades are assigned based on the Delta (Δ)
  • Technical Complexity: The architectural value added beyond what the AI could solve alone.
  • HCI / Orchestration Quality: The sophistication of the student’s interaction strategy (prompt engineering, iterative refinement, and debugging logic) used to guide the AI to the solution.
  • Evaluation-First Mindset: The sophistication of the student’s “Acceptance Criteria”—their ability to verify and refine the AI’s output (Outcome Certainty) rather than merely generating it.

  • Controlled Environment: To ensure equity and reproducibility, the baseline is computed using an institution-provided model chain, version-locked for the assessment window.

5. Institutional Feasibility: Application to the KU Leuven Case

While the DAP is designed as a universal framework for higher education, it demonstrates immediate utility in resolving interpretative ambiguities within specific contexts, such as KU Leuven’s current policy framework (2025-2026):

  1. Distinguishing Text from Function: Current guidelines differentiate between “verbatim copying” of text (restricted to an “absolute minimum”) and “generating code” (authorized provided transparency is maintained). The DAP explicitly categorizes assignments to align with these distinctions, ensuring that assessment protocols are congruent with the specific technical nature of the workflow (Textual vs. Functional).
  2. Operationalizing “Encouragement”: The university’s vision “encourages students… to handle this technology.” The DAP provides the concrete grading instrument (WAM) necessary to implement this vision, shifting the evaluative focus from ‘Process Certainty’ (evaluating the manual construction of the artifact) to ‘Outcome Certainty’ (evaluating the rigorous verification of the system’s logic).
  3. Structuring “Transparency”: KU Leuven guidelines require students to “explain the use in the methods section.” The DAP transforms this from a subjective narrative into a structured data declaration, ensuring compliance with Article 84 by providing an objective standard for disclosure.

6. Research Questions

  1. Clarity: To what extent does the WAM improve perceived policy clarity and reduce anxiety for students compared to natural language disclosures?
  2. Reliability: Does a WAM-based assessment model reduce inter-rater variability (grading variance) when assessing AI-assisted submissions?
  3. Behavioral Validity: Does the WAM successfully shift student effort toward high-value learning outcomes (Architecture/Verification) rather than the production of “human-like” artifacts? Does the WAM successfully shift student focus from “Process Emulation” (mimicking manual coding) to “Outcome Verification” (architectural validation) as their primary evidence of learning?
  4. Feasibility: What are the time-cost implications and faculty barriers to implementing the DAP?

7. Proposed Methodology (Mixed-Methods)

This thesis will employ a design-based research approach.

Phase 1: Qualitative Inquiry (Core Methodology)
Semi-structured interviews with N=5-8 professors (CS/Digital Humanities) to validate the problem statement, identify adoption barriers, and refine the DAP design based on expert feedback.

Phase 2: Empirical Validation (Optional Extension)
To provide rigorous evidence, a controlled experiment is proposed.

  • Participants: Target N=15-20 assessors.

  • Design: Counterbalanced design where assessors grade identical “borderline” submissions using (A) current policies and (B) the DAP/WAM.

  • Metrics:

    • Reliability: Inter-Class Correlation (ICC) and Krippendorff’s α
    • Efficiency: Average grading time per submission.
    • Confidence: Self-reported decision confidence (Likert scale).

8. Ethics & Data Handling

All data collection (interviews, grading vignettes) will adhere to university ethical standards. Data will be anonymized to protect participants and stored on secure university servers. The DAP design itself follows a “Data Minimization” principle, relying on targeted human verification rather than continuous algorithmic surveillance.

9. Tangible Deliverables

This research will produce a toolkit for institutional adoption:

  1. The WAM Template: A customizable matrix for course syllabi.
  2. The Verification Trigger Decision Tree: A 1-page guide for instructors.
  3. The Verification Script: Standardized questions for assessing authorship.
  4. The Student Guide: A “How-To” for disclosing provenance under the DAP.

10. Expected Contributions

  • For the Institution: A scalable instrument to increase assessment consistency and reduce the administrative burden of grade appeals.
  • For Faculty: A defensible and transparent grading framework that reduces the emotional labor of navigating ambiguity in AI use.
  • For Students: Provides clear expectations, ensures procedural fairness, reduces anxiety related to ambiguity, and better prepares them for modern professional workflows.
  • For Equity: An equity-aware design that reduces the disparate impact of underspecified policies on diverse student populations.

11. References

Elarde, J., Bruster, B., & Hasan, M. (2024). Software orchestration: A paradigm for software development and security assessment using ChatGPT requirements. The Journal of Computing Sciences in Colleges, 39(8), 44–53. https://dl.acm.org/doi/10.5555/3717781.3717790

Endres, M., Fakhoury, S., Chakraborty, S., & Lahiri, S. K. (2024). Can large language models transform natural language intent into formal method postconditions? Proceedings of the ACM on Software Engineering, 1(FSE), Article 84, 1889–1912. https://doi.org/10.1145/3660791

Hassan, A. E., Li, H., Lin, D., Adams, B., Chen, T.-H., Kashiwa, Y., & Qiu, D. (2025). Agentic software engineering: Foundational pillars and a research roadmap. arXiv. https://arxiv.org/abs/2509.06216v2

Peng, J., Cui, L., Huang, K., Yang, J., & Ray, B. (2025). CWEval: Outcome-driven evaluation on functionality and security of LLM code generation. arXiv. https://arxiv.org/abs/2501.08200v1

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems 36: Proceedings of the 37th International Conference on Neural Information Processing Systems (Article 377, pp. 8634–8652). https://dl.acm.org/doi/10.5555/3666122.3666499

Wang, Y. (2026, January 25). From process certainty to outcome certainty: A different kind of confidence in the age of AI. Computing Life. https://yage.ai/result-certainty-en.html

Wang, Y. (2026, February 02). Why AI Education Should Go Beyond Content Creation to Engineering Infrastructure. Computing Life. [https://yage.ai/ai-builder-space-en.html)

Star, S. L., & Griesemer, J. R. (1989). Institutional ecology,translations’ and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social studies of science, 19(3), 387-420. https://doi.org/10.1177/030631289019003001

Tyler, T. R. (1990). Why people obey the law. Yale University Press.

Friedman, B., & Hendry, D. G. (2019). Value sensitive design: Shaping technology with moral imagination. Mit Press.

Fricker, M. (2007). Epistemic injustice: Power and the ethics of knowing. Oxford university press.

KU Leuven. (2025). Guidelines for students on the authorised use of GenAI. KU Leuven Education & Examination Regulations. https://www.kuleuven.be/english/education/student/educational-tools/guidelines-for-students-on-the-authorised-use-of-genai

KU Leuven. (2025). Regulations on Education and Examinations (OER), Article 84: Irregularities. https://www.kuleuven.be/education/regulations/2025/#1306ee1f-c214-487c-9a85-b61e0c01f84e

KU Leuven. (2025). Responsible use of generative Artificial Intelligence: Vision & Policy. https://www.kuleuven.be/english/education/student/educational-tools/responsible-use-of-generative-artificial-intelligence

File Share authentication issue in MacOS (including SMB and AFP) in an unusual circumstance

*updated on September 19, 2022

I tried the MacOS native File Share feature since I need to share data between the two Macs via LAN. Naturally, this function, which by default is based on SMB, also supports AFP.

An accidental irreversible event happened during the attempt to authenticate, and the error form seemed to have the incorrect user name or password. However, I am confident that the login credentials are correct. Even after reinstalling the system from High Sierra to Catalina (this procedure has taken a long time….), the issue has not been fixed, the error is still present. This outcome looks absurd.

The error message in the console is:

smbd transact: gss_accept_sec_context: major_status: 0xd0000, minor_status: 0xa2e9a74a

After looking around, I found this prompt to be quite inspiring: https://discussions.apple.com/thread/8318535

Solution: synchronize both the time and time zone of two Macs. Issue is resolved.

Consideration: Rather than starting over with a fresh installation of the system whenever an unclear issue arises, searching for the relevant log in Console will be more essential.