Skip to content

Update slide deck 3#155

Open
calderonjesus wants to merge 4 commits into
mainfrom
update/slides-3
Open

Update slide deck 3#155
calderonjesus wants to merge 4 commits into
mainfrom
update/slides-3

Conversation

@calderonjesus
Copy link
Copy Markdown
Collaborator

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

This pull request makes a number of improvements and corrections to the AI evaluation lecture slides (03_evaluation.md). The changes focus on clarifying language, fixing typos, improving formatting, and enhancing the accuracy and professionalism of the content. The most important changes are grouped below:

Content and Clarity Improvements:

  • Added a new "Main Points" section summarizing the key takeaways about evaluation in AI engineering, emphasizing practical hurdles, evaluation methods, and the importance of systematic pipelines.
  • Clarified explanations for evaluation metrics (e.g., cross entropy, perplexity, BPB), downstream evaluation approaches, and the process of factual consistency verification. [1] [2] [3] [4] [5] [6]
  • Improved descriptions of evaluation biases (self, position, verbosity) and the limitations of multiple-choice questions (MCQs) for evaluating language models. [1] [2]

Corrections and Typo Fixes:

  • Fixed numerous typos and grammatical errors throughout the document (e.g., "practictioners" → "practitioners", "simiarlity" → "similarity", "Consisntency" → "Consistency"). [1] [2] [3]
  • Updated references and dates for accuracy (e.g., changed "(Huyen, 2025)" to "(Huyen, 2024)"). [1] [2]

Formatting and Style Enhancements:

  • Standardized section titles and improved markdown formatting for readability (e.g., "n-gram similarity" → "n-gram Similarity").
  • Improved example prompts and code block formatting for clarity. [1] [2]

Terminology and Consistency:

  • Made terminology more precise and consistent (e.g., "AI-as-a-judge", "human-designed similarity measures"). [1] [2]
  • Clarified the scope and intent of evaluation methods, such as the difference between lexical and semantic similarity. [1] [2]

These changes collectively enhance the quality, accuracy, and professionalism of the instructional material.

What did you learn from the changes you have made?

N/A

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

N/A

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

N/A

How were these changes tested?

N/A

A reference to a related issue in your repository (if applicable)

N/A

Checklist

  • [ X] I can confirm that my changes are working as intended

- Add references
- Update open vs closed model comparison
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 6, 2026

Hello, thank you for your contribution. If you are a participant, please close this pull request and open it in your own forked repository instead of here. Please read the instructions on your onboarding Assignment Submission Guide more carefully. If you are not a participant, please give us up to 72 hours to review your PR. Alternatively, you can reach out to us directly to expedite the review process.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 6, 2026

Hello, thank you for your contribution. If you are a participant, please close this pull request and open it in your own forked repository instead of here. Please read the instructions on your onboarding Assignment Submission Guide more carefully. If you are not a participant, please give us up to 72 hours to review your PR. Alternatively, you can reach out to us directly to expedite the review process.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 6, 2026

Hello, thank you for your contribution. If you are a participant, please close this pull request and open it in your own forked repository instead of here. Please read the instructions on your onboarding Assignment Submission Guide more carefully. If you are not a participant, please give us up to 72 hours to review your PR. Alternatively, you can reach out to us directly to expedite the review process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant