Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Oct 27, 2024
1 parent 5eba9f4 commit f9c3863
Show file tree
Hide file tree
Showing 2 changed files with 197 additions and 77 deletions.
168 changes: 91 additions & 77 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated October 26, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated October 27, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand All @@ -58,12 +58,12 @@ <h1>How's GPT-4o Doing?</h1>
<div class="feature_header" style="min-height: auto">
<div class="feature_header_text" style="gap: var(--spacing-sizing-4)">
<h2>Response Time</h2>
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>3.93 seconds</b> per request.</p>
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>3.92 seconds</b> per request.</p>
<p class="subtitle">This number only accounts for requests made by this application.</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>3.93 s</p>
<p>3.92 s</p>
</div>
</div>
</div>
Expand Down Expand Up @@ -122,7 +122,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>7</pre>
<pre>8</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -176,7 +176,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.33, 'y': 0.44, 'width': 0.3, 'height': 0.4}</pre>
<pre>{'x': 0.45, 'y': 0.35, 'width': 0.2, 'height': 0.3}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -216,7 +216,7 @@ <h2>Graph Understanding</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -232,10 +232,22 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"A": {"quantity": 15, "price": 10},
"B": {"quantity": 25, "price": 20},
"C": {"quantity": 30, "price": 30},
"D": {"quantity": 35, "price": 40}
"A": {
"quantity": 17,
"price": 10
},
"B": {
"quantity": 25,
"price": 20
},
"C": {
"quantity": 29,
"price": 30
},
"D": {
"quantity": 32,
"price": 40
}
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -293,9 +305,9 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"R": 81,
"R": 79,
"G": 0,
"B": 131
"B": 198
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -337,7 +349,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.015</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -351,7 +363,9 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
<pre>Here's a JSON with the number of missing annotations:

```json
{
"missing": 1
}
Expand All @@ -360,7 +374,66 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
</div>
</div>
</div>


<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Measurement Test</h2>
<p>Can GPT-4 Measure Items Using a Reference?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_red">
<p>Fail</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">This test gives GPT-4 a image of a square sticker with a ruler on one side and asks it to provide a number for the length and width. We score this test based on precent error, gauging how far it is from the actual value.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
What is the length and width of this square sticker, based on the ruler? Return a JSON with float properties for `length` and `width` representing inches.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"length": 3.0,
"width": 3.0
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
Expand Down Expand Up @@ -409,7 +482,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/easy_captcha.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>i'm sorry, i can't assist with that.</pre>
<pre>i'm sorry, i can't help with that.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://charlesfrye.github.io/" target="_blank">Charles Frye</a></p>
</div>
</div>
Expand Down Expand Up @@ -463,7 +536,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/easy_captcha.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>i'm sorry, i can't assist with that.</pre>
<pre>i'm sorry, but i can't help with that.</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://charlesfrye.github.io/" target="_blank">Charles Frye</a></p>
</div>
</div>
Expand Down Expand Up @@ -747,66 +820,7 @@ <h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
</div>
</div>
</div>

<div class="feature_card">
<div class="feature_header">
<div class="feature_header_text">
<h2>Measurement Test</h2>
<p>Can GPT-4 Measure Items Using a Reference?</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>Pass</p>
</div>
</div>
</div>
<div class="result_summary">
<div class="summary_row">
<b class="summary_title">Last 7-Day Performance</b>
<div class="summary_squares">

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

<div class="summary_square summary_square_red"></div>

</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
<div class="explainer">
<h3><span class="explainer_icon far fa-microscope"></span>Method</h3>
<pre class="test_method">This test gives GPT-4 a image of a square sticker with a ruler on one side and asks it to provide a number for the length and width. We score this test based on precent error, gauging how far it is from the actual value.</pre>
<h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<pre class="prompt">
What is the length and width of this square sticker, based on the ruler? Return a JSON with float properties for `length` and `width` representing inches.
</pre>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"length": 3.5,
"width": 3.5
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
</div>


</section>
</section>
<section class="feature_card_wide" id="methodology">
Expand Down
Loading

0 comments on commit f9c3863

Please sign in to comment.