Skip to content

Commit

Permalink
Update results
Browse files Browse the repository at this point in the history
  • Loading branch information
capjamesg committed Aug 20, 2024
1 parent dec4d49 commit 63ae4a5
Show file tree
Hide file tree
Showing 2 changed files with 129 additions and 55 deletions.
78 changes: 23 additions & 55 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ <h1>How's GPT-4o Doing?</h1>
<p>You can contribute your own tests, too! See the <a href="https://github.com/roboflow/gpt-checkup?tab=readme-ov-file#-contribute">GitHub README</a> for contributing instructions.</p>
</div>
<div class="header_subtitle">
<p>Tests are run every day at 1am PT. Last updated August 19, 2024.</p>
<p>Tests are run every day at 1am PT. Last updated August 20, 2024.</p>
<p>Made with ❤️ by the team at <a href="https://roboflow.com">Roboflow</a>.</p>
</div>
<div class="header_cta">
Expand All @@ -58,12 +58,12 @@ <h1>How's GPT-4o Doing?</h1>
<div class="feature_header" style="min-height: auto">
<div class="feature_header_text" style="gap: var(--spacing-sizing-4)">
<h2>Response Time</h2>
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>4.15 seconds</b> per request.</p>
<p style="font-size: 16px; color: var(--gray-700)">Today, the average response time to receive results from our tests was <b>4.14 seconds</b> per request.</p>
<p class="subtitle">This number only accounts for requests made by this application.</p>
</div>
<div class="chart">
<div class="chart_box chart_box_green">
<p>4.15 s</p>
<p>4.14 s</p>
</div>
</div>
</div>
Expand Down Expand Up @@ -176,7 +176,7 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/fruit.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>{'x': 0.45, 'y': 0.4, 'width': 0.25, 'height': 0.21}</pre>
<pre>{'x': 0.45, 'y': 0.43, 'width': 0.28, 'height': 0.42}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -216,7 +216,7 @@ <h2>Graph Understanding</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.011</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -232,22 +232,10 @@ <h3><span class="explainer_icon far fa-image"></span>Image</h3>
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"A": {
"quantity": 20,
"price": 10
},
"B": {
"quantity": 25,
"price": 20
},
"C": {
"quantity": 30,
"price": 30
},
"D": {
"quantity": 35,
"price": 40
}
"A": {"quantity": 20, "price": 15},
"B": {"quantity": 25, "price": 27},
"C": {"quantity": 30, "price": 35},
"D": {"quantity": 33, "price": 40}
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -303,13 +291,11 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/color.png" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>```json
{
"R": 85,
"G": 16,
"B": 123
}
```</pre>
<pre>Failed to produce a valid JSON output: {
"R": 79,
"G": 0,
"B": 132
}</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
</div>
</div>
Expand Down Expand Up @@ -349,7 +335,7 @@ <h2>Annotation Quality Assurance</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.02</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -363,27 +349,13 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/annotationqa.jpeg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>To determine if there are any missing annotations, let's count the visible cars in the image and compare that to the number of red bounding boxes.

Visible cars:
1. The white car on the right side of the image
2. The black car on the left side of the image (partially visible)
3. The black car in the middle-left lane
4. The car in the middle lane
5. Four cars further down the middle-right lane (each with a bounding box)

There are a total of eight cars in the image.

Bounding boxes:
There are six bounding boxes labeling cars.
<pre>Upon inspection of the image, there is one missing annotation. The white car in the right lane closest to the camera is not annotated.

There are 2 cars in the images that are not labeled with red bounding boxes.

Thus, the JSON output for the missing annotations would be:
Here's the JSON with the number of missing annotations:

```json
{
"missing": 2
"missing": 1
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -425,7 +397,7 @@ <h2>Measurement Test</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>0%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.01</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.009</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand All @@ -439,14 +411,10 @@ <h3><span class="explainer_icon far fa-comment-dots"></span>Prompt</h3>
<h3><span class="explainer_icon far fa-image"></span>Image</h3>
<img class="test_image" src="images/measurement.jpg" alt="Image of the input into GPT-4" />
<h3><span class="explainer_icon far fa-sparkles"></span>Result</h3>
<pre>Based on the ruler in the image, the square sticker appears to be approximately 3 inches on each side.

Here is the JSON representation:

```json
<pre>```json
{
"length": 3.0,
"width": 3.0
"length": 2.75,
"width": 2.75
}
```</pre>
<p class="subtitle" style="margin-top: 16px; text-align: center">Test submitted by <a href="https://roboflow.com" target="_blank">Roboflow</a></p>
Expand Down Expand Up @@ -713,7 +681,7 @@ <h2>Math OCR</h2>
</div>
</div>
<p class="result_text">Of the last 7 tests, conducted daily, this test has passed <b>100%</b> of the time.</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.016</p>
<p class="request_price"><i class="far fa-coins"></i>Today's request cost $0.015</p>
</div>
<div class="explainer_dropdown">
<button type="button" class="dropdown dropdown_learn active">Learn about this test</button>
Expand Down
106 changes: 106 additions & 0 deletions results/2024-08-20.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"zero_shot_classification": {
"score": 1,
"success": true,
"price": 0.00481,
"pass_fail": "Pass",
"response_time": 1.872744083404541,
"result": "Toyota Camry"
},
"count_fruit": {
"score": 0,
"success": false,
"price": 0.007870000000000002,
"pass_fail": "Fail",
"response_time": 3.2203755378723145,
"result": "8"
},
"document_ocr": {
"score": 1,
"success": true,
"price": 0.008539999999999999,
"pass_fail": "Pass",
"response_time": 3.425847053527832,
"result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times."
},
"handwriting_ocr": {
"score": 1,
"success": true,
"price": 0.00876,
"pass_fail": "Pass",
"response_time": 5.5869879722595215,
"result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day old tea.\""
},
"extraction_ocr": {
"score": 1.0,
"success": true,
"price": 0.00719,
"pass_fail": "Pass",
"response_time": 3.0273895263671875,
"result": "[{'name': 'Mary Thomas', 'time_per_day': 1, 'medication': 'Atenolol', 'dosage': 100, 'rx_number': '1234567-12345'}]"
},
"math_ocr": {
"score": 1.0,
"success": true,
"price": 0.015290000000000002,
"pass_fail": "Pass",
"response_time": 2.972666025161743,
"result": "3x^2-6x+2"
},
"object_detection": {
"score": 0.700922266139658,
"success": false,
"price": 0.009490000000000002,
"pass_fail": "Fail",
"response_time": 3.71329927444458,
"result": "{'x': 0.45, 'y': 0.43, 'width': 0.28, 'height': 0.42}"
},
"graph_understanding": {
"score": 0.905,
"success": false,
"price": 0.01019,
"pass_fail": "Fail",
"response_time": 2.601107358932495,
"result": "```json\n{\n \"A\": {\"quantity\": 20, \"price\": 15},\n \"B\": {\"quantity\": 25, \"price\": 27},\n \"C\": {\"quantity\": 30, \"price\": 35},\n \"D\": {\"quantity\": 33, \"price\": 40}\n}\n```"
},
"color_recognition": {
"score": 0,
"success": false,
"price": 0.00875,
"pass_fail": "Fail",
"response_time": 1.995676040649414,
"result": "Failed to produce a valid JSON output: {\n \"R\": 79,\n \"G\": 0,\n \"B\": 132\n}"
},
"annotation_qa": {
"score": 0.33333333333333337,
"success": false,
"price": 0.01641,
"pass_fail": "Fail",
"response_time": 6.796298980712891,
"result": "Upon inspection of the image, there is one missing annotation. The white car in the right lane closest to the camera is not annotated.\n\nHere's the JSON with the number of missing annotations:\n\n```json\n{\n \"missing\": 1\n}\n```"
},
"measurement": {
"score": 0.7857142857142857,
"success": false,
"price": 0.00877,
"pass_fail": "Fail",
"response_time": 5.766342401504517,
"result": "```json\n{\n \"length\": 2.75,\n \"width\": 2.75\n}\n```"
},
"easy_captcha": {
"score": 1,
"success": true,
"price": 0.004790000000000001,
"pass_fail": "Pass",
"response_time": 1.646939754486084,
"result": "charybdis indubitable"
},
"easy_captcha_persuade": {
"score": 1,
"success": true,
"price": 0.00529,
"pass_fail": "Pass",
"response_time": 1.285482406616211,
"result": "charybdis indubitable"
}
}

0 comments on commit 63ae4a5

Please sign in to comment.