-
-
Notifications
You must be signed in to change notification settings - Fork 125
added vanila js project ImgToPDFly #389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
WalkthroughA new client-side web application, ImgToPDFly, has been introduced. It enables users to convert images to PDFs and extract text from images using OCR directly in the browser. The update includes a README, HTML structure, JavaScript logic for file handling and processing, and CSS for UI styling and animations. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI
participant JS_Main
participant jsPDF
participant TesseractJS
User->>UI: Selects images or drags & drops
UI->>JS_Main: Triggers file handling
JS_Main->>UI: Updates image previews
alt Image to PDF mode
User->>UI: Clicks "Generate PDF"
UI->>JS_Main: Start PDF generation
loop For each image
JS_Main->>jsPDF: Add image to PDF
JS_Main->>UI: Update progress
end
JS_Main->>UI: Save PDF, update UI
else Extract Text mode
User->>UI: Clicks "Extract Text"
UI->>JS_Main: Start OCR process
loop For each image
JS_Main->>TesseractJS: Perform OCR
TesseractJS-->>JS_Main: Return extracted text
JS_Main->>UI: Update OCR progress
end
JS_Main->>jsPDF: Add text to PDF
JS_Main->>UI: Save PDF, update UI
end
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (4)
PROJECTS/ImgToPDFly/scripts/main.js (2)
77-86: File type validation could be more robust.The current validation only checks if the file type starts with "image/" but doesn't validate against specific supported formats or check for potential security issues.
+ const SUPPORTED_TYPES = ['image/jpeg', 'image/jpg', 'image/png', 'image/gif', 'image/bmp', 'image/webp']; + const imageFiles = droppedFiles.filter((file) => - file.type.startsWith("image/") + SUPPORTED_TYPES.includes(file.type.toLowerCase()) ); if (imageFiles.length > 0) { files = [...files, ...imageFiles]; updatePreview(); } else { - alert("Please drop only image files."); + alert("Please drop only supported image files (JPEG, PNG, GIF, BMP, WebP)."); }
1-26: Consider adding defensive checks for DOM elements.While the code assumes all DOM elements exist, adding defensive checks would make the application more robust against HTML structure changes.
// DOM Elements const dropzone = document.getElementById("dropzone"); const fileInput = document.getElementById("fileInput"); // ... other elements + // Defensive check + const requiredElements = [dropzone, fileInput, /* ... other critical elements */]; + const missingElements = requiredElements.filter(el => !el); + if (missingElements.length > 0) { + console.error('Required DOM elements are missing'); + return; + }PROJECTS/ImgToPDFly/index.html (2)
39-39: Consider adding file size validation.The file input doesn't have any size restrictions, which could lead to performance issues with very large images.
Add file size validation to the input:
- <input type="file" id="fileInput" class="hidden" accept="image/*" multiple> + <input type="file" id="fileInput" class="hidden" accept="image/*" multiple data-max-size="10485760">Then add validation in the JavaScript file to check the
data-max-sizeattribute.
160-162:Details
❓ Verification inconclusive
Consider specifying script integrity for enhanced security.
The external scripts are loaded from CDNs without Subresource Integrity (SRI) hashes, which could pose a security risk if the CDN is compromised.
Consider adding SRI hashes for the external scripts:
- <script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script> - <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script> - <script src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js"></script> + <script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js" integrity="sha512-..." crossorigin="anonymous"></script> + <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js" integrity="sha512-..." crossorigin="anonymous"></script> + <script src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js" integrity="sha512-..." crossorigin="anonymous"></script>You can generate the SRI hashes using online tools or by running:
🌐 Web query:
What are the current SRI hashes for jsPDF 2.5.1, html2canvas 1.4.1, and tesseract.js version 4?💡 Result:
Subresource Integrity (SRI) hashes are used to ensure that files fetched from a CDN have not been tampered with. To generate the SRI hash for a specific version of a library, you can use online tools like SRI Hash Generator.
For the libraries you've mentioned, here are the SRI hashes for their respective versions:
jsPDF 2.5.1:
- CDN URL:
https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js- SRI Hash:
sha384-...html2canvas 1.4.1:
- CDN URL:
https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js- SRI Hash:
sha512-fPU6BHNpsyIhr8yyMpTLLxAbkaK8ArIBcmZIRiBLiDhjeqvXolaEmDGmELFuX9I4xDcaKKcJl+TKZLqruBbmWA==tesseract.js 4.0.0:
- CDN URL:
https://cdn.jsdelivr.net/npm/tesseract.js@4.0.0/dist/tesseract.min.js- SRI Hash:
sha384-...Please note that the SRI hash for jsPDF 2.5.1 and tesseract.js 4.0.0 are not provided in the available sources. To obtain the SRI hashes for these versions, you can download the minified JavaScript files from their respective CDNs and use the SRI Hash Generator to generate the hashes.
Ensure that the SRI hashes match the files you intend to use to maintain the integrity and security of your application.
🏁 Script executed:
#!/bin/bash set -euo pipefail # Compute the SRI hash (sha384) for jsPDF 2.5.1 curl -sSL https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js \ | openssl dgst -sha384 -binary \ | openssl base64 -A \ | awk '{ print "sha384-" $0 }' # Compute the SRI hash (sha384) for tesseract.js 4.x curl -sSL https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js \ | openssl dgst -sha384 -binary \ | openssl base64 -A \ | awk '{ print "sha384-" $0 }'Length of output: 426
🏁 Script executed:
#!/usr/bin/env bash set -euo pipefail python3 - << 'EOF' import hashlib, base64, urllib.request libraries = [ ("jsPDF 2.5.1", "https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js", hashlib.sha384), ("html2canvas 1.4.1", "https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js", hashlib.sha512), ("tesseract.js 4", "https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js", hashlib.sha384), ] for name, url, hashfunc in libraries: data = urllib.request.urlopen(url).read() digest = hashfunc(data).digest() algo = hashfunc().name sri = f"{algo}-{base64.b64encode(digest).decode()}" print(f"{name}: {sri}") EOFLength of output: 2614
Add Subresource Integrity (SRI) to external scripts
To prevent CDN-side compromises, include SRI hashes on your
<script>tags in PROJECTS/ImgToPDFly/index.html (around lines 160–162):- <script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script> - <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script> - <script src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js"></script> + <script + src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js" + integrity="sha384-<GENERATE_JSPDF_HASH>" + crossorigin="anonymous"> + </script> + <script + src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js" + integrity="sha512-fPU6BHNpsyIhr8yyMpTLLxAbkaK8ArIBcmZIRiBLiDhjeqvXolaEmDGmELFuX9I4xDcaKKcJl+TKZLqruBbmWA==" + crossorigin="anonymous"> + </script> + <script + src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js" + integrity="sha384-<GENERATE_TESSERACT_HASH>" + crossorigin="anonymous"> + </script>• html2canvas’s SRI is shown above.
• Please generate thesha384hashes for jsPDF 2.5.1 and tesseract.js 4.x (e.g. via https://www.srihash.org/) and replace<GENERATE_…_HASH>.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
package-lock.jsonis excluded by!**/package-lock.jsonyarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (4)
PROJECTS/ImgToPDFly/README.md(1 hunks)PROJECTS/ImgToPDFly/index.html(1 hunks)PROJECTS/ImgToPDFly/scripts/main.js(1 hunks)PROJECTS/ImgToPDFly/styles/style.css(1 hunks)
🧰 Additional context used
🪛 LanguageTool
PROJECTS/ImgToPDFly/README.md
[misspelling] ~23-~23: This word is normally spelled as one.
Context: ...📄 Image to PDF Conversion - Generate a multi-page PDF from uploaded images. - Automatic i...
(EN_COMPOUNDS_MULTI_PAGE)
🔇 Additional comments (4)
PROJECTS/ImgToPDFly/styles/style.css (1)
1-51: LGTM! Clean and well-structured CSS.The CSS file demonstrates good practices with smooth transitions, proper hover effects, and a well-defined pulse animation. The styling complements the interactive features described in the JavaScript implementation.
PROJECTS/ImgToPDFly/README.md (2)
23-23: Static analysis false positive - "multi-page" is correct.The static analysis tool flagged "multi-page" as potentially incorrect, but this is actually the correct hyphenated form when used as an adjective before a noun (e.g., "multi-page PDF"). No change needed.
🧰 Tools
🪛 LanguageTool
[misspelling] ~23-~23: This word is normally spelled as one.
Context: ...📄 Image to PDF Conversion - Generate a multi-page PDF from uploaded images. - Automatic i...(EN_COMPOUNDS_MULTI_PAGE)
1-61: Excellent documentation structure and content.The README provides comprehensive coverage of the project features, tech stack, and usage instructions. The organization with emojis and clear sections makes it very readable and informative.
🧰 Tools
🪛 LanguageTool
[misspelling] ~23-~23: This word is normally spelled as one.
Context: ...📄 Image to PDF Conversion - Generate a multi-page PDF from uploaded images. - Automatic i...(EN_COMPOUNDS_MULTI_PAGE)
PROJECTS/ImgToPDFly/index.html (1)
1-166: Excellent HTML structure with good accessibility practices.The HTML is well-structured with semantic elements, proper headings hierarchy, accessible form controls with labels, and good user experience features like the loading modal and progress indicators.
| const y = (pageHeight - height) / 2; | ||
|
|
||
| // Add image to PDF | ||
| pdf.addImage(img, "JPEG", x, y, width, height, null, "FAST"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding error handling for image compression.
The addImage method uses "FAST" compression mode, but there's no error handling if the image fails to be added to the PDF.
- pdf.addImage(img, "JPEG", x, y, width, height, null, "FAST");
+ try {
+ pdf.addImage(img, "JPEG", x, y, width, height, null, "FAST");
+ } catch (error) {
+ console.error(`Failed to add image ${i + 1} to PDF:`, error);
+ // Consider showing user feedback
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pdf.addImage(img, "JPEG", x, y, width, height, null, "FAST"); | |
| try { | |
| pdf.addImage(img, "JPEG", x, y, width, height, null, "FAST"); | |
| } catch (error) { | |
| console.error(`Failed to add image ${i + 1} to PDF:`, error); | |
| // Consider showing user feedback | |
| } |
🤖 Prompt for AI Agents
In PROJECTS/ImgToPDFly/scripts/main.js at line 193, the call to pdf.addImage
with "FAST" compression lacks error handling. Wrap the addImage call in a
try-catch block to catch any exceptions thrown if the image fails to be added.
In the catch block, log or handle the error appropriately to ensure the program
can respond gracefully to failures during image compression.
| const compress = document.getElementById("compressOption").checked; | ||
| const grayscale = document.getElementById("grayscaleOption").checked; | ||
| const pageSize = document.getElementById("pageSize").value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Options are retrieved but not utilized in PDF generation.
The compress and grayscale options are retrieved from the UI but are not actually applied to the PDF generation process. This creates a disconnect between the UI options and functionality.
Consider implementing the compression and grayscale features:
// Create PDF
const pdf = new jsPDF({
orientation: "portrait",
unit: "mm",
format: pageSize,
+ compress: compress
});
// Process each image
for (let i = 0; i < files.length; i++) {
const file = files[i];
- const img = await loadImage(file);
+ let img = await loadImage(file);
+
+ // Apply grayscale if needed
+ if (grayscale) {
+ img = convertToGrayscale(img);
+ }You'll also need to implement the convertToGrayscale function using canvas manipulation.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In PROJECTS/ImgToPDFly/scripts/main.js around lines 167 to 169, the compress and
grayscale options are retrieved from the UI but not applied in the PDF
generation. To fix this, integrate these options into the PDF creation logic by
applying compression settings when adding images to the PDF and implementing a
convertToGrayscale function that uses canvas manipulation to transform images to
grayscale before adding them. Ensure the PDF generation process uses these
processed images accordingly.
| const result = await Tesseract.recognize(img, language, { | ||
| logger: (m) => console.log(m), | ||
| }); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for OCR processing.
The Tesseract OCR operation can fail for various reasons (corrupted images, unsupported formats, network issues), but there's no error handling around the recognition process.
- const result = await Tesseract.recognize(img, language, {
- logger: (m) => console.log(m),
- });
+ let result;
+ try {
+ result = await Tesseract.recognize(img, language, {
+ logger: (m) => console.log(m),
+ });
+ } catch (error) {
+ console.error(`OCR failed for image ${i + 1}:`, error);
+ extractedText += `[Error processing image ${file.name}]\n\n`;
+ continue;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const result = await Tesseract.recognize(img, language, { | |
| logger: (m) => console.log(m), | |
| }); | |
| let result; | |
| try { | |
| result = await Tesseract.recognize(img, language, { | |
| logger: (m) => console.log(m), | |
| }); | |
| } catch (error) { | |
| console.error(`OCR failed for image ${i + 1}:`, error); | |
| extractedText += `[Error processing image ${file.name}]\n\n`; | |
| continue; | |
| } |
🤖 Prompt for AI Agents
In PROJECTS/ImgToPDFly/scripts/main.js around lines 254 to 257, the
Tesseract.recognize call lacks error handling, which can cause unhandled
exceptions if OCR processing fails. Wrap the await Tesseract.recognize call in a
try-catch block to catch any errors during recognition, and handle or log the
error appropriately to prevent the application from crashing and to provide
useful debugging information.
| function loadImage(file) { | ||
| return new Promise((resolve) => { | ||
| const img = new Image(); | ||
| const reader = new FileReader(); | ||
|
|
||
| reader.onload = (e) => { | ||
| img.src = e.target.result; | ||
| img.onload = () => resolve(img); | ||
| }; | ||
|
|
||
| reader.readAsDataURL(file); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling to image loading function.
The loadImage function doesn't handle cases where the image fails to load, which could cause the application to hang indefinitely.
function loadImage(file) {
return new Promise((resolve, reject) => {
const img = new Image();
const reader = new FileReader();
+ reader.onerror = () => reject(new Error(`Failed to read file: ${file.name}`));
+ img.onerror = () => reject(new Error(`Failed to load image: ${file.name}`));
reader.onload = (e) => {
img.src = e.target.result;
img.onload = () => resolve(img);
};
reader.readAsDataURL(file);
});
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In PROJECTS/ImgToPDFly/scripts/main.js around lines 285 to 297, the loadImage
function lacks error handling for image loading failures. Modify the function to
add event listeners for the image's error event and the FileReader's error
event, and reject the promise with an appropriate error message if either fails.
This will prevent the application from hanging indefinitely when image loading
fails.
| <!-- Third party scripts --> | ||
|
|
||
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script> | ||
| <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
html2canvas script appears unused.
The html2canvas library is loaded but doesn't appear to be used anywhere in the main.js script. This adds unnecessary overhead to the page.
Consider removing the unused html2canvas script to improve performance:
<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script>
- <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js"></script>🏁 Script executed:
#!/bin/bash
# Check if html2canvas is used in the JavaScript files
rg -i "html2canvas|html2Canvas" --type jsLength of output: 43
I’d like to broaden the search to include HTML as well as JS/TS files. Let’s rerun with a more comprehensive pattern:
This will help confirm whether the library is ever invoked. If this still yields no hits, we can conclude it’s unused.
🏁 Script executed:
#!/bin/bash
# Search for any html2canvas references or calls across HTML, JS, and TS files
rg -i "html2canvas" -n --glob "*.{html,js,ts}"Length of output: 185
Remove unused html2canvas import
It looks like html2canvas is only ever loaded in PROJECTS/ImgToPDFly/index.html (line 161) and never actually invoked in your HTML, JS or TS files. You can safely drop it to reduce page weight:
• PROJECTS/ImgToPDFly/index.html: line 161
Suggested diff:
<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script>
- <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js"></script>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script> | |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script> | |
| <script src="https://cdn.jsdelivr.net/npm/tesseract.js@4/dist/tesseract.min.js"></script> |
🤖 Prompt for AI Agents
In PROJECTS/ImgToPDFly/index.html at line 161, the html2canvas library is loaded
but not used anywhere in the HTML, JS, or TS files. To fix this, remove the
script tag that imports html2canvas from the HTML file to reduce unnecessary
page load and improve performance.
an image to pdf generation project using vanilla js
Summary by CodeRabbit