diff --git a/.github/workflows/spell-check.yaml b/.github/workflows/spell-check.yaml new file mode 100644 index 0000000000..04b147e0a6 --- /dev/null +++ b/.github/workflows/spell-check.yaml @@ -0,0 +1,17 @@ +name: Spell check + +on: + pull_request: + push: + branches: + - master + +jobs: + spell_check: + name: Spell check + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v6 + - name: Check spelling with typos + uses: crate-ci/typos@v1.42.0 diff --git a/examples/ts-parallel-scraping/orchestrator/src/main.ts b/examples/ts-parallel-scraping/orchestrator/src/main.ts index 6c9d2c287e..eb1e0ab440 100644 --- a/examples/ts-parallel-scraping/orchestrator/src/main.ts +++ b/examples/ts-parallel-scraping/orchestrator/src/main.ts @@ -36,7 +36,7 @@ if (state.isInitialized) { const runClient = apifyClient.run(runId); const run = await runClient.get(); - // This should happen only if the run was deleted or the state was incorectly saved. + // This should happen only if the run was deleted or the state was incorrectly saved. if (!run) throw await Actor.fail(`The run ${runId} from state does not exists.`); if (run.status === 'RUNNING') { diff --git a/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md b/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md index 15e5e42266..e4ade9c4fb 100644 --- a/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md +++ b/sources/academy/platform/expert_scraping_with_apify/apify_api_and_client.md @@ -49,7 +49,7 @@ The new Actor should take the following input values, which be mapped to paramet "useClient": false, // The fields in each item to return back. All other - // fields should be ommitted + // fields should be omitted "fields": ["title", "itemUrl", "offer"], // The maximum number of items to return back diff --git a/sources/academy/tutorials/node_js/caching_responses_in_puppeteer.md b/sources/academy/tutorials/node_js/caching_responses_in_puppeteer.md index 3ce0a59d2c..50a86ff4d8 100644 --- a/sources/academy/tutorials/node_js/caching_responses_in_puppeteer.md +++ b/sources/academy/tutorials/node_js/caching_responses_in_puppeteer.md @@ -77,7 +77,7 @@ page.on('response', async (response) => { try { buffer = await response.buffer(); } catch (error) { - // some responses do not contain buffer and do not need to be catched + // some responses do not contain buffer and do not need to be caught return; } diff --git a/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md b/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md index da57305df1..13fcdb4f4c 100644 --- a/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md +++ b/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md @@ -7,7 +7,7 @@ slug: /node-js/filter-blocked-requests-using-sessions _This article explains how the problem was solved before the [SessionPool](/sdk/js/docs/api/session-pool) class was added into [Apify SDK](/sdk/js/). We are keeping the article here as it might be interesting for people who want to see how to work with sessions on a lower level. For any practical usage of sessions, follow the documentation and examples of SessionPool._ -### Overview of the problem +## Overview of the problem You want to crawl a website with a proxy pool, but most of your proxies are blocked. It's a very common situation. Proxies can be blocked for many reasons: @@ -25,7 +25,7 @@ Nobody can make sure that a proxy will work infinitely. The only real solution t However, usually, at least some of our proxies work. To crawl successfully, it is therefore imperative to handle blocked requests properly. You first need to discover that you are blocked, which usually means that either your request returned status greater or equal to 400 (it didn't return the proper response) or that the page displayed a captcha. To ensure that this bad request is retried, you usually throw an error and it gets automatically retried later (our [SDK](/sdk/js/) handles this for you). Check out [this article](https://docs.apify.com/academy/node-js/handle-blocked-requests-puppeteer) as inspiration for how to handle this situation with `PuppeteerCrawler` class. -### Solution +## Solution Now we are able to retry bad requests and eventually unless all of our proxies get banned, we should be able to successfully crawl what we want. The problem is that it takes too long and our log is full of errors. Fortunately, we can overcome this with [proxy sessions](/platform/proxy/datacenter-proxy#username-parameters) (look at the proxy and SDK documentation for how to use them in your Actors.) @@ -50,7 +50,7 @@ Apify.main(async () => { }); ``` -### Algorithm +## Algorithm You don't necessarily need to understand the solution below - it should be fine to copy/paste it to your Actor. @@ -121,7 +121,7 @@ const pickSession = (sessions, maxSessions = 100) => { }; ``` -### Puppeteer example +## Puppeteer example We then use this function whenever we want to get the session for our request. Here is an example of how we would use it for bare bones Puppeteer (for example as a part of `BasicCrawler` class). @@ -142,11 +142,11 @@ After success: After failure (captcha, blocked request, etc.): `delete sessions[session.name]` -### PuppeteerCrawler example +## PuppeteerCrawler example Now you might start to wonder, "I have already prepared an Actor using PuppeteerCrawler, can I make it work there?". The problem is that with PuppeteerCrawler we don't have everything nicely inside one function scope like when using pure Puppeteer or BasicCrawler. Fortunately, there is a little hack that enables passing the session name to where we need it. -First we define `lauchPuppeteerFunction` which tells the crawler how to create new browser instances and we pass the picked session there. +First we define `launchPuppeteerFunction` which tells the crawler how to create new browser instances and we pass the picked session there. ```js const crawler = new Apify.PuppeteerCrawler({ diff --git a/sources/academy/tutorials/node_js/multiple-runs-scrape.md b/sources/academy/tutorials/node_js/multiple-runs-scrape.md index f56edce6ae..be451307be 100644 --- a/sources/academy/tutorials/node_js/multiple-runs-scrape.md +++ b/sources/academy/tutorials/node_js/multiple-runs-scrape.md @@ -114,7 +114,7 @@ if (state.isInitialized) { const runClient = apifyClient.run(runId); const run = await runClient.get(); - // This should happen if the run was deleted or the state was incorectly saved. + // This should happen if the run was deleted or the state was incorrectly saved. if (!run) throw new Error(`The run ${runId} from state does not exists.`); if (run.status === 'RUNNING') { diff --git a/sources/academy/webscraping/scraping_basics_javascript/08_saving_data.md b/sources/academy/webscraping/scraping_basics_javascript/08_saving_data.md index 6a312d9570..30e5acbd51 100644 --- a/sources/academy/webscraping/scraping_basics_javascript/08_saving_data.md +++ b/sources/academy/webscraping/scraping_basics_javascript/08_saving_data.md @@ -112,7 +112,7 @@ $ node index.js ## Saving data as JSON -The JSON format is popular primarily among developers. We use it for storing data, configuration files, or as a way to transfer data between programs (e.g., APIs). Its origin stems from the syntax of JavaScript objects, but people now use it accross programming languages. +The JSON format is popular primarily among developers. We use it for storing data, configuration files, or as a way to transfer data between programs (e.g., APIs). Its origin stems from the syntax of JavaScript objects, but people now use it across programming languages. We'll begin with importing the `writeFile` function from the Node.js standard library, so that we can, well, write files: diff --git a/sources/legal/latest/terms/store-publishing-terms-and-conditions.md b/sources/legal/latest/terms/store-publishing-terms-and-conditions.md index 84bc778483..537db9de8f 100644 --- a/sources/legal/latest/terms/store-publishing-terms-and-conditions.md +++ b/sources/legal/latest/terms/store-publishing-terms-and-conditions.md @@ -85,7 +85,7 @@ We are authorized to unpublish and/or delete such an Actor, in our sole discreti 1. "**Monthly Rental**" which means that each User of your Actor will pay a flat monthly rental fee for use of that Actor. You will set the price as X USD per month; 2. "**Pay per Result**" which means that each User of your Actor will pay a fee calculated according to the number of results of each run of that Actor. You will set the price as X USD per 1,000 results. In this model, the Users do not pay for the Platform usage; or -3. "**Pay per Event**" which allows you to programatically charge for events in your Actor source code. You need to pre-define the events first when setting the Actor pricing and configure whether Users pay for the Platform usage or not. +3. "**Pay per Event**" which allows you to programmatically charge for events in your Actor source code. You need to pre-define the events first when setting the Actor pricing and configure whether Users pay for the Platform usage or not. 11.2. If you set your Actor as monetized, you will be entitled to receive remuneration calculated as follows: diff --git a/sources/platform/integrations/ai/langchain.md b/sources/platform/integrations/ai/langchain.md index 8d78ea758c..b9f722a4b9 100644 --- a/sources/platform/integrations/ai/langchain.md +++ b/sources/platform/integrations/ai/langchain.md @@ -128,8 +128,7 @@ After running the code, you should see the following output: ```text answer: LangChain is a framework designed for developing applications powered by large language models (LLMs). It simplifies the - entire application lifecycle, from development to productionization and deployment. LangChain provides open-source components a -nd integrates with various third-party tools, making it easier to build and optimize applications using language models. + entire application lifecycle, from development to productionization and deployment. LangChain provides open-source components and integrates with various third-party tools, making it easier to build and optimize applications using language models. source: https://python.langchain.com/docs/get_started/introduction ``` diff --git a/sources/platform/integrations/data-storage/airtable/index.md b/sources/platform/integrations/data-storage/airtable/index.md index fb7e280435..d2551837ad 100644 --- a/sources/platform/integrations/data-storage/airtable/index.md +++ b/sources/platform/integrations/data-storage/airtable/index.md @@ -39,9 +39,9 @@ Go to [Airtable](https://airtable.com) and open the base you would like to work ![Access the extensions tab on Airtable UI by pressing tools button](../../images/airtable/airtable_tools_button.png) -Search for Apify extenison and install it +Search for Apify extension and install it -![Search for the Apify extension on Airtable](../../images/airtable/airtable_search_apify_extenison.png) +![Search for the Apify extension on Airtable](../../images/airtable/airtable_search_apify_extension.png) Open the Apify extension and login using OAuth 2.0 with your Apify account. If you dont have an account, visit [Apify registration](https://console.apify.com/sign-up) page. diff --git a/sources/platform/integrations/images/airtable/airtable_search_apify_extenison.png b/sources/platform/integrations/images/airtable/airtable_search_apify_extension.png similarity index 100% rename from sources/platform/integrations/images/airtable/airtable_search_apify_extenison.png rename to sources/platform/integrations/images/airtable/airtable_search_apify_extension.png diff --git a/src/components/PlatformCard.jsx b/src/components/PlatformCard.jsx index a479db6e42..8bb4fcf114 100644 --- a/src/components/PlatformCard.jsx +++ b/src/components/PlatformCard.jsx @@ -21,7 +21,7 @@ const PlatformLink = ({ cardItem, href, isExternalLink }) => ( ); -const PlaftormCard = ({ title, items }) => { +const PlatformCard = ({ title, items }) => { return (

{title}

@@ -36,4 +36,4 @@ const PlaftormCard = ({ title, items }) => { ); }; -export default PlaftormCard; +export default PlatformCard; diff --git a/typos.toml b/typos.toml new file mode 100644 index 0000000000..9a36a2d6b6 --- /dev/null +++ b/typos.toml @@ -0,0 +1,40 @@ +# Configuration for typos spell checker +# https://github.com/crate-ci/typos + +[default] +extend-ignore-re = [ + "https?://[^\\s]+", # Ignore URLs +] + +[files] +# Extend the default exclude list +extend-exclude = [ + "*.lock", + "*.min.js", + "*.min.css", + "CHANGELOG.md", +] + +# Add project-specific identifiers that should not be treated as typos +[default.extend-identifiers] +# Actor ID example +vKg4IjxZbEYTYeW8T = "vKg4IjxZbEYTYeW8T" +# YouTube video ID +HV6OlMPn5sI = "HV6OlMPn5sI" +# Webhook ID example +pVJtoTelgYUq4qJOt = "pVJtoTelgYUq4qJOt" +# Facebook ID example +ZmVlZGJhY2s6MTA1NTQwMzA4MzM2ODM5NV8yMzg2MDgyOTg1MTIyNDUx = "ZmVlZGJhY2s6MTA1NTQwMzA4MzM2ODM5NV8yMzg2MDgyOTg1MTIyNDUx" +# Git hash +9ba8df137936 = "9ba8df137936" + +# Add project-specific words that should not be treated as typos +[default.extend-words] +# Vietnamese dish name "Bún bò Nam Bô" +Nam = "Nam" +# Search Engine Result (Page/Pages) +SER = "SER" +SERPs = "SERPs" +# Czech legal documents +dne = "dne" +tak = "tak"