Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/workflows/spell-check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Spell check

on:
pull_request:
push:
branches:
- master

jobs:
spell_check:
name: Spell check
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Check spelling with typos
uses: crate-ci/typos@v1.42.0
2 changes: 1 addition & 1 deletion examples/ts-parallel-scraping/orchestrator/src/main.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ if (state.isInitialized) {
const runClient = apifyClient.run(runId);
const run = await runClient.get();

// This should happen only if the run was deleted or the state was incorectly saved.
// This should happen only if the run was deleted or the state was incorrectly saved.
if (!run) throw await Actor.fail(`The run ${runId} from state does not exists.`);

if (run.status === 'RUNNING') {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ The new Actor should take the following input values, which be mapped to paramet
"useClient": false,

// The fields in each item to return back. All other
// fields should be ommitted
// fields should be omitted
"fields": ["title", "itemUrl", "offer"],

// The maximum number of items to return back
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ page.on('response', async (response) => {
try {
buffer = await response.buffer();
} catch (error) {
// some responses do not contain buffer and do not need to be catched
// some responses do not contain buffer and do not need to be caught
return;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ slug: /node-js/filter-blocked-requests-using-sessions

_This article explains how the problem was solved before the [SessionPool](/sdk/js/docs/api/session-pool) class was added into [Apify SDK](/sdk/js/). We are keeping the article here as it might be interesting for people who want to see how to work with sessions on a lower level. For any practical usage of sessions, follow the documentation and examples of SessionPool._

### Overview of the problem
## Overview of the problem

You want to crawl a website with a proxy pool, but most of your proxies are blocked. It's a very common situation. Proxies can be blocked for many reasons:

Expand All @@ -25,7 +25,7 @@ Nobody can make sure that a proxy will work infinitely. The only real solution t

However, usually, at least some of our proxies work. To crawl successfully, it is therefore imperative to handle blocked requests properly. You first need to discover that you are blocked, which usually means that either your request returned status greater or equal to 400 (it didn't return the proper response) or that the page displayed a captcha. To ensure that this bad request is retried, you usually throw an error and it gets automatically retried later (our [SDK](/sdk/js/) handles this for you). Check out [this article](https://docs.apify.com/academy/node-js/handle-blocked-requests-puppeteer) as inspiration for how to handle this situation with `PuppeteerCrawler` class.

### Solution
## Solution

Now we are able to retry bad requests and eventually unless all of our proxies get banned, we should be able to successfully crawl what we want. The problem is that it takes too long and our log is full of errors. Fortunately, we can overcome this with [proxy sessions](/platform/proxy/datacenter-proxy#username-parameters) (look at the proxy and SDK documentation for how to use them in your Actors.)

Expand All @@ -50,7 +50,7 @@ Apify.main(async () => {
});
```

### Algorithm
## Algorithm

You don't necessarily need to understand the solution below - it should be fine to copy/paste it to your Actor.

Expand Down Expand Up @@ -121,7 +121,7 @@ const pickSession = (sessions, maxSessions = 100) => {
};
```

### Puppeteer example
## Puppeteer example

We then use this function whenever we want to get the session for our request. Here is an example of how we would use it for bare bones Puppeteer (for example as a part of `BasicCrawler` class).

Expand All @@ -142,11 +142,11 @@ After success:
After failure (captcha, blocked request, etc.):
`delete sessions[session.name]`

### PuppeteerCrawler example
## PuppeteerCrawler example

Now you might start to wonder, "I have already prepared an Actor using PuppeteerCrawler, can I make it work there?". The problem is that with PuppeteerCrawler we don't have everything nicely inside one function scope like when using pure Puppeteer or BasicCrawler. Fortunately, there is a little hack that enables passing the session name to where we need it.

First we define `lauchPuppeteerFunction` which tells the crawler how to create new browser instances and we pass the picked session there.
First we define `launchPuppeteerFunction` which tells the crawler how to create new browser instances and we pass the picked session there.

```js
const crawler = new Apify.PuppeteerCrawler({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ if (state.isInitialized) {
const runClient = apifyClient.run(runId);
const run = await runClient.get();

// This should happen if the run was deleted or the state was incorectly saved.
// This should happen if the run was deleted or the state was incorrectly saved.
if (!run) throw new Error(`The run ${runId} from state does not exists.`);

if (run.status === 'RUNNING') {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ $ node index.js

## Saving data as JSON

The JSON format is popular primarily among developers. We use it for storing data, configuration files, or as a way to transfer data between programs (e.g., APIs). Its origin stems from the syntax of JavaScript objects, but people now use it accross programming languages.
The JSON format is popular primarily among developers. We use it for storing data, configuration files, or as a way to transfer data between programs (e.g., APIs). Its origin stems from the syntax of JavaScript objects, but people now use it across programming languages.

We'll begin with importing the `writeFile` function from the Node.js standard library, so that we can, well, write files:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ We are authorized to unpublish and/or delete such an Actor, in our sole discreti

1. "**Monthly Rental**" which means that each User of your Actor will pay a flat monthly rental fee for use of that Actor. You will set the price as X USD per month;
2. "**Pay per Result**" which means that each User of your Actor will pay a fee calculated according to the number of results of each run of that Actor. You will set the price as X USD per 1,000 results. In this model, the Users do not pay for the Platform usage; or
3. "**Pay per Event**" which allows you to programatically charge for events in your Actor source code. You need to pre-define the events first when setting the Actor pricing and configure whether Users pay for the Platform usage or not.
3. "**Pay per Event**" which allows you to programmatically charge for events in your Actor source code. You need to pre-define the events first when setting the Actor pricing and configure whether Users pay for the Platform usage or not.

11.2. If you set your Actor as monetized, you will be entitled to receive remuneration calculated as follows:

Expand Down
3 changes: 1 addition & 2 deletions sources/platform/integrations/ai/langchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,8 +128,7 @@ After running the code, you should see the following output:

```text
answer: LangChain is a framework designed for developing applications powered by large language models (LLMs). It simplifies the
entire application lifecycle, from development to productionization and deployment. LangChain provides open-source components a
nd integrates with various third-party tools, making it easier to build and optimize applications using language models.
entire application lifecycle, from development to productionization and deployment. LangChain provides open-source components and integrates with various third-party tools, making it easier to build and optimize applications using language models.

source: https://python.langchain.com/docs/get_started/introduction
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ Go to [Airtable](https://airtable.com) and open the base you would like to work
![Access the extensions tab on Airtable UI by pressing tools button](../../images/airtable/airtable_tools_button.png)

<!-- TODO: improve pictures when Apify integration is published -->
Search for Apify extenison and install it
Search for Apify extension and install it

![Search for the Apify extension on Airtable](../../images/airtable/airtable_search_apify_extenison.png)
![Search for the Apify extension on Airtable](../../images/airtable/airtable_search_apify_extension.png)

Open the Apify extension and login using OAuth 2.0 with your Apify account. If you dont have an account, visit [Apify registration](https://console.apify.com/sign-up) page.

Expand Down
4 changes: 2 additions & 2 deletions src/components/PlatformCard.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ const PlatformLink = ({ cardItem, href, isExternalLink }) => (
</Link>
);

const PlaftormCard = ({ title, items }) => {
const PlatformCard = ({ title, items }) => {
return (
<div className={styles.card}>
<h4 className={styles['card-header']}>{title}</h4>
Expand All @@ -36,4 +36,4 @@ const PlaftormCard = ({ title, items }) => {
);
};

export default PlaftormCard;
export default PlatformCard;
40 changes: 40 additions & 0 deletions typos.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Configuration for typos spell checker
# https://github.com/crate-ci/typos

[default]
extend-ignore-re = [
"https?://[^\\s]+", # Ignore URLs
]

[files]
# Extend the default exclude list
extend-exclude = [
"*.lock",
"*.min.js",
"*.min.css",
"CHANGELOG.md",
]

# Add project-specific identifiers that should not be treated as typos
[default.extend-identifiers]
# Actor ID example
vKg4IjxZbEYTYeW8T = "vKg4IjxZbEYTYeW8T"
# YouTube video ID
HV6OlMPn5sI = "HV6OlMPn5sI"
# Webhook ID example
pVJtoTelgYUq4qJOt = "pVJtoTelgYUq4qJOt"
# Facebook ID example
ZmVlZGJhY2s6MTA1NTQwMzA4MzM2ODM5NV8yMzg2MDgyOTg1MTIyNDUx = "ZmVlZGJhY2s6MTA1NTQwMzA4MzM2ODM5NV8yMzg2MDgyOTg1MTIyNDUx"
# Git hash
9ba8df137936 = "9ba8df137936"

# Add project-specific words that should not be treated as typos
[default.extend-words]
# Vietnamese dish name "Bún bò Nam Bô"
Nam = "Nam"
# Search Engine Result (Page/Pages)
SER = "SER"
SERPs = "SERPs"
# Czech legal documents
dne = "dne"
tak = "tak"
Loading