Fink

Fink (pronounced "Phpink") is a command line tool, written in PHP, for checking HTTP links.

Check websites for broken links or error pages.
Asynchronous HTTP requests.

Installation

Install as a stand-alone tool or as a project dependency:

Installing as a project dependency

$ composer require dantleech/fink --dev

Installing from a PHAR

Download the PHAR from the Releases page.

Building your own PHAR with Box

You can build your own PHAR by cloning this repository and running:

$ ./vendor/bin/box compile

Usage

Run the command with a single URL to start crawling:

$ ./vendor/bin/fink https://www.example.com

Use --output=somefile to log verbose information for each URL in JSON format, including:

url: The tested URL.
status: The HTTP status code.
referrer: The page which linked to the URL.
referrer_title: The value (e.g. link title) of the referring element.
referrer_xpath: The path to the node in the referring document.
distance: The number of links away from the start document.
request_time: Number of microseconds taken to make the request.
timestamp: The time that the request was made.
exception: Any runtime exception encountered (e.g. malformed URL, etc).

Arguments

url (multiple) Specify one or more base URLs to crawl (mandatory).

Options

--client-max-body-size: Max body size for HTTP client (in bytes).
--client-max-header-size: Max header size for HTTP client (in bytes).
--client-redirects=5: Set the maximum number of times the client should redirect (0 to never redirect).
--client-security-level=1: Set the default SSL security level
--client-timeout=15000: Set the maximum amount of time (in milliseconds) the client should wait for a response, defaults to 15,000 (15 seconds).
--concurrency: Number of simultaneous HTTP requests to use.
--display-bufsize=10: Set the number of URLs to consider when showing the display.
--display=+memory: Set, add or remove elements of the runtime display (prefix with - or + to modify the default set).
--exclude-url=logout: (multiple) Exclude URLs matching the given PCRE pattern.
--header="Foo: Bar": (multiple) Specify custom header(s).
--help: Display available options.
--include-link=foobar.html: Include given link as if it were linked from the base URL.
--insecure: Do not verify SSL certificates.
--load-cookies: Load from a cookies.txt.
--max-distance: Maximum allowed distance from base URL (if not specified then there is no limitation).
--max-external-distance: Limit the external (disjoint) distance from the base URL.
--no-dedupe: Do not filter duplicate URLs (can result in a non-terminating process).
--output=out.json: Output JSON report for each URL to given file (truncates existing content).
--publisher=csv: Set the publisher (defaults to json) can be either json or csv.
--rate: Set a maximum number of requests to make in a second.
--stdout: Stream to STDOUT directly, disables display and any specified outfile.

Examples

Crawl a single website

$ fink http://www.example.com --max-external-distance=0

Crawl a single website and check the status of external links

$ fink http://www.example.com --max-external-distance=1

Use `jq` to analyse results

jq is a tool which can be used to query and manipulate JSON data.

$ fink http://www.example.com -x0 -oreport.json

$ cat report.json| jq -c '. | select(.status==404) | {url: .url, referrer: .referrer}' | jq

Crawl pages behind a login

# create a cookies file for later re-use (simulate a login in this case via HTTP-POST)
$ curl -L --cookie-jar mycookies.txt -d username=myLogin -d password=MyP4ssw0rd https://www.example.org/my/login/url

# re-use the cookies file with your fink crawl command
$ fink https://www.example.org/myaccount --load-cookies=mycookies.txt

note: its not possible to create the cookie jar on computer A, store it and read it in again on e.g. a linux server. you need to create the cookie file from the very same ip, because otherwise server side session handling might not continue the http-session because of a IP mismatch

Exit Codes

0: All URLs were successful.
1: Unexpected runtime error.
2: At least one URL failed to resolve successfully.

Name	Name	Last commit message	Last commit date
Latest commit dantleech Fixing the box build Mar 16, 2024 8cf5352 · Mar 16, 2024 History 299 Commits
.github/workflows	.github/workflows	Fixing the box build	Mar 16, 2024
bin	bin	set command name to fink #61	Apr 7, 2019
lib	lib	Update phpstan and run 8.1/2 in CI	Mar 16, 2024
tests	tests	Upgrade phpunit	Jan 28, 2021
.gitignore	.gitignore	Upgradfe php-cs-fxier	Mar 16, 2024
.php-cs-fixer.dist.php	.php-cs-fixer.dist.php	Upgradfe php-cs-fxier	Mar 16, 2024
CHANGELOG.md	CHANGELOG.md	Bump changelog	Mar 16, 2024
Dockerfile	Dockerfile	adds dockerfile with libevent	Feb 15, 2019
LICENSE	LICENSE	initial	Jan 15, 2019
README.md	README.md	Remove redundant apostrophes.	Aug 9, 2021
box.json	box.json	Include dat files in PHAR compiling	Mar 15, 2024
composer.json	composer.json	Bummp min version to 8.0	Mar 16, 2024
phpstan-baseline.neon	phpstan-baseline.neon	Update phpstan and run 8.1/2 in CI	Mar 16, 2024
phpstan.neon	phpstan.neon	Ibcludes	Jan 28, 2021
phpunit.xml.dist	phpunit.xml.dist	initial	Jan 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fink

Installation

Installing as a project dependency

Installing from a PHAR

Building your own PHAR with Box

Usage

Arguments

Options

Examples

Crawl a single website

Crawl a single website and check the status of external links

Use `jq` to analyse results

Crawl pages behind a login

Exit Codes

About

Releases 14

Packages

Contributors 15

Languages

License

dantleech/fink

Folders and files

Latest commit

History

Repository files navigation

Fink

Installation

Installing as a project dependency

Installing from a PHAR

Building your own PHAR with Box

Usage

Arguments

Options

Examples

Crawl a single website

Crawl a single website and check the status of external links

Use jq to analyse results

Crawl pages behind a login

Exit Codes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Packages 0

Contributors 15

Languages

Use `jq` to analyse results

Packages