-
Notifications
You must be signed in to change notification settings - Fork 0
/
README_Spider
132 lines (82 loc) · 2.55 KB
/
README_Spider
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
NAME
Spider - web spider searching for keywords
SYNOPSIS
use Spider;
my $spider = Spider->new(
output_file => $opened_filehandle,
links => \%links,
keywords => \@keywords,
allowed_keywords => \%allowed_keywords,
debug_enabled => 1,
web_depth => 5,
);
DESCRIPTION
Spider is web spider, which spiders links, and matches their content against keywords.
Keyword trigger ALERT to output_file.
Allowed keywords do not trigger ALERT.
Websites are defined by 'want_spider' parameter in the links hash.
The are spidered to 'web_depth' (default 3), and links in their content are added to links hash.
Other links are just checked for keywords, no spidering.
ARGUMENTS
output_file
opened file handle
keywords
array of keywords you want to find
allowed_keywords
hash of keywords which do not trigger ALERT. Like:
my %allowed_keywords = (
wuord1 => 1,
);
links
websites and referer urls you want to spider. Like:
my %links = (
'http://website.sk' => {
'want_spider' => 1,
'depth' => 0,
},
'http://referer.sk' => {
'depth' => 0,
},
);
note, that links hash is changed, when running the spider
debug_enabled
prints debug messages to standard output
web_depth
depth to which website will be scanned. Default is 3.
METHODS
spider_links
main method
settle_website WEBSITE
makes necessary settings to spider website
spider_website
scans website according to settings
check_website
checks if url's content matches keywords
add_links_from_root
add links in url's content to links hash
debug
if debug enabled, prints string to standard output
SAMPLE OUTPUT
SPIDER http://domain.sk
this IS NOT counted as alerted
----------------------------------------------------------------------
SPIDER LINKS
SPIDER http://trololo.sk
ERROR:404 Not Found
this IS NOT counted as alerted
SPIDER LINKS
SPIDER http://domain.sk/old.html
possible bad content http://domain.sk/old.html word2
found keywords: 1
fetching http://domain.sk/new.html
ALERT possible bad content http://domain.sk/new.html wuord1 word2
found keywords: 2
fetching http://domain.sk/lala.txt
SKIPPING because of content type or length
SPIDER http://domain.sk
this IS counted as alerted
SEE ALSO
KeywordsSpider -- takes files as arguments and prepares attributes for Spider
COPYRIGHT
Copyright 2013 katkad
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.