Skip to content

nepsho/webthief

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebThief

Build Status npm version npm license npm repository npm author

Promise and Callback based website-info getter using meta data of websites.

Features

  • Get any web page source code with webthief
  • Get any website logo, title and description
  • Support modren metatag scraping
  • Fully promise and callback based
  • Support with ES6 async/await
  • Support multiple metatag scraping

Support

ES5 ES6 Callback Promise async/await

Installing

$ npm install webthief

Some Basic Meta Tags in HTML

<meta name="description" content="Website info api"/>
<meta name="keywords" content="webthief, api, nodejs, python"/>
<meta name="subject" content="website subject">
<meta name="copyright" content="nepsho">
<meta name="language" content="en">
<meta name="robots" content="index,follow" />
<meta name="revised" content="Saturday, May 9th, 2019, 0:00 am" />
<meta name="abstract" content="any abstract">
<meta name="topic" content="any topic">
<meta name="summary" content="any summary">
<meta name="author" content="bcrazydreamer, [email protected]">
<meta name="designer" content="bcrazydreamer">
<meta name="reply-to" content="[email protected]">
<meta name="url" content="https://nepsho.github.io/">
<meta name="category" content="any category">

Some OpenGraph Meta Tags in HTML

<meta name="og:title" content="webthief"/>
<meta name="og:type" content="API"/>
<meta name="og:url" content="https://nepsho.github.io/"/>
<meta name="og:image" content="https://nepsho.github.io/lib/img/logo.png"/>
<meta name="og:email" content="[email protected]"/>
<meta name="og:phone_number" content="123-456-7890"/>

Supported meta fields by webthief

S. No a b c d
1 logo description title keywords
2 copyright language robots revised
3 reply-to topic summary author
4 country-name url category site_name
5 phone_number

Examples

const webthief = require("webthief");

To get html of any webpage:

/* Callback method */
webthief.getHtml("https://nepsho.github.io/example/meta_tags.html",(data)=>{
    console.log(data);
})

/* Promise method */
webthief.getHtml("https://nepsho.github.io/example/meta_tags.html").then(function(data) {
	console.log(data);
}).catch(function(error) {
	console.log(error);
});

/* async/await method */
async function demo(){
    var result = await webthief.getHtml("https://nepsho.github.io/example/meta_tags.html");
    console.log(result);
} 

/* Sample output 
    { 
        url : 'https://nepsho.github.io/example/meta_tags.html'
        status : 200,
        success : true,
        html : "<html></html>"
    }
*/

To get meta of any webpage: for meta request a option is required which control and specify the desired output.

var option = {
    fields: ["logo","description","title"] /*fields you want*/
};

or

var option = {
    fields: ["*"] /*for all supported field*/
};
/* Callback method */
webthief.getMeta("https://nepsho.github.io/example/meta_tags.html",option,(data)=>{
    console.log(data);
})

/* Promise method */
webthief.getMeta("https://nepsho.github.io/example/meta_tags.html",option).then(function(data){
    console.log(data)
}).catch(function(error) {
	console.log(error);
});

/* async/await method */
async function demo(){
    var result = await webthief.getMeta("https://nepsho.github.io/example/meta_tags.html",option);
    console.log(result);
} 

/* Sample output 
    {
    	success: true,
	response: {
		logo : "https://nepsho.github.io/lib/img/logo.png",
        	title : "NepSho",
        	description : "Promise and callback based website-info getter using metadata of websites..."
	}
    }
*/

To get images from webpage:

/* Callback method */
webthief.getSiteImages("https://nepsho.github.io/example/meta_tags.html",(data)=>{
    console.log(data);
})

/* Promise method */
webthief.getSiteImages("https://nepsho.github.io/example/meta_tags.html").then(function(data) {
	console.log(data);
}).catch(function(error) {
	console.log(error);
});

/* async/await method */
async function demo(){
    var result = await webthief.getSiteImages("https://nepsho.github.io/example/meta_tags.html");
    console.log(result);
} 

/* Sample output 
    {
    	success: true,
	response: [ArrayOfImages]
    }
*/

Error callback data (In case any error):

//Error return object type
{
    success: false,
    error: "ErrorType",
    detail: "detail message of error"
}

In case of empty option then a default option is automatically set which contain logo, title and description. In this API both core function is designed in such way we can user as promise and as callback.

CLI Usage

$ npm install webthief -g

Valid Fields: [meta|getmata], [html|gethtml], [images|getsiteimages] (These options used for cli)

$ webthief [-method-] [-input-] [-option-]

method:

  • Get HTML
    • html | gethtml
  • Get Meta
    • meta | getmeta
  • Get Images
    • images | getsiteimages

input: Basically a valid url.

option: Option parameter basically -d for download html files and images.

CLI Examples

$ webthief html https://nepsho.github.io/example/meta_tags.html
or to download page also
$ webthief html https://nepsho.github.io/example/meta_tags.html -d
$ webthief meta https://nepsho.github.io/example/meta_tags.html
$ webthief images https://nepsho.github.io/example/meta_tags.html
or to download images also
$ webthief images https://nepsho.github.io/example/meta_tags.html -d

licence

MIT licence

Author

@BCrazyDreamer

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published