Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using -v should output the status code of the url #248

Closed
yuzhe-Mortal opened this issue Dec 22, 2022 · 3 comments
Closed

Using -v should output the status code of the url #248

yuzhe-Mortal opened this issue Dec 22, 2022 · 3 comments
Assignees
Labels
Status: Abandoned This issue is no longer important to the requestor and no one else has shown an interest in it. Type: Enhancement Most issues will probably ask for additions or changes. Type: Question A query or seeking clarification on parts of the spec. Probably doesn't need the attention of all.

Comments

@yuzhe-Mortal
Copy link
Contributor

yuzhe-Mortal commented Dec 22, 2022

image

image

func (c *Crawler) getRequest(ctx context.Context, request navigation.Request, rootHostname string, depth int, httpclient *retryablehttp.Client) (navigation.Response, error) {
	response := navigation.Response{
		Depth:        request.Depth + 1,
		Options:      c.options,
		RootHostname: rootHostname,
	}
	ctx = context.WithValue(ctx, navigation.Depth{}, depth)
	httpReq, err := http.NewRequestWithContext(ctx, request.Method, request.URL, nil)
	if err != nil {
		return response, err
	}
	if request.Body != "" && request.Method != "GET" {
		httpReq.Body = io.NopCloser(strings.NewReader(request.Body))
	}
	req, err := retryablehttp.FromRequest(httpReq)
	if err != nil {
		return response, err
	}
	req.Header.Set("User-Agent", utils.WebUserAgent())

	for k, v := range request.Headers {
		req.Header.Set(k, v)
	}
	for k, v := range c.headers {
		req.Header.Set(k, v)
	}
	resp, err := httpclient.Do(req)
	if resp != nil {
		defer func() {
			if resp.Body != nil && resp.StatusCode != http.StatusSwitchingProtocols {
				_, _ = io.CopyN(io.Discard, resp.Body, 8*1024)
			}
			_ = resp.Body.Close()
		}()
	}
	if err != nil {
		return response, err
	}
	if resp.StatusCode == http.StatusSwitchingProtocols {
		return response, nil
	}
	limitReader := io.LimitReader(resp.Body, int64(c.options.Options.BodyReadSize))
	data, err := io.ReadAll(limitReader)
	if err != nil {
		return response, err
	}

	response.Body = data
	response.Resp = resp
	response.Reader, err = goquery.NewDocumentFromReader(bytes.NewReader(data))
	if err != nil {
		return response, errors.Wrap(err, "could not make document from reader")
	}
	return response, nil
}
@yuzhe-Mortal yuzhe-Mortal added the Type: Enhancement Most issues will probably ask for additions or changes. label Dec 22, 2022
@Mzack9999 Mzack9999 self-assigned this Jan 8, 2023
@Mzack9999
Copy link
Member

@yuzhe-Mortal Probably, all the URLs listed as output will have a status code equal to 200. Did you have any particular use case to filter responses based on status code or response body length?

@Mzack9999 Mzack9999 added the Type: Question A query or seeking clarification on parts of the spec. Probably doesn't need the attention of all. label Jan 8, 2023
@yuzhe-Mortal
Copy link
Contributor Author

yuzhe-Mortal commented Jan 9, 2023

The status codes of some links may not be 200, or they may be 500, 404, or 403, because some links do not have permission to access them, or some links do not exist

@Mzack9999
Copy link
Member

After review, I doubt this can be supported at the actual time as the tool output is extracted from previously received responses (not yet sent out/future outgoing requests/promises). This might make more sense after implementing #174, where the mixture of requests/responses will determine unique web statuses.

@Mzack9999 Mzack9999 added Status: Abandoned This issue is no longer important to the requestor and no one else has shown an interest in it. and removed Status: Abandoned This issue is no longer important to the requestor and no one else has shown an interest in it. labels Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Abandoned This issue is no longer important to the requestor and no one else has shown an interest in it. Type: Enhancement Most issues will probably ask for additions or changes. Type: Question A query or seeking clarification on parts of the spec. Probably doesn't need the attention of all.
Projects
None yet
Development

No branches or pull requests

3 participants