Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep external repositories' statistics (i.e. client, tools and modules) up to date #1030

Closed
wants to merge 1 commit into from

Conversation

itamarhaber
Copy link
Member

@antirez plzlemmeknow your thoughts before I implement the changes to the ui in redis-io (and probably port the script to ruby). Possible triggers: manual, every push, daily...

Generated using this script:

import datetime
import requests
import pprint
import json
import re
import os

GITHUB_TOKEN = os.environ['GITHUB_TOKEN']
REDIS_DOC_PATH = os.environ['REDIS_DOC_PATH']

headers = {'Authorization': 'Bearer {}'.format(GITHUB_TOKEN)}

def run_query(query, variables):
    request = requests.post('https://api.github.com/graphql', json={'query': query, 'variables': variables}, headers=headers)
    if request.status_code == 200:
        return request.json()
    else:
        raise Exception("GitHub GrapQL query failed to run - status code: {}. {}".format(request.status_code, query))

def get_repo(owner, name, days=6*30):
    query = """
query ($owner: String!, $name: String!, $since: GitTimestamp!) {
  repository(owner: $owner, name: $name) {
    nameWithOwner
    description
    isArchived
    createdAt
    forkCount
    watchers {
      totalCount
    }
    stargazers {
      totalCount
    }
    openIssues: issues(states: OPEN) {
      totalCount
    }
    closedIssues: issues(states: CLOSED) {
      totalCount
    }
    openPullRequests: pullRequests(states: OPEN) {
      totalCount
    }
    closedPullRequests: pullRequests(states: CLOSED) {
      totalCount
    }
    mergedPullRequests: pullRequests(states: MERGED) {
      totalCount
    }
    licenseInfo {
      name
    }
    languages(first: 1, orderBy: {field: SIZE, direction: DESC}) {
      nodes {
        name
      }
    }
    periodCommits: defaultBranchRef {
      target {
        ... on Commit {
          history(since: $since) {
            totalCount
          }
        }
      }
    }
    lastCommit: defaultBranchRef {
      target {
        ... on Commit {
          history(first: 1) {
            edges {
              node {
                author {
                  date
                }
              }
            }
          }
        }
      }
    }
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}
    """
    since = datetime.datetime.today()-datetime.timedelta(days=days)
    variables = """
    {{
      "owner": "{}",
      "name": "{}",
      "since": "{}"
    }}
    """.format(owner, name, since.isoformat())

    reply = run_query(query, variables)
    repository = reply['data']['repository']
    if repository is None:
      return None
    return {
        'isArchived': repository['isArchived'],
        'createdAt': repository['createdAt'],
        'periodCommits': repository['periodCommits']['target']['history']['totalCount'],
        'committedAt': repository['lastCommit']['target']['history']['edges'][0]['node']['author']['date'],
        'fetchedAt': datetime.datetime.utcnow().replace(microsecond=0).isoformat(),
        'forks': repository['forkCount'],
        'watchers': repository['watchers']['totalCount'],
        'stargazers': repository['stargazers']['totalCount'],
        'openPullRequests': repository['openPullRequests']['totalCount'],
        'openIssues': repository['openIssues']['totalCount'],
    }

if __name__ == '__main__':
    jsons = [
      'clients.json',
      'tools.json',
      'modules.json'
    ]

    for jfile in jsons:
        try:
            open(jfile, 'r')
            run = False
        except FileNotFoundError:
            run = True

        if not run:
            continue

        with open('{}/{}'.format(REDIS_DOC_PATH, jfile)) as f:
            elems = json.load(f)

        ghpat = re.compile(r'^(https://github\.com/|git@github\.com:)(.*)/(.*)$')
        repos = list()
        for el in elems:
            if 'repository' in el:
                repository = str(el['repository'])
                mat = ghpat.search(repository)
                if mat:
                    stats = get_repo(mat.group(2), mat.group(3))
                    if stats is not None:
                        el['stats'] = stats
                        if 'active' in el:
                          el['active'] = stats['periodCommits'] > 0
                        if 'stars' in el:
                          el['stars'] = stats['stargazers']
                        print('touched {} {}'.format(jfile, repository))

        with open(jfile, 'w') as f:
            json.dump(elems, f, indent=4)

Signed-off-by: Itamar Haber [email protected]

Using this script:

```python
import datetime
import requests
import pprint
import json
import re
import os

GITHUB_TOKEN = os.environ['GITHUB_TOKEN']
REDIS_DOC_PATH = os.environ['REDIS_DOC_PATH']

headers = {'Authorization': 'Bearer {}'.format(GITHUB_TOKEN)}

def run_query(query, variables):
    request = requests.post('https://api.github.com/graphql', json={'query': query, 'variables': variables}, headers=headers)
    if request.status_code == 200:
        return request.json()
    else:
        raise Exception("GitHub GrapQL query failed to run - status code: {}. {}".format(request.status_code, query))

def get_repo(owner, name, days=6*30):
    query = """
query ($owner: String!, $name: String!, $since: GitTimestamp!) {
  repository(owner: $owner, name: $name) {
    nameWithOwner
    description
    isArchived
    createdAt
    forkCount
    watchers {
      totalCount
    }
    stargazers {
      totalCount
    }
    openIssues: issues(states: OPEN) {
      totalCount
    }
    closedIssues: issues(states: CLOSED) {
      totalCount
    }
    openPullRequests: pullRequests(states: OPEN) {
      totalCount
    }
    closedPullRequests: pullRequests(states: CLOSED) {
      totalCount
    }
    mergedPullRequests: pullRequests(states: MERGED) {
      totalCount
    }
    licenseInfo {
      name
    }
    languages(first: 1, orderBy: {field: SIZE, direction: DESC}) {
      nodes {
        name
      }
    }
    periodCommits: defaultBranchRef {
      target {
        ... on Commit {
          history(since: $since) {
            totalCount
          }
        }
      }
    }
    lastCommit: defaultBranchRef {
      target {
        ... on Commit {
          history(first: 1) {
            edges {
              node {
                author {
                  date
                }
              }
            }
          }
        }
      }
    }
  }
  rateLimit {
    limit
    cost
    remaining
    resetAt
  }
}
    """
    since = datetime.datetime.today()-datetime.timedelta(days=days)
    variables = """
    {{
      "owner": "{}",
      "name": "{}",
      "since": "{}"
    }}
    """.format(owner, name, since.isoformat())

    reply = run_query(query, variables)
    repository = reply['data']['repository']
    if repository is None:
      return None
    return {
        'isArchived': repository['isArchived'],
        'createdAt': repository['createdAt'],
        'periodCommits': repository['periodCommits']['target']['history']['totalCount'],
        'committedAt': repository['lastCommit']['target']['history']['edges'][0]['node']['author']['date'],
        'fetchedAt': datetime.datetime.utcnow().replace(microsecond=0).isoformat(),
        'forks': repository['forkCount'],
        'watchers': repository['watchers']['totalCount'],
        'stargazers': repository['stargazers']['totalCount'],
        'openPullRequests': repository['openPullRequests']['totalCount'],
        'openIssues': repository['openIssues']['totalCount'],
    }

if __name__ == '__main__':
    jsons = [
      'clients.json',
      'tools.json',
      'modules.json'
    ]

    for jfile in jsons:
        try:
            open(jfile, 'r')
            run = False
        except FileNotFoundError:
            run = True

        if not run:
            continue

        with open('{}/{}'.format(REDIS_DOC_PATH, jfile)) as f:
            elems = json.load(f)

        ghpat = re.compile(r'^(https://github\.com/|git@github\.com:)(.*)/(.*)$')
        repos = list()
        for el in elems:
            if 'repository' in el:
                repository = str(el['repository'])
                mat = ghpat.search(repository)
                if mat:
                    stats = get_repo(mat.group(2), mat.group(3))
                    if stats is not None:
                        el['stats'] = stats
                        if 'active' in el:
                          el['active'] = stats['periodCommits'] > 0
                        if 'stars' in el:
                          el['stars'] = stats['stargazers']
                        print('touched {} {}'.format(jfile, repository))

        with open(jfile, 'w') as f:
            json.dump(elems, f, indent=4)

```

Signed-off-by: Itamar Haber <[email protected]>
@f0rmiga
Copy link
Contributor

f0rmiga commented Oct 17, 2019

@itamarhaber A suggestion: run it daily using GitHub Actions. With GitHub Actions, we can open a PR with the changes, or directly push to master if we trust in ourselves with the generated code. :)

@itamarhaber
Copy link
Member Author

That's a great idea for the daily update - thanks!

@zuiderkwast
Copy link
Contributor

With GitHub Actions, we can open a PR with the changes, or directly push to master if we trust in ourselves with the generated code.

The commit history would grow very fast in this way.

The stars don't need to be committed to the repo. Why not let the website itself fetch the data and cache it in Redis with a TTL of 24 hours or so? Alternatively, let them be updated when redis.io is deployed.

@huangzhw
Copy link
Collaborator

With GitHub Actions, we can open a PR with the changes, or directly push to master if we trust in ourselves with the generated code.

The commit history would grow very fast in this way.

The stars don't need to be committed to the repo. Why not let the website itself fetch the data and cache it in Redis with a TTL of 24 hours or so? Alternatively, let them be updated when redis.io is deployed.

Agree, fetching data is better.

@zuiderkwast
Copy link
Contributor

Agree, fetching data is better.

Great @huangz1990. I think the fetching from GitHub API and caching could go into app.rb in the redis-io repo. Here's a related PR: redis/redis-io#230.

@madolson madolson added the to-be-closed should probably be dismissed sooner or later label Jun 21, 2021
@madolson
Copy link
Contributor

100% agree with @zuiderkwast, let's focus on the other PR.

@nermiller
Copy link
Contributor

@zuiderkwast, ok to close this PR?

@zuiderkwast
Copy link
Contributor

Yeah this is built into new website, right? I don't know enough about the new website since I was not involved much.

@madolson
Copy link
Contributor

@itamarhaber When are we open sourcing the new website?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to-be-closed should probably be dismissed sooner or later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants