Skip to content

Commit 5fdd152

Browse files
author
Linh Vo
committed
fix reviews
1 parent 1a4ffdd commit 5fdd152

File tree

2 files changed

+13
-13
lines changed

2 files changed

+13
-13
lines changed

README.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ The sample dataset contains stargazer and language data for Github projects whic
1111
* `language.txt`: Language name to languageID mapping. The line number corresponds to the languageID.
1212
* `language.csv`: languageID, projectID
1313
* `stargazer.csv`: stargazerID, projectID, timestamp(starred)
14-
* `input_definition.json`: input definition in json format
15-
* `json_input.json`: data in json format with repo_id, language_id, stargazer_id defined in input_defintion.txt
14+
* `input_definition.json`: input definition in JSON format
15+
* `json_input.json`: data in JSON format with repo_id, language_id, stargazer_id defined in input_defintion.json
1616
## Usage
1717

1818
1. Pilosa server should be running: [Starting Pilosa](https://www.pilosa.com/docs/getting-started/#starting-pilosa)
@@ -21,7 +21,7 @@ The sample dataset contains stargazer and language data for Github projects whic
2121

2222
## Sample Projects
2323

24-
* [Python](https://github.com/pilosa/getting-started/python)
24+
* [Python](https://github.com/pilosa/getting-started/tree/master/python)
2525

2626
## Generating the Dataset
2727
Using a Github token is strongly recommended for avoiding throttling. If you don't already have a token for the [GitHub API](https://developer.github.com/v3/), see [Creating a personal access token for the command line](https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/).
@@ -42,18 +42,18 @@ Below are the steps to run commands:
4242

4343
#### To generate csv files:
4444

45-
`fetch.py` script searches Github for a given keyword, and creates the dataset explained in *The Dataset* section.
45+
The `fetch.py` script searches Github for a given keyword and creates the dataset explained in *The Dataset* section.
4646

47-
Run the script: `python fetch.py KEYWORD`
48-
`KEYWORD` is repository's name for searching
47+
Run the script: `python fetch.py KEYWORD`.
48+
`KEYWORD` is the search term to use for searching repository names.
4949

5050
#### To generate input_defintion and json_input files:
5151

52-
`build_definition.py` script build JSON format input-defintion, using `language.txt` to map language string to id
52+
The `build_definition.py` script builds a JSON formatted input-defintion. It uses `language.txt` to map a language string to an id.
5353

54-
Run the script: `python build_definition.py <input_definition.json>`
55-
If `input_definitioon.json` isn't set, print out JSON input-definition
54+
Run the script: `python build_definition.py <input_definition.json>`.
55+
If `input_definition.json` isn't set, print out JSON input-definition.
5656

57-
`build_json_input.json` script searches Github for a given keyword then create json data set to import data that adheres to that `input_definition.json`
58-
Run the script: `python build_json_input.py <reository_name>`
57+
`build_json_input.py` script searches Github for a given keyword and creates a JSON data set that adheres to `input_definition.json`.
58+
Run the script: `python build_json_input.py <reository_name>`.
5959

fetch.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
21
import os
32
import sys
43
from github import Github
54

5+
66
TIME_FORMAT = "%Y-%m-%dT%H:%S"
77

88

@@ -79,7 +79,7 @@ def _add_or_get(cls, external_id, store):
7979

8080
def main():
8181
if len(sys.argv) != 2:
82-
print("Usage: python fetch.py keyword", file=sys.stderr)
82+
print("Usage: python fetch.py keyword")
8383
sys.exit(1)
8484

8585
if os.path.exists("token"):

0 commit comments

Comments
 (0)