Skip to content

Commit 029c06e

Browse files
VPerrollazericmjl
andauthoredMay 15, 2020
[DOC] Pep8 the notebooks and markdown files in examples #415 (#655)
* Pep8 convert_currency.md * Pep8 filter_date.md * Pep8 make_currency_column_numeric.md * Pep8 round_to_fraction.md * Pep8 row_to_names.md * Pep8 then.md * Pep8 cells of examples/notebooks/anime.ipynb * Pep8 examples/notebooks/bad_values.ipynb * Pep8 the cells of examples/notebooks/bird_call.ipynb * Pep8 cells of examples/notebooks/board_games.ipynb * Pep8 the cells of examples/notebooks/dirty_data.ipynb * Pep8 the cells of examples/notebooks/french_trains.ipynb * Pep8 the cells of examples/notebook/groupby_agg.ipynb * Pep8 the cells of examples/notebooksinflating_converting_currency.ipynb * Pep8 the cells of examples/notebooks/medium_franchise.ipynb * Pep8 the cells of examples/notebooks/normalize.ipynb * Pep8 the cells and code in markdown of pyjanitor_intro.ipynb * Pep8 Row_to_Names.ipynb * Pep8 the cells of sort_naturally.ipynb * Pep8 the cells of teacher_pupil.ipynb * Final pep8 commit - Added author - Added change to CHANGELOG - Went over all notebooks with the dev version Co-authored-by: Eric Ma <[email protected]>
1 parent 56c6f48 commit 029c06e

23 files changed

+2379
-751
lines changed
 

‎AUTHORS.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -87,4 +87,5 @@ Contributors
8787
- `@DollofCuty <https://github.com/DollofCuty>`_ | `contributions <https://github.com/ericmjl/pyjanitor/pulls?utf8=%E2%9C%93&q=is%3Aclosed+mentions%3ADollofCuty>`_
8888
- `@bdice <https://github.com/bdice>`_ | `contributions <https://github.com/ericmjl/pyjanitor/pulls?utf8=%E2%9C%93&q=is%3Aclosed+mentions%3Abdice>`_
8989
- `@evan-anderson <https://github.com/evan-anderson>`_ | `contributions <https://github.com/ericmjl/pyjanitor/pulls?utf8=%E2%9C%93&q=is%3Aclosed+mentions%3evan-anderson>`_
90-
`@smu095 <https://github.com/smu095>`_ | `contributions <https://github.com/ericmjl/pyjanitor/issues?q=is%3Aclosed+mentions%3smu095>`_
90+
- `@smu095 <https://github.com/smu095>`_ | `contributions <https://github.com/ericmjl/pyjanitor/issues?q=is%3Aclosed+mentions%3smu095>`_
91+
- `@VPerrollaz <https://github.com/VPerrollaz>`_ | `contributions <https://github.com/ericmjl/pyjanitor/issues?q=is%3Aclosed+mentions%3AVPerrollaz>`_

‎CHANGELOG.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
new version (on deck)
22
=====================
3+
- [DOC] pep8 all examples. @VPerrollaz
34
- [TST]: Add docstrings to tests @hectormz
45
- [INF]: Add ``debug-statements``, ``requirements-txt-fixer``, and ``interrogate`` to ``pre-commit``. @hectormz
56
- [ENH]: Upgraded transform_column to use df.assign underneath the hood,

‎examples/convert_currency.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,12 @@ data_dict = {
4242
```python
4343
example_dataframe = pd.DataFrame(data_dict)
4444

45-
example_dataframe.convert_currency('a', from_currency='USD', to_currency='EUR', historical_date=date(2018,1,1))
45+
example_dataframe.convert_currency(
46+
'a',
47+
from_currency='USD',
48+
to_currency='EUR',
49+
historical_date=date(2018, 1, 1)
50+
)
4651
```
4752

4853
### Output

‎examples/filter_date.md

+2-6
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,7 @@ date_list = [
6666
[26, "03/11/20"],
6767
[27, "03/12/20"]]
6868

69-
example_dataframe = pd.DataFrame(date_list, columns = ['AMOUNT', 'DATE'])
70-
69+
example_dataframe = pd.DataFrame(date_list, columns=['AMOUNT', 'DATE'])
7170
```
7271

7372
## Example 1: Filter dataframe between two dates
@@ -102,7 +101,6 @@ example_dataframe.filter_date('DATE', end=end, format=format)
102101
## Example 3: Filtering by year
103102

104103
```python
105-
106104
years = [2019]
107105

108106
example_dataframe.filter_date('DATE', years=years)
@@ -125,7 +123,6 @@ example_dataframe.filter_date('DATE', years=years)
125123
## Example 4: Filtering by year and month
126124

127125
```python
128-
129126
years = [2020]
130127
months = [3]
131128

@@ -144,9 +141,8 @@ example_dataframe.filter_date('DATE', years=years, months=months)
144141
## Example 5: Filtering by year and day
145142

146143
```python
147-
148144
years = [2020]
149-
days = range(10,12)
145+
days = range(10, 12)
150146

151147
example_dataframe.filter_date('DATE', years=years, days=days)
152148
```

‎examples/make_currency_column_numeric.md

+10-3
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,11 @@ df.make_currency_column_numeric("a", fill_all_non_numeric=35)
100100
## Example 4: Coerce numeric values in column to float, replace a string value with a specific value, and replace remaining string values with a specific value
101101

102102
```python
103-
df.make_currency_column_numeric("a", cast_non_numeric=cast_non_numeric, fill_all_non_numeric=35)
104-
103+
df.make_currency_column_numeric(
104+
"a",
105+
cast_non_numeric=cast_non_numeric,
106+
fill_all_non_numeric=35
107+
)
105108
```
106109

107110
## Output
@@ -138,7 +141,11 @@ df.make_currency_column_numeric("a", remove_non_numeric=True)
138141
## Example 6: Coerce numeric values in column to float, replace a string value with a specific value, and remove remaining string values
139142

140143
```python
141-
df.make_currency_column_numeric("a", cast_non_numeric=cast_non_numeric, remove_non_numeric=True)
144+
df.make_currency_column_numeric(
145+
"a",
146+
cast_non_numeric=cast_non_numeric,
147+
remove_non_numeric=True
148+
)
142149
```
143150

144151
## Output

‎examples/notebooks/Row_to_Names.ipynb

+17-19
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
},
2424
{
2525
"cell_type": "code",
26-
"execution_count": 1,
26+
"execution_count": 2,
2727
"metadata": {},
2828
"outputs": [],
2929
"source": [
@@ -34,7 +34,7 @@
3434
},
3535
{
3636
"cell_type": "code",
37-
"execution_count": 2,
37+
"execution_count": 3,
3838
"metadata": {},
3939
"outputs": [],
4040
"source": [
@@ -48,7 +48,7 @@
4848
},
4949
{
5050
"cell_type": "code",
51-
"execution_count": 3,
51+
"execution_count": 4,
5252
"metadata": {},
5353
"outputs": [
5454
{
@@ -121,13 +121,13 @@
121121
"4 bag 305 25"
122122
]
123123
},
124-
"execution_count": 3,
124+
"execution_count": 4,
125125
"metadata": {},
126126
"output_type": "execute_result"
127127
}
128128
],
129129
"source": [
130-
"temp = pd.read_csv(StringIO(data), header= None)\n",
130+
"temp = pd.read_csv(StringIO(data), header=None)\n",
131131
"temp"
132132
]
133133
},
@@ -145,7 +145,7 @@
145145
},
146146
{
147147
"cell_type": "code",
148-
"execution_count": 4,
148+
"execution_count": 5,
149149
"metadata": {},
150150
"outputs": [
151151
{
@@ -211,15 +211,15 @@
211211
"4 bag 305 25"
212212
]
213213
},
214-
"execution_count": 4,
214+
"execution_count": 5,
215215
"metadata": {},
216216
"output_type": "execute_result"
217217
}
218218
],
219219
"source": [
220-
"temp.columns = temp.iloc[2,:]\n",
220+
"temp.columns = temp.iloc[2, :]\n",
221221
"temp.columns = temp.columns.str.strip()\n",
222-
"temp = temp.drop(2,axis=0)\n",
222+
"temp = temp.drop(2, axis=0)\n",
223223
"temp = temp.rename_axis(None, axis='columns')\n",
224224
"temp"
225225
]
@@ -233,7 +233,7 @@
233233
},
234234
{
235235
"cell_type": "code",
236-
"execution_count": 5,
236+
"execution_count": 6,
237237
"metadata": {},
238238
"outputs": [
239239
{
@@ -299,28 +299,26 @@
299299
"4 bag 305 25"
300300
]
301301
},
302-
"execution_count": 5,
302+
"execution_count": 6,
303303
"metadata": {},
304304
"output_type": "execute_result"
305305
}
306306
],
307307
"source": [
308-
"df = (pd\n",
309-
" .read_csv(StringIO(data),\n",
310-
" header= None)\n",
311-
" .row_to_names(row_number=2,\n",
312-
" remove_row=True)\n",
313-
" )\n",
308+
"df = (\n",
309+
" pd.read_csv(StringIO(data), header=None)\n",
310+
" .row_to_names(row_number=2, remove_row=True)\n",
311+
")\n",
314312
"\n",
315313
"df"
316314
]
317315
}
318316
],
319317
"metadata": {
320318
"kernelspec": {
321-
"display_name": "PyJanitor development",
319+
"display_name": "Python 3",
322320
"language": "python",
323-
"name": "pyjanitor-dev"
321+
"name": "python3"
324322
},
325323
"language_info": {
326324
"codemirror_mode": {

‎examples/notebooks/anime.ipynb

+49-48
Large diffs are not rendered by default.

‎examples/notebooks/bad_values.ipynb

+28-29
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
},
1616
{
1717
"cell_type": "code",
18-
"execution_count": 1,
18+
"execution_count": 2,
1919
"metadata": {},
2020
"outputs": [],
2121
"source": [
@@ -33,7 +33,7 @@
3333
},
3434
{
3535
"cell_type": "code",
36-
"execution_count": 2,
36+
"execution_count": 4,
3737
"metadata": {},
3838
"outputs": [
3939
{
@@ -231,13 +231,15 @@
231231
"[5 rows x 24 columns]"
232232
]
233233
},
234-
"execution_count": 2,
234+
"execution_count": 4,
235235
"metadata": {},
236236
"output_type": "execute_result"
237237
}
238238
],
239239
"source": [
240-
"wind = pd.read_csv(\"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-06/us_wind.csv\")\n",
240+
"wind = pd.read_csv(\n",
241+
" \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-11-06/us_wind.csv\"\n",
242+
")\n",
241243
"wind.head()"
242244
]
243245
},
@@ -250,7 +252,7 @@
250252
},
251253
{
252254
"cell_type": "code",
253-
"execution_count": 3,
255+
"execution_count": 5,
254256
"metadata": {},
255257
"outputs": [
256258
{
@@ -259,7 +261,7 @@
259261
"-1069.986537767466"
260262
]
261263
},
262-
"execution_count": 3,
264+
"execution_count": 5,
263265
"metadata": {},
264266
"output_type": "execute_result"
265267
}
@@ -278,7 +280,7 @@
278280
},
279281
{
280282
"cell_type": "code",
281-
"execution_count": 4,
283+
"execution_count": 6,
282284
"metadata": {
283285
"scrolled": true
284286
},
@@ -289,7 +291,7 @@
289291
"['usgs_pr_id', 'p_year', 'p_cap', 't_cap', 't_hh', 't_rd', 't_rsa', 't_ttlh']"
290292
]
291293
},
292-
"execution_count": 4,
294+
"execution_count": 6,
293295
"metadata": {},
294296
"output_type": "execute_result"
295297
}
@@ -315,7 +317,7 @@
315317
},
316318
{
317319
"cell_type": "code",
318-
"execution_count": 5,
320+
"execution_count": 7,
319321
"metadata": {},
320322
"outputs": [
321323
{
@@ -513,7 +515,7 @@
513515
"[5 rows x 24 columns]"
514516
]
515517
},
516-
"execution_count": 5,
518+
"execution_count": 7,
517519
"metadata": {},
518520
"output_type": "execute_result"
519521
}
@@ -523,8 +525,8 @@
523525
"wind2 = (\n",
524526
" wind\n",
525527
" .find_replace(\n",
526-
" usgs_pr_id=mapping, \n",
527-
" p_tnum=mapping, \n",
528+
" usgs_pr_id=mapping,\n",
529+
" p_tnum=mapping,\n",
528530
" p_cap=mapping,\n",
529531
" t_cap=mapping,\n",
530532
" t_hh=mapping,\n",
@@ -545,7 +547,7 @@
545547
},
546548
{
547549
"cell_type": "code",
548-
"execution_count": 6,
550+
"execution_count": 8,
549551
"metadata": {},
550552
"outputs": [
551553
{
@@ -554,7 +556,7 @@
554556
"77.31203064391"
555557
]
556558
},
557-
"execution_count": 6,
559+
"execution_count": 8,
558560
"metadata": {},
559561
"output_type": "execute_result"
560562
}
@@ -596,21 +598,18 @@
596598
},
597599
{
598600
"cell_type": "code",
599-
"execution_count": 7,
601+
"execution_count": 9,
600602
"metadata": {},
601603
"outputs": [],
602604
"source": [
603605
"# Note that update_where mutates the original dataframe\n",
604-
"(wind\n",
605-
" .update_where(\n",
606-
" (wind['p_year'] < 1887) | (wind['p_year'] > 2018),\n",
607-
" 'p_year', np.nan)\n",
608-
" .update_where(\n",
609-
" (wind['t_hh'] <= 0) | (wind['t_hh'] >= 1000),\n",
610-
" 't_hh', np.nan)\n",
611-
" .update_where(\n",
612-
" (wind['xlong'] < -161.76) | (wind['xlong'] > -68.01),\n",
613-
" 'xlong', np.nan));"
606+
"(\n",
607+
" wind.update_where(\n",
608+
" (wind['p_year'] < 1887) | (wind['p_year'] > 2018), 'p_year', np.nan\n",
609+
" )\n",
610+
" .update_where((wind['t_hh'] <= 0) | (wind['t_hh'] >= 1000), 't_hh', np.nan)\n",
611+
" .update_where((wind['xlong'] < -161.76) | (wind['xlong'] > -68.01), 'xlong', np.nan)\n",
612+
");"
614613
]
615614
},
616615
{
@@ -622,7 +621,7 @@
622621
},
623622
{
624623
"cell_type": "code",
625-
"execution_count": 8,
624+
"execution_count": 10,
626625
"metadata": {},
627626
"outputs": [
628627
{
@@ -631,7 +630,7 @@
631630
"77.31203064391"
632631
]
633632
},
634-
"execution_count": 8,
633+
"execution_count": 10,
635634
"metadata": {},
636635
"output_type": "execute_result"
637636
}
@@ -643,9 +642,9 @@
643642
],
644643
"metadata": {
645644
"kernelspec": {
646-
"display_name": "pyjanitor-dev",
645+
"display_name": "Python 3",
647646
"language": "python",
648-
"name": "pyjanitor-dev"
647+
"name": "python3"
649648
},
650649
"language_info": {
651650
"codemirror_mode": {

‎examples/notebooks/bird_call.ipynb

+47-34
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
},
3535
{
3636
"cell_type": "code",
37-
"execution_count": 1,
37+
"execution_count": 2,
3838
"metadata": {},
3939
"outputs": [],
4040
"source": [
@@ -53,13 +53,20 @@
5353
},
5454
{
5555
"cell_type": "code",
56-
"execution_count": 2,
56+
"execution_count": 4,
5757
"metadata": {},
5858
"outputs": [],
5959
"source": [
60-
"raw_birds = pd.read_csv(\"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Chicago_collision_data.csv\")\n",
61-
"raw_call = pd.read_csv(\"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/bird_call.csv\", sep=\" \")\n",
62-
"raw_light = pd.read_csv(\"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Light_levels_dryad.csv\")"
60+
"raw_birds = pd.read_csv(\n",
61+
" \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Chicago_collision_data.csv\"\n",
62+
")\n",
63+
"raw_call = pd.read_csv(\n",
64+
" \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/bird_call.csv\", \n",
65+
" sep=\" \"\n",
66+
")\n",
67+
"raw_light = pd.read_csv(\n",
68+
" \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Light_levels_dryad.csv\"\n",
69+
")"
6370
]
6471
},
6572
{
@@ -73,7 +80,7 @@
7380
},
7481
{
7582
"cell_type": "code",
76-
"execution_count": 3,
83+
"execution_count": 5,
7784
"metadata": {},
7885
"outputs": [
7986
{
@@ -152,7 +159,7 @@
152159
"4 Ammodramus nelsoni 1986-09-10 MP"
153160
]
154161
},
155-
"execution_count": 3,
162+
"execution_count": 5,
156163
"metadata": {},
157164
"output_type": "execute_result"
158165
}
@@ -163,7 +170,7 @@
163170
},
164171
{
165172
"cell_type": "code",
166-
"execution_count": 4,
173+
"execution_count": 6,
167174
"metadata": {},
168175
"outputs": [
169176
{
@@ -260,7 +267,7 @@
260267
"4 Seiurus aurocapilla Parulidae 4580 Yes Forest Lower"
261268
]
262269
},
263-
"execution_count": 4,
270+
"execution_count": 6,
264271
"metadata": {},
265272
"output_type": "execute_result"
266273
}
@@ -271,7 +278,7 @@
271278
},
272279
{
273280
"cell_type": "code",
274-
"execution_count": 5,
281+
"execution_count": 7,
275282
"metadata": {},
276283
"outputs": [
277284
{
@@ -338,7 +345,7 @@
338345
"4 2000-04-02 17"
339346
]
340347
},
341-
"execution_count": 5,
348+
"execution_count": 7,
342349
"metadata": {},
343350
"output_type": "execute_result"
344351
}
@@ -363,7 +370,7 @@
363370
},
364371
{
365372
"cell_type": "code",
366-
"execution_count": 6,
373+
"execution_count": 8,
367374
"metadata": {},
368375
"outputs": [],
369376
"source": [
@@ -372,7 +379,7 @@
372379
},
373380
{
374381
"cell_type": "code",
375-
"execution_count": 7,
382+
"execution_count": 9,
376383
"metadata": {},
377384
"outputs": [
378385
{
@@ -439,7 +446,7 @@
439446
"4 2000-04-02 17"
440447
]
441448
},
442-
"execution_count": 7,
449+
"execution_count": 9,
443450
"metadata": {},
444451
"output_type": "execute_result"
445452
}
@@ -457,20 +464,20 @@
457464
},
458465
{
459466
"cell_type": "code",
460-
"execution_count": 8,
467+
"execution_count": 10,
461468
"metadata": {},
462469
"outputs": [],
463470
"source": [
464471
"clean_call = (\n",
465472
" raw_call\n",
466-
" .rename_column(\"Species\", \"Genus\") # rename 'Species' column to 'Genus'\n",
467-
" .rename_column(\"Family\", \"Species\") # rename 'Family' columnto 'Species'\n",
473+
" .rename_column(\"Species\", \"Genus\") # rename 'Species' column to 'Genus'\n",
474+
" .rename_column(\"Family\", \"Species\") # rename 'Family' columnto 'Species'\n",
468475
")"
469476
]
470477
},
471478
{
472479
"cell_type": "code",
473-
"execution_count": 9,
480+
"execution_count": 11,
474481
"metadata": {},
475482
"outputs": [
476483
{
@@ -567,7 +574,7 @@
567574
"4 Seiurus aurocapilla Parulidae 4580 Yes Forest Lower"
568575
]
569576
},
570-
"execution_count": 9,
577+
"execution_count": 11,
571578
"metadata": {},
572579
"output_type": "execute_result"
573580
}
@@ -585,24 +592,35 @@
585592
},
586593
{
587594
"cell_type": "code",
588-
"execution_count": 10,
595+
"execution_count": 12,
589596
"metadata": {},
590597
"outputs": [],
591598
"source": [
592599
"clean_birds = (\n",
593600
" raw_birds\n",
594-
" .merge(clean_call, how='left') # merge the raw_birds dataframe with clean_raw dataframe\n",
595-
" .select_columns([\"Genus\", \"Species\", \"Date\", \"Locality\", \"Collisions\", \"Call\", \"Habitat\", \"Stratum\"]) # include list of cols\n",
596-
" .clean_names() \n",
597-
" .rename_column(\"collisions\", \"family\") # rename 'collisions' column to 'family' in merged dataframe\n",
601+
" .merge(clean_call, how='left') # merge the raw_birds dataframe with clean_raw dataframe\n",
602+
" .select_columns(\n",
603+
" [\n",
604+
" \"Genus\",\n",
605+
" \"Species\",\n",
606+
" \"Date\",\n",
607+
" \"Locality\",\n",
608+
" \"Collisions\",\n",
609+
" \"Call\",\n",
610+
" \"Habitat\",\n",
611+
" \"Stratum\"\n",
612+
" ]\n",
613+
" ) # include list of cols\n",
614+
" .clean_names()\n",
615+
" .rename_column(\"collisions\", \"family\") # rename 'collisions' column to 'family' in merged dataframe\n",
598616
" .rename_column(\"call\", \"flight_call\")\n",
599-
" .dropna() # drop all rows which contain a NaN\n",
617+
" .dropna() # drop all rows which contain a NaN\n",
600618
")"
601619
]
602620
},
603621
{
604622
"cell_type": "code",
605-
"execution_count": 11,
623+
"execution_count": 13,
606624
"metadata": {},
607625
"outputs": [
608626
{
@@ -712,19 +730,14 @@
712730
"93 Yes Open Lower\\t "
713731
]
714732
},
715-
"execution_count": 11,
733+
"execution_count": 13,
716734
"metadata": {},
717735
"output_type": "execute_result"
718736
}
719737
],
720738
"source": [
721739
"clean_birds.head()"
722740
]
723-
},
724-
{
725-
"cell_type": "markdown",
726-
"metadata": {},
727-
"source": []
728741
}
729742
],
730743
"metadata": {
@@ -743,9 +756,9 @@
743756
"name": "python",
744757
"nbconvert_exporter": "python",
745758
"pygments_lexer": "ipython3",
746-
"version": "3.7.3"
759+
"version": "3.7.6"
747760
}
748761
},
749762
"nbformat": 4,
750-
"nbformat_minor": 2
763+
"nbformat_minor": 4
751764
}

‎examples/notebooks/board_games.ipynb

+37-27
Large diffs are not rendered by default.

‎examples/notebooks/dirty_data.ipynb

+21-18
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
},
2828
{
2929
"cell_type": "code",
30-
"execution_count": 1,
30+
"execution_count": 2,
3131
"metadata": {},
3232
"outputs": [],
3333
"source": [
@@ -44,7 +44,7 @@
4444
},
4545
{
4646
"cell_type": "code",
47-
"execution_count": 2,
47+
"execution_count": 4,
4848
"metadata": {},
4949
"outputs": [
5050
{
@@ -315,7 +315,7 @@
315315
"12 NaN "
316316
]
317317
},
318-
"execution_count": 2,
318+
"execution_count": 4,
319319
"metadata": {},
320320
"output_type": "execute_result"
321321
}
@@ -336,7 +336,7 @@
336336
},
337337
{
338338
"cell_type": "code",
339-
"execution_count": 3,
339+
"execution_count": 5,
340340
"metadata": {},
341341
"outputs": [
342342
{
@@ -416,7 +416,7 @@
416416
"1 Yes NaN Physical ed Theater NaN "
417417
]
418418
},
419-
"execution_count": 3,
419+
"execution_count": 5,
420420
"metadata": {},
421421
"output_type": "execute_result"
422422
}
@@ -442,7 +442,7 @@
442442
},
443443
{
444444
"cell_type": "code",
445-
"execution_count": 4,
445+
"execution_count": 6,
446446
"metadata": {},
447447
"outputs": [
448448
{
@@ -544,7 +544,7 @@
544544
"8 No PENDING NaN "
545545
]
546546
},
547-
"execution_count": 4,
547+
"execution_count": 6,
548548
"metadata": {},
549549
"output_type": "execute_result"
550550
}
@@ -572,7 +572,7 @@
572572
},
573573
{
574574
"cell_type": "code",
575-
"execution_count": 5,
575+
"execution_count": 7,
576576
"metadata": {},
577577
"outputs": [
578578
{
@@ -688,7 +688,7 @@
688688
"4 1.00 Yes PENDING NaN "
689689
]
690690
},
691-
"execution_count": 5,
691+
"execution_count": 7,
692692
"metadata": {},
693693
"output_type": "execute_result"
694694
}
@@ -723,7 +723,7 @@
723723
},
724724
{
725725
"cell_type": "code",
726-
"execution_count": 6,
726+
"execution_count": 8,
727727
"metadata": {},
728728
"outputs": [
729729
{
@@ -832,7 +832,7 @@
832832
"11 Vocal music English"
833833
]
834834
},
835-
"execution_count": 6,
835+
"execution_count": 8,
836836
"metadata": {},
837837
"output_type": "execute_result"
838838
}
@@ -850,7 +850,7 @@
850850
},
851851
{
852852
"cell_type": "code",
853-
"execution_count": 7,
853+
"execution_count": 9,
854854
"metadata": {},
855855
"outputs": [
856856
{
@@ -1051,7 +1051,7 @@
10511051
"11 0.80 No Vocal music "
10521052
]
10531053
},
1054-
"execution_count": 7,
1054+
"execution_count": 9,
10551055
"metadata": {},
10561056
"output_type": "execute_result"
10571057
}
@@ -1063,7 +1063,10 @@
10631063
" .remove_empty()\n",
10641064
" .rename_column(\"%_allocated\", \"percent_allocated\")\n",
10651065
" .rename_column(\"full_time_\", \"full_time\")\n",
1066-
" .coalesce(column_names=['certification', 'certification_1'], new_column_name='certification')\n",
1066+
" .coalesce(\n",
1067+
" column_names=['certification', 'certification_1'],\n",
1068+
" new_column_name='certification'\n",
1069+
" )\n",
10671070
")\n",
10681071
"\n",
10691072
"df_clean"
@@ -1088,7 +1091,7 @@
10881091
},
10891092
{
10901093
"cell_type": "code",
1091-
"execution_count": 8,
1094+
"execution_count": 10,
10921095
"metadata": {},
10931096
"outputs": [
10941097
{
@@ -1289,7 +1292,7 @@
12891292
"11 0.80 No Vocal music "
12901293
]
12911294
},
1292-
"execution_count": 8,
1295+
"execution_count": 10,
12931296
"metadata": {},
12941297
"output_type": "execute_result"
12951298
}
@@ -1331,9 +1334,9 @@
13311334
"name": "python",
13321335
"nbconvert_exporter": "python",
13331336
"pygments_lexer": "ipython3",
1334-
"version": "3.7.3"
1337+
"version": "3.7.6"
13351338
}
13361339
},
13371340
"nbformat": 4,
1338-
"nbformat_minor": 2
1341+
"nbformat_minor": 4
13391342
}

‎examples/notebooks/french_trains.ipynb

+60-42
Large diffs are not rendered by default.

‎examples/notebooks/groupby_agg.ipynb

+19-15
Original file line numberDiff line numberDiff line change
@@ -32,18 +32,18 @@
3232
},
3333
{
3434
"cell_type": "code",
35-
"execution_count": 1,
35+
"execution_count": 2,
3636
"metadata": {},
3737
"outputs": [],
3838
"source": [
39-
"#load modules\n",
39+
"# load modules\n",
4040
"import pandas as pd\n",
4141
"from janitor import groupby_agg"
4242
]
4343
},
4444
{
4545
"cell_type": "code",
46-
"execution_count": 2,
46+
"execution_count": 3,
4747
"metadata": {},
4848
"outputs": [
4949
{
@@ -116,15 +116,17 @@
116116
"4 bag 305 25"
117117
]
118118
},
119-
"execution_count": 2,
119+
"execution_count": 3,
120120
"metadata": {},
121121
"output_type": "execute_result"
122122
}
123123
],
124124
"source": [
125-
"data = {'item':['shoe','shoe','bag','shoe','bag'],\n",
126-
" 'MRP':[220,450,320,200,305],\n",
127-
" 'number_sold':[100,40,56,38,25]}\n",
125+
"data = {\n",
126+
" 'item': ['shoe', 'shoe', 'bag', 'shoe', 'bag'],\n",
127+
" 'MRP': [220, 450, 320, 200, 305],\n",
128+
" 'number_sold': [100, 40, 56, 38, 25]\n",
129+
"}\n",
128130
"\n",
129131
"df = pd.DataFrame(data)\n",
130132
"\n",
@@ -140,7 +142,7 @@
140142
},
141143
{
142144
"cell_type": "code",
143-
"execution_count": 3,
145+
"execution_count": 4,
144146
"metadata": {},
145147
"outputs": [
146148
{
@@ -219,26 +221,28 @@
219221
"4 bag 305 25 312.5"
220222
]
221223
},
222-
"execution_count": 3,
224+
"execution_count": 4,
223225
"metadata": {},
224226
"output_type": "execute_result"
225227
}
226228
],
227229
"source": [
228-
"df = df.groupby_agg(by='item',\n",
229-
" agg='mean',\n",
230-
" agg_column_name='MRP',\n",
231-
" new_column_name='Avg_MRP')\n",
230+
"df = df.groupby_agg(\n",
231+
" by='item',\n",
232+
" agg='mean',\n",
233+
" agg_column_name='MRP',\n",
234+
" new_column_name='Avg_MRP'\n",
235+
")\n",
232236
"\n",
233237
"df"
234238
]
235239
}
236240
],
237241
"metadata": {
238242
"kernelspec": {
239-
"display_name": "PyJanitor development",
243+
"display_name": "Python 3",
240244
"language": "python",
241-
"name": "pyjanitor-dev"
245+
"name": "python3"
242246
},
243247
"language_info": {
244248
"codemirror_mode": {

‎examples/notebooks/inflating_converting_currency.ipynb

+50-41
Large diffs are not rendered by default.

‎examples/notebooks/medium_franchise.ipynb

+224-234
Large diffs are not rendered by default.

‎examples/notebooks/normalize.ipynb

+1,575-60
Large diffs are not rendered by default.

‎examples/notebooks/pyjanitor_intro.ipynb

+99-98
Large diffs are not rendered by default.

‎examples/notebooks/sort_naturally.ipynb

+11-18
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@
99
},
1010
{
1111
"cell_type": "code",
12-
"execution_count": 1,
12+
"execution_count": 2,
1313
"metadata": {},
1414
"outputs": [],
1515
"source": [
1616
"import pandas_flavor as pf\n",
1717
"import pandas as pd\n",
18-
"import janitor\n"
18+
"import janitor"
1919
]
2020
},
2121
{
@@ -27,7 +27,7 @@
2727
},
2828
{
2929
"cell_type": "code",
30-
"execution_count": 7,
30+
"execution_count": 3,
3131
"metadata": {},
3232
"outputs": [
3333
{
@@ -100,15 +100,15 @@
100100
"5 B12 7"
101101
]
102102
},
103-
"execution_count": 7,
103+
"execution_count": 3,
104104
"metadata": {},
105105
"output_type": "execute_result"
106106
}
107107
],
108108
"source": [
109109
"data = {\n",
110110
" \"Well\": [\"A21\", \"A3\", \"A21\", \"B2\", \"B51\", \"B12\"],\n",
111-
" \"Value\":[ 1, 2, 13, 3, 4, 7],\n",
111+
" \"Value\": [1, 2, 13, 3, 4, 7],\n",
112112
"}\n",
113113
"df = pd.DataFrame(data)\n",
114114
"df"
@@ -127,7 +127,7 @@
127127
},
128128
{
129129
"cell_type": "code",
130-
"execution_count": 8,
130+
"execution_count": 4,
131131
"metadata": {},
132132
"outputs": [
133133
{
@@ -200,7 +200,7 @@
200200
"4 B51 4"
201201
]
202202
},
203-
"execution_count": 8,
203+
"execution_count": 4,
204204
"metadata": {},
205205
"output_type": "execute_result"
206206
}
@@ -218,7 +218,7 @@
218218
},
219219
{
220220
"cell_type": "code",
221-
"execution_count": 9,
221+
"execution_count": 5,
222222
"metadata": {},
223223
"outputs": [
224224
{
@@ -291,7 +291,7 @@
291291
"4 B51 4"
292292
]
293293
},
294-
"execution_count": 9,
294+
"execution_count": 5,
295295
"metadata": {},
296296
"output_type": "execute_result"
297297
}
@@ -306,20 +306,13 @@
306306
"source": [
307307
"Now we're in sorting bliss! :)"
308308
]
309-
},
310-
{
311-
"cell_type": "code",
312-
"execution_count": null,
313-
"metadata": {},
314-
"outputs": [],
315-
"source": []
316309
}
317310
],
318311
"metadata": {
319312
"kernelspec": {
320-
"display_name": "pyjanitor-dev",
313+
"display_name": "Python 3",
321314
"language": "python",
322-
"name": "pyjanitor-dev"
315+
"name": "python3"
323316
},
324317
"language_info": {
325318
"codemirror_mode": {

‎examples/notebooks/teacher_pupil.ipynb

+28-18
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
},
2424
{
2525
"cell_type": "code",
26-
"execution_count": 1,
26+
"execution_count": 2,
2727
"metadata": {
2828
"pycharm": {
2929
"is_executing": false,
@@ -145,7 +145,7 @@
145145
"4 Democratic Republic of the Congo 2012 2012 34.74758 NaN NaN "
146146
]
147147
},
148-
"execution_count": 1,
148+
"execution_count": 2,
149149
"metadata": {},
150150
"output_type": "execute_result"
151151
}
@@ -155,8 +155,10 @@
155155
"import pandas as pd\n",
156156
"import pandas_flavor as pf\n",
157157
"\n",
158-
"dirty_csv = \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/EDULIT_DS_06052019101747206.csv\"\n",
159-
"dirty_df= pd.read_csv(dirty_csv)\n",
158+
"dirty_csv = (\n",
159+
" \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/EDULIT_DS_06052019101747206.csv\"\n",
160+
")\n",
161+
"dirty_df = pd.read_csv(dirty_csv)\n",
160162
"dirty_df.head()\n"
161163
]
162164
},
@@ -186,7 +188,7 @@
186188
},
187189
{
188190
"cell_type": "code",
189-
"execution_count": 2,
191+
"execution_count": 3,
190192
"metadata": {},
191193
"outputs": [
192194
{
@@ -201,7 +203,7 @@
201203
" 'Pupil-teacher ratio in upper secondary education (headcount basis)'}"
202204
]
203205
},
204-
"execution_count": 2,
206+
"execution_count": 3,
205207
"metadata": {},
206208
"output_type": "execute_result"
207209
}
@@ -229,7 +231,7 @@
229231
},
230232
{
231233
"cell_type": "code",
232-
"execution_count": 3,
234+
"execution_count": 4,
233235
"metadata": {
234236
"pycharm": {
235237
"is_executing": false,
@@ -264,14 +266,20 @@
264266
"def drop_duplicated_column(df, column_name: str, column_order: int=0):\n",
265267
" \"\"\"Remove duplicated columns and retain only a column given its order.\n",
266268
" Order 0 is to remove the first column, Order 1 is to remove the second column, and etc\"\"\"\n",
267-
" \n",
269+
"\n",
268270
" cols = list(df.columns)\n",
269-
" col_indexes = [col_idx for col_idx, col_name in enumerate(cols) if col_name == column_name]\n",
270-
" \n",
271+
" col_indexes = [\n",
272+
" col_idx for col_idx,\n",
273+
" col_name in enumerate(cols) if col_name == column_name\n",
274+
" ]\n",
275+
"\n",
271276
" # given that a column could be duplicated, user could opt based on its order\n",
272277
" removed_col_idx = col_indexes[column_order]\n",
273278
" # get the column indexes without column that is being removed\n",
274-
" filtered_cols = [c_i for c_i, c_v in enumerate(cols) if c_i != removed_col_idx]\n",
279+
" filtered_cols = [\n",
280+
" c_i for c_i,\n",
281+
" c_v in enumerate(cols) if c_i != removed_col_idx\n",
282+
" ]\n",
275283
" return df.iloc[:, filtered_cols]\n",
276284
"\n"
277285
]
@@ -285,7 +293,7 @@
285293
},
286294
{
287295
"cell_type": "code",
288-
"execution_count": 4,
296+
"execution_count": 5,
289297
"metadata": {},
290298
"outputs": [
291299
{
@@ -395,7 +403,7 @@
395403
"4 Democratic Republic of the Congo 2012 34.74758 NaN NaN "
396404
]
397405
},
398-
"execution_count": 4,
406+
"execution_count": 5,
399407
"metadata": {},
400408
"output_type": "execute_result"
401409
}
@@ -412,7 +420,7 @@
412420
" .str_trim(\"country\")\n",
413421
" .str_title(\"indicator\")\n",
414422
" # remove `time` column (which is duplicated). The second `time` is being removed\n",
415-
" .drop_duplicated_column(\"time\", 1) \n",
423+
" .drop_duplicated_column(\"time\", 1)\n",
416424
" # renaming columns\n",
417425
" .rename_column(\"location\", \"country_code\")\n",
418426
" .rename_column(\"value\", \"student_ratio\")\n",
@@ -424,7 +432,7 @@
424432
},
425433
{
426434
"cell_type": "code",
427-
"execution_count": 5,
435+
"execution_count": 6,
428436
"metadata": {
429437
"pycharm": {
430438
"is_executing": false,
@@ -435,7 +443,9 @@
435443
"outputs": [],
436444
"source": [
437445
"# ensure that the output from janitor is similar with the clean r's janitor\n",
438-
"r_clean_csv = \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv\"\n",
446+
"r_clean_csv = (\n",
447+
" \"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv\"\n",
448+
")\n",
439449
"r_clean_df = pd.read_csv(r_clean_csv)\n",
440450
"\n",
441451
"pd.testing.assert_frame_equal(r_clean_df, py_clean_df)"
@@ -458,7 +468,7 @@
458468
"name": "python",
459469
"nbconvert_exporter": "python",
460470
"pygments_lexer": "ipython3",
461-
"version": "3.7.3"
471+
"version": "3.7.6"
462472
},
463473
"stem_cell": {
464474
"cell_type": "raw",
@@ -471,5 +481,5 @@
471481
}
472482
},
473483
"nbformat": 4,
474-
"nbformat_minor": 2
484+
"nbformat_minor": 4
475485
}

‎examples/notebooks/transform_column.ipynb

+88-31
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "code",
5-
"execution_count": null,
5+
"execution_count": 1,
66
"metadata": {},
77
"outputs": [],
88
"source": [
@@ -32,7 +32,7 @@
3232
},
3333
{
3434
"cell_type": "code",
35-
"execution_count": null,
35+
"execution_count": 3,
3636
"metadata": {},
3737
"outputs": [],
3838
"source": [
@@ -50,7 +50,7 @@
5050
},
5151
{
5252
"cell_type": "code",
53-
"execution_count": null,
53+
"execution_count": 4,
5454
"metadata": {},
5555
"outputs": [],
5656
"source": [
@@ -59,7 +59,7 @@
5959
},
6060
{
6161
"cell_type": "code",
62-
"execution_count": null,
62+
"execution_count": 5,
6363
"metadata": {},
6464
"outputs": [],
6565
"source": [
@@ -75,9 +75,17 @@
7575
},
7676
{
7777
"cell_type": "code",
78-
"execution_count": null,
78+
"execution_count": 6,
7979
"metadata": {},
80-
"outputs": [],
80+
"outputs": [
81+
{
82+
"name": "stdout",
83+
"output_type": "stream",
84+
"text": [
85+
"1.86 s ± 102 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
86+
]
87+
}
88+
],
8189
"source": [
8290
"%%timeit\n",
8391
"# We are using a lambda function that operates on each element,\n",
@@ -94,9 +102,17 @@
94102
},
95103
{
96104
"cell_type": "code",
97-
"execution_count": null,
105+
"execution_count": 7,
98106
"metadata": {},
99-
"outputs": [],
107+
"outputs": [
108+
{
109+
"name": "stdout",
110+
"output_type": "stream",
111+
"text": [
112+
"15.7 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
113+
]
114+
}
115+
],
100116
"source": [
101117
"%%timeit\n",
102118
"df.transform_column(\"0\", lambda s: np.abs(s), elementwise=False)"
@@ -126,7 +142,7 @@
126142
},
127143
{
128144
"cell_type": "code",
129-
"execution_count": null,
145+
"execution_count": 8,
130146
"metadata": {},
131147
"outputs": [],
132148
"source": [
@@ -149,7 +165,7 @@
149165
},
150166
{
151167
"cell_type": "code",
152-
"execution_count": null,
168+
"execution_count": 9,
153169
"metadata": {},
154170
"outputs": [],
155171
"source": [
@@ -159,39 +175,71 @@
159175
},
160176
{
161177
"cell_type": "code",
162-
"execution_count": null,
178+
"execution_count": 10,
163179
"metadata": {},
164-
"outputs": [],
180+
"outputs": [
181+
{
182+
"name": "stdout",
183+
"output_type": "stream",
184+
"text": [
185+
"408 ms ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
186+
]
187+
}
188+
],
165189
"source": [
166190
"%%timeit\n",
167191
"stringdf.assign(data=first_five(stringdf[\"data\"]))"
168192
]
169193
},
170194
{
171195
"cell_type": "code",
172-
"execution_count": null,
196+
"execution_count": 11,
173197
"metadata": {},
174-
"outputs": [],
198+
"outputs": [
199+
{
200+
"name": "stdout",
201+
"output_type": "stream",
202+
"text": [
203+
"293 ms ± 4.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
204+
]
205+
}
206+
],
175207
"source": [
176208
"%%timeit\n",
177209
"first_five(stringdf[\"data\"])"
178210
]
179211
},
180212
{
181213
"cell_type": "code",
182-
"execution_count": null,
214+
"execution_count": 12,
183215
"metadata": {},
184-
"outputs": [],
216+
"outputs": [
217+
{
218+
"name": "stdout",
219+
"output_type": "stream",
220+
"text": [
221+
"295 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
222+
]
223+
}
224+
],
185225
"source": [
186226
"%%timeit\n",
187227
"stringdf[\"data\"].str[0:5]"
188228
]
189229
},
190230
{
191231
"cell_type": "code",
192-
"execution_count": null,
232+
"execution_count": 13,
193233
"metadata": {},
194-
"outputs": [],
234+
"outputs": [
235+
{
236+
"name": "stdout",
237+
"output_type": "stream",
238+
"text": [
239+
"301 ms ± 7.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
240+
]
241+
}
242+
],
195243
"source": [
196244
"%%timeit\n",
197245
"stringdf[\"data\"].apply(lambda x: x[0:5])"
@@ -208,9 +256,17 @@
208256
},
209257
{
210258
"cell_type": "code",
211-
"execution_count": null,
259+
"execution_count": 14,
212260
"metadata": {},
213-
"outputs": [],
261+
"outputs": [
262+
{
263+
"name": "stdout",
264+
"output_type": "stream",
265+
"text": [
266+
"409 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
267+
]
268+
}
269+
],
214270
"source": [
215271
"%%timeit\n",
216272
"stringdf.transform_column(\"data\", lambda x: x[0:5])"
@@ -225,27 +281,28 @@
225281
},
226282
{
227283
"cell_type": "code",
228-
"execution_count": null,
284+
"execution_count": 15,
229285
"metadata": {},
230-
"outputs": [],
286+
"outputs": [
287+
{
288+
"name": "stdout",
289+
"output_type": "stream",
290+
"text": [
291+
"403 ms ± 7.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
292+
]
293+
}
294+
],
231295
"source": [
232296
"%%timeit\n",
233297
"stringdf.transform_column(\"data\", first_five, elementwise=False)"
234298
]
235-
},
236-
{
237-
"cell_type": "code",
238-
"execution_count": null,
239-
"metadata": {},
240-
"outputs": [],
241-
"source": []
242299
}
243300
],
244301
"metadata": {
245302
"kernelspec": {
246-
"display_name": "pyjanitor-dev",
303+
"display_name": "Python 3",
247304
"language": "python",
248-
"name": "pyjanitor-dev"
305+
"name": "python3"
249306
},
250307
"language_info": {
251308
"codemirror_mode": {

‎examples/round_to_fraction.md

-2
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,6 @@ example_dataframe.round_to_fraction('a', 2)
5050
## Example 2: Rounding the first column to nearest third
5151

5252
```python
53-
5453
example_dataframe2 = pd.DataFrame(data_dict)
5554

5655
example_dataframe2.limit_column_characters('a', 3)
@@ -72,7 +71,6 @@ example_dataframe2.limit_column_characters('a', 3)
7271
## Example 3: Rounding the first column to the nearest third and rounding each value to the 10,000th place
7372

7473
```python
75-
7674
example_dataframe2 = pd.DataFrame(data_dict)
7775

7876
example_dataframe2.limit_column_characters('a', 3, 4)

‎examples/row_to_names.md

-3
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ Remove the rows from the index above `row_number`.
2525
## Setup
2626

2727
```python
28-
2928
import pandas as pd
3029
import janitor
3130

@@ -63,7 +62,6 @@ example_dataframe.row_to_names(0)
6362
## Example2: Move first row to column names and remove row
6463

6564
```python
66-
6765
example_dataframe = pd.DataFrame(data_dict)
6866

6967
example_dataframe.row_to_names(0, remove_row=True)
@@ -84,7 +82,6 @@ example_dataframe.row_to_names(0, remove_row=True)
8482
## Example3: Move first row to column names, remove row, and remove rows above selected row
8583

8684
```python
87-
8885
example_dataframe = pd.DataFrame(data_dict)
8986

9087
example_dataframe.row_to_names(2, remove_row=True, remove_rows_above=True)

‎examples/then.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,11 @@ example_dataframe2.then(remove_rows_3_and_4)
8080

8181
```python
8282
example_dataframe = pd.DataFrame(data_dict)
83-
example_dataframe = (example_dataframe
84-
.then(remove_first_two_letters_from_col_names)
85-
.then(remove_rows_3_and_4))
83+
example_dataframe = (
84+
example_dataframe
85+
.then(remove_first_two_letters_from_col_names)
86+
.then(remove_rows_3_and_4)
87+
)
8688
```
8789

8890
### Output

0 commit comments

Comments
 (0)
Please sign in to comment.