Skip to content

Commit 73fb664

Browse files
committedMar 23, 2025
Removed unnecessary scraping scripts and updated package files
1 parent 00feaf8 commit 73fb664

File tree

11 files changed

+3
-1569
lines changed

11 files changed

+3
-1569
lines changed
 

‎pipeline/src/glossapi.egg-info/PKG-INFO

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Metadata-Version: 2.2
1+
Metadata-Version: 2.4
22
Name: glossapi
33
Version: 0.0.7
44
Summary: A library for processing academic texts in Greek and other languages
@@ -11,7 +11,6 @@ Classifier: Operating System :: OS Independent
1111
Classifier: Development Status :: 3 - Alpha
1212
Requires-Python: >=3.8
1313
Description-Content-Type: text/markdown
14-
License-File: LICENSE.md
1514
Requires-Dist: pandas
1615
Requires-Dist: numpy
1716
Requires-Dist: scikit-learn
@@ -64,8 +63,8 @@ corpus = Corpus(
6463
}
6564
)
6665

67-
# Step 1: Filter documents (quality control)
68-
corpus.filter()
66+
# Step 1: Extract documents (quality control)
67+
corpus.extract()
6968

7069
# Step 2: Extract sections from filtered documents
7170
corpus.section()

‎pipeline/src/glossapi.egg-info/SOURCES.txt

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
LICENSE.md
21
MANIFEST.in
32
README.md
43
pyproject.toml

‎scraping/download_and_extract_scripts/copy_paste_pdf.py

-67
This file was deleted.

‎scraping/download_and_extract_scripts/create_metadata_json.py

-36
This file was deleted.

‎scraping/download_and_extract_scripts/downloader10.py

-264
This file was deleted.

‎scraping/download_and_extract_scripts/extractor4.py

-99
This file was deleted.

‎scraping/download_and_extract_scripts/site_specific_scrapers/gr-land-studies.py

-79
This file was deleted.

‎scraping/download_and_extract_scripts/site_specific_scrapers/gr-lang-anthology.py

-89
This file was deleted.

‎scraping/download_and_extract_scripts/site_specific_scrapers/gr-lang-perilhpsh.py

-65
This file was deleted.

‎scraping/download_and_extract_scripts/site_specific_scrapers/lawspot2.py

-59
This file was deleted.

‎scraping/paragraphs_final.csv

-806
This file was deleted.

0 commit comments

Comments
 (0)
Please sign in to comment.