Skip to content
This repository was archived by the owner on Feb 8, 2024. It is now read-only.

Commit 91c8a8c

Browse files
CipherCipher
authored andcommitted
Reinstated stemmer
1 parent 0dacb12 commit 91c8a8c

File tree

4 files changed

+53
-76
lines changed

4 files changed

+53
-76
lines changed

Janex.egg-info/PKG-INFO

Lines changed: 44 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Metadata-Version: 2.1
22
Name: Janex
3-
Version: 0.0.79
3+
Version: 0.0.79b0
44
Home-page: https://github.com/Cipher58/Janex-Python
55
Download-URL: https://github.com/Cipher58/Janex-Python.git
66
Author: Cipher58
@@ -12,117 +12,86 @@ Description-Content-Type: text/markdown
1212
License-File: LICENSE
1313

1414
# Janex
15-
A free open-source framework which can be used to build Machine Learning tools, LLMs, and Natural Language Processing scripts with full simplicity.
1615

17-
<h2> What is the purpose of Janex? </h2>
16+
Janex-Python is a library which can be used to create Natural Language Processing-based programs and other forms of Artificial Intelligence.
1817

19-
Monopolistic companies are confining their Artificial Intelligence research to themselves, and capitalising on it. Even companies that swore to open source their software (OpenAI) went back on their morals, with the releases of GPT-3+, as did other powerful businesses.
18+
It is part of the Janex eco-system which is designed for developers to use in their own projects for free, licensed under the Free Lily License.
2019

21-
Released under the **new** Free Lily License 1.0, Janex will improve and become a useful tool for users to conduct their own research in the potential of Artifical Intelligence.
20+
As of update 0.0.80, the entire infrastructure of how this code works has been modified intensely.
2221

23-
If you want to use a more heavyweight but more accurate version of Janex, I would recommend using Janex: PyTorch Edition, which uses Neural Network techniques from PyTorch and NLTK to enhance the prediction accuracy.
24-
```
25-
https://pypi.org/project/JanexPT/0.0.1/#files
26-
```
27-
28-
<h3> How to use </h3>
29-
30-
<h5> Adding to your project </h5>
22+
### How to use
3123

32-
Firstly, you'll need to install the library using the Python pip package manager.
24+
First, install Janex using pip.
3325

34-
```
26+
```bash
3527
python3 -m pip install Janex
36-
37-
```
38-
Secondly, you need to import the library into your Python script.
39-
4028
```
29+
Next, import it into your code
30+
```python
4131
from Janex import *
4232
```
4333

44-
<h4>Using Janex in your code</h4>
45-
46-
<h5>Create an instance</h5>
34+
### Intent classifier
4735

48-
Before anything else, you need to create an instance of the IntentMatcher class. (If you do not have one made already, the program will automatically download a pre-written file created by @SoapDoesCode - big thanks to her for their intents file!)
36+
To use the pre-built intent classifier included with the package, you need to create an instance of it and then set the intents, vectors and dimensions.
4937

50-
```
51-
intents_file_path = "./intents.json"
38+
```python
5239

53-
thesaurus_file_path = "./thesaurus.json"
40+
from janex.intentclassifier import *
5441

55-
matcher = IntentMatcher(intents_file_path, thesaurus_file_path)
56-
```
42+
Classifier = IntentClassifier()
5743

58-
Optional: If you would like to update your thesaurus to your most recent pre-written file, then you can add this code to check for new versions and to download them. Be careful though, this function removes your thesaurus file, which means any unsaved data which doesn't exist on the pre-written file will be erased. (But could possibly be restored in your bin directory)
44+
Classifier.set_intentsfp("intents.json")
45+
Classifier.set_vectorsfp("vectors.json")
46+
Classifier.set_dimensions(300)
5947

60-
```
61-
matcher.update_thesaurus()
48+
Classifier.train_vectors()
6249
```
6350

64-
<h5>Tokenizing:</h5>
51+
You can then determine the class of which a certain variable belongs in using the Classifier.claffy() function.
6552

66-
To utilise the tokenizer feature, here is an example of how it can be used.
53+
```python
54+
Input = input("You: ")
6755

68-
```
69-
input_string = "Hello! What is your name?"
56+
classification = Classifier.classify(Input)
7057

71-
words = matcher.Tokenize(input_string)
58+
response = random.choice(classification["responses"])
7259

73-
print(words)
60+
print(response)
7461
```
7562

76-
<h5>Intent classifying:</h5>
63+
### Data experimentation with vectors
7764

78-
To compare the input with the patterns from your intents.json storage file, you have to declare the intents file path.
65+
If you would like to tokenize, stem or otherwise preprocess data, the Janex library comes with some pre-made tools.
7966

80-
```
81-
intent_class, similarity = matcher.pattern_compare(input_string)
82-
83-
print(intent_class)
84-
```
67+
To tokenize:
68+
```python
69+
from Janex.word_manipulation import *
8570

86-
<h5>Response similarity:</h5>
71+
string = "Hello. My name is Brendon."
8772

88-
Sometimes a list of responses in a class can become varied in terms of context, and so in order to get the best possible response, we can use the 'responsecompare' function to compare the input string with your list of responses.
73+
tokens = tokenize(string)
8974

75+
print(tokens)
9076
```
91-
BestResponse = matcher.response_compare(input_string, intent_class)
92-
93-
print(BestResponse)
94-
```
95-
96-
<h5>Text Generation:</h5>
77+
To vectorize:
78+
```python
79+
from Janex.vectortoolkit import *
9780

98-
In experimental phase but included in 0.0.15 and above, the 'ResponseGenerator' function can absorb the response chosen by your response comparer from your intents.json file, and then modify it, replacing words with synonyms, to give it a more unscripted response.
81+
input_string = "Hello, my name is Sheila."
9982

100-
For this to be used, if you haven't got a thesaurus.json file already, the IntentMatcher will automatically download the pre-written example directly from Github and into your chatbot folder.
83+
vectors = string_vectorize(input_string)
10184

102-
After doing so, you may include the feature in your code like this.
85+
vectors = reshape_array_dimensions(vectors, 300) # To reshape the vector array
10386

104-
```
105-
generated_response = matcher.ResponseGenerator(BestResponse)
106-
107-
print(generated_response)
108-
```
87+
secondstring = "Hello, my name is Robert."
10988

110-
Warning: This feature is still work-in-progress, and will only be as effective per the size of your thesaurus file, so don't expect it to be fully stable until I have fully completed it. :)
89+
second_vectors = string_vectorize(secondstring)
11190

112-
<h3> Contributors </h3>
91+
second_vectors = reshape_array_dimensions(second_vectors, 300)
11392

114-
Many thanks to these Github developers for their contributions! :)
93+
similarity = calculate_cosine_similarity(vectors, second_vectors)
11594

116-
@Ethan-Barr
117-
@SoapDoesCode
95+
print(similarity)
11896

119-
<h3> Functionality </h3>
120-
121-
<h4>Version 0.0.17</h4>
122-
123-
- Word tokenizer ✓
124-
- Intent classifier ✓
125-
- Word Stemmer ✓
126-
- Support for Darwin, Linux (GNU) and Windows ✓
127-
- Custom Response Generator (development stage) ✓
128-
- Automatic intents & thesaurus builders ✓
97+
```

Janex/word_manipulation.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,11 @@ def tokenize(input_string):
1616
words = processed_string.split(" ")
1717

1818
return words
19+
20+
def stem(input_word):
21+
suffixes = ["ing", "ly", "ed", "es", "'s", "er", "est", "y", "ily", "able", "ful", "ness", "less", "ment", "ive", "ize", "ous"]
22+
for suffix in suffixes:
23+
if input_word.endswith(suffix):
24+
input_word = input_word[:-len(suffix)]
25+
break
26+
return input_word

dist/Janex-0.0.79b0.tar.gz

5.15 KB
Binary file not shown.

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
name="Janex",
99

1010
# version of the module
11-
version="0.0.79",
11+
version="0.0.79b",
1212

1313
# Name of Author
1414
author="Cipher58",

0 commit comments

Comments
 (0)