Reinstated stemmer

Cipher · Cipher · commit 91c8a8c048ce · 2023-08-31T13:15:43.000+01:00
diff --git a/Janex.egg-info/PKG-INFO b/Janex.egg-info/PKG-INFO
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: Janex
-Version: 0.0.79
+Version: 0.0.79b0
 Home-page: https://github.com/Cipher58/Janex-Python
 Download-URL: https://github.com/Cipher58/Janex-Python.git
 Author: Cipher58
@@ -12,117 +12,86 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 
 # Janex
-A free open-source framework which can be used to build Machine Learning tools, LLMs, and Natural Language Processing scripts with full simplicity.
 
-<h2> What is the purpose of Janex? </h2>
+Janex-Python is a library which can be used to create Natural Language Processing-based programs and other forms of Artificial Intelligence.
 
-Monopolistic companies are confining their Artificial Intelligence research to themselves, and capitalising on it. Even companies that swore to open source their software (OpenAI) went back on their morals, with the releases of GPT-3+, as did other powerful businesses.
+It is part of the Janex eco-system which is designed for developers to use in their own projects for free, licensed under the Free Lily License.
 
-Released under the **new** Free Lily License 1.0, Janex will improve and become a useful tool for users to conduct their own research in the potential of Artifical Intelligence.
+As of update 0.0.80, the entire infrastructure of how this code works has been modified intensely.
 
-If you want to use a more heavyweight but more accurate version of Janex, I would recommend using Janex: PyTorch Edition, which uses Neural Network techniques from PyTorch and NLTK to enhance the prediction accuracy.
-```
-https://pypi.org/project/JanexPT/0.0.1/#files
-```
-
-<h3> How to use </h3>
-
-<h5> Adding to your project </h5>
+### How to use
 
-Firstly, you'll need to install the library using the Python pip package manager.
+First, install Janex using pip.
 
-```
+```bash
 python3 -m pip install Janex
-
-```
-Secondly, you need to import the library into your Python script.
-
 ```
+Next, import it into your code
+```python
 from Janex import *
 ```
 
-<h4>Using Janex in your code</h4>
-
-<h5>Create an instance</h5>
+### Intent classifier
 
-Before anything else, you need to create an instance of the IntentMatcher class. (If you do not have one made already, the program will automatically download a pre-written file created by @SoapDoesCode - big thanks to her for their intents file!)
+To use the pre-built intent classifier included with the package, you need to create an instance of it and then set the intents, vectors and dimensions.
 
-```
-intents_file_path = "./intents.json"
+```python
 
-thesaurus_file_path = "./thesaurus.json"
+from janex.intentclassifier import *
 
-matcher = IntentMatcher(intents_file_path, thesaurus_file_path)
-```
+Classifier = IntentClassifier()
 
-Optional: If you would like to update your thesaurus to your most recent pre-written file, then you can add this code to check for new versions and to download them. Be careful though, this function removes your thesaurus file, which means any unsaved data which doesn't exist on the pre-written file will be erased. (But could possibly be restored in your bin directory)
+Classifier.set_intentsfp("intents.json")
+Classifier.set_vectorsfp("vectors.json")
+Classifier.set_dimensions(300)
 
-```
-matcher.update_thesaurus()
+Classifier.train_vectors()
 ```
 
-<h5>Tokenizing:</h5>
+You can then determine the class of which a certain variable belongs in using the Classifier.claffy() function.
 
-To utilise the tokenizer feature, here is an example of how it can be used.
+```python
+Input = input("You: ")
 
-```
-input_string = "Hello! What is your name?"
+classification = Classifier.classify(Input)
 
-words = matcher.Tokenize(input_string)
+response = random.choice(classification["responses"])
 
-print(words)
+print(response)
 ```
 
-<h5>Intent classifying:</h5>
+### Data experimentation with vectors
 
-To compare the input with the patterns from your intents.json storage file, you have to declare the intents file path.
+If you would like to tokenize, stem or otherwise preprocess data, the Janex library comes with some pre-made tools.
 
-```
-intent_class, similarity = matcher.pattern_compare(input_string)
-
-print(intent_class)
-```
+To tokenize:
+```python
+from Janex.word_manipulation import *
 
-<h5>Response similarity:</h5>
+string = "Hello. My name is Brendon."
 
-Sometimes a list of responses in a class can become varied in terms of context, and so in order to get the best possible response, we can use the 'responsecompare' function to compare the input string with your list of responses.
+tokens = tokenize(string)
 
+print(tokens)
 ```
-BestResponse = matcher.response_compare(input_string, intent_class)
-
-print(BestResponse)
-```
-
-<h5>Text Generation:</h5>
+To vectorize:
+```python
+from Janex.vectortoolkit import *
 
-In experimental phase but included in 0.0.15 and above, the 'ResponseGenerator' function can absorb the response chosen by your response comparer from your intents.json file, and then modify it, replacing words with synonyms, to give it a more unscripted response.
+input_string = "Hello, my name is Sheila."
 
-For this to be used, if you haven't got a thesaurus.json file already, the IntentMatcher will automatically download the pre-written example directly from Github and into your chatbot folder.
+vectors = string_vectorize(input_string)
 
-After doing so, you may include the feature in your code like this.
+vectors = reshape_array_dimensions(vectors, 300) # To reshape the vector array
 
-```
-generated_response = matcher.ResponseGenerator(BestResponse)
-
-print(generated_response)
-```
+secondstring = "Hello, my name is Robert."
 
-Warning: This feature is still work-in-progress, and will only be as effective per the size of your thesaurus file, so don't expect it to be fully stable until I have fully completed it. :)
+second_vectors = string_vectorize(secondstring)
 
-<h3> Contributors </h3>
+second_vectors = reshape_array_dimensions(second_vectors, 300)
 
-Many thanks to these Github developers for their contributions! :)
+similarity = calculate_cosine_similarity(vectors, second_vectors)
 
-@Ethan-Barr
-@SoapDoesCode
+print(similarity)
 
-<h3> Functionality </h3>
-
-<h4>Version 0.0.17</h4>
-
-- Word tokenizer ✓
-- Intent classifier ✓
-- Word Stemmer ✓
-- Support for Darwin, Linux (GNU) and Windows ✓
-- Custom Response Generator (development stage) ✓
-- Automatic intents & thesaurus builders ✓
+```
diff --git a/Janex/word_manipulation.py b/Janex/word_manipulation.py
@@ -16,3 +16,11 @@ def tokenize(input_string):
     words = processed_string.split(" ")
 
     return words
+
+def stem(input_word):
+    suffixes = ["ing", "ly", "ed", "es", "'s", "er", "est", "y", "ily", "able", "ful", "ness", "less", "ment", "ive", "ize", "ous"]
+    for suffix in suffixes:
+        if input_word.endswith(suffix):
+            input_word = input_word[:-len(suffix)]
+            break
+    return input_word
diff --git a/dist/Janex-0.0.79b0.tar.gz b/dist/Janex-0.0.79b0.tar.gz
diff --git a/setup.py b/setup.py
@@ -8,7 +8,7 @@
     name="Janex",
 
     # version of the module
-    version="0.0.79",
+    version="0.0.79b",
 
     # Name of Author
     author="Cipher58",