-
Notifications
You must be signed in to change notification settings - Fork 24
Email cleaner #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
SvenjaKern
wants to merge
6
commits into
main
Choose a base branch
from
email_cleaner
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Email cleaner #335
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1 @@ | ||||||
This modules removes certain aspects of the email to focus on the context. The aspects being removed are Sentences starting with "EXTERNAL MAIL", Brackets, that starts with "cid.image", everything after the Disclaimer or until the next Mail in case of response-mails and everything after the signiture in brackets or until the next Mail. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typos
Suggested change
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
from pydantic import BaseModel | ||
import re | ||
|
||
INPUT_EXAMPLE = { | ||
"""Hi Sofia, | ||
I hope this email finds you well. I have some exciting news to share with you regarding a potential new client for StellarDefense Insurance. We have recently received an application from a company called Bleyerstift and More, who are in need of insurance coverage. Bleyerstift and More is a reputable company in the manufacturing industry. They operate in the pharmaceutical sector, specializing in the production of medical supplies. With a workforce of approximately 500 employees, they are located at 123 Main Street, Anytown, USA. You can find more information about them on their website at www.bleyerstiftandmore.com. | ||
The client has requested a submission to be completed by April 1st, 2024. They are specifically interested in obtaining a comprehensive general liability insurance policy, with a coverage limit of $1 million for each occurrence. | ||
Please let me know if you require any additional information from them or if there are any specific questions you would like me to address. As for attachments, there is a document that provides a detailed breakdown of Bleyerstift and More's revenue and other pertinent financial information. | ||
I have included this attachment for your reference. I believe this opportunity has great potential for StellarDefense Insurance's growth and would appreciate your assistance in handling this case. If you have any questions or need any further information, please do not hesitate to reach out to me. Thank you for your time and support in this matter. | ||
[cid:[email protected]] | ||
Best regards, | ||
Amelia Smith Insurance Broker StellarDefense Insurance | ||
|
||
DISCLAIMER | ||
|
||
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. | ||
|
||
This email has been scanned for viruses and malware, and may have been automatically archived by blubb. | ||
|
||
From: Bender, Zoe <[email protected]> | ||
Sent: 22 September 2022 16:55 | ||
To: Smith, Amelia <[email protected]> | ||
Subject: Small question | ||
|
||
EXTERNAL EMAIL: This email originated from outside StellarDefense. | ||
Dear Amelia, | ||
I just wanted to know if you have new information for me. If I remember correctly, you told me about a great deal with a new company. Love to hear more about it. | ||
All best | ||
Zoe | ||
[signature]""" | ||
} | ||
|
||
class EmailCleanerModel(BaseModel): | ||
email: str | ||
|
||
class Config: | ||
schema_extra = {"example": INPUT_EXAMPLE} | ||
|
||
|
||
def email_cleaner(req: EmailCleanerModel): | ||
text = req.email | ||
text = re.sub("DISCLAIMER((\w|\s|\S))+?(?=From:|\Z)", "",text, flags=re.IGNORECASE) | ||
text = re.sub("EXTERNAL EMAIL.*?(?=\.)\.", "", text, flags=re.IGNORECASE) | ||
text = re.sub("\[cid:image.*?(?=\])\]", "",text, flags=re.IGNORECASE) | ||
text = re.sub("signature((\w|\s|\S))+?(?=From:|\Z)","",text, flags=re.IGNORECASE) | ||
return text | ||
|
||
|
||
|
48 changes: 48 additions & 0 deletions
48
generators/text_cleaning/email_cleaner/code_snippet_common.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
```python | ||
import re | ||
|
||
def email_cleaner(text): | ||
text = re.sub("DISCLAIMER((\w|\s|\S))+?(?=From:|\Z)", "",text, flags=re.IGNORECASE) | ||
text = re.sub("EXTERNAL EMAIL.*?(?=\.)\.", "", text, re.IGNORECASE) | ||
text = re.sub("\[cid:image.*?(?=\])\]", "",text, re.IGNORECASE) | ||
text = re.sub("signature((\w|\s|\S))+?(?=From:|\Z)","",text, re.IGNORECASE) | ||
return text | ||
|
||
# ↑ necessary bricks stuff | ||
# ----------------------------------------------------------------------------------------- | ||
# ↓ example implementation | ||
|
||
emails = ["""Hi Sofia, | ||
I hope this email finds you well. I have some exciting news to share with you regarding a potential new client for StellarDefense Insurance. We have recently received an application from a company called Bleyerstift and More, who are in need of insurance coverage. Bleyerstift and More is a reputable company in the manufacturing industry. They operate in the pharmaceutical sector, specializing in the production of medical supplies. With a workforce of approximately 500 employees, they are located at 123 Main Street, Anytown, USA. You can find more information about them on their website at www.bleyerstiftandmore.com. | ||
The client has requested a submission to be completed by April 1st, 2024. They are specifically interested in obtaining a comprehensive general liability insurance policy, with a coverage limit of $1 million for each occurrence. | ||
Please let me know if you require any additional information from them or if there are any specific questions you would like me to address. As for attachments, there is a document that provides a detailed breakdown of Bleyerstift and More's revenue and other pertinent financial information. | ||
I have included this attachment for your reference. I believe this opportunity has great potential for StellarDefense Insurance's growth and would appreciate your assistance in handling this case. If you have any questions or need any further information, please do not hesitate to reach out to me. Thank you for your time and support in this matter. | ||
[cid:[email protected]] | ||
Best regards, | ||
Amelia Smith Insurance Broker StellarDefense Insurance | ||
|
||
DISCLAIMER | ||
|
||
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. | ||
|
||
This email has been scanned for viruses and malware, and may have been automatically archived by blubb. | ||
|
||
From: Bender, Zoe <[email protected]> | ||
Sent: 22 September 2022 16:55 | ||
To: Smith, Amelia <[email protected]> | ||
Subject: Small question | ||
|
||
EXTERNAL EMAIL: This email originated from outside StellarDefense. | ||
Dear Amelia, | ||
I just wanted to know if you have new information for me. If I remember correctly, you told me about a great deal with a new company. Love to hear more about it. | ||
All best | ||
Zoe | ||
[signature]"""] | ||
|
||
def example_integration(): | ||
texts = emails | ||
for text in texts: | ||
print(f"the emails will looked cleansed like this:\n{email_cleaner(text)}") | ||
example_integration() | ||
|
||
``` |
13 changes: 13 additions & 0 deletions
13
generators/text_cleaning/email_cleaner/code_snippet_refinery.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
```python | ||
import re | ||
|
||
ATTRIBUTE: str = "headline" #only text attributes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change headline to text please. |
||
|
||
def email_cleaner(record): | ||
text = record[ATTRIBUTE].text | ||
text = re.sub("DISCLAIMER((\w|\s|\S))+?(?=From:|\Z)", "",text, flags=re.IGNORECASE) | ||
text = re.sub("EXTERNAL EMAIL.*?(?=\.)\.", "", text, re.IGNORECASE) | ||
text = re.sub("\[cid:image.*?(?=\])\]", "",text, re.IGNORECASE) | ||
text = re.sub("signature((\w|\s|\S))+?(?=From:|\Z)","",text, re.IGNORECASE) | ||
return text | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
from util.configs import build_generator_function_config | ||
from util.enums import State, RefineryDataType, BricksVariableType, SelectionType | ||
from . import email_cleaner, INPUT_EXAMPLE | ||
|
||
|
||
def get_config(): | ||
return build_generator_function_config( | ||
function=email_cleaner, | ||
input_example=INPUT_EXAMPLE, | ||
issue_id=328, | ||
tabler_icon="square-rounded-letter-e", | ||
min_refinery_version="1.7.0", | ||
state=State.PUBLIC.value, | ||
type="python_function", | ||
kern_token_proxy_usable="false", | ||
docker_image="none", | ||
available_for=["refinery", "common"], | ||
part_of_group=[ | ||
"text_cleaning", | ||
], # first entry should be parent directory | ||
# bricks integrator information | ||
integrator_inputs={ | ||
"name": "email_cleaner", | ||
"refineryDataType": RefineryDataType.TEXT.value, | ||
"variables": { | ||
"ATTRIBUTE": { | ||
"selectionType": SelectionType.CHOICE.value, | ||
"addInfo": [ | ||
BricksVariableType.ATTRIBUTE.value, | ||
BricksVariableType.GENERIC_STRING.value, | ||
], | ||
} | ||
}, | ||
}, | ||
) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be in this PR