-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathresumeNER.py
327 lines (267 loc) · 15.4 KB
/
resumeNER.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
# -*- coding: utf-8 -*-
"""ResumeNER.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1BXhWVpWgIW9Rl_YHkMiP3xGUlhLgWHZF
ENGINEERING OPERATIONS DIRECTOR
Executive Profile
Senior Software Executive Senior Software Executive who is a key contributor to strategic planning and product development. Highly skilled at
creating and implementing key software improvements and process changes by uncovering major process limitations, maximizing profitability,
scalability, and competition in the global marketplace. Accomplishments (over last 15 years +) Reported directly to C-level executives and Board
members, successfully bridging the gap between the business and Engineering, implementing strategic plans and ensuring that the engineering teams
are aligned to business goals. Agile expert and evangelist, running software development teams for over 17 years and specifically agile software
development for more that 13 years. Reorganized and transitioned many teams and companies to become smooth running agile groups, drastically
reducing delivery issues, making the work very transparent, empowering team members to become self-directed and accountable to their
commitments. Skilled at managing entire software development process and employees including QA, Project Management, Technical Support, on
and offshore teams, contractors, subsidiaries, and merger/acquisitions. Successfully on-boarded the engineers and technology from an acquired
company and quickly merged their intellectual property (IP) into the main product line. Advocate for strong Engineering Best Practices, including
design & code reviews, paired programming, unit tests and continuous integration testing through automation. Including, establishing leading and
trailing engineering metrics, which provide strong indicators of product quality and delivery schedule. Managed globally distributed teams ranging
from 15-60 people, with P&L responsibilities between $2M - $6M. Teams have been located in US, Germany, Hungary, Russia, China, and
Argentina. Consistently an early adopter of critical trends in methodologies and practices, which transform and refine processes to increase the
delivery of business value.
Skill Highlights
Technology · Java · Amazon AWS · Hibernate · PHP · EC2 · Elastic Search · C++ · JSMVC · JUnit · C · HTML ·
Selenium · CanJS · CSS · Aurora · PL/SQL · Bootstrap · Jenkins · Oracle · Python · Phabricator · My/SQL · AJAX ·
GitHub · JavaScript · Camel · Jira · REST and SOAP services · MongoDB · Perl
Professional Experience
Engineering Operations Director
January 2014 to Current Company Name ï¼ City , State
A high growth company, whose suite of services help researchers successfully communicate their work.
Identified misalignment between technical teams and business, reorganized the technical teams and aligned technical metrics to support
business KPIs, increasing revenue and cost savings.
Doubled team to 20 people in 4 months, by introducing a new improved hiring process that quickly filtered out non-qualified candidates and
increased our acceptance rate to over 90%.
Awarded Culture Champion Award.
Director of Software Development
January 2012 to January 2014 Company Name ï¼ City , State
A non-profit organization devoted to the advancement and well-being of dogs.
Turned around a multiyear software delivery failure, by re-architecting the approach taken, changing the technology used, and transitioning
the team to Agile; putting the software back on budget and on time.
Reduced technical dependency on old technologies by road mapping out a multiyear strategic technology plan, reducing number of
technologies used throughout the department by 50%.
Responsible for web based PCI compliant e-commerce software, connected to an enterprise database.
Chief Operating Officer
January 2010 to January 2012 Company Name ï¼ City , State
Public safety software and services company focused on enterprise-class software for Fire and EMS Departments.
Implemented a SaaS solution, allowing smaller towns and cities the ability to use and integrate with the Fire and EMS software.
Reduced customer's server upgrade time from 4 days to 4 hours.
Removed the requirement, caused by software limitations, that hard mounted mobile computers be removed from fire trucks and brought
into the IT dept for upgrades.
Reduced a mobile computer's install and upgrade times from 1 day per machine to 2 hours.
Vice President of Engineering
January 2001 to January 2010 Company Name ï¼ City , State
A mid-sized 3D software company for creating digital models of physical objects, including both 'off-the-shelf' and customized commercial
applications.
The software is used globally in markets such as: rapid prototyping, reverse engineering, inspection, and healthcare.
Grew revenue from $0 to over $16M with a CAGR greater than 30% for 6 consecutive years.
Integral in receiving 6 term sheets of similar valuation resulting in $8M in VC funds in 2008.
Expanded company organically from 22 to 110 employees, coordinated effectively with Sales, Product Development, and Marketing teams
to produce globally competitive products.
Conceived of and implemented critical changes in software architectural designs creating a partner eco- system.
Director of Software Development
January 2000 to January 2001 Company Name ï¼ City , State
"""
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Experience with C++ at Apple")
for entity in doc.ents:
print(f"{entity.text} ({entity.label_})")
# Output:
# Apple (ORG)
# UK (GPE)
# $1 billion (MONEY)
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm
nlp = spacy.blank("en")
db = DocBin()
import json
import os
f = open("annotations.json")
TRAIN_DATA = json.load(f)
for text, annot in tqdm(TRAIN_DATA['annotations']):
doc = nlp.make_doc(text)
ents = []
for start, end, label in annot["entities"]:
span = doc.char_span(start, end, label=label, alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents
db.add(doc)
db.to_disk("./training_data_skills.spacy")
#!python -m spacy init config config.cfg --lang en --pipeline ner --optimize efficiency
#!python -m spacy train config.cfg --output ./ --paths.train ./training_data_skills.spacy --paths.dev ./training_data_skills.spacy
nlp_ner = spacy.load("model-best")
doc = nlp_ner(''' Raheem - MECHANICAL ENGINEERING INTERN
Summary
CAD | CAM | Finite Element Analysis | Mechanical Design | Product Design and Development
Skills
5 years of experience with CAD packages (SolidWorks, Autodesk Inventor, AutoCAD, CATIA, PTC CREO)
2.5 years of experience with CAE Softwares (HyperMesh, Abaqus, ANSYS, Optistruct)
2.5 years of experience with Analysis (Linear & Non-linear Static, Dynamic, GD & T, Tolerance Analysis, Design Optimization)
Experience with Sheet metal, Design for manufacturing, generating Bill of Materials, DFMEA, Sculpting.
Experience with advanced material selection for rapid prototyping, advanced manufacturing, welding and 3D printing.
Experience
09/2013 to 05/2014
Company Name
Finite Element Analysis of Industrial Robotic Assembly, Illinois Institute of Technology, Chicago Jan - May 2016.
Conceptualized, brainstormed and designed a 6-axis SCARA Robot for pick and place operation in automotive industry.
Performed static analysis with stainless steel 304 to evaluate the maximum load an assembly can lift before yielding.
Also, analyzed Gripper and joints to eradicate future failures.
Optimized design using OptiStruct by varying mesh sizes and element order.
Simulated assembly with dynamic analysis to find distorted elements and to verify optimized structure.
Reliability Engineering Analysis on Automotive Oil Pump, Illinois Institute of Technology, Chicago Sept - Dec 2015.
Used industrial reliability specifications to select the power consumption and flow rate at three distinct levels of rpm to study its variability.
Improved system using Taguchi analysis by optimizing signal to noise ratio.
Conducted Failure Mode Effect Analysis (FMEA) to analyze potential causes of failures to deliver clean oil upon demand Abstracted and
designed Near Dry Machine with two inlet nozzles.
Performed fluid analysis and actual results on lathe machine.
Provided vegetable oil as a coolant with pressurized air on flank face of the tool, which resulted in unburnt and recyclable chips.
Gearbox Design, Narsee Monjee Institute of Management Studies, Mumbai Jan - May 2013.
Designed a gear box with different gears such as spur, helical worm by considering seals, lubricating oil and bearings.
Assigned materials and performed dynamic simulation to define contact surfaces.
06/2013 to 08/2013
Mechanical Engineering Intern Company Name
Initiated a project to perform a failure investigation in mufflers due to the low clearance of roads and provided feedback.
Established and coordinated maintenance, GD&T, safety procedures, service schedule and supply of materials in the maintenance shop.
Developed failure reports including feedback based on common failures from the automotive industry.
Set up and calibrated accelerometers on Hyundai cars to conduct tests to analyze the modes of vibration of vehicle and the steering column.
05/2012 to 07/2012
Manufacturing Engineering Intern Company Name
Analyzed automation, process parameters, different equipment to shape and control the profile of chips and Manufacturing process of Hot
Strip Coil.
Re-designed the existing shop floor to improve space utilization, increase material flow, optimize labor and reduce holding costs by 5% and
improved space utilization by 20%.
Performed statistical analysis on historical data of the operating parameters using SPC and DOE's to identify significant factors contributing
to process deviation and affecting the cold crushing strength of the pellet.
Generated Bill of Materials and calculated overall manufacturing cost.''')
spacy.displacy.render(doc, style="ent", jupyter=True)
if any(ent.label_ in ['SKILL'] for ent in doc.ents):
print(doc.ents)
import spacy
nlp_ner = spacy.load("model-best")
job_descriptions = [
"We are looking for a Machine Learning Engineer with experience in TensorFlow and PyTorch.",
"Seeking a Backend Developer skilled in Node.js and MongoDB."
]
for text in job_descriptions:
doc = nlp_ner(text)
print(f"Text: {text}")
for ent in doc.ents:
print(f"Entity: {ent.text}, Label: {ent.label_}")
#pip install pdfplumber
#Add custom NER as a pipeline component
from spacy.language import Language
en_core_web_sm = spacy.load("en_core_web_sm")
@Language.component("custom_ner_component")
def custom_ner_component(doc):
custom_doc = nlp_ner(doc.text) #Use custom NER on the text
doc.ents = list(custom_doc.ents) #Replace entities with those from the custom model
return doc
#Add the custom NER component to the pre-trained pipeline
en_core_web_sm.add_pipe("custom_ner_component", name="custom_ner", last=True)
#Use the combined pipeline
import spacy
from spacy.tokens import Span
from spacy.util import filter_spans
en_core_web_sm = spacy.load("en_core_web_sm")
custom_ner = spacy.load("model-best")
for label in custom_ner.pipe_labels["ner"]:
if label not in en_core_web_sm.pipe_labels["ner"]:
en_core_web_sm.vocab.strings.add(label)
def combine_entities(doc, custom_doc):
person_ents = [ent for ent in doc.ents if ent.label_ == "PERSON"]
custom_ents = list(custom_doc.ents)
combined_ents = filter_spans(person_ents + custom_ents)
doc.ents = [Span(doc, ent.start, ent.end, label=ent.label) for ent in combined_ents]
return doc
@spacy.language.Language.component("combine_ner")
def combine_ner_component(doc):
custom_doc = custom_ner(doc.text) #Process text with the custom NER model
doc = combine_entities(doc, custom_doc)
return doc
en_core_web_sm.add_pipe("combine_ner", last=True)
text = "John Doe is a Software Engineer proficient in Python and JavaScript."
doc = en_core_web_sm(text)
print("Entities:")
for ent in doc.ents:
print(f"{ent.text} ({ent.label_})")
print(custom_ner.pipe_labels)
import pdfplumber
def extract_text_from_pdf(file_path):
with pdfplumber.open(file_path) as pdf:
text = ''
for page in pdf.pages:
text += page.extract_text()
return text
def parse_resume(file_path):
if file_path.endswith('.pdf'):
text = extract_text_from_pdf(file_path)
elif file_path.endswith('.docx'):
text = extract_text_from_docx(file_path)
else:
raise ValueError("Unsupported file format. Use PDF or DOCX files.")
doc = en_core_web_sm(text)
parsed_entities = {}
for ent in doc.ents:
parsed_entities[ent.label_] = parsed_entities.get(ent.label_, []) + [ent.text]
return parsed_entities
file_path = "resume1.pdf"
entities = parse_resume(file_path)
print(entities)
def extract_entities_from_job_description(job_description):
#Use the NER model to process the job description
doc = en_core_web_sm(job_description)
entities = {ent.label_: [] for ent in doc.ents}
for ent in doc.ents:
entities[ent.label_].append(ent.text)
return entities
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
def extract_entities_from_job_description(job_description):
#Use the NER model to process the job description
doc = en_core_web_sm(job_description)
entities = {ent.label_: [] for ent in doc.ents}
for ent in doc.ents:
entities[ent.label_].append(ent.text)
return entities
def calculate_match_score(resume_entities, job_description_entities, entity_type):
resume_set = set(resume_entities.get(entity_type, []))
job_set = set(job_description_entities.get(entity_type, []))
if not job_set:
return 0
return len(resume_set & job_set) / len(job_set) * 100
def calculate_text_similarity(resume_text, job_description):
#TF-IDF vectorization for similarity calculation
vectorizer = TfidfVectorizer().fit_transform([resume_text, job_description])
vectors = vectorizer.toarray()
return cosine_similarity([vectors[0]], [vectors[1]])[0][0]
def main(file_path, job_description):
#Parse resume and extract entities
resume_entities = parse_resume(file_path)
#Extract entities from job description
job_description_entities = extract_entities_from_job_description(job_description)
#Calculate skill and role match scores
skill_match = calculate_match_score(resume_entities, job_description_entities, "SKILL")
role_match = calculate_match_score(resume_entities, job_description_entities, "ROLE")
#Calculate text similarity
resume_text = " ".join([text for texts in resume_entities.values() for text in texts])
similarity_score = calculate_text_similarity(resume_text, job_description)
#Print results
print(f"Skill Match Score: {skill_match:.2f}%")
print(f"Role Match Score: {role_match:.2f}%")
print(f"Text Similarity Score: {similarity_score:.2f}")
return {
"skill_match": skill_match,
"role_match": role_match,
"similarity_score": similarity_score
}
file_path = "resume2.pdf"
job_description = """
We are looking for a Software Engineer proficient in Python, machine learning,Java,C programming and cloud technologies like AWS.
The candidate should have experience in leadership roles such as Team Lead or Project Manager.
"""
results = main(file_path, job_description)