Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 25, 2025

  • Explore existing codebase and adapter patterns
  • Create GlassdoorJobScraper adapter class mirroring LinkedIn pattern
  • Add Glassdoor-specific types (GlassdoorJob, GlassdoorSearchFilters)
  • Implement URL builder utility for Glassdoor searches
  • Implement HTML parser utilities for job data extraction
  • Implement rating parser for company ratings
  • Add pagination support
  • Create unit tests for the adapter (88 tests passing)
  • Create parser tests with mock HTML fixtures
  • Add Dockerfile for containerized deployment
  • Add README with setup and usage documentation
  • Create adapter integration for agent-orchestrator
  • Integrate with existing PlatformManager
  • Run code review and security checks (0 issues)
  • Address code review feedback:
    • Updated User-Agent to Chrome/124 with env var override
    • Removed duplicate playwright from dependencies (kept in peerDependencies)
    • Renamed parseBasicInfoFromDetailPage to parseJobFromDetailPage
    • Fixed throttle() to use local variable instead of modifying shared config (avoiding race condition)
    • Refactored search() to use beforeParseJobList() hook instead of overriding entire method
    • Added thread-safety documentation to GlassdoorAdapter
    • Improved error logging with full stack traces
    • Refactored glassdoor-adapter.ts to use cheerio for robust HTML parsing instead of fragile regex patterns
    • Extracted User-Agent and salary parsing patterns as constants for maintainability
    • Fixed race condition in rate limiting with mutex pattern for thread-safe concurrent access
  • Fix build errors:
    • Updated TypeScript configs to include DOM lib and ES2020 target
    • Added playwright as dev dependency for glassdoor adapter
    • Configured proper type definitions for node and jest
    • All 88 tests passing in glassdoor adapter
    • Agent orchestrator tests passing
Original prompt

Objective

Develop Glassdoor platform adapter for job scraping, similar to the LinkedIn adapter, to expand job source coverage and provide users with more opportunities.

Related Issue

Closes #22

Requirements

  • Mirror LinkedIn adapter's class structure (extend BaseJobScraper)
  • Scrape jobs with title, company, location, compensation, etc.
  • Export to unified job pipeline JSON format
  • Add platform selection logic in orchestrator
  • Unit tests, usage docs

Technical Specifications

Directory Structure

/services/platform-adapters/
├── glassdoor/
│   ├── src/
│   │   ├── index.ts                  # Main export
│   │   ├── GlassdoorJobScraper.ts    # Core scraper class
│   │   ├── types.ts                  # Glassdoor-specific types
│   │   └── utils/
│   │       ├── urlBuilder.ts         # Build Glassdoor search URLs
│   │       ├── parser.ts             # Parse Glassdoor HTML
│   │       └── ratingParser.ts       # Parse company ratings
│   ├── tests/
│   │   ├── scraper.test.ts           # Unit tests
│   │   ├── parser.test.ts            # Parser tests
│   │   └── fixtures/                 # Mock HTML
│   │       ├── search-results.html
│   │       └── job-detail.html
│   ├── package.json
│   ├── tsconfig.json
│   ├── Dockerfile
│   └── README.md
├── base/
│   └── BaseJobScraper.ts             # Shared base class
└── types/
    └── job.ts                        # Unified Job interface

Glassdoor-Specific Job Interface

interface GlassdoorJob extends Job {
  // Glassdoor-specific fields
  companyRating?: number;           // 1-5 stars
  salaryEstimate?: {
    min: number;
    max: number;
    currency: string;
  };
  reviewCount?: number;
  easyApply: boolean;
  companySize?: string;
  industry?: string;
  headquarters?: string;
}

// Glassdoor search filters
interface GlassdoorSearchFilters extends SearchFilters {
  companyRating?: number;           // Minimum rating (1-5)
  companySize?: 'small' | 'medium' | 'large' | 'enterprise';
  industry?: string;
  easyApplyOnly?: boolean;
}

GlassdoorJobScraper Class

import { BaseJobScraper } from '../base/BaseJobScraper';
import { Browser, Page } from 'playwright';

export class GlassdoorJobScraper extends BaseJobScraper {
  readonly platform = 'glassdoor';
  readonly baseUrl = 'https://www.glassdoor.com';
  
  constructor(config: ScraperConfig) {
    super(config);
  }
  
  // Build Glassdoor search URL from filters
  protected buildSearchUrl(filters: GlassdoorSearchFilters): string {
    const params = new URLSearchParams();
    params.set('keyword', filters.keywords);
    if (filters.location) params.set('locT', 'C');
    if (filters.location) params.set('locKeyword', filters.location);
    if (filters.salary?.min) params.set('minSalary', filters.salary.min.toString());
    if (filters.datePosted) params.set('fromAge', this.mapDatePosted(filters.datePosted));
    
    return `${this.baseUrl}/Job/jobs.htm?${params.toString()}`;
  }
  
  // Parse job cards from search results
  protected async parseJobList(page: Page): Promise<GlassdoorJob[]> {
    const jobCards = await page.$$('[data-test="jobListing"]');
    const jobs: GlassdoorJob[] = [];
    
    for (const card of jobCards) {
      const job = await this.parseJobCard(card);
      jobs.push(job);
    }
    
    return jobs;
  }
  
  // Parse individual job card
  private async parseJobCard(card: ElementHandle): Promise<GlassdoorJob> {
    return {
      id: await card.getAttribute('data-id') || '',
      title: await card.$eval('[data-test="job-title"]', el => el.textContent?.trim() || ''),
      company: await card.$eval('[data-test="employer-name"]', el => el.textContent?.trim() || ''),
      location: await card.$eval('[data-test="emp-location"]', el => el.textContent?.trim() || ''),
      salary: await this.parseSalary(card),
      companyRating: await this.parseRating(card),
      url: await this.parseJobUrl(card),
      postingDate: await this.parseDate(card),
      tags: [],
      source: 'glassdoor',
      scrapedAt: new Date().toISOString(),
      easyApply: await this.hasEasyApply(card)
    };
  }
  
  // Get detailed job info
  async getJobDetails(jobUrl: string): Promise<GlassdoorJob> {
    const page = await this.browser.newPage();
    await page.goto(jobUrl);
    
    return {
      ...await this.parseBasicInfo(page),
      description: await page.$eval('[data-test="job-description"]', el => el.innerHTML),
      companySize: await this.parseCompanySize(page),
      industry: await this.parseIndustry(page),
      headquarters: await this.parseHeadquarters(page)
    };
  }
  
  // Handle pagination
  protected async hasNextPage(page: Page): Promise<boolean> {
    return await page.isVisible('[data-test="pagination-next"]');
  }
  
  protected async goToNextPage(page: Page): Promise<void> {
    await page.click('[data-test="pagination-next"]');
    await page.waitForLoadState('networkidle');
  }
}

URL Builder Utility


</details>

*This pull request was created as a result of the following prompt from Copilot chat.*
> ## Objective
> Develop Glassdoor platform adapter for job scraping, similar to the LinkedIn adapter, to expand job source coverage and provide users with more opportunities.
> 
> ## Related Issue
> Closes #22
> 
> ## Requirements
> - Mirror LinkedIn adapter's class structure (extend BaseJobScraper)
> - Scrape jobs with title, company, location, compensation, etc.
> - Export to unified job pipeline JSON format
> - Add platform selection logic in orchestrator
> - Unit tests, usage docs
> 
> ## Technical Specifications
> 
> ### Directory Structure
> ```
> /services/platform-adapters/
> ├── glassdoor/
> │   ├── src/
> │   │   ├── index.ts                  # Main export
> │   │   ├── GlassdoorJobScraper.ts    # Core scraper class
> │   │   ├── types.ts                  # Glassdoor-specific types
> │   │   └── utils/
> │   │       ├── urlBuilder.ts         # Build Glassdoor search URLs
> │   │       ├── parser.ts             # Parse Glassdoor HTML
> │   │       └── ratingParser.ts       # Parse company ratings
> │   ├── tests/
> │   │   ├── scraper.test.ts           # Unit tests
> │   │   ├── parser.test.ts            # Parser tests
> │   │   └── fixtures/                 # Mock HTML
> │   │       ├── search-results.html
> │   │       └── job-detail.html
> │   ├── package.json
> │   ├── tsconfig.json
> │   ├── Dockerfile
> │   └── README.md
> ├── base/
> │   └── BaseJobScraper.ts             # Shared base class
> └── types/
>     └── job.ts                        # Unified Job interface
> ```
> 
> ### Glassdoor-Specific Job Interface
> ```typescript
> interface GlassdoorJob extends Job {
>   // Glassdoor-specific fields
>   companyRating?: number;           // 1-5 stars
>   salaryEstimate?: {
>     min: number;
>     max: number;
>     currency: string;
>   };
>   reviewCount?: number;
>   easyApply: boolean;
>   companySize?: string;
>   industry?: string;
>   headquarters?: string;
> }
> 
> // Glassdoor search filters
> interface GlassdoorSearchFilters extends SearchFilters {
>   companyRating?: number;           // Minimum rating (1-5)
>   companySize?: 'small' | 'medium' | 'large' | 'enterprise';
>   industry?: string;
>   easyApplyOnly?: boolean;
> }
> ```
> 
> ### GlassdoorJobScraper Class
> ```typescript
> import { BaseJobScraper } from '../base/BaseJobScraper';
> import { Browser, Page } from 'playwright';
> 
> export class GlassdoorJobScraper extends BaseJobScraper {
>   readonly platform = 'glassdoor';
>   readonly baseUrl = 'https://www.glassdoor.com';
>   
>   constructor(config: ScraperConfig) {
>     super(config);
>   }
>   
>   // Build Glassdoor search URL from filters
>   protected buildSearchUrl(filters: GlassdoorSearchFilters): string {
>     const params = new URLSearchParams();
>     params.set('keyword', filters.keywords);
>     if (filters.location) params.set('locT', 'C');
>     if (filters.location) params.set('locKeyword', filters.location);
>     if (filters.salary?.min) params.set('minSalary', filters.salary.min.toString());
>     if (filters.datePosted) params.set('fromAge', this.mapDatePosted(filters.datePosted));
>     
>     return `${this.baseUrl}/Job/jobs.htm?${params.toString()}`;
>   }
>   
>   // Parse job cards from search results
>   protected async parseJobList(page: Page): Promise<GlassdoorJob[]> {
>     const jobCards = await page.$$('[data-test="jobListing"]');
>     const jobs: GlassdoorJob[] = [];
>     
>     for (const card of jobCards) {
>       const job = await this.parseJobCard(card);
>       jobs.push(job);
>     }
>     
>     return jobs;
>   }
>   
>   // Parse individual job card
>   private async parseJobCard(card: ElementHandle): Promise<GlassdoorJob> {
>     return {
>       id: await card.getAttribute('data-id') || '',
>       title: await card.$eval('[data-test="job-title"]', el => el.textContent?.trim() || ''),
>       company: await card.$eval('[data-test="employer-name"]', el => el.textContent?.trim() || ''),
>       location: await card.$eval('[data-test="emp-location"]', el => el.textContent?.trim() || ''),
>       salary: await this.parseSalary(card),
>       companyRating: await this.parseRating(card),
>       url: await this.parseJobUrl(card),
>       postingDate: await this.parseDate(card),
>       tags: [],
>       source: 'glassdoor',
>       scrapedAt: new Date().toISOString(),
>       easyApply: await this.hasEasyApply(card)
>     };
>   }
>   
>   // Get detailed job info
>   async getJobDetails(jobUrl: string): Promise<GlassdoorJob> {
>     const page = await this.browser.newPage();
>     await page.goto(jobUrl);
>     
>     return {
>       ...await this.parseBasicInfo(page),
>       description: await page.$eval('[data-test="job-description"]', el => el.innerHTML),
>       companySize: await this.parseCompanySize(page),
>       industry: await this.parseIndustry(page),
>       headquarters: await this.parseHeadquarters(page)
>     };
>   }
>   
>   // Handle pagination
>   protected async hasNextPage(page: Page): Promise<boolean> {
>     return await page.isVisible('[data-test="pagination-next"]');
>   }
>   
>   protected async goToNextPage(page: Page): Promise<void> {
>     await page.click('[data-test="pagination-next"]');
>     await page.waitForLoadState('networkidle');
>   }
> }
> ```
> 
> ### URL Builder Utility
> ```typescript
> // utils/urlBuilder.ts
> export function buildGlassdoorSearchUrl(filters: GlassdoorSearchFilters): string {
>   const baseUrl = 'https://www.glassdoor.com/Job/jobs.htm';
>   const params: Record<string, string> = {};
>   
>   if (filters.keywords) {
>     params.sc = `keywords=' + encodeURIComponent(filters.keywords);
>   }
>   
>   if (filters.location) {
>     params.locT = 'C';
>     params.locKeyword = filters.location;
>   }
>   
>   if (filters.remote) {
>     params.remoteWorkType = '1';
>   }
>   
>   if (filters.salary?.min) {
>     params.minSalary = filters.salary.min.toString();
>   }
>   
>   if (filters.datePosted) {
>     const dateMap = { '24h': '1', 'week': '7', 'month': '30' };
>     params.fromAge = dateMap[filters.datePosted];
>   }
>   
>   if (filters.easyApplyOnly) {
>     params.applicationType = '1';
>   }
>   
>   const queryString = Object.entries(params)
>     .map(([k, v]) => `${k}=${encodeURIComponent(v)}`)
>     .join('&');
>     
>   return `${baseUrl}?${queryString}`;
> }
> ```
> 
> ### Usage Example
> ```typescript
> import { GlassdoorJobScraper } from './GlassdoorJobScraper';
> 
> const scraper = new GlassdoorJobScraper({
>   headless: true,
>   throttleMs: 3000,
>   maxResults: 50
> });
> 
> const jobs = await scraper.search({
>   keywords: 'software engineer',
>   location: 'New York, NY',
>   remote: true,
>   companyRating: 4,      // 4+ stars only
>   easyApplyOnly: true,
>   datePosted: 'week'
> });
> 
> console.log(`Found ${jobs.length} jobs on Glassdoor`);
> 
> // Get detailed info for top job
> const details = await scraper.getJobDetails(jobs[0].url);
> console.log(`Company size: ${details.companySize}`);
> console.log(`Industry: ${details.industry}`);
> 
> await scraper.exportToJson(jobs, './output/glassdoor-jobs.json');
> ```
> 
> ### Platform Orchestrator Integration
> ```typescript
> // In /services/agent-orchestrator/src/JobSearchOrchestrator.ts
> import { LinkedInJobScraper } from '../platform-adapters/linkedin';
> import { GlassdoorJobScraper } from '../platform-adapters/glassdoor';
> import { Job, SearchFilters } from '../platform-adapters/types/job';
> 
> class JobSearchOrchestrator {
>   private scrapers: Map<string, BaseJobScraper> = new Map();
>   
>   constructor() {
>     this.scrapers.set('linkedin', new LinkedInJobScraper({ headless: true }));
>     this.scrapers.set('glassdoor', new GlassdoorJobScraper({ headless: true }));
>   }
>   
>   async searchAllPlatforms(filters: SearchFilters): Promise<Job[]> {
>     const results = await Promise.all(
>       Array.from(this.scrapers.values()).map(s => s.search(filters))
>     );
>     
>     // Deduplicate by job title + company
>     return this.deduplicateJobs(results.flat());
>   }
>   
>   async searchPlatform(platform: string, filters: SearchFilters): Promise<Job[]> {
>     const scraper = this.scrapers.get(platform);
>     if (!scraper) throw new Error(`Unknown platform: ${platform}`);
>     return scraper.search(filters);
>   }
> }
> ```
> 
> ## Acceptance Criteria
> - [ ] Extends BaseJobScraper from LinkedIn adapter
> - [ ] Scrapes 30+ jobs per search query
> - [ ] Parses company ratings and salary estimates
> - [ ] Handles Glassdoor's anti-bot measures (throttling, user-agent)
> - [ ] Easy Apply filter support
> - [ ] Pagination support for large result sets
> - [ ] Unit tests with mock HTML fixtures (>80% coverage)
> - [ ] Integration with JobSearchOrchestrator
> - [ ] README with setup and usage documentation
> - [ ] Dockerfile for containerized deployment

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/groupthinking/AJOB4AGENT/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

@jazzberry-ai
Copy link

jazzberry-ai bot commented Nov 25, 2025

This repository is associated with groupthinking whose free trial has ended. Subscribe at jazzberry.ai.
If this is an error contact us at [email protected].

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI changed the title [WIP] Add Glassdoor platform adapter for job scraping Add Glassdoor platform adapter for job scraping Nov 25, 2025
Copilot AI requested a review from groupthinking November 25, 2025 23:52
@groupthinking groupthinking marked this pull request as ready for review November 26, 2025 20:39
Copilot AI review requested due to automatic review settings November 26, 2025 20:39
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

// Phase 1: JobSpy platforms (4 core platforms)
const jobSpyPlatforms = params.platforms.filter(p =>
['indeed', 'linkedin', 'glassdoor', 'ziprecruiter'].includes(p)
);

P1 Badge Glassdoor direct adapter never used in multi-platform search

In searchAllPlatforms the Glassdoor platform is always grouped with the JobSpy MCP platforms and sent to jobSpyAdapter (lines 80‑83), and the method has no branch for the new 'glassdoor-direct' option. As a result, multi-platform searches ignore USE_GLASSDOOR_DIRECT=true and drop any 'glassdoor-direct' entry from params.platforms, returning either JobSpy data or no Glassdoor results at all even though a dedicated GlassdoorAdapter was added elsewhere. Consider routing Glassdoor through the dedicated adapter when requested so the new scraper can actually run during parallel searches.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive Glassdoor platform adapter for job scraping, extending the AJOB4AGENT system with a new job source. The implementation follows the established LinkedIn adapter pattern with a Playwright-based scraper, utility functions, comprehensive tests, and integration with the agent orchestrator.

Key Changes

  • New Glassdoor scraper with Playwright-based automation, pagination support, and rate limiting with jitter
  • Comprehensive utility functions for URL building, HTML parsing, and rating/review count parsing
  • 88+ unit tests with HTML fixtures achieving 80%+ coverage targets
  • Agent orchestrator integration with dual routing (dedicated adapter + JobSpy fallback)

Reviewed changes

Copilot reviewed 23 out of 25 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
services/platform-adapters/types/job.ts Shared type definitions for unified Job interface and search filters
services/platform-adapters/glassdoor/src/types.ts Glassdoor-specific types extending base Job interface with rating, salary, and Easy Apply fields
services/platform-adapters/glassdoor/src/GlassdoorJobScraper.ts Core scraper class with modal dismissal, throttling, and pagination logic
services/platform-adapters/glassdoor/src/utils/*.ts URL builder, HTML parser, and rating parser utilities
services/platform-adapters/glassdoor/src/base/* Base scraper class and duplicated type definitions
services/platform-adapters/glassdoor/tests/*.test.ts Comprehensive unit tests for scraper, parsers, and utilities
services/platform-adapters/glassdoor/tests/fixtures/*.html Mock HTML fixtures for testing
services/platform-adapters/glassdoor/package.json Package configuration with dependencies and test scripts
services/platform-adapters/glassdoor/Dockerfile Multi-stage Docker build with Playwright support
services/platform-adapters/glassdoor/README.md Comprehensive documentation with API reference and examples
services/agent-orchestrator/src/adapters/glassdoor-adapter.ts HTTP-based adapter for agent orchestrator (simplified regex parsing)
services/agent-orchestrator/src/adapters/platform-manager.ts Integration with dual routing for glassdoor/glassdoor-direct platforms
.gitignore Excludes TypeScript build artifacts from platform-adapters

Comment on lines 244 to 297
async search(filters: GlassdoorSearchFilters): Promise<GlassdoorJob[]> {
await this.initialize();

const page = await this.newPage();
const allJobs: GlassdoorJob[] = [];

try {
const searchUrl = this.buildSearchUrl(filters);
console.log(`🔍 Searching Glassdoor: ${searchUrl}`);

await this.throttle();
await page.goto(searchUrl, {
waitUntil: 'networkidle',
timeout: this.config.timeout
});

// Handle potential modal or overlay
await this.dismissModals(page);

let pageNum = 1;
const maxResults = this.config.maxResults || 50;

while (allJobs.length < maxResults) {
console.log(`📄 Processing page ${pageNum}...`);

// Handle modals that might appear mid-session
await this.dismissModals(page);

const jobs = await this.parseJobList(page);
allJobs.push(...jobs);

console.log(`✅ Total jobs collected: ${allJobs.length}`);

if (allJobs.length >= maxResults) {
break;
}

const hasMore = await this.hasNextPage(page);
if (!hasMore) {
console.log('📍 No more pages available');
break;
}

await this.throttle();
await this.goToNextPage(page);
pageNum++;
}

return allJobs.slice(0, maxResults);

} finally {
await page.close();
}
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search() method is fully overridden in the subclass and doesn't call super.search(), which duplicates pagination logic from the base class. This violates the DRY principle and makes maintenance harder since changes to search logic would need to be duplicated. Consider either calling super.search() and only customizing the modal dismissal, or refactoring the base class to allow hooks for platform-specific behavior.

Copilot uses AI. Check for mistakes.
Comment on lines 156 to 205
private async parseBasicInfoFromDetailPage(page: Page): Promise<GlassdoorJob> {
const title = await page.$eval('[data-test="job-title"], .JobDetails_jobTitle__bFQf_, h1',
el => el.textContent?.trim() || ''
).catch(() => 'Unknown Title');

const company = await page.$eval('[data-test="employer-name"], .EmployerProfile_employerName__Xemli',
el => el.textContent?.trim() || ''
).catch(() => 'Unknown Company');

const location = await page.$eval('[data-test="emp-location"], .JobDetails_location__4j4Qv',
el => el.textContent?.trim() || ''
).catch(() => '');

const salary = await page.$eval('[data-test="detailSalary"], .JobDetails_salaryEstimate__cPQyl',
el => el.textContent?.trim() || ''
).catch(() => undefined);

const ratingText = await page.$eval('[data-test="rating"], .EmployerProfile_ratingValue__2BBWA',
el => el.textContent?.trim() || ''
).catch(() => '');

const reviewCountText = await page.$eval('[data-test="review-count"]',
el => el.textContent?.trim() || ''
).catch(() => '');

const easyApply = await page.$('[data-test="easy-apply"], .JobDetails_easyApply__YZw6j')
.then(el => el !== null)
.catch(() => false);

// Extract job ID from URL
const url = page.url();
const idMatch = url.match(/jl=(\d+)/) || url.match(/-JV_(\d+)/);
const id = idMatch ? idMatch[1] : `gd-${Date.now()}`;

return {
id,
title,
company,
location,
salary: salary || undefined,
companyRating: ratingText ? parseRating(ratingText) : undefined,
reviewCount: reviewCountText ? parseReviewCount(reviewCountText) : undefined,
easyApply,
salaryEstimate: salary ? parseSalaryEstimate(salary) : undefined,
url,
tags: [],
source: 'glassdoor',
scrapedAt: new Date().toISOString()
};
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseBasicInfoFromDetailPage method returns a GlassdoorJob object with potentially empty/default values when elements are not found (lines 157-203). However, the method name suggests it only parses "basic info", not a complete job object. This could lead to confusion about what data is guaranteed to be present. Consider renaming to parseJobFromDetailPage or returning a partial type to clarify expectations.

Copilot uses AI. Check for mistakes.
Comment on lines 97 to 98
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
console.error('❌ Glassdoor search failed:', errorMessage);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing error types in the catch block - using Error type check but also catching all errors with unknown. The pattern on line 97 error instanceof Error ? error.message : 'Unknown error' is correct, but this could be simplified using optional chaining: (error as Error)?.message || 'Unknown error'. More importantly, errors should be logged with full stack traces for debugging, not just the message.

Suggested change
const errorMessage = error instanceof Error ? error.message : 'Unknown error';
console.error('❌ Glassdoor search failed:', errorMessage);
const errorMessage = (error as Error)?.message || 'Unknown error';
console.error('❌ Glassdoor search failed:', errorMessage, error);

Copilot uses AI. Check for mistakes.
Comment on lines 49 to 72
private async rateLimitInterceptor(config: any) {
const now = Date.now();
const timeSinceLastRequest = now - this.lastRequestTime;

if (timeSinceLastRequest < this.minRequestInterval) {
const delay = this.minRequestInterval - timeSinceLastRequest;
console.log(`⏳ Glassdoor rate limiting: waiting ${delay}ms`);
await new Promise(resolve => setTimeout(resolve, delay));
}

if (this.requestCount >= this.maxRequestsPerWindow) {
const windowReset = this.lastRequestTime + this.rateLimitWindow;
if (now < windowReset) {
const delay = windowReset - now;
console.log(`⚠️ Glassdoor rate limit reached: waiting ${Math.ceil(delay / 1000)}s`);
await new Promise(resolve => setTimeout(resolve, delay));
this.requestCount = 0;
}
}

this.lastRequestTime = Date.now();
this.requestCount++;

return config;
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rate limiting logic has a potential race condition. The requestCount is checked and incremented in the interceptor (lines 59-70), but if multiple requests are made concurrently, they could all pass the check before any increments the counter. Consider using an atomic operation or mutex to ensure thread-safety, or document that this adapter should not be used concurrently.

Copilot uses AI. Check for mistakes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot @claude Provide a fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 2dca7cc. Implemented a mutex pattern using Promise chaining for thread-safe rate limiting:

  • Added acquireRateLimitLock() method that ensures only one request processes rate limiting at a time
  • Uses Promise chaining to serialize access - each request waits for the previous to complete before checking/updating counters
  • Added windowStartTime for proper rate limit window tracking
  • Updated getRateLimitStatus() to use the correct window timing

The adapter is now safe for concurrent access.

Comment on lines 56 to 234
console.log('⚠️ No job listings found on page');
return [];
}

// Get page HTML and parse job cards
const html = await page.content();
const rawJobs = parseJobListHtml(html);

// Transform raw job cards to GlassdoorJob format
const jobs: GlassdoorJob[] = rawJobs.map(raw => {
const job = transformRawJobCard(raw);

// Parse salary estimate if available
if (raw.salary) {
job.salaryEstimate = parseSalaryEstimate(raw.salary);
}

return job;
});

console.log(`📋 Parsed ${jobs.length} jobs from current page`);
return jobs;
}

/**
* Check if there's a next page available
*/
protected async hasNextPage(page: Page): Promise<boolean> {
try {
const html = await page.content();
return hasNextPageButton(html);
} catch {
return false;
}
}

/**
* Navigate to the next page
*/
protected async goToNextPage(page: Page): Promise<void> {
const nextButton = await page.$('[data-test="pagination-next"], .nextButton, [aria-label="Next"]');

if (nextButton) {
await nextButton.click();
await page.waitForLoadState('networkidle');
}
}

/**
* Get detailed job information from a job detail page
*/
async getJobDetails(jobUrl: string): Promise<GlassdoorJob> {
await this.initialize();

const page = await this.newPage();

try {
await this.throttle();
await page.goto(jobUrl, {
waitUntil: 'networkidle',
timeout: this.config.timeout
});

// Wait for job description to load
await page.waitForSelector('[data-test="job-description"], .JobDetails_jobDescription__uW_fK, #JobDescriptionContainer', {
timeout: 10000
}).catch(() => {
console.log('⚠️ Job description selector not found, continuing...');
});

// Parse the page content
const html = await page.content();
const detail = parseJobDetailHtml(html);

// Get basic info from the page
const basicInfo = await this.parseBasicInfoFromDetailPage(page);

return {
...basicInfo,
descriptionHtml: detail.description,
description: this.stripHtml(detail.description),
companySize: detail.companySize,
industry: detail.industry,
headquarters: detail.headquarters,
benefits: detail.benefits,
skills: detail.skills,
employmentType: detail.employmentType,
url: jobUrl,
source: 'glassdoor',
scrapedAt: new Date().toISOString()
};

} finally {
await page.close();
}
}

/**
* Parse basic job info from the detail page
*/
private async parseBasicInfoFromDetailPage(page: Page): Promise<GlassdoorJob> {
const title = await page.$eval('[data-test="job-title"], .JobDetails_jobTitle__bFQf_, h1',
el => el.textContent?.trim() || ''
).catch(() => 'Unknown Title');

const company = await page.$eval('[data-test="employer-name"], .EmployerProfile_employerName__Xemli',
el => el.textContent?.trim() || ''
).catch(() => 'Unknown Company');

const location = await page.$eval('[data-test="emp-location"], .JobDetails_location__4j4Qv',
el => el.textContent?.trim() || ''
).catch(() => '');

const salary = await page.$eval('[data-test="detailSalary"], .JobDetails_salaryEstimate__cPQyl',
el => el.textContent?.trim() || ''
).catch(() => undefined);

const ratingText = await page.$eval('[data-test="rating"], .EmployerProfile_ratingValue__2BBWA',
el => el.textContent?.trim() || ''
).catch(() => '');

const reviewCountText = await page.$eval('[data-test="review-count"]',
el => el.textContent?.trim() || ''
).catch(() => '');

const easyApply = await page.$('[data-test="easy-apply"], .JobDetails_easyApply__YZw6j')
.then(el => el !== null)
.catch(() => false);

// Extract job ID from URL
const url = page.url();
const idMatch = url.match(/jl=(\d+)/) || url.match(/-JV_(\d+)/);
const id = idMatch ? idMatch[1] : `gd-${Date.now()}`;

return {
id,
title,
company,
location,
salary: salary || undefined,
companyRating: ratingText ? parseRating(ratingText) : undefined,
reviewCount: reviewCountText ? parseReviewCount(reviewCountText) : undefined,
easyApply,
salaryEstimate: salary ? parseSalaryEstimate(salary) : undefined,
url,
tags: [],
source: 'glassdoor',
scrapedAt: new Date().toISOString()
};
}

/**
* Strip HTML tags from a string
*/
private stripHtml(html: string): string {
return html
.replace(/<[^>]*>/g, ' ')
.replace(/\s+/g, ' ')
.trim();
}

/**
* Handle Glassdoor's anti-bot measures
* Override the throttle method with more aggressive rate limiting
*/
protected async throttle(): Promise<void> {
const now = Date.now();
const elapsed = now - this.lastRequestTime;

// Use a longer throttle time for Glassdoor (default 3 seconds)
const throttleMs = this.config.throttleMs || 3000;

// Add some randomness to avoid detection patterns
const jitter = Math.floor(Math.random() * 1000);
const totalDelay = throttleMs + jitter;

if (elapsed < totalDelay) {
const delay = totalDelay - elapsed;
console.log(`⏳ Throttling: waiting ${delay}ms`);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The console.log statements should be replaced with proper logging using a logger instance. Using console.log directly in production code makes it difficult to control log levels, format output consistently, or redirect logs to different destinations. Consider using a logging library or creating a logger interface.

Copilot uses AI. Check for mistakes.
await new Promise(resolve => setTimeout(resolve, delay));
}

this.lastRequestTime = Date.now();
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lastRequestTime property is declared as protected in the base class, but the throttle method in GlassdoorJobScraper (line 223) attempts to access it. While this is technically correct for protected members, the overridden throttle() method never calls super.throttle(), which means the base class's lastRequestTime won't be updated. This could cause issues if other methods in the base class rely on lastRequestTime being current.

Suggested change
this.lastRequestTime = Date.now();
this.lastRequestTime = Date.now();
await super.throttle();

Copilot uses AI. Check for mistakes.
@vercel
Copy link

vercel bot commented Nov 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
ajob-4-agent Ready Ready Preview, Comment Dec 20, 2025 2:11am

@groupthinking
Copy link
Owner

@copilot apply changes based on the comments in this thread

…efactor search() to use hooks, add thread-safety docs

Co-authored-by: groupthinking <[email protected]>
Copy link
Contributor Author

Copilot AI commented Nov 26, 2025

@copilot apply changes based on the comments in this thread

Applied all code review changes in commit d54b047:

  • Updated User-Agent to Chrome/124, made configurable via SCRAPER_USER_AGENT env var
  • Removed duplicate playwright from dependencies (kept only in peerDependencies)
  • Renamed parseBasicInfoFromDetailPageparseJobFromDetailPage with clarified docs
  • Refactored throttle() to properly call super.throttle()
  • Refactored search() to use new beforeParseJobList() hook instead of duplicating base class logic
  • Added thread-safety documentation to GlassdoorAdapter
  • Improved error logging to include full stack traces

All 88 tests still passing.

groupthinking and others added 2 commits November 26, 2025 21:28
Co-authored-by: Copilot <[email protected]>
…tead of fragile regex patterns

Co-authored-by: groupthinking <[email protected]>
Copilot AI and others added 2 commits November 27, 2025 03:47
Co-authored-by: vercel[bot] <35613825+vercel[bot]@users.noreply.github.com>
groupthinking
groupthinking previously approved these changes Nov 27, 2025
const searchParams: Record<string, string> = {};

if (params.searchTerm) {
searchParams.sc = 'keyword=' + encodeURIComponent(params.searchTerm);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search parameter construction double-encodes the search term, causing malformed query parameters that Glassdoor won't understand.

View Details
📝 Patch Details
diff --git a/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts b/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts
index 45e1593..5e70728 100644
--- a/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts
+++ b/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts
@@ -166,7 +166,7 @@ export class GlassdoorAdapter {
     const searchParams: Record<string, string> = {};
 
     if (params.searchTerm) {
-      searchParams.sc = 'keyword=' + encodeURIComponent(params.searchTerm);
+      searchParams.sc = 'keyword=' + params.searchTerm;
     }
 
     if (params.location) {
diff --git a/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts b/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
index 83a9553..6776cb7 100644
--- a/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
+++ b/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
@@ -55,7 +55,7 @@ export function buildGlassdoorSearchUrl(filters: GlassdoorSearchFilters): string
 
   // Keywords/search term
   if (filters.keywords) {
-    params.sc = 'keyword=' + encodeURIComponent(filters.keywords);
+    params.sc = 'keyword=' + filters.keywords;
   }
 
   // Location handling

Analysis

Double-encoding of search parameters causes malformed Glassdoor API requests

What fails: Search parameter construction in GlassdoorAdapter.buildSearchParams() and buildGlassdoorSearchUrl() double-encodes the search term, resulting in malformed query parameters that Glassdoor cannot parse.

How to reproduce:

// In services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
const filters = { keywords: 'software engineer' };
const url = buildGlassdoorSearchUrl(filters);
// Result: ?sc=keyword%3Dsoftware%2520engineer

What happens: When axios/URL API receives a pre-encoded parameter value like sc='keyword=software%20engineer', it automatically URL-encodes all parameter values again, resulting in:

  • Original space: 20 → encoded to %20 → re-encoded to %2520
  • Original = sign: → encoded to %3D
  • Final URL: ?sc=keyword%3Dsoftware%2520engineer
  • Glassdoor receives: sc parameter value = "keyword=software%20engineer" (with literal %20 instead of space)

Expected behavior: axios should handle all URL encoding automatically. Parameters passed to axios should NOT be pre-encoded. Let axios/URL API do the single encoding pass:

  • With fix: ?sc=keyword%3Dsoftware+engineer
  • Glassdoor receives: sc parameter value = "keyword=software engineer" (properly decoded)

Fix applied: Removed encodeURIComponent() calls in both:

  1. services/agent-orchestrator/src/adapters/glassdoor-adapter.ts line 169
  2. services/platform-adapters/glassdoor/src/utils/urlBuilder.ts line 58

Changed from: searchParams.sc = 'keyword=' + encodeURIComponent(params.searchTerm); Changed to: searchParams.sc = 'keyword=' + params.searchTerm;

This allows axios to perform a single, correct URL encoding rather than double-encoding the parameter values.

Verification: All existing tests pass, URL now correctly generates sc=keyword%3Dsoftware+engineer instead of sc=keyword%3Dsoftware%2520engineer

Comment on lines +58 to +126
params.sc = 'keyword=' + encodeURIComponent(filters.keywords);
}

// Location handling
if (filters.location) {
params.locT = 'C'; // City type
params.locKeyword = filters.location;
}

// Remote work filter
if (filters.remote) {
params.remoteWorkType = '1';
}

// Salary filter
if (filters.salary?.min) {
params.minSalary = filters.salary.min.toString();
}
if (filters.salary?.max) {
params.maxSalary = filters.salary.max.toString();
}

// Date posted filter
const fromAge = mapDatePosted(filters.datePosted);
if (fromAge) {
params.fromAge = fromAge;
}

// Easy Apply filter
if (filters.easyApplyOnly) {
params.applicationType = '1';
}

// Company rating filter (minimum stars)
if (
filters.companyRating !== undefined &&
filters.companyRating !== null &&
filters.companyRating >= 1 &&
filters.companyRating <= 5
) {
params.minRating = filters.companyRating.toString();
}

// Company size filter
const employeeCount = mapCompanySize(filters.companySize);
if (employeeCount) {
params.employerSizes = employeeCount;
}

// Job type filter
const jobType = mapJobType(filters.jobType);
if (jobType) {
params.jobType = jobType;
}

// Industry filter
if (filters.industry) {
params.industry = filters.industry;
}

// Company name filter
if (filters.companyName) {
params.employer = filters.companyName;
}

// Build query string
const queryString = Object.entries(params)
.map(([key, value]) => `${key}=${encodeURIComponent(value)}`)
.join('&');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL builder double-encodes the search term by first encoding it, then encoding it again when building the final query string.

View Details
📝 Patch Details
diff --git a/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts b/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts
index 45e1593..5e70728 100644
--- a/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts
+++ b/services/agent-orchestrator/src/adapters/glassdoor-adapter.ts
@@ -166,7 +166,7 @@ export class GlassdoorAdapter {
     const searchParams: Record<string, string> = {};
 
     if (params.searchTerm) {
-      searchParams.sc = 'keyword=' + encodeURIComponent(params.searchTerm);
+      searchParams.sc = 'keyword=' + params.searchTerm;
     }
 
     if (params.location) {
diff --git a/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts b/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
index 83a9553..6776cb7 100644
--- a/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
+++ b/services/platform-adapters/glassdoor/src/utils/urlBuilder.ts
@@ -55,7 +55,7 @@ export function buildGlassdoorSearchUrl(filters: GlassdoorSearchFilters): string
 
   // Keywords/search term
   if (filters.keywords) {
-    params.sc = 'keyword=' + encodeURIComponent(filters.keywords);
+    params.sc = 'keyword=' + filters.keywords;
   }
 
   // Location handling

Analysis

Double-encoding of search terms in Glassdoor URL builder

What fails: The URL builder pre-encodes the search term keyword using encodeURIComponent(), then encodes it again when building the query string. This results in %20 (space character) being double-encoded as %2520, which malforms the search parameters sent to Glassdoor.

How to reproduce:

import { buildGlassdoorSearchUrl } from './services/platform-adapters/glassdoor/src/utils/urlBuilder';

const filters = { keywords: 'python developer' };
const url = buildGlassdoorSearchUrl(filters);
console.log(url);
// Actual:   https://www.glassdoor.com/Job/jobs.htm?sc=keyword%3Dpython%2520developer
// Expected: https://www.glassdoor.com/Job/jobs.htm?sc=keyword%3Dpython%20developer

What happens vs. expected behavior:

  • Current (broken): sc=keyword%3Dpython%2520developer - the space becomes %2520 (double-encoded)
  • When decoded once: keyword=python%20developer (still has encoded space)
  • Expected: sc=keyword%3Dpython%20developer - space should be %20 only (single-encoded)

Root cause: At line 58 of services/platform-adapters/glassdoor/src/utils/urlBuilder.ts, the code pre-encodes the search term:

params.sc = 'keyword=' + encodeURIComponent(filters.keywords);

Then at lines 124-126, it encodes all parameter values again:

const queryString = Object.entries(params)
  .map(([key, value]) => ` 

This causes double-encoding of the entire keyword= parameter value. Since axios automatically applies encodeURIComponent to param values, the same issue exists in services/agent-orchestrator/src/adapters/glassdoor-adapter.ts at line 169.

Fix: Remove the pre-encoding at the parameter construction stage and let the query string builder handle all encoding:

In services/platform-adapters/glassdoor/src/utils/urlBuilder.ts (line 58):

params.sc = 'keyword=' + filters.keywords;  // Remove encodeURIComponent

In services/agent-orchestrator/src/adapters/glassdoor-adapter.ts (line 169):

searchParams.sc = 'keyword=' + params.searchTerm;  // Remove encodeURIComponent

Both locations use the same pattern and require the same fix.

@groupthinking
Copy link
Owner

@copilot Fix this repo

Copy link
Contributor Author

Copilot AI commented Dec 20, 2025

@copilot Fix this repo

I see build errors in both services. The main issues are:

Glassdoor adapter:

  • Missing dependencies: playwright, cheerio, @types/node
  • TypeScript configuration needs adjustment (lib settings for DOM/ES2019+)

Agent orchestrator:

  • Missing type definitions: @types/node, @types/jest
  • Missing dependencies: axios, cheerio
  • Similar TypeScript lib configuration issues

Would you like me to:

  1. Fix all missing dependencies and type definitions?
  2. Update TypeScript configurations to resolve lib/target issues?
  3. Address any specific functionality issues?

Or should I focus on a specific area?

…nd playwright dev dependency

Co-authored-by: groupthinking <[email protected]>
Comment on lines 93 to 136
/**
* Export jobs to JSON file
* Search for jobs with the given filters
*/
async exportToJson(jobs: Job[], filePath: string): Promise<void> {
const fs = await import('fs/promises');
const path = await import('path');
async search(filters: F): Promise<T[]> {
await this.initialize();

const page = await this.newPage();
const allJobs: T[] = [];

try {
const searchUrl = this.buildSearchUrl(filters);
await this.throttle();
await page.goto(searchUrl, {
waitUntil: 'networkidle',
timeout: this.config.timeout
});

// Ensure directory exists
const dir = path.dirname(filePath);
await fs.mkdir(dir, { recursive: true });
let pageNum = 1;
const maxResults = this.config.maxResults || 50;

// Write jobs to file
while (allJobs.length < maxResults) {
const jobs = await this.parseJobList(page);
allJobs.push(...jobs);

if (allJobs.length >= maxResults) {
break;
}

const hasMore = await this.hasNextPage(page);
if (!hasMore) {
break;
}

await this.throttle();
await this.goToNextPage(page);
pageNum++;
}

return allJobs.slice(0, maxResults);

} finally {
await page.close();
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared BaseJobScraper class is missing the beforeParseJobList() hook and its integration into the search() method, making it architecturally incomplete compared to the improved version in the Glassdoor module.

View Details
📝 Patch Details
diff --git a/services/platform-adapters/base/BaseJobScraper.ts b/services/platform-adapters/base/BaseJobScraper.ts
index 4f7c267..9c4ddb8 100644
--- a/services/platform-adapters/base/BaseJobScraper.ts
+++ b/services/platform-adapters/base/BaseJobScraper.ts
@@ -90,6 +90,14 @@ export abstract class BaseJobScraper<T extends Job = Job, F extends SearchFilter
     this.lastRequestTime = Date.now();
   }
 
+  /**
+   * Hook called before parsing job list on each page.
+   * Subclasses can override to add platform-specific behavior (e.g., dismissing modals).
+   */
+  protected async beforeParseJobList(_page: Page): Promise<void> {
+    // Default implementation does nothing
+  }
+
   /**
    * Search for jobs with the given filters
    */
@@ -107,10 +115,16 @@ export abstract class BaseJobScraper<T extends Job = Job, F extends SearchFilter
         timeout: this.config.timeout 
       });
 
+      // Allow subclasses to perform platform-specific setup
+      await this.beforeParseJobList(page);
+
       let pageNum = 1;
       const maxResults = this.config.maxResults || 50;
 
       while (allJobs.length < maxResults) {
+        // Allow subclasses to handle modals or other interruptions before parsing
+        await this.beforeParseJobList(page);
+        
         const jobs = await this.parseJobList(page);
         allJobs.push(...jobs);
 

Analysis

Missing beforeParseJobList() hook in root BaseJobScraper class

What fails: The root BaseJobScraper class in services/platform-adapters/base/BaseJobScraper.ts is missing the beforeParseJobList() extension hook and its integration into the search() method, causing architectural inconsistency with the improved Glassdoor adapter implementation.

How to reproduce: Compare the two files:

  • Root base: services/platform-adapters/base/BaseJobScraper.ts
  • Glassdoor base: services/platform-adapters/glassdoor/src/base/BaseJobScraper.ts

The Glassdoor version includes (lines 93-136 in complete form):

  1. A protected beforeParseJobList(_page: Page): Promise<void> hook method with default no-op implementation
  2. A call to await this.beforeParseJobList(page) after the initial page navigation
  3. A call to await this.beforeParseJobList(page) before each iteration of the job parsing loop

The root base class was missing all three elements.

Result before fix: Subclasses using the root base class cannot override beforeParseJobList() to handle platform-specific setup (like dismissing modals) without reimplementing the entire search() method.

Expected behavior: Both base classes should be architecturally identical, providing the same extension points. The beforeParseJobList() hook allows subclasses to add platform-specific behavior (demonstrated in GlassdoorJobScraper.beforeParseJobList() which dismisses modals) without needing to override the entire search() method.

Fix applied: Added the missing hook method and integrated it into the search() method at two points:

  1. After initial page load (allows platform-specific setup before first parse)
  2. Before each parsing loop iteration (allows cleanup between pages, e.g., dismissing modals)

This provides a consistent, extensible base class for all platform adapters and matches the architectural pattern established in the Glassdoor implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Glassdoor Job Scraper Adapter

2 participants