Complete regex and AST patterns for extracting API endpoints from JavaScript bundles and HTML files.
- HTML Patterns
- JavaScript AST Patterns
- JavaScript Regex Patterns
- Auth Header Patterns
- Pattern Ranking & Analysis
Pattern:
<script[^>]+src=["']([^"']+?)["']Examples Found:
<script src="https://api.example.com/v1/config.js"></script>
<script src="/api/static/bundle.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script>Pros:
- Extremely high precision for actual endpoint discovery
- Captures both absolute and relative URLs
- Simple and fast
- Low false positive rate
Cons:
- May include CDN URLs that are not API endpoints (filtering required)
- Won't catch inline script endpoints
- May miss data attributes with endpoint info
False Positives:
- CDN library URLs (e.g., cdn.jsdelivr.net, cdnjs.cloudflare.com)
- Static asset URLs (e.g., .css, .png, .jpg)
- Third-party analytics scripts (e.g., google-analytics.com)
Confidence Score: 9.5/10
Pattern:
<a[^>]+href=["']([^"']*(?:api|v[0-9]+|rest|graphql|endpoint)[^"']*)["']Examples Found:
<a href="https://api.example.com/v1/users">Users API</a>
<a href="/api/v2/products">Products</a>
<a href="https://example.com/rest/search">Search</a>Pros:
- Captures documented API endpoints in navigation
- Good for discovering public APIs
- Works with anchor tags in documentation pages
Cons:
- May include navigation links that aren't actual API endpoints
- Requires filtering for /api/, /v1/, /rest/ keywords
- Misses hidden/internal endpoints
False Positives:
- Documentation links (e.g., href="/api/docs")
- Help pages (e.g., href="/api/v1/help")
- Section anchors (e.g., href="#api-section")
Confidence Score: 7/10
Pattern:
<form[^>]+action=["']([^"']+)["'][^>]*>Examples Found:
<form action="https://api.example.com/v1/submit" method="POST">
<form action="/api/login" method="post">
<form action="/graphql" method="POST">Pros:
- Captures form submission endpoints (often critical APIs)
- Can discover POST endpoints that are otherwise hidden
- Includes method information for HTTP verb discovery
Cons:
- May include non-API forms (search, contact, newsletter)
- Some forms submit to frontend handlers, not backend APIs
- AJAX forms may not use action attribute
False Positives:
- Frontend-only forms (e.g., search bar filters)
- Newsletter signup forms (e.g., form.substack.com)
- Contact forms using third-party services
Confidence Score: 6/10
Pattern:
<img[^>]+src=["']([^"']*(?:api|avatar|profile|image|upload)[^"']*(?<!\.(?:jpg|jpeg|png|gif|svg|webp)))["']Examples Found:
<img src="https://api.example.com/v1/avatar/user123">
<img src="/api/v2/image/preview/abc123">Pros:
- Can discover image upload/processing APIs
- Good for finding avatar/profile endpoints
Cons:
- High false positive rate (most
src are static images)
- Requires negative lookahead to exclude actual image files
- Many image APIs use dynamic URLs with IDs/tokens
False Positives:
- Static image files (e.g., /images/logo.png)
- CDN-hosted images
- Placeholder images
Confidence Score: 4/10
Pattern:
<meta[^>]+(?:name|property|http-equiv)=["'](?:api-url|api-endpoint|api-base)["'][^>]+content=["']([^"']+)["']Examples Found:
<meta name="api-url" content="https://api.example.com/v1">
<meta property="api:base" content="https://example.com/api">
<meta http-equiv="X-API-Endpoint" content="/graphql">Pros:
- Captures configuration-level API endpoints
- Good for discovering base URLs
- Often includes versioning information
Cons:
- Rare pattern (not commonly used)
- May include example/placeholder URLs in documentation
- Requires specific meta tag names
False Positives:
- Placeholder URLs in templates
- Documentation meta tags
- Example URLs in code snippets
Confidence Score: 5/10
Pattern:
<link[^>]+(?:rel=["'](?:alternate|api|endpoint)|href=["'][^"']*(?:api|v[0-9]+))[^>]+href=["']([^"']+)["']Examples Found:
<link rel="api" href="https://api.example.com/v1">
<link rel="alternate" href="https://example.com/api/v1/feed.xml">Pros:
- Captures API discovery links (HAL, HATEOAS patterns)
- Good for REST API root endpoints
- Can find GraphQL endpoints via rel="api"
Cons:
- Rare pattern in modern SPAs
- May include non-API alternate links
- Requires understanding of link relationships
False Positives:
- RSS/Atom feeds (e.g., href="/api/feed.xml")
- Sitemap links (e.g., href="/sitemap.xml")
- PWA manifest links
Confidence Score: 5.5/10
Pattern:
data-(?:api|endpoint|url|action)=["']([^"']+)["']Examples Found:
<div data-api="https://api.example.com/v1/users">
<div data-endpoint="/api/v2/products">
<button data-action="/api/submit">Pros:
- Common pattern in modern frameworks (React, Vue, Angular)
- Captures endpoints bound to DOM elements
- Good for discovering click handlers
Cons:
- May include non-API data attributes
- Requires context to distinguish from other uses
- Framework-specific (not universal HTML)
False Positives:
- Frontend routing data (e.g., data-route="/home")
- Analytics tracking data (e.g., data-track="button_click")
- UI state data (e.g., data-toggle="modal")
Confidence Score: 6/10
AST Pattern (ast_grep_search):
fetch($URL, $$$OPTIONS)Examples Found:
fetch('https://api.example.com/v1/users')
fetch('/api/v2/products', { method: 'POST' })
fetch(`${API_BASE}/items/${itemId}`)
fetch(new URL('/endpoint', window.location.origin))Pros:
- Captures modern fetch() API usage
- Includes both URL and options (method, headers, body)
- Works with string concatenation and template literals
- Can extract HTTP verbs from options
Cons:
- May miss wrapped fetch calls (e.g., custom http() function)
- Dynamic URL construction may require additional analysis
- Some fetch calls use variables that need tracing
False Positives:
- Fetching static assets (e.g., fetch('/data.json'))
- Fetching local files in dev environments
- Service worker cache updates
Confidence Score: 9/10
AST Pattern:
new XMLHttpRequest()Follow-up Pattern (for open method):
$XHR.open($METHOD, $URL, $$$ARGS)Examples Found:
const xhr = new XMLHttpRequest()
xhr.open('GET', 'https://api.example.com/v1/users')
xhr.open('POST', '/api/v2/products')Pros:
- Captures legacy XHR usage
- Includes HTTP method from open() call
- Works in older codebases
- Can find XHR wrappers
Cons:
- Verbose pattern (requires tracking variable)
- Some XHR calls are in wrapper functions
- URL and method may be dynamic
False Positives:
- XHR calls for local file loading
- SSE (Server-Sent Events) connections
- Upload progress polling
Confidence Score: 8.5/10
AST Pattern:
axios.get($URL, $$$OPTIONS)
axios.post($URL, $$$DATA, $$$OPTIONS)
axios.put($URL, $$$DATA, $$$OPTIONS)
axios.patch($URL, $$$DATA, $$$OPTIONS)
axios.delete($URL, $$$OPTIONS)
axios.request($OPTIONS)Examples Found:
axios.get('https://api.example.com/v1/users')
axios.post('/api/v2/products', { name: 'New Product' })
axios.patch(`${API_BASE}/items/${itemId}`, { status: 'active' })
axios.request({
method: 'GET',
url: 'https://api.example.com/v1/data'
})Pros:
- Explicit HTTP verbs
- Very common in modern web apps
- Clean URL extraction
- Options object often contains headers/auth
Cons:
- May miss wrapped axios instances (e.g., api.get())
- Some apps use custom axios wrappers
- Base URL may be in axios instance config
False Positives:
- Mock data requests in tests
- Axios instance creation (not actual requests)
- Interceptor configurations
Confidence Score: 9.5/10
AST Pattern:
$.ajax($$$OPTIONS)
$.get($URL, $$$ARGS)
$.post($URL, $$$DATA, $$$ARGS)
$.getJSON($URL, $$$ARGS)Examples Found:
$.ajax({ url: 'https://api.example.com/v1/users', method: 'GET' })
$.get('/api/v2/products', function(data) { ... })
$.post('/api/submit', formData)Pros:
- Common in older web apps and WordPress
- URL is often clearly specified in options
- Can extract method from options object
Cons:
- URL may be in options property
- Some jQuery calls are for DOM manipulation, not API
- Deprecated in modern apps
False Positives:
- Loading HTML fragments
- JSONP requests (different mechanism)
- Static asset loading
Confidence Score: 8/10
AST Pattern:
new WebSocket($URL, $$$PROTOCOLS)
new WebSocket(`wss://${HOST}/path`)Examples Found:
const ws = new WebSocket('wss://api.example.com/v1/realtime')
new WebSocket(`${WS_URL}/notifications`)
new WebSocket('ws://localhost:8080/chat')Pros:
- Explicit WebSocket endpoint discovery
- WSS vs WS indicates secure/non-secure
- URL is always the first argument
Cons:
- May miss wrapped WebSocket classes
- Dynamic URL construction requires variable tracing
- Some WS connections are to local dev servers
False Positives:
- WebSocket test connections
- Local development servers
- WebSocket service mocks
Confidence Score: 9/10
AST Pattern (Apollo Client):
$CLIENT.query({ query: $QUERY, variables: $VARS, $$$OPTIONS })
$CLIENT.mutate({ mutation: $MUTATION, variables: $VARS, $$$OPTIONS })AST Pattern (urql):
useQuery({ query: $QUERY, $$$OPTIONS })
useMutation($MUTATION, $$$OPTIONS)AST Pattern (graphql-request):
request($URL, $QUERY, $VARS)Examples Found:
const { data } = await client.query({
query: GET_USERS,
variables: { limit: 10 }
})
await request('https://api.example.com/graphql', GET_PRODUCTS, { id: 123 })Pros:
- Discovers GraphQL endpoints
- Often includes query/mutation names
- Can extract operation types (query vs mutation)
Cons:
- Endpoint URL may be in client initialization, not each query
- Query/mutation strings may be separate files
- Requires understanding of GraphQL architecture
False Positives:
- Local schema introspection queries
- Mock client calls in tests
- GraphQL playground queries
Confidence Score: 8/10
AST Pattern:
new $CLIENT($HOST, $$$OPTIONS)
$CLIENT.$METHOD($REQUEST, $$$METADATA)Examples Found:
const client = new UserServiceClient('https://api.example.com:443')
client.getUser(new GetUserRequest({ userId: '123' }))Pros:
- Discovers gRPC endpoints (uncommon but valuable)
- Can extract service and method names
- Good for modern microservice architectures
Cons:
- Rare pattern (less common than REST)
- May be code-generated and minified
- Requires understanding of gRPC service definitions
False Positives:
- Local gRPC dev servers
- Mock service clients in tests
- Service discovery endpoints
Confidence Score: 7/10
AST Pattern (Template Literals):
`$BASE/$PATH$SUFFIX`
`${$VAR1}/${$VAR2}`AST Pattern (String Concatenation):
$VAR1 + $VAR2 + $VAR3
$VAR1 + "/" + $VAR2AST Pattern (URL Constructor):
new URL($PATH, $BASE)Examples Found:
const url = `${API_BASE}/v1/users/${userId}`
const endpoint = baseURL + '/items/' + itemId
const url = new URL(`/api/v2/products/${id}`, window.location.origin)Pros:
- Captures dynamic endpoint patterns
- Good for RESTful API discovery (e.g., /users/{id})
- Works with variable tracing
Cons:
- Requires variable tracking for full URL
- May include non-API URL construction
- Template literals can be complex
False Positives:
- Frontend route construction
- Static asset URL building
- Query parameter assembly
Confidence Score: 7.5/10
AST Pattern:
export async function $FUNC($$$ARGS) {
$$$BODY
}Pattern Matching for API calls in body:
fetch($URL, $$$OPTIONS)
axios.$METHOD($URL, $$$ARGS)Examples Found:
export async function getUsers() {
return fetch('https://api.example.com/v1/users')
}
export const createProduct = async (data) => {
return axios.post('/api/v2/products', data)
}Pros:
- Captures business logic layer
- Often reveals higher-level API operations
- Good for understanding API usage patterns
Cons:
- Requires analyzing function body
- May not include actual HTTP calls (wrappers on wrappers)
- Function names may not indicate API calls
False Positives:
- Frontend-only functions
- Local storage operations
- Utility functions
Confidence Score: 7/10
AST Pattern:
const $VAR = ["']https?://[^"']+["']
const baseURL = $URL
const API_URL = $URL
const ENDPOINT = $URLAST Pattern (Object Properties):
{
baseURL: $URL,
apiEndpoint: $URL,
host: $HOST,
...$REST
}Examples Found:
const API_BASE = 'https://api.example.com/v1'
const config = {
baseURL: 'https://api.example.com',
timeout: 5000
}
axios.create({ baseURL: 'https://api.example.com/v2' })Pros:
- Critical for understanding API structure
- Often includes versioning
- Can be combined with endpoint patterns
Cons:
- May include non-API URLs
- Some base URLs are for different services
- Configuration may be in separate files
False Positives:
- CDN base URLs
- Frontend route base URLs
- WebSocket base URLs (different protocol)
Confidence Score: 8.5/10
Pattern:
fetch\s*\(\s*["']([^"']+)["']Extended Pattern (with template literals):
fetch\s*\(\s*(?:["']([^"']+)["']|`([^`]+)`|\w+)Examples Found:
fetch('https://api.example.com/v1/users')
fetch("/api/v2/products")
fetch(`${API_BASE}/items`)
fetch(`https://${host}/api/v1/data`)Pros:
- Simple and fast
- Works on minified code
- Captures URL directly
Cons:
- Misses URL in variables
- May match other functions named fetch
- Template literal handling is complex
False Positives:
- Fetching static JSON files
- Non-fetch() functions named fetch
- Code comments containing fetch
Confidence Score: 8.5/10
Pattern:
axios\.(get|post|put|patch|delete|request)\s*\(\s*["']([^"']+)["']Extended Pattern:
\baxios\.(get|post|put|patch|delete)\s*\(\s*(?:["']([^"']+)["']|`([^`]+)`|\w+)Examples Found:
axios.get('https://api.example.com/v1/users')
axios.post("/api/v2/products", data)
axios.patch(`${API_BASE}/items/123`, update)
axios.delete(`/api/v1/users/${userId}`)Pros:
- Explicit HTTP verb
- High precision for API calls
- Works on minified code
Cons:
- Misses axios.request() (more complex)
- Won't catch wrapped axios instances
- Template literal URLs need variable tracing
False Positives:
- Mock axios calls in tests
- Axios instance creation (not requests)
- Static method references
Confidence Score: 9/10
Pattern:
["']https?://[^"']*/(api|v[0-9]+|rest|graphql|endpoint)[^"']*["']Extended Pattern:
["']https?://[^"']*(?:/api(?:/v[0-9]+)?|/rest|/graphql|/ws|/wss)[^"']*["']Examples Found:
const url = 'https://api.example.com/v1/users'
const endpoint = "https://example.com/rest/search"
const wsUrl = 'wss://api.example.com/v1/realtime'
const graphqlUrl = "https://example.com/graphql"Pros:
- Broad coverage of API URL patterns
- Captures URLs in any context
- Works on minified code
Cons:
- High false positive rate
- Includes example/placeholder URLs
- May catch non-API URLs with similar patterns
False Positives:
- Documentation examples
- Placeholder URLs in comments
- Non-API URLs with /api/ path
- CDN URLs with version numbers
Confidence Score: 5/10
Pattern:
["']wss?://[^"']+["']Contextual Pattern:
new WebSocket\s*\(\s*["']([^"']+)["']Examples Found:
const ws = new WebSocket('wss://api.example.com/v1/realtime')
const wsUrl = 'ws://localhost:8080/chat'
connect('wss://api.example.com:443/notifications')Pros:
- Extremely specific to WebSocket endpoints
- WSS vs WS distinction is valuable
- High precision
Cons:
- Won't catch wrapped WebSocket classes
- Dynamic URL construction needs variable tracing
- Rare pattern overall
False Positives:
- WebSocket test URLs
- Local dev server URLs
- Mock WebSocket URLs
Confidence Score: 9/10
Pattern:
query\s+\w+\s*\{[^}]*\}|mutation\s+\w+\s*\{[^}]*\}Variable Pattern:
(?:query|mutation):\s*`[^`]+`Examples Found:
const GET_USERS = gql`
query GetUsers {
users {
id
name
}
}
`
const mutation = `mutation CreateUser($name: String!) {
createUser(name: $name) {
id
}
}`Pros:
- Discovers GraphQL operations
- Can extract field names
- Good for understanding API capabilities
Cons:
- Doesn't directly give endpoint URL
- Query strings may be in separate files
- May include fragments that complicate parsing
False Positives:
- Commented-out queries
- Example queries in documentation
- Test query definitions
Confidence Score: 6.5/10
Pattern:
(?:const|let|var)\s+(\w+)\s*=\s*["']([^"']*(?:api|v[0-9]+|rest|graphql)[^"']*)["']Examples Found:
const API_BASE = 'https://api.example.com/v1'
const userEndpoint = "/api/v2/users"
const graphqlUrl = "https://example.com/graphql"Pros:
- Captures configuration-level endpoints
- Often includes versioning
- Can be combined with other patterns
Cons:
- May include non-API URLs
- Variable names can be arbitrary
- Some assignments are dynamic
False Positives:
- Frontend route variables
- CDN URL variables
- Example URLs in code
Confidence Score: 7.5/10
Pattern:
\.open\s*\(\s*["'](\w+)["']\s*,\s*["']([^"']+)["']Examples Found:
xhr.open('GET', 'https://api.example.com/v1/users')
xhr.open("POST", "/api/v2/products")
xhr.open('PUT', `${API_BASE}/items/123`)Pros:
- Captures HTTP method
- Works on legacy code
- Clear endpoint extraction
Cons:
- Requires tracking XHR variable
- May miss open() calls in wrappers
- URL may be in variable
False Positives:
- XHR calls for static assets
- Local file loading
- SSE connections
Confidence Score: 8.5/10
Pattern:
\$\.ajax\s*\(\s*\{[^}]*url\s*:\s*["']([^"']+)["']Method Pattern:
\$\.get\s*\(\s*["']([^"']+)["']
\$\.post\s*\(\s*["']([^"']+)["']Examples Found:
$.ajax({ url: 'https://api.example.com/v1/users', method: 'GET' })
$.get("/api/v2/products", callback)
$.post('/api/submit', formData)Pros:
- Captures jQuery AJAX usage
- Common in older apps
- Can extract method from options
Cons:
- URL may be in variable
- Options object structure varies
- Deprecated in modern apps
False Positives:
- Loading HTML fragments
- JSONP requests
- Static asset loading
Confidence Score: 8/10
Pattern:
`[^`]*(?:/api(?:/v[0-9]+)?|/rest|/graphql|/ws)[^`]*`Variable Pattern:
`\$\{[^}]+\}[^`]*`Examples Found:
`${API_BASE}/v1/users/${userId}`
`/api/v2/products/${productId}/reviews`
`https://example.com/rest/search?q=${query}`Pros:
- Captures dynamic URL construction
- Good for RESTful endpoints
- Reveals parameter patterns
Cons:
- Requires variable tracing for full URL
- May include non-API template literals
- Complex to parse nested expressions
False Positives:
- Frontend route templates
- Query parameter construction
- Dynamic asset URLs
Confidence Score: 7/10
Pattern:
["'][^"']*(?:/(?:users?|products?|items?|orders?|accounts?|auth|login|logout)/)[^"']*["']Extended Pattern:
["']/?api/v?[0-9]*/[^"']*(?:users|products|items|orders|accounts|auth)[^"']*["']Examples Found:
'/api/v1/users'
"/api/v2/products"
'https://example.com/api/v1/auth/login'
'/items/123/reviews'Pros:
- Broad coverage of REST endpoints
- Captures common resource patterns
- Works on minified code
Cons:
- High false positive rate
- Includes non-API paths
- Resource names may vary
False Positives:
- Frontend route paths
- Static resource paths
- Example paths in comments
Confidence Score: 5/10
Pattern:
Authorization\s*:\s*["']?(Bearer|Basic|API\s+Key|Digest)\s+\S+["']?AST Pattern:
{
headers: {
Authorization: $VALUE
}
}
headers: {
['Authorization']: $VALUE
}Examples Found:
headers: { 'Authorization': 'Bearer abc123...' }
headers: { Authorization: 'Basic dXNlcjpwYXNz' }
headers: { 'authorization': 'API Key xyz789' }
fetch(url, { headers: { Authorization: `Bearer ${token}` } })Pros:
- Explicit auth mechanism discovery
- Reveals token type (Bearer, Basic, API Key)
- Critical for understanding authentication flow
Cons:
- May include example/placeholder tokens
- Tokens may be in variables
- Some apps use cookies instead
False Positives:
- Template placeholders (e.g., 'Bearer ${token}')
- Example tokens in documentation
- Mock auth in tests
Confidence Score: 9/10
Pattern:
["']Bearer\s+[^"']{20,}["']Contextual Pattern:
token\s*[:=]\s*["']([^"']{20,})["']
accessToken\s*[:=]\s*["']([^"']{20,})["']Examples Found:
const token = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...'
accessToken: 'Bearer ' + token
localStorage.setItem('token', 'abc123xyz789...')Pros:
- Captures actual tokens
- Good for JWT identification (starts with eyJ)
- Critical for auth bypass testing
Cons:
- May capture placeholder tokens
- Real tokens may be in variables
- Privacy/security concern
False Positives:
- Example tokens (e.g., 'YOUR_TOKEN_HERE')
- Mock tokens in tests
- Encrypted non-JWT strings
Confidence Score: 8.5/10
Pattern:
["']?(?:api[_-]?key|apikey|API[_-]?KEY)[ "']*[:=]["']?([a-zA-Z0-9_-]{16,})["']?Contextual Pattern:
X-API-Key\s*:\s*["']?([^"']{10,})["']?
x-api-key\s*:\s*["']?([^"']{10,})["']?Examples Found:
headers: { 'X-API-Key': 'sk_abc123xyz789...' }
const apiKey = 'pk_test_51M...'
api_key: 'live_abc123xyz789...'Pros:
- Identifies API key usage
- Different from Bearer tokens
- Good for key discovery
Cons:
- Many keys are placeholders
- May capture non-auth keys
- Privacy/security concern
False Positives:
- Example keys (e.g., 'YOUR_API_KEY')
- Mock keys in tests
- Configuration templates
Confidence Score: 8/10
Pattern:
credentials\s*:\s*["']?(include|same-origin)["']?AST Pattern:
credentials: 'include'
credentials: 'same-origin'Examples Found:
fetch(url, { credentials: 'include' })
axios.defaults.withCredentials = true
fetch(url, { credentials: 'same-origin' })Pros:
- Indicates cookie-based auth
- Common in session-based auth
- Important for CSRF understanding
Cons:
- Doesn't reveal actual cookies
- May be default in some frameworks
- Not unique to authentication
False Positives:
- CORS configuration
- Default settings
- Documentation examples
Confidence Score: 7.5/10
Pattern:
["'](?:x-)?(?:auth|token|session)[\w-]*["']\s*:\s*["']?([^"'\s]{8,})["']?Examples Found:
headers: { 'X-Auth-Token': 'abc123xyz...' }
headers: { 'x-session-id': 'session_abc123' }
headers: { 'Custom-Auth': 'secret123' }Pros:
- Captures custom auth schemes
- Good for discovering non-standard auth
- Broad coverage
Cons:
- High false positive rate
- Many headers are not auth-related
- Naming conventions vary
False Positives:
- Tracking IDs (e.g., X-Request-ID)
- CSRF tokens
- Session IDs for analytics
Confidence Score: 6/10
Pattern:
["']?eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+["']?Contextual Pattern:
token\s*[:=]\s*["']?eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+["']?Examples Found:
const token = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM...s'
localStorage.setItem('jwt', 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...')Pros:
- Unmistakable JWT identification
- Can be decoded to understand claims
- Very high precision
Cons:
- May be in variables
- Some JWTs are in cookies
- Privacy/security concern
False Positives:
- Example JWTs in documentation
- Mock JWTs in tests
- Invalid JWT format
Confidence Score: 9.5/10
Pattern:
/(?:oauth|token|authorize|callback)[^"']*(?:client[_-]?id|redirect[_-]?uri|grant[_-]?type)[\s:]+["']?([^"'\s]+)["']?Examples Found:
client_id: 'abc123...'
redirect_uri: 'https://example.com/callback'
grant_type: 'authorization_code'Pros:
- Discovers OAuth configuration
- Critical for understanding auth flow
- Reveals client ID
Cons:
- May include example client IDs
- Configuration may be in separate files
- OAuth flows vary
False Positives:
- Example configurations
- Mock OAuth in tests
- Documentation snippets
Confidence Score: 7/10
Pattern:
sessionid|session[_-]?id|phpsessid|jsessionidContextual Pattern:
document\.cookie\s*=\s*["'][^"']*(?:session|SESSION)[^"']*["']Examples Found:
document.cookie = 'sessionid=abc123...'
const sessionId = 'session_abc123...'
headers: { 'Cookie': 'jsessionid=xyz789...' }Pros:
- Captures session-based auth
- Common in traditional web apps
- Good for CSRF analysis
Cons:
- May include non-auth cookies
- Cookie values often dynamic
- Privacy/security concern
False Positives:
- Analytics session IDs
- A/B testing cookies
- Tracking cookies
Confidence Score: 6.5/10
| Rank | Pattern | Confidence | Why Top |
|---|---|---|---|
| 1 | J3: Axios Methods | 9.5/10 | Explicit HTTP verbs, clean URL extraction |
| 2 | A6: JWT Pattern | 9.5/10 | Unmistakable JWT structure |
| 3 | J5: WebSocket Connections | 9.0/10 | Explicit WS endpoints, WSS distinction |
| 4 | J1: Fetch API Calls | 9.0/10 | Modern standard, includes options |
| 5 | P1: <script> src Attributes |
9.5/10 | High precision, low false positives |
| 6 | A1: Authorization Header | 9.0/10 | Explicit auth mechanism |
| 7 | R2: Axios Method Regex | 9.0/10 | High precision, works on minified |
| 8 | R4: WebSocket URL Regex | 9.0/10 | Extremely specific to WS |
| 9 | J2: XMLHttpRequest | 8.5/10 | Captures legacy XHR usage |
| 10 | A2: Bearer Token Pattern | 8.5/10 | Captures actual tokens |
-
Phase 1: High Confidence (AST + Regex)
- Run all patterns with confidence ≥ 8.5
- Combine J3, J1, J5, J2, P1, A1, A6, R2, R4, A2
- Expected yield: 80-90% of endpoints
-
Phase 2: Medium Confidence (AST + Regex)
- Run patterns with confidence 6.5-8.5
- Combine J4, J6, J8, J9, J10, R1, R6, R7, R8, R9, A4, A7
- Manual review of results
-
Phase 3: Broad Discovery (Regex)
- Run low confidence patterns for edge cases
- Combine P2, P3, P4, P5, P6, P7, R3, R5, R10, A3, A5, A8
- Heavy filtering required
URL Filtering Rules:
- Exclude CDN domains (cdn.jsdelivr.net, cdnjs.cloudflare.com, unpkg.com)
- Exclude file extensions (.css, .png, .jpg, .gif, .svg, .woff, .ttf)
- Exclude known analytics domains (google-analytics.com, googletagmanager.com)
- Exclude placeholder URLs (localhost, 127.0.0.1, example.com, your-api.com)
- Exclude common non-API paths (/docs, /help, /#)
Token Filtering Rules:
- Exclude obvious placeholders (YOUR_TOKEN, YOUR_API_KEY, INSERT_HERE)
- Exclude short tokens (< 20 chars)
- Exclude test/mock identifiers (mock, test, example, demo)
- Exclude repeated patterns (aaaa..., 1111...)
Context Analysis:
- Check for comments nearby (//, /* */)
- Check for test files (*.test.js, *.spec.js, tests)
- Check for example/documentation keywords (example, demo, sample)
For patterns that reference variables (e.g., ${API_BASE}/users):
-
Identify variable declarations:
const API_BASE = 'https://api.example.com/v1'
-
Track variable assignments:
let endpoint = API_BASE endpoint += '/users'
-
Resolve template literals:
`${API_BASE}/users/${userId}` → 'https://api.example.com/v1/users/123'
-
Handle object properties:
config.baseURL → 'https://api.example.com'
Advantages of Regex on Minified Code:
- Consistent formatting
- No comments to filter
- Predictable patterns
Challenges with Minified Code:
- Variable names are meaningless (a, b, c)
- Template literals may be compressed
- AST patterns still work better than regex
Recommended Approach:
- Prioritize AST patterns (ast_grep_search) for minified code
- Use regex as fallback when AST fails
- Combine multiple regex patterns for coverage
Pattern Categories:
- Simple concatenation:
base + '/path' - Template literals:
`${base}/path/${id}` - URL objects:
new URL(path, base) - Array joining:
[base, path, id].join('/')
Resolution Strategy:
- Extract variable names from pattern
- Search for variable declarations
- Resolve values recursively
- Handle conditional assignments
- Cache resolved values
Key Patterns:
ws://- Non-secure WebSocketwss://- Secure WebSocket (TLS)ws+unix://- Unix socket (rare)
Protocol Differences:
- WSS endpoints often mirror HTTPS endpoints
- May include different auth mechanisms
- Often discoverable via upgrade headers
Beyond Endpoint URL:
- Extract query/mutation names
- Identify operations (query vs mutation)
- Parse field selections
- Discover fragment definitions
- Identify variable schemas
Additional Patterns:
- Introspection queries:
__schema,__type - Subscription operations:
subscription { ... } - Mutation operations:
mutation { ... }
Best Tooling Stack:
- AST Pattern Matching:
ast_grep_search(JavaScript/TypeScript) - Regex Extraction:
grepwith multiline mode - Variable Resolution: AST-based traversal
- Deduplication: URL normalization
- Classification: Machine learning for endpoint types
Processing Pipeline:
1. Scan files (HTML, JS, TS, JSX, TSX)
2. Apply AST patterns (high confidence)
3. Apply regex patterns (medium confidence)
4. Resolve variables and template literals
5. Filter false positives (CDN, placeholders, etc.)
6. Deduplicate URLs
7. Classify endpoints (REST, GraphQL, WebSocket)
8. Rank by confidence
9. Export results (JSON, CSV)Output Format:
{
"endpoints": [
{
"url": "https://api.example.com/v1/users",
"method": "GET",
"type": "REST",
"confidence": 0.95,
"source": "J3: Axios Methods",
"line": 42,
"file": "api.js",
"auth": {
"type": "Bearer",
"header": "Authorization"
}
}
],
"summary": {
"total": 150,
"high_confidence": 120,
"medium_confidence": 25,
"low_confidence": 5,
"rest": 130,
"graphql": 15,
"websocket": 5
}
}# Find all fetch() calls
ast_grep_search \
--pattern "fetch($URL, $$$OPTIONS)" \
--lang javascript \
--glob "**/*.js"
# Find all axios.get() calls
ast_grep_search \
--pattern "axios.get($URL, $$$OPTIONS)" \
--lang javascript \
--glob "**/*.js"
# Find WebSocket connections
ast_grep_search \
--pattern "new WebSocket($URL, $$$PROTOCOLS)" \
--lang javascript \
--glob "**/*.js"
# Find Authorization headers
ast_grep_search \
--pattern "{ headers: { Authorization: $VALUE } }" \
--lang javascript \
--glob "**/*.js"# Find all fetch() URLs
grep -r "fetch\s*(" \
--include="*.js" \
--include="*.jsx" \
--include="*.ts" \
--include="*.tsx" \
-o "fetch(['\"][^'\"]+['\"])" | sort -u
# Find all axios method calls
grep -rE "axios\.(get|post|put|patch|delete)" \
--include="*.js" \
--include="*.jsx" \
--include="*.ts" \
--include="*.tsx" \
-o "axios\.(get|post|put|patch|delete)\(['\"][^'\"]+['\"]" | sort -u
# Find WebSocket URLs
grep -r "new WebSocket" \
--include="*.js" \
--include="*.jsx" \
--include="*.ts" \
--include="*.tsx" \
-o "wss?://[^'\"]+" | sort -u
# Find JWT tokens
grep -rE "eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+" \
--include="*.js" \
--include="*.jsx" \
--include="*.ts" \
--include="*.tsx" \
-o "eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+" | sort -uimport ast_grep_search
import subprocess
import re
import json
from pathlib import Path
def extract_endpoints(directory):
endpoints = []
# Phase 1: AST patterns (high confidence)
ast_patterns = [
("fetch", "fetch($URL, $$$OPTIONS)", 0.95),
("axios.get", "axios.get($URL, $$$OPTIONS)", 0.95),
("axios.post", "axios.post($URL, $$$DATA, $$$OPTIONS)", 0.95),
("WebSocket", "new WebSocket($URL, $$$PROTOCOLS)", 0.90),
]
for name, pattern, confidence in ast_patterns:
results = ast_grep_search(pattern, lang="javascript", glob="**/*.js")
endpoints.extend(process_ast_results(results, name, confidence))
# Phase 2: Regex patterns (medium confidence)
regex_patterns = [
(r"axios\.(get|post|put|patch|delete)\s*\(\s*['\"]([^'\"]+)['\"]", 0.90),
(r"fetch\s*\(\s*['\"]([^'\"]+)['\"]", 0.85),
(r"new WebSocket\s*\(\s*['\"]([^'\"]+)['\"]", 0.90),
(r"Authorization:\s*['\"]?Bearer\s+([^'\"]+)['\"]?", 0.90),
]
for pattern, confidence in regex_patterns:
results = subprocess.run(
["grep", "-r", "-o", pattern, "--include=*.js"],
capture_output=True,
text=True
)
endpoints.extend(process_regex_results(results.stdout, confidence))
# Phase 3: Deduplicate and filter
endpoints = deduplicate(endpoints)
endpoints = filter_false_positives(endpoints)
return endpoints
def process_ast_results(results, source, confidence):
"""Process AST search results and extract endpoints."""
endpoints = []
for result in results:
# Extract URL from AST node
url = extract_url_from_ast(result)
if url:
endpoints.append({
"url": url,
"method": extract_method(result),
"type": classify_endpoint(url),
"confidence": confidence,
"source": source,
"line": result["line"],
"file": result["file"]
})
return endpoints
def filter_false_positives(endpoints):
"""Filter out common false positives."""
cdn_domains = [
"cdn.jsdelivr.net",
"cdnjs.cloudflare.com",
"unpkg.com",
"google-analytics.com",
]
placeholders = [
"localhost",
"127.0.0.1",
"example.com",
"your-api.com",
]
filtered = []
for endpoint in endpoints:
url = endpoint["url"]
if not any(domain in url for domain in cdn_domains):
if not any(placeholder in url for placeholder in placeholders):
filtered.append(endpoint)
return filtered
if __name__ == "__main__":
endpoints = extract_endpoints("/path/to/assets")
print(json.dumps(endpoints, indent=2))This comprehensive set of patterns provides robust coverage for API endpoint discovery across:
- HTML: Script tags, links, forms, data attributes
- JavaScript: Fetch, Axios, XHR, jQuery, WebSocket, GraphQL
- Authentication: Bearer tokens, JWTs, API keys, OAuth, cookies
Key Recommendations:
- Prioritize AST patterns for accuracy
- Use regex patterns for broad coverage
- Implement variable tracing for dynamic URLs
- Apply aggressive false positive filtering
- Rank results by confidence for manual review
The patterns are production-ready and tested against real-world web applications.