20260311 #300 requirements.txt 개선 및 dockerfile 개선#302
Conversation
워크스루YFinance와 FMP API를 활용한 하이브리드 2단계 재무 데이터 수집 플로우가 도입되었고, FMP 업그레이드 상태 추적용 변경 사항
시퀀스 다이어그램sequenceDiagram
participant CLI as CLI
participant Phase1 as Phase 1: YFinance
participant YF as YFinance
participant DB as Database
participant Phase2 as Phase 2: FMP
participant FMP as FMP API
CLI->>Phase1: update_tickers(tickers)
Phase1->>DB: 조회(현재 데이터/결손 티커)
Phase1->>YF: fetch_yf_metrics(ticker)
YF-->>Phase1: 4년 재무 데이터
Phase1->>DB: upsert rows (ON CONFLICT)
Phase1->>Phase2: 시작(완료 후)
Phase2->>DB: get_fmp_targets(limit)
Phase2->>FMP: fetch_fmp_metrics(ticker)
FMP-->>Phase2: 10년 재무 데이터
Phase2->>DB: upsert rows (ON CONFLICT)
Phase2->>DB: mark_fmp_completed(ticker)
Phase2-->>CLI: 완료
예상 코드 리뷰 노력🎯 3 (중간) | ⏱️ ~20 minutes 관련 가능성 있는 PR
제안 검토자
시
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
schema.sql (1)
18-25:⚠️ Potential issue | 🟠 Major기존 DB용 마이그레이션이 함께 필요합니다.
이 컬럼은
company_fundamentals_data.py에서 바로 조회/업데이트하고 있어서,schema.sql만 바꿔서는 기존 Postgres 볼륨을 쓰는 환경이 따라오지 못합니다. 운영/개발 DB가 이미 생성된 상태라면fmp_completed가 없어 런타임에서 바로 실패합니다.적용이 필요한 SQL 예시
ALTER TABLE public.stock_info ADD COLUMN IF NOT EXISTS fmp_completed boolean DEFAULT false; UPDATE public.stock_info SET fmp_completed = false WHERE fmp_completed IS NULL;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@schema.sql` around lines 18 - 25, The migration missed adding the new fmp_completed column to existing databases, causing runtime failures in company_fundamentals_data.py when it queries/updates stock_info; add a SQL migration that ALTERs the stock_info table to ADD COLUMN IF NOT EXISTS fmp_completed boolean DEFAULT false and ensure existing rows get a non-null value (e.g., UPDATE stock_info SET fmp_completed = false WHERE fmp_completed IS NULL), then include this migration in your deployment/migrations pipeline so environments with preexisting Postgres volumes get the new column before company_fundamentals_data.py runs.AI/modules/data_collector/company_fundamentals_data.py (1)
143-169:⚠️ Potential issue | 🟠 Major저장 실패나 일시적 API 실패 후에도 완료 처리되어 재시도가 막힙니다.
fetch_fmp_metrics()는 실패 시[]를 반환하고,save_to_db()도 DB 오류를 로그만 남기고 호출자에게 실패를 전달하지 않습니다. 그런데 Line 257은 그와 무관하게fmp_completed = TRUE를 찍으므로, 일시 장애였던 종목도 영구적으로 재시도 대상에서 빠집니다. 게다가ticker_master_updater.py:113-122는 이 플래그를 덮어쓰지 않아서 잘못 찍힌 완료 상태가 계속 남습니다.가능한 수정 방향
- data = self.fetch_fmp_metrics(ticker) - if data: - self.save_to_db(ticker, data) - - # 수집을 시도했으므로 무조건 완료 도장 찍기 (신규 상장주 무한루프 방지) - self.mark_fmp_completed(ticker) - success_count += 1 + data = self.fetch_fmp_metrics(ticker) + if not data: + print(f" [{ticker}] FMP 데이터 없음/일시 실패 - 완료 처리 보류") + continue + + saved = self.save_to_db(ticker, data) + if not saved: + print(f" [{ticker}] DB 저장 실패 - 완료 처리 보류") + continue + + self.mark_fmp_completed(ticker) + success_count += 1
save_to_db()는 성공 여부를bool로 반환하고, “영구적으로 데이터가 없는 티커”와 “일시 실패”를 분리하는 상태값을 두는 편이 안전합니다.Also applies to: 252-257
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AI/modules/data_collector/company_fundamentals_data.py` around lines 143 - 169, save_to_db currently swallows DB errors and returns nothing, and fetch_fmp_metrics returns [] on failure, but the caller still sets the fmp_completed flag (e.g., fmp_completed = TRUE) which prevents retries; change save_to_db to return a bool (or status enum) indicating success/failure and whether the absence of data is permanent vs transient, propagate exceptions or false on DB errors, and have the caller (where fmp_completed is set) only mark fmp_completed true for true permanent-complete status; update the logic around fetch_fmp_metrics, save_to_db, and the code that sets fmp_completed (and the update routine that overwrites it) to use the returned status so transient failures allow retries and permanent no-data sets the completed flag.
🧹 Nitpick comments (1)
AI/requirements.txt (1)
2-25: 재현 가능성을 위해 더 많은 의존성 버전을 고정하는 것을 고려하세요.대부분의 의존성에 버전 제약이 없어서 재현 가능성 문제와 예상치 못한 breaking change가 발생할 수 있습니다.
pip freeze를 사용하거나pip-tools를 통해 버전을 고정하는 것을 권장합니다.📌 버전 고정 방법
옵션 1: pip freeze 사용
pip install -r AI/requirements.txt pip freeze > AI/requirements-lock.txt옵션 2: pip-tools 사용 (권장)
현재 requirements.txt를 requirements.in으로 유지하고:pip install pip-tools pip-compile AI/requirements.in --output-file AI/requirements.txt이렇게 하면 모든 의존성(및 전이 의존성)이 정확한 버전으로 고정되어 환경 재현성이 보장됩니다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AI/requirements.txt` around lines 2 - 25, Pin all dependencies in AI/requirements.txt to exact versions and add a lockfile: convert the current unpinned list (packages like python-dotenv, pandas, numpy<2.0, tensorflow-cpu<2.11, psycopg2-binary, requests, beautifulsoup4, groq, fredapi, yfinance, scikit-learn, langchain-community, stable-baselines3, gymnasium, shimmy, matplotlib, tqdm) into a fully versioned requirements file by either (a) creating a requirements.in from the current list and running pip-compile to generate a pinned requirements.txt, or (b) installing and running pip freeze to produce a requirements-lock.txt and replacing the unpinned AI/requirements.txt with the frozen versions; ensure the chosen approach is documented in the repo and include the generated lockfile in source control for reproducible installs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AI/modules/data_collector/company_fundamentals_data.py`:
- Around line 77-79: The current conditional uses truthiness so numeric zeros
become None; update the three computations (roe, debt_ratio, interest_coverage)
to test for None explicitly and check denominators for zero: use "net_income is
not None and equity is not None and equity != 0" for roe, "total_liabilities is
not None and equity is not None and equity != 0" for debt_ratio, and "op_income
is not None and int_expense is not None and int_expense != 0" (or
abs(int_expense) != 0) for interest_coverage; apply the same explicit None
checks to the analogous calculations around lines 130-132.
- Around line 96-110: The three FMP requests (inc_resp, bal_resp, cf_resp) lack
timeouts, status-code checks and proper error handling and the current except is
a bare catch; update the code that builds inc_resp/bal_resp/cf_resp in
company_fundamentals_data.py to: add a reasonable timeout to each requests.get
call, validate each response.status_code (treat non-2xx, 429, 5xx as failures),
attempt to parse JSON only after a successful status and validate bal_data and
cf_data the same way you validate inc_data, avoid a bare except (catch
requests.Timeout, requests.RequestException, JSONDecodeError), log the specific
error and return [] on failure so df_inc/df_bal/df_cf and df_merged are only
built when all three responses are valid.
In `@AI/modules/data_collector/ticker_master_updater.py`:
- Around line 131-142: The current conn.rollback() in the per-item exception
block undoes the entire transaction (invalidating success_count and earlier
upserts); change this to use a savepoint around each upsert: before calling
cursor.execute(query_company_names, (item['ticker'], item['name'])) create a
savepoint (e.g., SAVEPOINT sp), on exception ROLLBACK TO that savepoint and then
RELEASE it (or only RELEASE on success), and only increment success_count after
a successful execute; keep the final conn.commit() for the overall batch. Use
the existing symbols cursor.execute, query_company_names, success_count,
conn.commit and replace conn.rollback() with savepoint-based rollback to avoid
undoing prior successful rows.
In `@AI/requirements.txt`:
- Line 1: Remove the UTF-8 BOM at the beginning of the file that precedes the
header line "--- 핵심 데이터 처리 ---"; open the file and re-save it as UTF-8 without
BOM (or run a BOM-strip operation) so the leading invisible character (U+FEFF)
is deleted and the file starts directly with the header text, ensuring tools and
Python parsing won't be affected.
- Around line 6-7: The requirements pin "tensorflow-cpu<2.11" fixes TensorFlow
to a known vulnerable range; update the constraint in AI/requirements.txt from
"tensorflow-cpu<2.11" to a secure, tested range (for example
"tensorflow-cpu>=2.11,<2.14" or "tensorflow-cpu>=2.13,<3.0") and then run
compatibility tests with numpy and stable-baselines3 (ensure numpy remains
within supported bounds such as numpy<2.0 if required) and adjust other
dependency pins if any conflicts appear; verify CI/builds pass and update
project docs or dependency notes to record the chosen TensorFlow minimum.
---
Outside diff comments:
In `@AI/modules/data_collector/company_fundamentals_data.py`:
- Around line 143-169: save_to_db currently swallows DB errors and returns
nothing, and fetch_fmp_metrics returns [] on failure, but the caller still sets
the fmp_completed flag (e.g., fmp_completed = TRUE) which prevents retries;
change save_to_db to return a bool (or status enum) indicating success/failure
and whether the absence of data is permanent vs transient, propagate exceptions
or false on DB errors, and have the caller (where fmp_completed is set) only
mark fmp_completed true for true permanent-complete status; update the logic
around fetch_fmp_metrics, save_to_db, and the code that sets fmp_completed (and
the update routine that overwrites it) to use the returned status so transient
failures allow retries and permanent no-data sets the completed flag.
In `@schema.sql`:
- Around line 18-25: The migration missed adding the new fmp_completed column to
existing databases, causing runtime failures in company_fundamentals_data.py
when it queries/updates stock_info; add a SQL migration that ALTERs the
stock_info table to ADD COLUMN IF NOT EXISTS fmp_completed boolean DEFAULT false
and ensure existing rows get a non-null value (e.g., UPDATE stock_info SET
fmp_completed = false WHERE fmp_completed IS NULL), then include this migration
in your deployment/migrations pipeline so environments with preexisting Postgres
volumes get the new column before company_fundamentals_data.py runs.
---
Nitpick comments:
In `@AI/requirements.txt`:
- Around line 2-25: Pin all dependencies in AI/requirements.txt to exact
versions and add a lockfile: convert the current unpinned list (packages like
python-dotenv, pandas, numpy<2.0, tensorflow-cpu<2.11, psycopg2-binary,
requests, beautifulsoup4, groq, fredapi, yfinance, scikit-learn,
langchain-community, stable-baselines3, gymnasium, shimmy, matplotlib, tqdm)
into a fully versioned requirements file by either (a) creating a
requirements.in from the current list and running pip-compile to generate a
pinned requirements.txt, or (b) installing and running pip freeze to produce a
requirements-lock.txt and replacing the unpinned AI/requirements.txt with the
frozen versions; ensure the chosen approach is documented in the repo and
include the generated lockfile in source control for reproducible installs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e489bd71-5bce-4e99-97fc-26f2f406f018
📒 Files selected for processing (6)
AI/modules/data_collector/company_fundamentals_data.pyAI/modules/data_collector/macro_data.pyAI/modules/data_collector/run.pyAI/modules/data_collector/ticker_master_updater.pyAI/requirements.txtschema.sql
| roe = (net_income / equity) if net_income and equity else None | ||
| debt_ratio = (total_liabilities / equity) if total_liabilities and equity else None | ||
| interest_coverage = (op_income / abs(int_expense)) if op_income and int_expense and abs(int_expense) > 0 else None |
There was a problem hiding this comment.
0 값을 None으로 누락시키고 있습니다.
여기 조건식은 0을 false로 취급해서, 실제 값이 0인 경우에도 roe, debt_ratio, interest_coverage를 None으로 저장합니다. 예를 들어 순이익 0, 부채 0, 영업이익 0은 유효한 값인데 현재는 결측치로 바뀝니다.
가능한 수정 예시
- roe = (net_income / equity) if net_income and equity else None
- debt_ratio = (total_liabilities / equity) if total_liabilities and equity else None
- interest_coverage = (op_income / abs(int_expense)) if op_income and int_expense and abs(int_expense) > 0 else None
+ roe = (net_income / equity) if net_income is not None and equity not in (None, 0) else None
+ debt_ratio = (total_liabilities / equity) if total_liabilities is not None and equity not in (None, 0) else None
+ interest_coverage = (
+ op_income / abs(int_expense)
+ if op_income is not None and int_expense is not None and int_expense != 0
+ else None
+ )Also applies to: 130-132
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AI/modules/data_collector/company_fundamentals_data.py` around lines 77 - 79,
The current conditional uses truthiness so numeric zeros become None; update the
three computations (roe, debt_ratio, interest_coverage) to test for None
explicitly and check denominators for zero: use "net_income is not None and
equity is not None and equity != 0" for roe, "total_liabilities is not None and
equity is not None and equity != 0" for debt_ratio, and "op_income is not None
and int_expense is not None and int_expense != 0" (or abs(int_expense) != 0) for
interest_coverage; apply the same explicit None checks to the analogous
calculations around lines 130-132.
| inc_resp = requests.get(f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}") | ||
| bal_resp = requests.get(f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}") | ||
| cf_resp = requests.get(f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}") | ||
|
|
||
| inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json() | ||
|
|
||
| if not inc_data or (isinstance(inc_data, dict) and 'Error Message' in inc_data): | ||
| return [] | ||
|
|
||
| df_inc, df_bal, df_cf = pd.DataFrame(inc_data), pd.DataFrame(bal_data), pd.DataFrame(cf_data) | ||
| if df_inc.empty or df_bal.empty or df_cf.empty: return [] | ||
|
|
||
| df_merged = pd.merge(df_inc, df_bal, on='date', how='inner', suffixes=('', '_bal')) | ||
| df_merged = pd.merge(df_merged, df_cf, on='date', how='inner', suffixes=('', '_cf')) | ||
| except: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n AI/modules/data_collector/company_fundamentals_data.py | sed -n '80,120p'Repository: SISC-IT/sisc-web
Length of output: 2403
🏁 Script executed:
rg -n "fetch_fmp_metrics" AI/modules/data_collector/ -A 3 -B 3Repository: SISC-IT/sisc-web
Length of output: 1424
🏁 Script executed:
cat -n AI/modules/data_collector/company_fundamentals_data.py | sed -n '245,260p'Repository: SISC-IT/sisc-web
Length of output: 806
FMP API 호출에 timeout 및 HTTP 상태 검증이 부재합니다.
세 개의 요청 모두 timeout 없이 실행되므로 외부 API가 느리거나 응답하지 않을 경우 수집기가 무한정 멈출 수 있습니다. 또한 HTTP 상태 코드(429, 5xx 등)를 확인하지 않으며, 오류 메시지 검증이 첫 번째 응답(inc_data)에만 적용되고 bal_data와 cf_data에는 적용되지 않습니다. 배치 루프(250번 줄)에서 하나의 요청이 타임아웃되면 전체 파이프라인이 블록되므로 이는 배치 안정성에 직접적인 영향을 미칩니다.
권장 수정 예시
try:
- inc_resp = requests.get(f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}")
- bal_resp = requests.get(f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}")
- cf_resp = requests.get(f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}")
+ inc_resp = requests.get(
+ f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}",
+ timeout=(5, 30),
+ )
+ bal_resp = requests.get(
+ f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}",
+ timeout=(5, 30),
+ )
+ cf_resp = requests.get(
+ f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}",
+ timeout=(5, 30),
+ )
+
+ inc_resp.raise_for_status()
+ bal_resp.raise_for_status()
+ cf_resp.raise_for_status()
inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inc_resp = requests.get(f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}") | |
| bal_resp = requests.get(f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}") | |
| cf_resp = requests.get(f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}") | |
| inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json() | |
| if not inc_data or (isinstance(inc_data, dict) and 'Error Message' in inc_data): | |
| return [] | |
| df_inc, df_bal, df_cf = pd.DataFrame(inc_data), pd.DataFrame(bal_data), pd.DataFrame(cf_data) | |
| if df_inc.empty or df_bal.empty or df_cf.empty: return [] | |
| df_merged = pd.merge(df_inc, df_bal, on='date', how='inner', suffixes=('', '_bal')) | |
| df_merged = pd.merge(df_merged, df_cf, on='date', how='inner', suffixes=('', '_cf')) | |
| except: | |
| inc_resp = requests.get( | |
| f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}", | |
| timeout=(5, 30), | |
| ) | |
| bal_resp = requests.get( | |
| f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}", | |
| timeout=(5, 30), | |
| ) | |
| cf_resp = requests.get( | |
| f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}", | |
| timeout=(5, 30), | |
| ) | |
| inc_resp.raise_for_status() | |
| bal_resp.raise_for_status() | |
| cf_resp.raise_for_status() | |
| inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json() | |
| if not inc_data or (isinstance(inc_data, dict) and 'Error Message' in inc_data): | |
| return [] | |
| df_inc, df_bal, df_cf = pd.DataFrame(inc_data), pd.DataFrame(bal_data), pd.DataFrame(cf_data) | |
| if df_inc.empty or df_bal.empty or df_cf.empty: | |
| return [] | |
| df_merged = pd.merge(df_inc, df_bal, on='date', how='inner', suffixes=('', '_bal')) | |
| df_merged = pd.merge(df_merged, df_cf, on='date', how='inner', suffixes=('', '_cf')) | |
| except: |
🧰 Tools
🪛 Ruff (0.15.5)
[error] 96-96: Probable use of requests call without timeout
(S113)
[error] 97-97: Probable use of requests call without timeout
(S113)
[error] 98-98: Probable use of requests call without timeout
(S113)
[error] 106-106: Multiple statements on one line (colon)
(E701)
[error] 110-110: Do not use bare except
(E722)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AI/modules/data_collector/company_fundamentals_data.py` around lines 96 -
110, The three FMP requests (inc_resp, bal_resp, cf_resp) lack timeouts,
status-code checks and proper error handling and the current except is a bare
catch; update the code that builds inc_resp/bal_resp/cf_resp in
company_fundamentals_data.py to: add a reasonable timeout to each requests.get
call, validate each response.status_code (treat non-2xx, 429, 5xx as failures),
attempt to parse JSON only after a successful status and validate bal_data and
cf_data the same way you validate inc_data, avoid a bare except (catch
requests.Timeout, requests.RequestException, JSONDecodeError), log the specific
error and return [] on failure so df_inc/df_bal/df_cf and df_merged are only
built when all three responses are valid.
| success_count = 0 | ||
| for item in ticker_list: | ||
| if not item.get('name'): continue | ||
| try: | ||
| cursor.execute(query_company_names, (item['ticker'], item['name'])) | ||
| success_count += 1 | ||
| except Exception as e: | ||
| conn.rollback() # 이 건만 취소 | ||
| print(f" - [{item['ticker']}] 업데이트 건너뜀") | ||
| continue | ||
|
|
||
| conn.commit() | ||
| print("[Master] DB 동기화 완료.") | ||
| conn.commit() # 최종 저장! |
There was a problem hiding this comment.
conn.rollback()가 이전 성공 건까지 함께 되돌립니다.
Line 138의 롤백은 “이 건만 취소”가 아니라 마지막 커밋 이후의 company_names 변경 전체를 취소합니다. 그래서 중간에 한 건이라도 실패하면 그 전에 성공한 upsert도 사라지고, success_count/로그는 실제 DB 상태와 어긋납니다. 행 단위로 무시하려면 savepoint를 쓰거나, 실패 격리를 다른 방식으로 해야 합니다.
가능한 수정 예시
success_count = 0
for item in ticker_list:
- if not item.get('name'): continue
+ if not item.get('name'):
+ continue
try:
+ cursor.execute("SAVEPOINT company_name_upsert")
cursor.execute(query_company_names, (item['ticker'], item['name']))
success_count += 1
except Exception:
- conn.rollback() # 이 건만 취소
+ cursor.execute("ROLLBACK TO SAVEPOINT company_name_upsert")
print(f" - [{item['ticker']}] 업데이트 건너뜀")
continue🧰 Tools
🪛 Ruff (0.15.5)
[error] 133-133: Multiple statements on one line (colon)
(E701)
[warning] 137-137: Do not catch blind exception: Exception
(BLE001)
[error] 137-137: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AI/modules/data_collector/ticker_master_updater.py` around lines 131 - 142,
The current conn.rollback() in the per-item exception block undoes the entire
transaction (invalidating success_count and earlier upserts); change this to use
a savepoint around each upsert: before calling
cursor.execute(query_company_names, (item['ticker'], item['name'])) create a
savepoint (e.g., SAVEPOINT sp), on exception ROLLBACK TO that savepoint and then
RELEASE it (or only RELEASE on success), and only increment success_count after
a successful execute; keep the final conn.commit() for the overall batch. Use
the existing symbols cursor.execute, query_company_names, success_count,
conn.commit and replace conn.rollback() with savepoint-based rollback to avoid
undoing prior successful rows.
| # [변경] GPU 바이너리가 포함되지 않은 가벼운 CPU 버전의 TensorFlow 사용 | ||
| tensorflow-cpu<2.11 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# TensorFlow CPU 버전의 보안 권고사항 확인
gh api graphql -f query='
{
securityVulnerabilities(first: 10, ecosystem: PIP, package: "tensorflow-cpu") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
# PyPI에서 최신 버전 확인
curl -s https://pypi.org/pypi/tensorflow-cpu/json | jq -r '.releases | keys | .[]' | sort -V | tail -5Repository: SISC-IT/sisc-web
Length of output: 263
🏁 Script executed:
# First, let's check the full requirements.txt file and understand the context
fd -n "requirements.txt" -t fRepository: SISC-IT/sisc-web
Length of output: 290
🏁 Script executed:
# Get more details about tensorflow-cpu versions and their compatibility
curl -s https://pypi.org/pypi/tensorflow-cpu/2.10.0/json | jq '{version: .info.version, release_date: .releases | keys | .[0], requires_python: .info.requires_python}'
curl -s https://pypi.org/pypi/tensorflow-cpu/2.11.0/json | jq '{version: .info.version, release_date: .releases | keys | .[0], requires_python: .info.requires_python}'Repository: SISC-IT/sisc-web
Length of output: 159
🏁 Script executed:
# Check security advisories and vulnerability data from alternative sources
curl -s 'https://api.osv.dev/v1/query' -d '{
"package": {"name": "tensorflow-cpu", "ecosystem": "PyPI"},
"version": "2.10.0"
}' | jq '.vulns[] | {id: .id, summary: .summary, severity: .severity}'Repository: SISC-IT/sisc-web
Length of output: 10503
🏁 Script executed:
# Find the requirements.txt file
find . -name "requirements.txt" -type f | head -20Repository: SISC-IT/sisc-web
Length of output: 81
🏁 Script executed:
# Check TensorFlow 2.11 release information
python3 << 'EOF'
import json
import urllib.request
# Get release history for tensorflow-cpu
try:
with urllib.request.urlopen('https://pypi.org/pypi/tensorflow-cpu/json') as response:
data = json.loads(response.read())
releases = data['releases']
# Get 2.11.0 release date
if '2.11.0' in releases and releases['2.11.0']:
print(f"tensorflow-cpu 2.11.0 release info: {releases['2.11.0'][0]['upload_time']}")
# List versions around 2.11
versions = sorted([v for v in releases.keys() if v.startswith('2.1')], key=lambda x: tuple(map(int, x.split('.'))))
print("\nVersions in 2.1x range:")
for v in versions[-10:]:
if releases[v]:
print(f" {v}: {releases[v][0]['upload_time']}")
except Exception as e:
print(f"Error: {e}")
EOFRepository: SISC-IT/sisc-web
Length of output: 194
🏁 Script executed:
# Check which vulnerabilities were fixed in tensorflow 2.11+
python3 << 'EOF'
import json
import urllib.request
# Query OSV for version 2.11.0 to see which vulnerabilities are fixed
try:
request_body = json.dumps({
"package": {"name": "tensorflow-cpu", "ecosystem": "PyPI"},
"version": "2.11.0"
}).encode('utf-8')
with urllib.request.urlopen('https://api.osv.dev/v1/query', request_body) as response:
data = json.loads(response.read())
vulns = data.get('vulns', [])
print(f"Vulnerabilities in tensorflow-cpu 2.11.0: {len(vulns)}")
if vulns:
print("First few vulnerabilities:")
for vuln in vulns[:5]:
print(f" - {vuln.get('id')}: {vuln.get('summary')}")
except Exception as e:
print(f"Error: {e}")
EOFRepository: SISC-IT/sisc-web
Length of output: 194
🏁 Script executed:
# Check what's in the AI directory for requirements.txt
cat AI/requirements.txt 2>/dev/null || echo "File not found in AI directory"Repository: SISC-IT/sisc-web
Length of output: 473
🌐 Web query:
TensorFlow 2.11 release date when was it released
💡 Result:
TensorFlow 2.11.0 was released on November 18, 2022. [1] (GitHub’s release tag for v2.11.0 is dated Nov 18 as well. [2])
🏁 Script executed:
# Check OSV data for different TensorFlow versions to understand patch history
python3 << 'EOF'
import json
import urllib.request
versions_to_check = ['2.11.0', '2.12.0', '2.13.0', '2.14.0', '2.15.0']
for version in versions_to_check:
try:
request_body = json.dumps({
"package": {"name": "tensorflow-cpu", "ecosystem": "PyPI"},
"version": version
}).encode('utf-8')
req = urllib.request.Request('https://api.osv.dev/v1/query', data=request_body, headers={'Content-Type': 'application/json'})
with urllib.request.urlopen(req) as response:
data = json.loads(response.read())
vulns = data.get('vulns', [])
print(f"tensorflow-cpu {version}: {len(vulns)} vulnerabilities")
except Exception as e:
print(f"tensorflow-cpu {version}: Error - {str(e)[:50]}")
EOFRepository: SISC-IT/sisc-web
Length of output: 469
🏁 Script executed:
# Search for stable-baselines3 compatibility with TensorFlow versions
python3 << 'EOF'
import json
import urllib.request
try:
# Check stable-baselines3 PyPI metadata for dependencies
with urllib.request.urlopen('https://pypi.org/pypi/stable-baselines3/json', timeout=5) as response:
data = json.loads(response.read())
info = data.get('info', {})
requires_dist = info.get('requires_dist', [])
print("stable-baselines3 dependencies:")
if requires_dist:
for req in requires_dist:
if 'tensorflow' in req.lower() or 'numpy' in req.lower():
print(f" {req}")
else:
print(" No specific TensorFlow/NumPy requirement found")
except Exception as e:
print(f"Error: {str(e)[:100]}")
EOFRepository: SISC-IT/sisc-web
Length of output: 167
tensorflow-cpu<2.11 제약이 알려진 보안 취약점이 있는 버전으로 제한합니다.
TensorFlow 2.11.0은 2022년 11월에 릴리스되었으며, tensorflow-cpu 2.10.x는 56개 이상의 알려진 보안 취약점(힙 오버플로우, 널 포인터 오류, 세그먼트 오류 등)을 포함하고 있습니다. 이 제약으로 인해 현재 코드는 오래된 취약 버전에 고정되어 있습니다.
2.11 이상의 버전으로 업그레이드 가능성을 검토하고, 필요한 경우 numpy<2.0 및 stable-baselines3와의 호환성을 테스트하는 것이 좋습니다. 더 최신 버전(예: 2.13+)으로 업데이트하면 보안 취약점을 해결하고 성능 개선을 얻을 수 있습니다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AI/requirements.txt` around lines 6 - 7, The requirements pin
"tensorflow-cpu<2.11" fixes TensorFlow to a known vulnerable range; update the
constraint in AI/requirements.txt from "tensorflow-cpu<2.11" to a secure, tested
range (for example "tensorflow-cpu>=2.11,<2.14" or "tensorflow-cpu>=2.13,<3.0")
and then run compatibility tests with numpy and stable-baselines3 (ensure numpy
remains within supported bounds such as numpy<2.0 if required) and adjust other
dependency pins if any conflicts appear; verify CI/builds pass and update
project docs or dependency notes to record the chosen TensorFlow minimum.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
AI/requirements.txt (1)
6-7:⚠️ Potential issue | 🟠 Major
tensorflow-cpu<2.11고정은 GPU 사용 경로를 끊고, 이전에 지적된 구버전 제약도 그대로 남깁니다.
AI/modules/signal/models/transformer/train.pyLine 12-27 과AI/tests/optimize_hyperparameter.pyLine 276-284 에서는 여전히 GPU 장치 탐지와 memory growth 설정을 수행하고 있습니다. 여기서 CPU 전용 패키지로 바꾸면 GPU가 있는 환경에서도 학습/튜닝이 항상 CPU로만 돌아가서 성능이 크게 떨어질 수 있습니다. 그리고<2.11범위는 이전 리뷰에서 이미 지적된 오래된 버전 제약도 해소하지 못합니다.가능한 수정 방향
-# [변경] GPU 바이너리가 포함되지 않은 가벼운 CPU 버전의 TensorFlow 사용 -tensorflow-cpu<2.11 +# CPU/GPU 이미지를 분리해서 관리하는 편이 안전합니다. +# 예: +# - CPU 전용 이미지: tensorflow-cpu>=2.11,<2.14 +# - GPU 사용 이미지: tensorflow>=2.11,<2.14다음 스크립트로 GPU 의존 경로가 실제로 남아 있는지 먼저 확인해 주세요.
#!/bin/bash # TensorFlow GPU 관련 코드 경로 확인 rg -n -C2 --glob 'AI/**/*.py' "list_physical_devices\\('GPU'\\)|set_memory_growth\\(" AI # requirements / Dockerfile 분리 여부 확인 fd -a 'requirements*.txt' . fd -i '^dockerfile$' .기대 결과: GPU 초기화 코드가 실제 학습 경로에 남아 있다면,
tensorflow-cpu단일 고정보다 CPU/GPU requirements 또는 이미지 분리가 필요합니다.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a3bfe4f8-1309-4ff7-9777-55e48fa4812b
📒 Files selected for processing (1)
AI/requirements.txt
Summary by CodeRabbit
새로운 기능
개선 사항
기타