Skip to content

20260311 #300 requirements.txt 개선 및 dockerfile 개선#302

Merged
twq110 merged 4 commits intomainfrom
20260311-#300-requirements.txt-개선-및-Dockerfile-개선
Mar 12, 2026

Hidden character warning

The head ref may contain hidden characters: "20260311-#300-requirements.txt-\uac1c\uc120-\ubc0f-Dockerfile-\uac1c\uc120"
Merged

20260311 #300 requirements.txt 개선 및 dockerfile 개선#302
twq110 merged 4 commits intomainfrom
20260311-#300-requirements.txt-개선-및-Dockerfile-개선

Conversation

@twq110
Copy link
Contributor

@twq110 twq110 commented Mar 12, 2026

Summary by CodeRabbit

  • 새로운 기능

    • 이중 엔진 하이브리드 재무 데이터 수집 도입 및 처리 상태 추적 기능 추가
  • 개선 사항

    • 회사 펀더멘털 수집 범위 4년에서 최대 10년으로 확장
    • 배치 업서트 방식으로 대량 업데이트 성능 향상
    • 매크로 데이터 기본 조회 기간 확대(최대 10년)
  • 기타

    • 머신러닝 패키지 CPU 전용으로 조정 및 DB에 수집 완료 표시 컬럼 추가

@twq110 twq110 requested a review from Kosw6 as a code owner March 12, 2026 06:27
@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

워크스루

YFinance와 FMP API를 활용한 하이브리드 2단계 재무 데이터 수집 플로우가 도입되었고, FMP 업그레이드 상태 추적용 fmp_completed 필드 및 관련 메서드가 추가되었습니다. DB upsert 로직과 CLI 실행 흐름이 리팩터링되었습니다.

변경 사항

Cohort / File(s) Summary
하이브리드 기본재무 데이터 수집기
AI/modules/data_collector/company_fundamentals_data.py
YFinance 기반 4년 기준선 수집(fetch_yf_metrics) 및 FMP 10년 수집(fetch_fmp_metrics) 추가. get_fmp_targets, mark_fmp_completed 추가. update_tickers 흐름을 Phase1(YF) / Phase2(FMP)로 분리. DB upsert를 ON CONFLICT (ticker, date) DO UPDATE로 변경. 환경변수 FMP_API_KEY 캡처 및 에러/로깅 정리.
매크로 데이터 조회 기간 조정
AI/modules/data_collector/macro_data.py, AI/modules/data_collector/run.py
매크로 데이터 기본 조회 기간을 5년에서 10년으로 변경(복구 모드일 때도 10년 적용).
티커 마스터 배치 업데이트 개선
AI/modules/data_collector/ticker_master_updater.py
stock_info를 단일 executemany 배치로 업서트하고, company_names는 항목별로 삽입 시 실패를 건너뛰며 최종 커밋하는 두 단계로 간소화. 실패 시 전체 중단 방지 및 로깅 강화.
의존성·환경 변경
AI/requirements.txt
tensorflowtensorflow-cpu<2.11로 변경, stable-baselines3[extra]에서 extras 제거 등 종속성 섹션 조정.
DB 스키마 확장
schema.sql
stock_info 테이블에 fmp_completed BOOLEAN DEFAULT false 컬럼 추가.
기타 실행 엔트리 변경
AI/modules/data_collector/company_fundamentals_data.py (CLI)
메인 CLI 파서 설명·인자 정리(--all 지원), 대화형 입력 제거.

시퀀스 다이어그램

sequenceDiagram
    participant CLI as CLI
    participant Phase1 as Phase 1: YFinance
    participant YF as YFinance
    participant DB as Database
    participant Phase2 as Phase 2: FMP
    participant FMP as FMP API

    CLI->>Phase1: update_tickers(tickers)
    Phase1->>DB: 조회(현재 데이터/결손 티커)
    Phase1->>YF: fetch_yf_metrics(ticker)
    YF-->>Phase1: 4년 재무 데이터
    Phase1->>DB: upsert rows (ON CONFLICT)

    Phase1->>Phase2: 시작(완료 후)
    Phase2->>DB: get_fmp_targets(limit)
    Phase2->>FMP: fetch_fmp_metrics(ticker)
    FMP-->>Phase2: 10년 재무 데이터
    Phase2->>DB: upsert rows (ON CONFLICT)
    Phase2->>DB: mark_fmp_completed(ticker)
    Phase2-->>CLI: 완료
Loading

예상 코드 리뷰 노력

🎯 3 (중간) | ⏱️ ~20 minutes

관련 가능성 있는 PR

제안 검토자

  • Kosw6

🐰 YF의 네 해와 FMP의 열 해가 만나
티커를 따라 춤추며 데이터 채워요
완료 표시로 루프는 멈추고
DB에 부드럽게 충돌을 녹여요
하이브리드의 밭에 당근 하나 심네 🥕📈

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning PR 제목은 'requirements.txt 개선 및 dockerfile 개선'을 언급하지만, 실제 변경 사항은 데이터 수집기 리팩토링, 하이브리드 엔진 구현, DB 스키마 변경 등 훨씬 광범위합니다. PR 제목을 실제 주요 변경 사항을 반영하도록 수정하세요. 예: '하이브리드 펀더멘털 데이터 수집기 구현 및 의존성 업데이트' 또는 더 구체적인 설명을 추가하세요.
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 20260311-#300-requirements.txt-개선-및-Dockerfile-개선

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
schema.sql (1)

18-25: ⚠️ Potential issue | 🟠 Major

기존 DB용 마이그레이션이 함께 필요합니다.

이 컬럼은 company_fundamentals_data.py에서 바로 조회/업데이트하고 있어서, schema.sql만 바꿔서는 기존 Postgres 볼륨을 쓰는 환경이 따라오지 못합니다. 운영/개발 DB가 이미 생성된 상태라면 fmp_completed가 없어 런타임에서 바로 실패합니다.

적용이 필요한 SQL 예시
ALTER TABLE public.stock_info
    ADD COLUMN IF NOT EXISTS fmp_completed boolean DEFAULT false;

UPDATE public.stock_info
SET fmp_completed = false
WHERE fmp_completed IS NULL;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@schema.sql` around lines 18 - 25, The migration missed adding the new
fmp_completed column to existing databases, causing runtime failures in
company_fundamentals_data.py when it queries/updates stock_info; add a SQL
migration that ALTERs the stock_info table to ADD COLUMN IF NOT EXISTS
fmp_completed boolean DEFAULT false and ensure existing rows get a non-null
value (e.g., UPDATE stock_info SET fmp_completed = false WHERE fmp_completed IS
NULL), then include this migration in your deployment/migrations pipeline so
environments with preexisting Postgres volumes get the new column before
company_fundamentals_data.py runs.
AI/modules/data_collector/company_fundamentals_data.py (1)

143-169: ⚠️ Potential issue | 🟠 Major

저장 실패나 일시적 API 실패 후에도 완료 처리되어 재시도가 막힙니다.

fetch_fmp_metrics()는 실패 시 []를 반환하고, save_to_db()도 DB 오류를 로그만 남기고 호출자에게 실패를 전달하지 않습니다. 그런데 Line 257은 그와 무관하게 fmp_completed = TRUE를 찍으므로, 일시 장애였던 종목도 영구적으로 재시도 대상에서 빠집니다. 게다가 ticker_master_updater.py:113-122는 이 플래그를 덮어쓰지 않아서 잘못 찍힌 완료 상태가 계속 남습니다.

가능한 수정 방향
-                data = self.fetch_fmp_metrics(ticker)
-                if data:
-                    self.save_to_db(ticker, data)
-                
-                # 수집을 시도했으므로 무조건 완료 도장 찍기 (신규 상장주 무한루프 방지)
-                self.mark_fmp_completed(ticker)
-                success_count += 1
+                data = self.fetch_fmp_metrics(ticker)
+                if not data:
+                    print(f"   [{ticker}] FMP 데이터 없음/일시 실패 - 완료 처리 보류")
+                    continue
+
+                saved = self.save_to_db(ticker, data)
+                if not saved:
+                    print(f"   [{ticker}] DB 저장 실패 - 완료 처리 보류")
+                    continue
+
+                self.mark_fmp_completed(ticker)
+                success_count += 1

save_to_db()는 성공 여부를 bool로 반환하고, “영구적으로 데이터가 없는 티커”와 “일시 실패”를 분리하는 상태값을 두는 편이 안전합니다.

Also applies to: 252-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/data_collector/company_fundamentals_data.py` around lines 143 -
169, save_to_db currently swallows DB errors and returns nothing, and
fetch_fmp_metrics returns [] on failure, but the caller still sets the
fmp_completed flag (e.g., fmp_completed = TRUE) which prevents retries; change
save_to_db to return a bool (or status enum) indicating success/failure and
whether the absence of data is permanent vs transient, propagate exceptions or
false on DB errors, and have the caller (where fmp_completed is set) only mark
fmp_completed true for true permanent-complete status; update the logic around
fetch_fmp_metrics, save_to_db, and the code that sets fmp_completed (and the
update routine that overwrites it) to use the returned status so transient
failures allow retries and permanent no-data sets the completed flag.
🧹 Nitpick comments (1)
AI/requirements.txt (1)

2-25: 재현 가능성을 위해 더 많은 의존성 버전을 고정하는 것을 고려하세요.

대부분의 의존성에 버전 제약이 없어서 재현 가능성 문제와 예상치 못한 breaking change가 발생할 수 있습니다. pip freeze를 사용하거나 pip-tools를 통해 버전을 고정하는 것을 권장합니다.

📌 버전 고정 방법

옵션 1: pip freeze 사용

pip install -r AI/requirements.txt
pip freeze > AI/requirements-lock.txt

옵션 2: pip-tools 사용 (권장)
현재 requirements.txt를 requirements.in으로 유지하고:

pip install pip-tools
pip-compile AI/requirements.in --output-file AI/requirements.txt

이렇게 하면 모든 의존성(및 전이 의존성)이 정확한 버전으로 고정되어 환경 재현성이 보장됩니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/requirements.txt` around lines 2 - 25, Pin all dependencies in
AI/requirements.txt to exact versions and add a lockfile: convert the current
unpinned list (packages like python-dotenv, pandas, numpy<2.0,
tensorflow-cpu<2.11, psycopg2-binary, requests, beautifulsoup4, groq, fredapi,
yfinance, scikit-learn, langchain-community, stable-baselines3, gymnasium,
shimmy, matplotlib, tqdm) into a fully versioned requirements file by either (a)
creating a requirements.in from the current list and running pip-compile to
generate a pinned requirements.txt, or (b) installing and running pip freeze to
produce a requirements-lock.txt and replacing the unpinned AI/requirements.txt
with the frozen versions; ensure the chosen approach is documented in the repo
and include the generated lockfile in source control for reproducible installs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/data_collector/company_fundamentals_data.py`:
- Around line 77-79: The current conditional uses truthiness so numeric zeros
become None; update the three computations (roe, debt_ratio, interest_coverage)
to test for None explicitly and check denominators for zero: use "net_income is
not None and equity is not None and equity != 0" for roe, "total_liabilities is
not None and equity is not None and equity != 0" for debt_ratio, and "op_income
is not None and int_expense is not None and int_expense != 0" (or
abs(int_expense) != 0) for interest_coverage; apply the same explicit None
checks to the analogous calculations around lines 130-132.
- Around line 96-110: The three FMP requests (inc_resp, bal_resp, cf_resp) lack
timeouts, status-code checks and proper error handling and the current except is
a bare catch; update the code that builds inc_resp/bal_resp/cf_resp in
company_fundamentals_data.py to: add a reasonable timeout to each requests.get
call, validate each response.status_code (treat non-2xx, 429, 5xx as failures),
attempt to parse JSON only after a successful status and validate bal_data and
cf_data the same way you validate inc_data, avoid a bare except (catch
requests.Timeout, requests.RequestException, JSONDecodeError), log the specific
error and return [] on failure so df_inc/df_bal/df_cf and df_merged are only
built when all three responses are valid.

In `@AI/modules/data_collector/ticker_master_updater.py`:
- Around line 131-142: The current conn.rollback() in the per-item exception
block undoes the entire transaction (invalidating success_count and earlier
upserts); change this to use a savepoint around each upsert: before calling
cursor.execute(query_company_names, (item['ticker'], item['name'])) create a
savepoint (e.g., SAVEPOINT sp), on exception ROLLBACK TO that savepoint and then
RELEASE it (or only RELEASE on success), and only increment success_count after
a successful execute; keep the final conn.commit() for the overall batch. Use
the existing symbols cursor.execute, query_company_names, success_count,
conn.commit and replace conn.rollback() with savepoint-based rollback to avoid
undoing prior successful rows.

In `@AI/requirements.txt`:
- Line 1: Remove the UTF-8 BOM at the beginning of the file that precedes the
header line "--- 핵심 데이터 처리 ---"; open the file and re-save it as UTF-8 without
BOM (or run a BOM-strip operation) so the leading invisible character (U+FEFF)
is deleted and the file starts directly with the header text, ensuring tools and
Python parsing won't be affected.
- Around line 6-7: The requirements pin "tensorflow-cpu<2.11" fixes TensorFlow
to a known vulnerable range; update the constraint in AI/requirements.txt from
"tensorflow-cpu<2.11" to a secure, tested range (for example
"tensorflow-cpu>=2.11,<2.14" or "tensorflow-cpu>=2.13,<3.0") and then run
compatibility tests with numpy and stable-baselines3 (ensure numpy remains
within supported bounds such as numpy<2.0 if required) and adjust other
dependency pins if any conflicts appear; verify CI/builds pass and update
project docs or dependency notes to record the chosen TensorFlow minimum.

---

Outside diff comments:
In `@AI/modules/data_collector/company_fundamentals_data.py`:
- Around line 143-169: save_to_db currently swallows DB errors and returns
nothing, and fetch_fmp_metrics returns [] on failure, but the caller still sets
the fmp_completed flag (e.g., fmp_completed = TRUE) which prevents retries;
change save_to_db to return a bool (or status enum) indicating success/failure
and whether the absence of data is permanent vs transient, propagate exceptions
or false on DB errors, and have the caller (where fmp_completed is set) only
mark fmp_completed true for true permanent-complete status; update the logic
around fetch_fmp_metrics, save_to_db, and the code that sets fmp_completed (and
the update routine that overwrites it) to use the returned status so transient
failures allow retries and permanent no-data sets the completed flag.

In `@schema.sql`:
- Around line 18-25: The migration missed adding the new fmp_completed column to
existing databases, causing runtime failures in company_fundamentals_data.py
when it queries/updates stock_info; add a SQL migration that ALTERs the
stock_info table to ADD COLUMN IF NOT EXISTS fmp_completed boolean DEFAULT false
and ensure existing rows get a non-null value (e.g., UPDATE stock_info SET
fmp_completed = false WHERE fmp_completed IS NULL), then include this migration
in your deployment/migrations pipeline so environments with preexisting Postgres
volumes get the new column before company_fundamentals_data.py runs.

---

Nitpick comments:
In `@AI/requirements.txt`:
- Around line 2-25: Pin all dependencies in AI/requirements.txt to exact
versions and add a lockfile: convert the current unpinned list (packages like
python-dotenv, pandas, numpy<2.0, tensorflow-cpu<2.11, psycopg2-binary,
requests, beautifulsoup4, groq, fredapi, yfinance, scikit-learn,
langchain-community, stable-baselines3, gymnasium, shimmy, matplotlib, tqdm)
into a fully versioned requirements file by either (a) creating a
requirements.in from the current list and running pip-compile to generate a
pinned requirements.txt, or (b) installing and running pip freeze to produce a
requirements-lock.txt and replacing the unpinned AI/requirements.txt with the
frozen versions; ensure the chosen approach is documented in the repo and
include the generated lockfile in source control for reproducible installs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e489bd71-5bce-4e99-97fc-26f2f406f018

📥 Commits

Reviewing files that changed from the base of the PR and between f910327 and 0ddafb3.

📒 Files selected for processing (6)
  • AI/modules/data_collector/company_fundamentals_data.py
  • AI/modules/data_collector/macro_data.py
  • AI/modules/data_collector/run.py
  • AI/modules/data_collector/ticker_master_updater.py
  • AI/requirements.txt
  • schema.sql

Comment on lines +77 to +79
roe = (net_income / equity) if net_income and equity else None
debt_ratio = (total_liabilities / equity) if total_liabilities and equity else None
interest_coverage = (op_income / abs(int_expense)) if op_income and int_expense and abs(int_expense) > 0 else None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

0 값을 None으로 누락시키고 있습니다.

여기 조건식은 0을 false로 취급해서, 실제 값이 0인 경우에도 roe, debt_ratio, interest_coverageNone으로 저장합니다. 예를 들어 순이익 0, 부채 0, 영업이익 0은 유효한 값인데 현재는 결측치로 바뀝니다.

가능한 수정 예시
-            roe = (net_income / equity) if net_income and equity else None
-            debt_ratio = (total_liabilities / equity) if total_liabilities and equity else None
-            interest_coverage = (op_income / abs(int_expense)) if op_income and int_expense and abs(int_expense) > 0 else None
+            roe = (net_income / equity) if net_income is not None and equity not in (None, 0) else None
+            debt_ratio = (total_liabilities / equity) if total_liabilities is not None and equity not in (None, 0) else None
+            interest_coverage = (
+                op_income / abs(int_expense)
+                if op_income is not None and int_expense is not None and int_expense != 0
+                else None
+            )

Also applies to: 130-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/data_collector/company_fundamentals_data.py` around lines 77 - 79,
The current conditional uses truthiness so numeric zeros become None; update the
three computations (roe, debt_ratio, interest_coverage) to test for None
explicitly and check denominators for zero: use "net_income is not None and
equity is not None and equity != 0" for roe, "total_liabilities is not None and
equity is not None and equity != 0" for debt_ratio, and "op_income is not None
and int_expense is not None and int_expense != 0" (or abs(int_expense) != 0) for
interest_coverage; apply the same explicit None checks to the analogous
calculations around lines 130-132.

Comment on lines +96 to +110
inc_resp = requests.get(f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}")
bal_resp = requests.get(f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}")
cf_resp = requests.get(f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}")

inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json()

if not inc_data or (isinstance(inc_data, dict) and 'Error Message' in inc_data):
return []

df_inc, df_bal, df_cf = pd.DataFrame(inc_data), pd.DataFrame(bal_data), pd.DataFrame(cf_data)
if df_inc.empty or df_bal.empty or df_cf.empty: return []

df_merged = pd.merge(df_inc, df_bal, on='date', how='inner', suffixes=('', '_bal'))
df_merged = pd.merge(df_merged, df_cf, on='date', how='inner', suffixes=('', '_cf'))
except:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n AI/modules/data_collector/company_fundamentals_data.py | sed -n '80,120p'

Repository: SISC-IT/sisc-web

Length of output: 2403


🏁 Script executed:

rg -n "fetch_fmp_metrics" AI/modules/data_collector/ -A 3 -B 3

Repository: SISC-IT/sisc-web

Length of output: 1424


🏁 Script executed:

cat -n AI/modules/data_collector/company_fundamentals_data.py | sed -n '245,260p'

Repository: SISC-IT/sisc-web

Length of output: 806


FMP API 호출에 timeout 및 HTTP 상태 검증이 부재합니다.

세 개의 요청 모두 timeout 없이 실행되므로 외부 API가 느리거나 응답하지 않을 경우 수집기가 무한정 멈출 수 있습니다. 또한 HTTP 상태 코드(429, 5xx 등)를 확인하지 않으며, 오류 메시지 검증이 첫 번째 응답(inc_data)에만 적용되고 bal_datacf_data에는 적용되지 않습니다. 배치 루프(250번 줄)에서 하나의 요청이 타임아웃되면 전체 파이프라인이 블록되므로 이는 배치 안정성에 직접적인 영향을 미칩니다.

권장 수정 예시
         try:
-            inc_resp = requests.get(f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}")
-            bal_resp = requests.get(f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}")
-            cf_resp = requests.get(f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}")
+            inc_resp = requests.get(
+                f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}",
+                timeout=(5, 30),
+            )
+            bal_resp = requests.get(
+                f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}",
+                timeout=(5, 30),
+            )
+            cf_resp = requests.get(
+                f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}",
+                timeout=(5, 30),
+            )
+
+            inc_resp.raise_for_status()
+            bal_resp.raise_for_status()
+            cf_resp.raise_for_status()
 
             inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inc_resp = requests.get(f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}")
bal_resp = requests.get(f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}")
cf_resp = requests.get(f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}")
inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json()
if not inc_data or (isinstance(inc_data, dict) and 'Error Message' in inc_data):
return []
df_inc, df_bal, df_cf = pd.DataFrame(inc_data), pd.DataFrame(bal_data), pd.DataFrame(cf_data)
if df_inc.empty or df_bal.empty or df_cf.empty: return []
df_merged = pd.merge(df_inc, df_bal, on='date', how='inner', suffixes=('', '_bal'))
df_merged = pd.merge(df_merged, df_cf, on='date', how='inner', suffixes=('', '_cf'))
except:
inc_resp = requests.get(
f"{base_url}/income-statement/{ticker}?limit={limit}&apikey={self.api_key}",
timeout=(5, 30),
)
bal_resp = requests.get(
f"{base_url}/balance-sheet-statement/{ticker}?limit={limit}&apikey={self.api_key}",
timeout=(5, 30),
)
cf_resp = requests.get(
f"{base_url}/cash-flow-statement/{ticker}?limit={limit}&apikey={self.api_key}",
timeout=(5, 30),
)
inc_resp.raise_for_status()
bal_resp.raise_for_status()
cf_resp.raise_for_status()
inc_data, bal_data, cf_data = inc_resp.json(), bal_resp.json(), cf_resp.json()
if not inc_data or (isinstance(inc_data, dict) and 'Error Message' in inc_data):
return []
df_inc, df_bal, df_cf = pd.DataFrame(inc_data), pd.DataFrame(bal_data), pd.DataFrame(cf_data)
if df_inc.empty or df_bal.empty or df_cf.empty:
return []
df_merged = pd.merge(df_inc, df_bal, on='date', how='inner', suffixes=('', '_bal'))
df_merged = pd.merge(df_merged, df_cf, on='date', how='inner', suffixes=('', '_cf'))
except:
🧰 Tools
🪛 Ruff (0.15.5)

[error] 96-96: Probable use of requests call without timeout

(S113)


[error] 97-97: Probable use of requests call without timeout

(S113)


[error] 98-98: Probable use of requests call without timeout

(S113)


[error] 106-106: Multiple statements on one line (colon)

(E701)


[error] 110-110: Do not use bare except

(E722)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/data_collector/company_fundamentals_data.py` around lines 96 -
110, The three FMP requests (inc_resp, bal_resp, cf_resp) lack timeouts,
status-code checks and proper error handling and the current except is a bare
catch; update the code that builds inc_resp/bal_resp/cf_resp in
company_fundamentals_data.py to: add a reasonable timeout to each requests.get
call, validate each response.status_code (treat non-2xx, 429, 5xx as failures),
attempt to parse JSON only after a successful status and validate bal_data and
cf_data the same way you validate inc_data, avoid a bare except (catch
requests.Timeout, requests.RequestException, JSONDecodeError), log the specific
error and return [] on failure so df_inc/df_bal/df_cf and df_merged are only
built when all three responses are valid.

Comment on lines +131 to +142
success_count = 0
for item in ticker_list:
if not item.get('name'): continue
try:
cursor.execute(query_company_names, (item['ticker'], item['name']))
success_count += 1
except Exception as e:
conn.rollback() # 이 건만 취소
print(f" - [{item['ticker']}] 업데이트 건너뜀")
continue

conn.commit()
print("[Master] DB 동기화 완료.")
conn.commit() # 최종 저장!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

conn.rollback()가 이전 성공 건까지 함께 되돌립니다.

Line 138의 롤백은 “이 건만 취소”가 아니라 마지막 커밋 이후의 company_names 변경 전체를 취소합니다. 그래서 중간에 한 건이라도 실패하면 그 전에 성공한 upsert도 사라지고, success_count/로그는 실제 DB 상태와 어긋납니다. 행 단위로 무시하려면 savepoint를 쓰거나, 실패 격리를 다른 방식으로 해야 합니다.

가능한 수정 예시
             success_count = 0
             for item in ticker_list:
-                if not item.get('name'): continue
+                if not item.get('name'):
+                    continue
                 try:
+                    cursor.execute("SAVEPOINT company_name_upsert")
                     cursor.execute(query_company_names, (item['ticker'], item['name']))
                     success_count += 1
                 except Exception:
-                    conn.rollback() # 이 건만 취소
+                    cursor.execute("ROLLBACK TO SAVEPOINT company_name_upsert")
                     print(f"  - [{item['ticker']}] 업데이트 건너뜀")
                     continue
🧰 Tools
🪛 Ruff (0.15.5)

[error] 133-133: Multiple statements on one line (colon)

(E701)


[warning] 137-137: Do not catch blind exception: Exception

(BLE001)


[error] 137-137: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/data_collector/ticker_master_updater.py` around lines 131 - 142,
The current conn.rollback() in the per-item exception block undoes the entire
transaction (invalidating success_count and earlier upserts); change this to use
a savepoint around each upsert: before calling
cursor.execute(query_company_names, (item['ticker'], item['name'])) create a
savepoint (e.g., SAVEPOINT sp), on exception ROLLBACK TO that savepoint and then
RELEASE it (or only RELEASE on success), and only increment success_count after
a successful execute; keep the final conn.commit() for the overall batch. Use
the existing symbols cursor.execute, query_company_names, success_count,
conn.commit and replace conn.rollback() with savepoint-based rollback to avoid
undoing prior successful rows.

Comment on lines +6 to +7
# [변경] GPU 바이너리가 포함되지 않은 가벼운 CPU 버전의 TensorFlow 사용
tensorflow-cpu<2.11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# TensorFlow CPU 버전의 보안 권고사항 확인
gh api graphql -f query='
{
  securityVulnerabilities(first: 10, ecosystem: PIP, package: "tensorflow-cpu") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

# PyPI에서 최신 버전 확인
curl -s https://pypi.org/pypi/tensorflow-cpu/json | jq -r '.releases | keys | .[]' | sort -V | tail -5

Repository: SISC-IT/sisc-web

Length of output: 263


🏁 Script executed:

# First, let's check the full requirements.txt file and understand the context
fd -n "requirements.txt" -t f

Repository: SISC-IT/sisc-web

Length of output: 290


🏁 Script executed:

# Get more details about tensorflow-cpu versions and their compatibility
curl -s https://pypi.org/pypi/tensorflow-cpu/2.10.0/json | jq '{version: .info.version, release_date: .releases | keys | .[0], requires_python: .info.requires_python}'
curl -s https://pypi.org/pypi/tensorflow-cpu/2.11.0/json | jq '{version: .info.version, release_date: .releases | keys | .[0], requires_python: .info.requires_python}'

Repository: SISC-IT/sisc-web

Length of output: 159


🏁 Script executed:

# Check security advisories and vulnerability data from alternative sources
curl -s 'https://api.osv.dev/v1/query' -d '{
  "package": {"name": "tensorflow-cpu", "ecosystem": "PyPI"},
  "version": "2.10.0"
}' | jq '.vulns[] | {id: .id, summary: .summary, severity: .severity}'

Repository: SISC-IT/sisc-web

Length of output: 10503


🏁 Script executed:

# Find the requirements.txt file
find . -name "requirements.txt" -type f | head -20

Repository: SISC-IT/sisc-web

Length of output: 81


🏁 Script executed:

# Check TensorFlow 2.11 release information
python3 << 'EOF'
import json
import urllib.request

# Get release history for tensorflow-cpu
try:
    with urllib.request.urlopen('https://pypi.org/pypi/tensorflow-cpu/json') as response:
        data = json.loads(response.read())
        releases = data['releases']
        
        # Get 2.11.0 release date
        if '2.11.0' in releases and releases['2.11.0']:
            print(f"tensorflow-cpu 2.11.0 release info: {releases['2.11.0'][0]['upload_time']}")
        
        # List versions around 2.11
        versions = sorted([v for v in releases.keys() if v.startswith('2.1')], key=lambda x: tuple(map(int, x.split('.'))))
        print("\nVersions in 2.1x range:")
        for v in versions[-10:]:
            if releases[v]:
                print(f"  {v}: {releases[v][0]['upload_time']}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: SISC-IT/sisc-web

Length of output: 194


🏁 Script executed:

# Check which vulnerabilities were fixed in tensorflow 2.11+
python3 << 'EOF'
import json
import urllib.request

# Query OSV for version 2.11.0 to see which vulnerabilities are fixed
try:
    request_body = json.dumps({
        "package": {"name": "tensorflow-cpu", "ecosystem": "PyPI"},
        "version": "2.11.0"
    }).encode('utf-8')
    
    with urllib.request.urlopen('https://api.osv.dev/v1/query', request_body) as response:
        data = json.loads(response.read())
        vulns = data.get('vulns', [])
        print(f"Vulnerabilities in tensorflow-cpu 2.11.0: {len(vulns)}")
        if vulns:
            print("First few vulnerabilities:")
            for vuln in vulns[:5]:
                print(f"  - {vuln.get('id')}: {vuln.get('summary')}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: SISC-IT/sisc-web

Length of output: 194


🏁 Script executed:

# Check what's in the AI directory for requirements.txt
cat AI/requirements.txt 2>/dev/null || echo "File not found in AI directory"

Repository: SISC-IT/sisc-web

Length of output: 473


🌐 Web query:

TensorFlow 2.11 release date when was it released

💡 Result:

TensorFlow 2.11.0 was released on November 18, 2022. [1] (GitHub’s release tag for v2.11.0 is dated Nov 18 as well. [2])


🏁 Script executed:

# Check OSV data for different TensorFlow versions to understand patch history
python3 << 'EOF'
import json
import urllib.request

versions_to_check = ['2.11.0', '2.12.0', '2.13.0', '2.14.0', '2.15.0']

for version in versions_to_check:
    try:
        request_body = json.dumps({
            "package": {"name": "tensorflow-cpu", "ecosystem": "PyPI"},
            "version": version
        }).encode('utf-8')
        
        req = urllib.request.Request('https://api.osv.dev/v1/query', data=request_body, headers={'Content-Type': 'application/json'})
        with urllib.request.urlopen(req) as response:
            data = json.loads(response.read())
            vulns = data.get('vulns', [])
            print(f"tensorflow-cpu {version}: {len(vulns)} vulnerabilities")
    except Exception as e:
        print(f"tensorflow-cpu {version}: Error - {str(e)[:50]}")
EOF

Repository: SISC-IT/sisc-web

Length of output: 469


🏁 Script executed:

# Search for stable-baselines3 compatibility with TensorFlow versions
python3 << 'EOF'
import json
import urllib.request

try:
    # Check stable-baselines3 PyPI metadata for dependencies
    with urllib.request.urlopen('https://pypi.org/pypi/stable-baselines3/json', timeout=5) as response:
        data = json.loads(response.read())
        info = data.get('info', {})
        requires_dist = info.get('requires_dist', [])
        
        print("stable-baselines3 dependencies:")
        if requires_dist:
            for req in requires_dist:
                if 'tensorflow' in req.lower() or 'numpy' in req.lower():
                    print(f"  {req}")
        else:
            print("  No specific TensorFlow/NumPy requirement found")
except Exception as e:
    print(f"Error: {str(e)[:100]}")
EOF

Repository: SISC-IT/sisc-web

Length of output: 167


tensorflow-cpu<2.11 제약이 알려진 보안 취약점이 있는 버전으로 제한합니다.

TensorFlow 2.11.0은 2022년 11월에 릴리스되었으며, tensorflow-cpu 2.10.x는 56개 이상의 알려진 보안 취약점(힙 오버플로우, 널 포인터 오류, 세그먼트 오류 등)을 포함하고 있습니다. 이 제약으로 인해 현재 코드는 오래된 취약 버전에 고정되어 있습니다.

2.11 이상의 버전으로 업그레이드 가능성을 검토하고, 필요한 경우 numpy<2.0 및 stable-baselines3와의 호환성을 테스트하는 것이 좋습니다. 더 최신 버전(예: 2.13+)으로 업데이트하면 보안 취약점을 해결하고 성능 개선을 얻을 수 있습니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/requirements.txt` around lines 6 - 7, The requirements pin
"tensorflow-cpu<2.11" fixes TensorFlow to a known vulnerable range; update the
constraint in AI/requirements.txt from "tensorflow-cpu<2.11" to a secure, tested
range (for example "tensorflow-cpu>=2.11,<2.14" or "tensorflow-cpu>=2.13,<3.0")
and then run compatibility tests with numpy and stable-baselines3 (ensure numpy
remains within supported bounds such as numpy<2.0 if required) and adjust other
dependency pins if any conflicts appear; verify CI/builds pass and update
project docs or dependency notes to record the chosen TensorFlow minimum.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
AI/requirements.txt (1)

6-7: ⚠️ Potential issue | 🟠 Major

tensorflow-cpu<2.11 고정은 GPU 사용 경로를 끊고, 이전에 지적된 구버전 제약도 그대로 남깁니다.

AI/modules/signal/models/transformer/train.py Line 12-27 과 AI/tests/optimize_hyperparameter.py Line 276-284 에서는 여전히 GPU 장치 탐지와 memory growth 설정을 수행하고 있습니다. 여기서 CPU 전용 패키지로 바꾸면 GPU가 있는 환경에서도 학습/튜닝이 항상 CPU로만 돌아가서 성능이 크게 떨어질 수 있습니다. 그리고 <2.11 범위는 이전 리뷰에서 이미 지적된 오래된 버전 제약도 해소하지 못합니다.

가능한 수정 방향
-# [변경] GPU 바이너리가 포함되지 않은 가벼운 CPU 버전의 TensorFlow 사용
-tensorflow-cpu<2.11
+# CPU/GPU 이미지를 분리해서 관리하는 편이 안전합니다.
+# 예:
+# - CPU 전용 이미지: tensorflow-cpu>=2.11,<2.14
+# - GPU 사용 이미지: tensorflow>=2.11,<2.14

다음 스크립트로 GPU 의존 경로가 실제로 남아 있는지 먼저 확인해 주세요.

#!/bin/bash
# TensorFlow GPU 관련 코드 경로 확인
rg -n -C2 --glob 'AI/**/*.py' "list_physical_devices\\('GPU'\\)|set_memory_growth\\(" AI

# requirements / Dockerfile 분리 여부 확인
fd -a 'requirements*.txt' .
fd -i '^dockerfile$' .

기대 결과: GPU 초기화 코드가 실제 학습 경로에 남아 있다면, tensorflow-cpu 단일 고정보다 CPU/GPU requirements 또는 이미지 분리가 필요합니다.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3bfe4f8-1309-4ff7-9777-55e48fa4812b

📥 Commits

Reviewing files that changed from the base of the PR and between 0ddafb3 and 16ecda2.

📒 Files selected for processing (1)
  • AI/requirements.txt

@twq110 twq110 merged commit 69008bc into main Mar 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant