Skip to content

[AI][FEAT] TCN 모델 뼈대 및 학습/추론 어댑터 구현#291

Merged
twq110 merged 7 commits intomainfrom
feat/SISC-290-TCN-model
Mar 12, 2026
Merged

[AI][FEAT] TCN 모델 뼈대 및 학습/추론 어댑터 구현#291
twq110 merged 7 commits intomainfrom
feat/SISC-290-TCN-model

Conversation

@KIMSE0NG1L
Copy link
Contributor

@KIMSE0NG1L KIMSE0NG1L commented Mar 8, 2026

작업 내용

AI 파트에서 TCN 모델 기본 구조를 구현했습니다.

주요 변경 사항

  • AI/modules/signal/models/TCN/architecture.py
    • TCN 분류 모델 아키텍처 추가
  • AI/modules/signal/models/TCN/train.py
    • 공통 데이터 로더 기반 학습 스크립트 추가
    • 기술적 지표 중심 feature로 멀티 호라이즌(1d, 3d, 5d, 7d) 학습 가능하도록 구성
  • AI/modules/signal/models/TCN/wrapper.py
    • 서비스 추론용 wrapper 구현
    • predict(df) 호출 시 feature 선택, scaling, window 생성, 추론, dict 반환까지 처리
  • .gitignore
    • AI/.venv/, AI/data/weights/tcn/ 제외 처리

확인 사항

  • 문법 검증 완료
  • 학습 스크립트 실행 및 weight 저장 동작 확인
  • wrapper 추론 동작 확인

비고

  • 이번 작업은 TCN 모델 폴더 내부 구현에 집중했습니다.
  • 상위 파이프라인 연결은 이후 작업으로 진행 가능합니다.

Summary by CodeRabbit

릴리스 노트

  • 새로운 기능

    • 시계열 분류용 TCN 아키텍처 및 학습/추론 파이프라인 추가 (다중 지평선 예측, 학습 CLI 및 모델/스케일러/메타데이터 저장)
    • 통합 TCN 래퍼로 빌드·학습·로드·저장·단일/배치 예측 API 제공
    • 기술지표 확장: RSI(스케일링), 단기 MA(5일) 및 관련 비율, 시장 변화(비율·거래량 변화) 추가
    • 데이터 입력 유연화: 날짜 범위 또는 DataFrame 직접 입력 지원 및 엄격한 NaN 처리 경로 추가
  • Chores

    • .gitignore 업데이트 (가상환경 및 가중치 디렉터리 제외)

@KIMSE0NG1L KIMSE0NG1L requested a review from Kosw6 as a code owner March 8, 2026 13:26
@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

Warning

Rate limit exceeded

@twq110 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 39 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d89123f7-9f0d-4cb7-8174-d100f6d29787

📥 Commits

Reviewing files that changed from the base of the PR and between 30a5127 and 6937d90.

📒 Files selected for processing (1)
  • AI/modules/signal/models/TCN/train.py

Walkthrough

TCN 아키텍처, 학습 스크립트 및 추론 래퍼를 추가하고 피처 처리/데이터셋 빌더에 엄격한 NaN 규칙과 피처 파이프라인을 도입했으며 개발 환경 및 모델 가중치 경로를 .gitignore에 추가했습니다.

Changes

Cohort / File(s) Summary
TCN 아키텍처 구성
AI/modules/signal/models/TCN/architecture.py
Chomp1d, TemporalBlock, TCNClassifier 클래스 추가 — 인과성 보장용 패딩 제거, 잔차 블록, 지수적 딜레이와 글로벌 풀링 기반 분류기 구현.
TCN 학습 파이프라인
AI/modules/signal/models/TCN/train.py
시퀀스 생성(build_sequences), 스케일러 학습/변환, 시계열 기반 train/val 분할, TCN 훈련 루프(최적 가중치·스케일러·메타데이터 저장) 및 CLI 추가.
TCN 추론 래퍼
AI/modules/signal/models/TCN/wrapper.py
TCNWrapper(BaseSignalModel) 추가 — 아티팩트 로드/저장, DataFrame 전처리, 입력 텐서 준비, predict/predict_batch/train API 및 디바이스·경로 관리, 기본 feature/horizon 상수 추가.
피처 및 파이프라인 변경
AI/modules/features/market_derived.py, AI/modules/signal/core/dataset_builder.py
시장 파생 피처(비율, vol_change, rsi_14, ma_5 등) 추가·조정, FeatureProcessor 기반 파이프라인으로 전환, strict NaN 처리 함수 도입 및 get_standard_training_data 시그니처 확장(문자열 또는 DataFrame 허용).
버전관리 제외 규칙
.gitignore
AI/.venv/AI/data/weights/tcn/ 추가; Windows 섹션의 .desktop.ini 항목 정리.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DataLoader as DataLoader
    participant SeqBuilder as build_sequences()
    participant Scaler as fit_scaler()
    participant Trainer as train_loop
    participant Storage as ArtifactsStorage

    User->>DataLoader: 원본 데이터 요청 (날짜/티커)
    DataLoader-->>SeqBuilder: DataFrame 반환
    SeqBuilder->>SeqBuilder: 기술지표 계산 및 시퀀스/타깃 생성
    SeqBuilder-->>Scaler: 훈련용 X 반환
    Scaler->>Trainer: 정규화된 배치 제공
    Trainer->>Trainer: 순전파·손실·역전파
    Trainer-->>Storage: 최적 가중치·scaler·metadata 저장
Loading
sequenceDiagram
    participant Client
    participant Wrapper as TCNWrapper
    participant Loader as _load_artifacts()
    participant PrepDF as _prepare_dataframe()
    participant PrepTensor as _prepare_input_tensor()
    participant Model as TCNClassifier
    participant Result

    Client->>Wrapper: DataFrame 또는 티커 입력
    Wrapper->>Loader: 메타/스케일러/모델 로드
    Loader-->>Wrapper: 모델 준비 완료
    Wrapper->>PrepDF: 기술지표 추가 및 열 검증
    PrepDF-->>PrepTensor: 최신 시퀀스 반환
    PrepTensor->>Model: 입력 텐서 전달 ([N, Features, Seq])
    Model->>Result: 로짓 반환
    Result->>Client: Sigmoid 적용된 확률 딕셔너리
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • Kosw6

Poem

🐇 깡충, 시퀀스 숲을 훑고 달려가네,
층층이 쌓여 시간의 인과를 지키네,
스칼러 옷차림 단정히, 가중치 품고,
예측 속삭이며 티커에 희망 뿌리네,
토끼의 박수로 모델이 깨어났네.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR 제목이 주요 변경 사항을 명확하게 나타냄: TCN 모델 아키텍처, 학습 스크립트, 추론 어댑터 구현이 모두 포함되어 있음.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/SISC-290-TCN-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
AI/modules/signal/models/TCN/train.py (1)

197-208: 메타데이터에 절대 경로 대신 상대 경로 고려

model_pathscaler_path가 절대 경로로 저장됩니다. 파일을 다른 환경으로 이동하거나 배포할 때 문제가 될 수 있습니다. wrapper의 _load_artifacts는 이미 별도 경로 설정을 지원하므로 메타데이터에서는 파일명만 저장하는 것이 좋습니다.

♻️ 상대 경로로 변경
     metadata: Dict[str, object] = {
         "feature_columns": FEATURE_COLUMNS,
         "horizons": HORIZONS,
         "seq_len": args.seq_len,
         "kernel_size": args.kernel_size,
         "dropout": args.dropout,
         "channels": args.channels,
-        "model_path": model_path,
-        "scaler_path": scaler_path,
+        "model_path": "model.pt",
+        "scaler_path": "scaler.pkl",
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/train.py` around lines 197 - 208, The metadata
currently stores absolute paths for model_path and scaler_path; change this to
store only filenames by replacing model_path and scaler_path in the metadata
dict with their basenames (e.g., os.path.basename(model_path) and
os.path.basename(scaler_path)) before json.dump to metadata_path. Keep metadata
variable name and keys ("model_path", "scaler_path") the same so the wrapper's
_load_artifacts can reconstruct full paths from the saved filenames and the
artifact directory at load time. Ensure imports (os) exist and that any tests or
consumers expecting absolute paths are updated to call _load_artifacts which
performs path joining.
AI/modules/signal/models/TCN/wrapper.py (2)

76-83: 미사용 파라미터 X_val, y_val 처리

검증 데이터 파라미터가 선언되었지만 실제로 사용되지 않습니다. 검증 로직을 구현하거나, 미구현 상태임을 명시하는 것이 좋습니다.

♻️ 미구현 표시 추가
     def train(
         self,
         X_train: np.ndarray,
         y_train: np.ndarray,
         X_val: Optional[np.ndarray] = None,
         y_val: Optional[np.ndarray] = None,
         **kwargs,
     ):
         # wrapper 단독 테스트용 학습 루프입니다. 실제 대규모 학습은 train.py 사용을 기준으로 둡니다.
+        # TODO: X_val, y_val을 활용한 검증 로직 미구현
+        _ = X_val, y_val  # 현재 미사용
         if self.model is None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 76 - 83, The train
method declares X_val and y_val but never uses them; either implement validation
handling or explicitly mark them as unsupported: update the train(self, ...)
implementation (function name: train) to use X_val/y_val for validation (e.g.,
compute val loss, perform early stopping or scheduler steps) when both are
provided, or if you don’t intend to support validation yet, immediately
assert/raise NotImplementedError or log a clear warning and ignore the
parameters so callers aren’t misled; ensure references to X_val and y_val appear
in the method body and document the behavior in the method docstring.

194-197: zip()strict=True 추가 권장

self.horizonsprobs 길이가 불일치할 경우 조용히 잘린 결과가 반환될 수 있습니다. 명시적으로 길이 검증을 추가하면 디버깅이 용이해집니다.

♻️ 수정 제안
         return {
             f"tcn_{horizon}d": float(prob)
-            for horizon, prob in zip(self.horizons, probs)
+            for horizon, prob in zip(self.horizons, probs, strict=True)
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 194 - 197, 현재 반환 dict
생성에서 self.horizons와 probs가 길이 불일치 시 묵묵히 잘려버립니다: 변경하세요; zip(self.horizons, probs)
대신 zip(self.horizons, probs, strict=True)로 사용하거나(파이썬 3.10 이상) 호출부에서 길이 검증을 추가해
길이가 다르면 명시적인 예외(ValueError)를 던지도록 처리합니다; 수정 대상 식별자: self.horizons, probs, 표현식
f"tcn_{horizon}d"가 사용된 dict 컴프리헨션(파일: AI/modules/signal/models/TCN/wrapper.py)에
적용하세요.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 131-133: The current use of pickle.load when initializing
self.scaler from self.scaler_path is unsafe; replace it with a safer pattern:
wrap the load in a try/except, validate the filename/path (self.scaler_path) is
expected/owned, and after deserialization assert the object type (e.g.,
isinstance(obj, sklearn.preprocessing.StandardScaler) or another expected scaler
class) before assigning to self.scaler; alternatively use joblib.load if the
scaler was saved with joblib and still perform the same post-load type check and
error handling; ensure any failure logs a clear error and does not assign
untrusted objects to self.scaler.
- Around line 173-197: The TCN wrapper's predict method returns Dict[str,
float], violating the BaseSignalModel.predict return type; update the design to
restore contract consistency by either (A) updating the abstract signature
BaseSignalModel.predict to return Union[np.ndarray, Dict[str, float]] and adjust
type hints/usages (including portfolio_logic.py) to handle both shapes, or (B)
create a new abstract base (e.g., DictReturningSignalModel) and move the TCN
wrapper (wrapper.py predict) to implement that interface so
BaseSignalModel.predict remains np.ndarray for other models; reference
BaseSignalModel.predict, the TCN wrapper predict in
AI/modules/signal/models/TCN/wrapper.py, and portfolio_logic.py when making the
changes.

---

Nitpick comments:
In `@AI/modules/signal/models/TCN/train.py`:
- Around line 197-208: The metadata currently stores absolute paths for
model_path and scaler_path; change this to store only filenames by replacing
model_path and scaler_path in the metadata dict with their basenames (e.g.,
os.path.basename(model_path) and os.path.basename(scaler_path)) before json.dump
to metadata_path. Keep metadata variable name and keys ("model_path",
"scaler_path") the same so the wrapper's _load_artifacts can reconstruct full
paths from the saved filenames and the artifact directory at load time. Ensure
imports (os) exist and that any tests or consumers expecting absolute paths are
updated to call _load_artifacts which performs path joining.

In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 76-83: The train method declares X_val and y_val but never uses
them; either implement validation handling or explicitly mark them as
unsupported: update the train(self, ...) implementation (function name: train)
to use X_val/y_val for validation (e.g., compute val loss, perform early
stopping or scheduler steps) when both are provided, or if you don’t intend to
support validation yet, immediately assert/raise NotImplementedError or log a
clear warning and ignore the parameters so callers aren’t misled; ensure
references to X_val and y_val appear in the method body and document the
behavior in the method docstring.
- Around line 194-197: 현재 반환 dict 생성에서 self.horizons와 probs가 길이 불일치 시 묵묵히
잘려버립니다: 변경하세요; zip(self.horizons, probs) 대신 zip(self.horizons, probs,
strict=True)로 사용하거나(파이썬 3.10 이상) 호출부에서 길이 검증을 추가해 길이가 다르면 명시적인 예외(ValueError)를
던지도록 처리합니다; 수정 대상 식별자: self.horizons, probs, 표현식 f"tcn_{horizon}d"가 사용된 dict
컴프리헨션(파일: AI/modules/signal/models/TCN/wrapper.py)에 적용하세요.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7ec2a31e-052f-4e60-a3e7-0062432d61a7

📥 Commits

Reviewing files that changed from the base of the PR and between aeab564 and bb10575.

📒 Files selected for processing (4)
  • .gitignore
  • AI/modules/signal/models/TCN/architecture.py
  • AI/modules/signal/models/TCN/train.py
  • AI/modules/signal/models/TCN/wrapper.py

Comment on lines +131 to +133
if self.scaler is None and os.path.exists(self.scaler_path):
with open(self.scaler_path, "rb") as f:
self.scaler = pickle.load(f)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pickle 역직렬화 보안 고려

pickle.load는 신뢰할 수 없는 데이터에 대해 잠재적 보안 위험이 있습니다. 현재 scaler 파일은 학습 과정에서 생성된 내부 아티팩트이므로 위험도는 낮지만, 파일 경로가 외부 입력으로 조작될 가능성이 있다면 추가 검증이 필요합니다.

🧰 Tools
🪛 Ruff (0.15.4)

[error] 133-133: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 131 - 133, The current
use of pickle.load when initializing self.scaler from self.scaler_path is
unsafe; replace it with a safer pattern: wrap the load in a try/except, validate
the filename/path (self.scaler_path) is expected/owned, and after
deserialization assert the object type (e.g., isinstance(obj,
sklearn.preprocessing.StandardScaler) or another expected scaler class) before
assigning to self.scaler; alternatively use joblib.load if the scaler was saved
with joblib and still perform the same post-load type check and error handling;
ensure any failure logs a clear error and does not assign untrusted objects to
self.scaler.

Comment on lines +173 to +197
def predict(self, X_input) -> Dict[str, float]:
# DataFrame 입력이 기본 경로이며, 테스트 편의를 위해 ndarray도 허용합니다.
self._load_artifacts()

if self.model is None:
raise ValueError("TCN model is not available.")

if isinstance(X_input, pd.DataFrame):
tensor_x = self._prepare_input_tensor(X_input)
else:
array_x = np.asarray(X_input, dtype=np.float32)
if array_x.ndim == 2:
array_x = np.expand_dims(array_x, axis=0)
tensor_x = torch.from_numpy(array_x).float().to(self.device)

def predict(self, X_input: np.ndarray) -> np.ndarray:
if self.model is None: raise Exception("Model not built")
self.model.eval()
with torch.no_grad():
tensor_x = torch.FloatTensor(X_input).to(self.device)
out = self.model(tensor_x)
return torch.sigmoid(out).cpu().numpy()
logits = self.model(tensor_x)
probs = torch.sigmoid(logits).cpu().numpy().flatten()

# 포트폴리오 파이프라인이 바로 읽을 수 있도록 horizon별 dict 형태로 반환합니다.
return {
f"tcn_{horizon}d": float(prob)
for horizon, prob in zip(self.horizons, probs)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# BaseSignalModel의 predict 메서드 시그니처 확인
ast-grep --pattern $'class BaseSignalModel {
  $$$
  def predict($$$) -> $_:
    $$$
  $$$
}'

Repository: SISC-IT/sisc-web

Length of output: 42


🏁 Script executed:

# Find the BaseSignalModel file
find . -type f -name "base_model.py" | head -20

Repository: SISC-IT/sisc-web

Length of output: 98


🏁 Script executed:

# Find portfolio_logic.py
find . -type f -name "portfolio_logic.py" | head -20

Repository: SISC-IT/sisc-web

Length of output: 109


🏁 Script executed:

# List files in the signal modules directory to understand structure
git ls-files | grep -i "signal" | head -30

Repository: SISC-IT/sisc-web

Length of output: 894


🏁 Script executed:

# Check the BaseSignalModel.predict method signature
cat -n AI/modules/signal/core/base_model.py | head -80

Repository: SISC-IT/sisc-web

Length of output: 2012


🏁 Script executed:

# Check the TCN wrapper predict method
cat -n AI/modules/signal/models/TCN/wrapper.py | sed -n '170,200p'

Repository: SISC-IT/sisc-web

Length of output: 1422


🏁 Script executed:

# Check how portfolio_logic.py uses the predictions
cat -n AI/modules/trader/strategies/portfolio_logic.py | grep -A 10 -B 5 "predict\|preds"

Repository: SISC-IT/sisc-web

Length of output: 1788


🏁 Script executed:

# Check other wrapper implementations to see if they have similar issues
for wrapper in AI/modules/signal/models/*/wrapper.py; do
  echo "=== $wrapper ==="
  grep -A 2 "def predict" "$wrapper" | head -5
done

Repository: SISC-IT/sisc-web

Length of output: 809


BaseSignalModel 추상 메서드의 반환 타입 계약 위반

BaseSignalModel.predictnp.ndarray를 반환하도록 선언되어 있지만, 이 TCN 구현은 Dict[str, float]를 반환합니다. 다른 모든 wrapper 구현(PatchTST, Transformer, iTrans)은 정확히 계약을 따르며 np.ndarray를 반환합니다.

실제 소비자인 portfolio_logic.py는 TCN에서만 dict를 기대하므로 런타임에는 동작하지만, 이는 Liskov 치환 원칙을 위반하며 정적 분석 도구에서 오류를 발생시키고 코드베이스 일관성을 저하시킵니다.

권장 해결 방안:

  1. BaseSignalModel.predict의 반환 타입을 Union[np.ndarray, Dict[str, float]]로 업데이트하거나
  2. Signal 모델용 별도의 추상 베이스를 정의하여 dict 반환 모델 분리
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 178-178: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 196-196: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 173 - 197, The TCN
wrapper's predict method returns Dict[str, float], violating the
BaseSignalModel.predict return type; update the design to restore contract
consistency by either (A) updating the abstract signature
BaseSignalModel.predict to return Union[np.ndarray, Dict[str, float]] and adjust
type hints/usages (including portfolio_logic.py) to handle both shapes, or (B)
create a new abstract base (e.g., DictReturningSignalModel) and move the TCN
wrapper (wrapper.py predict) to implement that interface so
BaseSignalModel.predict remains np.ndarray for other models; reference
BaseSignalModel.predict, the TCN wrapper predict in
AI/modules/signal/models/TCN/wrapper.py, and portfolio_logic.py when making the
changes.

discipline24
discipline24 previously approved these changes Mar 9, 2026
Copy link
Contributor

@discipline24 discipline24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

고생하셨습니다~ 아래 작업 컨벤션 확인해주시고 추후 작업시 컨벤션 지켜주시면 감사하겠습니다.
#216
추가로 위에 토끼가 리뷰한거 모두 적용해주신 후에 resolove conversation 클릭하고
다시 코드리뷰 approve 받으면 머지 가능합니다.

@twq110 twq110 changed the title [AI][FEAT]: TCN 모델 뼈대 및 학습/추론 어댑터 구현 [AI][FEAT] TCN 모델 뼈대 및 학습/추론 어댑터 구현 Mar 11, 2026
@twq110 twq110 closed this Mar 12, 2026
@twq110 twq110 deleted the feat/SISC-290-TCN-model branch March 12, 2026 06:30
@twq110 twq110 restored the feat/SISC-290-TCN-model branch March 12, 2026 06:31
@twq110 twq110 reopened this Mar 12, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
AI/modules/signal/models/TCN/wrapper.py (2)

77-84: 사용되지 않는 검증 데이터 파라미터

X_valy_val 파라미터가 선언되었으나 사용되지 않습니다. 호출자가 검증 로직이 수행된다고 오해할 수 있습니다.

♻️ 제안: 파라미터 제거 또는 사용

현재 주석에서 "wrapper 단독 테스트용"임을 명시했으므로, 사용하지 않는 파라미터를 제거하거나 향후 검증 로직을 추가할 계획이라면 TODO 주석을 남기는 것이 좋겠습니다.

 def train(
     self,
     X_train: np.ndarray,
     y_train: np.ndarray,
-    X_val: Optional[np.ndarray] = None,
-    y_val: Optional[np.ndarray] = None,
     **kwargs,
 ):
+    # NOTE: 검증 데이터를 사용한 early stopping은 train.py에서 처리합니다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 77 - 84, The train
method declares unused validation parameters X_val and y_val in function train
of the TCN wrapper; either remove these parameters from the signature to avoid
misleading callers or implement validation handling (e.g., run validation inside
train when X_val/y_val provided) and add a TODO comment if validation will be
added later; update any callers or tests that pass X_val/y_val to match the new
signature if you remove them, and ensure function docstring/comments reflect the
change.

200-203: zip()strict=True 추가로 길이 불일치 감지

self.horizonsprobs의 길이가 다를 경우 조용히 데이터가 손실됩니다. strict=True를 추가하면 길이 불일치 시 명시적으로 오류가 발생합니다.

♻️ 제안
 return {
     f"tcn_{horizon}d": float(prob)
-    for horizon, prob in zip(self.horizons, probs)
+    for horizon, prob in zip(self.horizons, probs, strict=True)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 200 - 203, The dict
comprehension using zip(self.horizons, probs) can silently drop items when
lengths differ; update the comprehension in the method that returns
{f"tcn_{horizon}d": float(prob) for horizon, prob in zip(self.horizons, probs)}
to use zip(..., strict=True) so a length mismatch raises an error (keeps
references to self.horizons and probs as-is).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 179-203: The wrapper is missing a get_signals method required by
the pipeline; add a get_signals(self, df: pd.DataFrame, ticker_id, sector_id)
method that accepts a DataFrame, prepares inputs (reuse existing
_prepare_input_tensor or call predict), calls the model prediction path and
returns the same horizon-keyed Dict[str,float] currently produced by predict (or
refactor predict to return raw np.ndarray per BaseSignalModel and have
get_signals call predict then map horizons->floats); ensure get_signals uses
self.horizons and the existing tensor preparation (e.g., _prepare_input_tensor)
and preserves device/model eval/no_grad behavior consistent with predict.
- Around line 142-145: torch.load 호출에서 weights_only=True 파라미터를 누락했으니
AI/modules/signal/models/TCN/wrapper.py의 해당 로드 코드를 수정해 self.model_path를 로드할 때
torch.load(self.model_path, map_location=self.device, weights_only=True)처럼
weights_only=True를 추가하고(참조: torch.load, self.model, self.model_path,
map_location) 동시에 프로젝트의 PyTorch 버전이 2.6.0 이상인지 확인해 필요 시 업그레이드하도록 안내문이나 주석을 남기세요.

---

Nitpick comments:
In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 77-84: The train method declares unused validation parameters
X_val and y_val in function train of the TCN wrapper; either remove these
parameters from the signature to avoid misleading callers or implement
validation handling (e.g., run validation inside train when X_val/y_val
provided) and add a TODO comment if validation will be added later; update any
callers or tests that pass X_val/y_val to match the new signature if you remove
them, and ensure function docstring/comments reflect the change.
- Around line 200-203: The dict comprehension using zip(self.horizons, probs)
can silently drop items when lengths differ; update the comprehension in the
method that returns {f"tcn_{horizon}d": float(prob) for horizon, prob in
zip(self.horizons, probs)} to use zip(..., strict=True) so a length mismatch
raises an error (keeps references to self.horizons and probs as-is).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d200b819-3d4a-4f08-8a4c-410e21256515

📥 Commits

Reviewing files that changed from the base of the PR and between bb10575 and 3d5359c.

📒 Files selected for processing (1)
  • AI/modules/signal/models/TCN/wrapper.py

Comment on lines +142 to +145
if self.model is not None and os.path.exists(self.model_path):
state_dict = torch.load(self.model_path, map_location=self.device)
self.model.load_state_dict(state_dict)
self.model.eval()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

PyTorch torch.load weights_only parameter security

💡 Result:

torch.load() deserializes via Python pickle, so loading a checkpoint can execute code; PyTorch’s docs explicitly warn to never load from an untrusted source. [1]

What weights_only does

  • weights_only=True tells torch.load to use a restricted unpickler that only allows tensors, primitive types, dictionaries, plus any types you explicitly allow via torch.serialization.add_safe_globals(). [1]

Security reality (important)

  • A critical vuln (CVE-2025-32434 / GHSA-53q9-r3pm-6pq6) showed that PyTorch < 2.6.0 could still hit RCE even when weights_only=True. The advisory lists affected: < 2.6.0, patched: 2.6.0. [4]

Practical guidance

  • Upgrade PyTorch to ≥ 2.6.0 (or newer) before relying on weights_only=True. [4]
  • Even then, follow PyTorch security guidance: treat untrusted models as untrusted code, prefer isolation (containers/VMs), and validate inputs. [3]
  • If you want to enforce safer behavior broadly, PyTorch provides TORCH_FORCE_WEIGHTS_ONLY_LOAD=1 to force torch.load to act as if weights_only=True (even if a callsite passes False). [2]

Sources: PyTorch torch.load docs [1], PyTorch env var docs [2], PyTorch security policy [3], GitHub advisory for CVE-2025-32434 [4].


🏁 Script executed:

# Find PyTorch version requirements
fd -t f "requirements.*|pyproject.toml|setup.py|environment.yml" | head -20

Repository: SISC-IT/sisc-web

Length of output: 79


🏁 Script executed:

# Check the wrapper.py file for more context
cat -n AI/modules/signal/models/TCN/wrapper.py | head -150

Repository: SISC-IT/sisc-web

Length of output: 6608


🏁 Script executed:

# Search for torch imports to understand version dependencies
rg "torch.*version|import.*torch" -A 2 --max-count 20

Repository: SISC-IT/sisc-web

Length of output: 2064


🏁 Script executed:

# Check requirements.txt for PyTorch version
cat -n AI/requirements.txt

Repository: SISC-IT/sisc-web

Length of output: 539


🏁 Script executed:

# Also check for any setup.py or pyproject.toml
rg "torch|pytorch" -i | head -30

Repository: SISC-IT/sisc-web

Length of output: 2908


torch.load()weights_only=True 파라미터 추가 권장

PyTorch 보안 가이드에 따르면 torch.load()는 pickle을 통해 역직렬화되므로, 신뢰할 수 없는 소스에서 모델을 로드할 때 weights_only=True를 명시적으로 지정해야 합니다. 현재 코드에서는 이 파라미터가 누락되어 있습니다.

참고: PyTorch < 2.6.0에서는 CVE-2025-32434로 인해 weights_only=True만으로는 완전한 보호가 되지 않으므로, 가능하면 PyTorch 2.6.0 이상으로 업그레이드하는 것을 권장합니다. 또한 현재 코드는 내부 저장소(AI/data/weights/tcn/model.pt)에서만 모델을 로드하므로 즉각적인 위협은 낮지만, 향후 모델 소스가 변경될 수 있으므로 미리 보안 모범 사례를 적용하는 것이 좋습니다.

🛡️ 보안 개선 제안
 if self.model is not None and os.path.exists(self.model_path):
-    state_dict = torch.load(self.model_path, map_location=self.device)
+    state_dict = torch.load(self.model_path, map_location=self.device, weights_only=True)
     self.model.load_state_dict(state_dict)
     self.model.eval()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.model is not None and os.path.exists(self.model_path):
state_dict = torch.load(self.model_path, map_location=self.device)
self.model.load_state_dict(state_dict)
self.model.eval()
if self.model is not None and os.path.exists(self.model_path):
state_dict = torch.load(self.model_path, map_location=self.device, weights_only=True)
self.model.load_state_dict(state_dict)
self.model.eval()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 142 - 145, torch.load
호출에서 weights_only=True 파라미터를 누락했으니 AI/modules/signal/models/TCN/wrapper.py의 해당
로드 코드를 수정해 self.model_path를 로드할 때 torch.load(self.model_path,
map_location=self.device, weights_only=True)처럼 weights_only=True를 추가하고(참조:
torch.load, self.model, self.model_path, map_location) 동시에 프로젝트의 PyTorch 버전이
2.6.0 이상인지 확인해 필요 시 업그레이드하도록 안내문이나 주석을 남기세요.

Comment on lines +179 to +203
def predict(self, X_input: Union[pd.DataFrame, np.ndarray]) -> Dict[str, float]:
# DataFrame 입력이 기본 경로이며, 테스트 편의를 위해 ndarray도 허용합니다.
self._load_artifacts()

if self.model is None:
raise ValueError("TCN model is not available.")

if isinstance(X_input, pd.DataFrame):
tensor_x = self._prepare_input_tensor(X_input)
else:
array_x = np.asarray(X_input, dtype=np.float32)
if array_x.ndim == 2:
array_x = np.expand_dims(array_x, axis=0)
tensor_x = torch.from_numpy(array_x).float().to(self.device)

def predict(self, X_input: np.ndarray) -> np.ndarray:
if self.model is None: raise Exception("Model not built")
self.model.eval()
with torch.no_grad():
tensor_x = torch.FloatTensor(X_input).to(self.device)
out = self.model(tensor_x)
return torch.sigmoid(out).cpu().numpy()
logits = self.model(tensor_x)
probs = torch.sigmoid(logits).cpu().numpy().flatten()

# 포트폴리오 파이프라인이 바로 읽을 수 있도록 horizon별 dict 형태로 반환합니다.
return {
f"tcn_{horizon}d": float(prob)
for horizon, prob in zip(self.horizons, probs)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

get_signals() 메서드 누락으로 파이프라인 통합 시 런타임 오류 발생

portfolio_logic.py에서 wrapper.get_signals(df, ticker_id, sector_id)를 호출하지만 TCNWrapper에는 이 메서드가 구현되어 있지 않습니다. 파이프라인에서 사용 시 AttributeError가 발생합니다.

다른 wrapper들(TransformerSignalModel 등)은 predict()get_signals()를 분리하여:

  • predict(): BaseSignalModel 계약 준수 (np.ndarray 반환)
  • get_signals(): 파이프라인용 DataFrame → Dict 변환
🔧 get_signals() 메서드 추가 제안
+    def get_signals(self, df: pd.DataFrame, ticker_id: int = 0, sector_id: int = 0) -> Dict[str, float]:
+        """
+        [파이프라인 전용 메서드]
+        DataFrame을 입력받아 TCN 추론 결과를 딕셔너리로 반환합니다.
+        """
+        self._load_artifacts()
+
+        if self.model is None:
+            raise ValueError("TCN model is not available.")
+
+        tensor_x = self._prepare_input_tensor(df)
+
+        self.model.eval()
+        with torch.no_grad():
+            logits = self.model(tensor_x)
+            probs = torch.sigmoid(logits).cpu().numpy().flatten()
+
+        return {
+            f"tcn_{horizon}d": float(prob)
+            for horizon, prob in zip(self.horizons, probs, strict=True)
+        }
+
     def predict(self, X_input: Union[pd.DataFrame, np.ndarray]) -> Dict[str, float]:

기존 predict() 메서드는 get_signals()를 호출하도록 리팩토링하거나, BaseSignalModel 계약에 맞게 np.ndarray를 반환하도록 수정하는 것을 고려해주세요.

🧰 Tools
🪛 Ruff (0.15.5)

[warning] 184-184: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 202-202: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 179 - 203, The wrapper
is missing a get_signals method required by the pipeline; add a
get_signals(self, df: pd.DataFrame, ticker_id, sector_id) method that accepts a
DataFrame, prepares inputs (reuse existing _prepare_input_tensor or call
predict), calls the model prediction path and returns the same horizon-keyed
Dict[str,float] currently produced by predict (or refactor predict to return raw
np.ndarray per BaseSignalModel and have get_signals call predict then map
horizons->floats); ensure get_signals uses self.horizons and the existing tensor
preparation (e.g., _prepare_input_tensor) and preserves device/model
eval/no_grad behavior consistent with predict.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
AI/modules/signal/models/TCN/wrapper.py (1)

33-39: ⚠️ Potential issue | 🔴 Critical

BaseSignalModel 계약이 아직 맞지 않습니다.

AI/modules/signal/core/base_model.py 기준으로 predict()np.ndarray를 반환해야 하고 get_signals()도 구현해야 합니다. 지금 구현은 Dict[str, float]를 반환하면서 get_signals()가 없어서, 베이스 클래스 계약을 따르는 호출부에서 바로 깨집니다.

#!/bin/bash
set -e

echo "[1] BaseSignalModel abstract API"
sed -n '1,120p' AI/modules/signal/core/base_model.py | nl -ba | sed -n '1,120p'

echo
echo "[2] TCNWrapper public methods"
sed -n '1,260p' AI/modules/signal/models/TCN/wrapper.py | nl -ba | sed -n '1,260p'

echo
echo "[3] Search for get_signals implementation in TCN wrapper"
rg -n 'def get_signals\(' AI/modules/signal/models/TCN/wrapper.py || true

Also applies to: 179-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 33 - 39, The TCNWrapper
currently violates BaseSignalModel: change TCNWrapper.predict(...) to return a
numpy.ndarray (np.ndarray) instead of Dict and ensure any internal
post-processing (scaling, windowing, etc.) produces an np.ndarray output;
convert your current dict/series outputs into np.array before returning.
Implement a new TCNWrapper.get_signals(self, model_output: np.ndarray) ->
Dict[str, float] that maps the numeric array output to the signal dictionary
expected by callers (e.g., extracting named values or summary metrics), and
update any other methods around the same area (see existing methods near lines
~179-203) that currently return Dict to instead return/accept np.ndarray so they
conform to BaseSignalModel's contract. Ensure type hints match (np.ndarray) and
add tests or simple conversions where callers still expect dicts by calling
get_signals.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/signal/core/dataset_builder.py`:
- Around line 33-48: The current pipeline calls
add_market_changes/add_macro_changes before sorting and splitting by ticker,
which lets pct_change/shift cross different tickers and corrupt features; modify
dataset_builder.py so after obtaining df (from DataLoader.load_data_from_db or
when passed a DataFrame) you first sort by date (and any necessary keys) and
then, if multiple tickers exist, group/split by the ticker column and apply
add_market_changes/add_macro_changes (or compute pct_change/shift/rolling)
per-group before re-concatenating; alternatively ensure
DataLoader.load_data_from_db is called with tickers to return single-ticker data
and document that FeatureProcessor.execute_pipeline expects pre-sorted,
per-ticker input.

In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 139-147: The current flow sets is_loaded=True even when no weights
are loaded, allowing predict() to run with random weights; change the logic in
the initializer/loader so that after ensuring self.model exists (via
build(self.seq_len, len(self.feature_columns))) you require a valid weights file
at self.model_path and successful load_state_dict; if
os.path.exists(self.model_path) is False raise FileNotFoundError (or
RuntimeError) and do not set self.is_loaded, and wrap
self.model.load_state_dict(...) in a try/except to raise a clear error on
failure instead of silently continuing; ensure references to self.model,
build(...), self.model_path, load_state_dict(...), predict() and self.is_loaded
are used to locate and update the code paths.

---

Duplicate comments:
In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 33-39: The TCNWrapper currently violates BaseSignalModel: change
TCNWrapper.predict(...) to return a numpy.ndarray (np.ndarray) instead of Dict
and ensure any internal post-processing (scaling, windowing, etc.) produces an
np.ndarray output; convert your current dict/series outputs into np.array before
returning. Implement a new TCNWrapper.get_signals(self, model_output:
np.ndarray) -> Dict[str, float] that maps the numeric array output to the signal
dictionary expected by callers (e.g., extracting named values or summary
metrics), and update any other methods around the same area (see existing
methods near lines ~179-203) that currently return Dict to instead return/accept
np.ndarray so they conform to BaseSignalModel's contract. Ensure type hints
match (np.ndarray) and add tests or simple conversions where callers still
expect dicts by calling get_signals.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4120e140-964b-4d32-8a9c-58e5aa8e49e5

📥 Commits

Reviewing files that changed from the base of the PR and between 3d5359c and 762b34a.

📒 Files selected for processing (3)
  • AI/modules/features/market_derived.py
  • AI/modules/signal/core/dataset_builder.py
  • AI/modules/signal/models/TCN/wrapper.py

Comment on lines +33 to +48
# 1. 입력 타입에 따른 분기 처리 (DB 로드 vs 직접 주입)
if isinstance(start_date_or_df, pd.DataFrame):
df = start_date_or_df.copy()
else:
loader = DataLoader()
df = loader.load_data_from_db(start_date_or_df, end_date)
if df is None or df.empty:
raise ValueError(f"지정된 기간({start_date_or_df} ~ {end_date})의 데이터를 불러오지 못했습니다.")

# 2. 파생 피처 레이어 계산 (1차: 기초 변화율 연산)
df = add_market_changes(df)
df = add_macro_changes(df)

# 2. 파생 피처 레이어 계산
# (팀장님이 기존 features 모듈을 활용해 모든 지표를 미리 계산해서 붙여줌)
df = add_technical_indicators(raw_df) # rsi_14, macd 등 추가
df = add_macro_features(df) # vix_z_score, us10y_chg 등 추가
# 3. 파생 피처 레이어 계산 (2차: FeatureProcessor를 통한 심화 지표 일괄 연산)
processor = FeatureProcessor(df)
df = processor.execute_pipeline()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

정렬·종목 분리 전에 변화율을 계산하면 피처가 오염됩니다.

add_market_changes()shift()/pct_change()에 의존하는데, 지금은 FeatureProcessor가 날짜 정렬을 수행하기 전에 먼저 실행됩니다. 게다가 load_data_from_db(start_date_or_df, end_date)tickers를 넘기지 않아 전체 종목을 반환할 수 있어서, 멀티 종목 데이터에서는 다른 종목의 직전 행이 prev_close로 섞일 수 있습니다. 최소한 날짜 정렬을 먼저 하고, 멀티 종목 입력이면 ticker 기준으로 분리해서 변화율/롤링 지표를 계산해야 합니다.

🧰 Tools
🪛 Ruff (0.15.5)

[warning] 40-40: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/core/dataset_builder.py` around lines 33 - 48, The current
pipeline calls add_market_changes/add_macro_changes before sorting and splitting
by ticker, which lets pct_change/shift cross different tickers and corrupt
features; modify dataset_builder.py so after obtaining df (from
DataLoader.load_data_from_db or when passed a DataFrame) you first sort by date
(and any necessary keys) and then, if multiple tickers exist, group/split by the
ticker column and apply add_market_changes/add_macro_changes (or compute
pct_change/shift/rolling) per-group before re-concatenating; alternatively
ensure DataLoader.load_data_from_db is called with tickers to return
single-ticker data and document that FeatureProcessor.execute_pipeline expects
pre-sorted, per-ticker input.

Comment on lines +139 to +147
if self.model is None:
self.build((self.seq_len, len(self.feature_columns)))

if self.model is not None and os.path.exists(self.model_path):
state_dict = torch.load(self.model_path, map_location=self.device)
self.model.load_state_dict(state_dict)
self.model.eval()

self.is_loaded = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

가중치가 없으면 랜덤 초기화 모델로 추론하게 됩니다.

현재 분기에서는 model.pt가 없어도 build()로 새 모델을 만든 뒤 그대로 is_loaded=True로 끝납니다. 그러면 predict()가 예외 없이 무작위 가중치 결과를 반환하므로, 모델 파일이 없거나 load_state_dict()가 실패한 경우에는 즉시 실패시키는 편이 안전합니다.

🔧 수정 예시
-        if self.model is None:
-            self.build((self.seq_len, len(self.feature_columns)))
-
-        if self.model is not None and os.path.exists(self.model_path):
-            state_dict = torch.load(self.model_path, map_location=self.device)
-            self.model.load_state_dict(state_dict)
-            self.model.eval()
-
-        self.is_loaded = True
+        if not os.path.exists(self.model_path):
+            raise FileNotFoundError(f"TCN weights not found: {self.model_path}")
+
+        if self.model is None:
+            self.build((self.seq_len, len(self.feature_columns)))
+
+        state_dict = torch.load(self.model_path, map_location=self.device)
+        self.model.load_state_dict(state_dict)
+        self.model.eval()
+        self.is_loaded = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 139 - 147, The current
flow sets is_loaded=True even when no weights are loaded, allowing predict() to
run with random weights; change the logic in the initializer/loader so that
after ensuring self.model exists (via build(self.seq_len,
len(self.feature_columns))) you require a valid weights file at self.model_path
and successful load_state_dict; if os.path.exists(self.model_path) is False
raise FileNotFoundError (or RuntimeError) and do not set self.is_loaded, and
wrap self.model.load_state_dict(...) in a try/except to raise a clear error on
failure instead of silently continuing; ensure references to self.model,
build(...), self.model_path, load_state_dict(...), predict() and self.is_loaded
are used to locate and update the code paths.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
AI/modules/signal/models/TCN/wrapper.py (1)

33-283: ⚠️ Potential issue | 🔴 Critical

BaseSignalModel 계약을 아직 만족하지 못합니다.

BaseSignalModelget_signals()를 추상 메서드로 요구하는데, 이 클래스에는 구현이 없습니다. 그래서 현재 상태로는 TCNWrapper(config) 인스턴스화 자체가 막히고, predict()도 베이스 시그니처의 np.ndarray 대신 dict를 반환해 인터페이스가 계속 갈라집니다. 다른 wrapper들처럼 predict()는 원시 예측값을 유지하고, 파이프라인용 dict 매핑은 get_signals()로 분리하는 편이 안전합니다.

#!/bin/bash
# BaseSignalModel 추상 API와 TCNWrapper 구현 비교
rg -n -C2 'class BaseSignalModel|@abstractmethod|def predict|def get_signals' AI/modules/signal/core/base_model.py
rg -n -C2 'class TCNWrapper|def predict|def get_signals|def save|def load' AI/modules/signal/models/TCN/wrapper.py

두 번째 출력에 def get_signals가 없으면 현재 구현은 추상 계약을 비운 상태입니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 33 - 283, The class
violates BaseSignalModel by missing get_signals and by having predict return a
dict; change predict to return raw prediction array/probs (e.g., numpy array or
list of floats) matching BaseSignalModel's predict signature and move the
pipeline-facing dict mapping into a new get_signals method that calls predict
(or predict_batch) and returns the {"tcn_Xd": float} mapping; update
predict_batch similarly to return raw 2D probs (batch x horizons) and have
get_signals call predict/predict_batch as appropriate for single-ticker vs batch
inputs, referencing methods TCNWrapper.predict, TCNWrapper.predict_batch and the
new TCNWrapper.get_signals to locate changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 77-118: train() does not mark the wrapper as having in-memory
weights, so after training predict() still calls _load_artifacts() and may load
disk checkpoint overwriting the freshly trained model; after finishing training
in the train() method (e.g., at end of the epoch loop) set self.is_loaded = True
and ensure the in-memory model state is the authoritative artifact (either call
the existing save/_save_artifacts routine or otherwise update whatever artifact
state _load_artifacts() expects, so predict() will use the in-memory weights
instead of reloading disk model.pt); reference: train(), predict(),
_load_artifacts(), and self.is_loaded.
- Around line 120-145: The instance caches previous artifacts so calling load()
with a different checkpoint can mix old scaler/model with new weights; update
_load_artifacts() (or load()) to evict prior artifacts before loading:
explicitly reset self.scaler = None, self.model = None (and optionally
self.is_loaded = False) when a new metadata/scaler/model path is provided, or
change the conditional checks in _load_artifacts() to always load from
scaler_path/model_path (instead of checking "is None"); also ensure you rebuild
the model using build((self.seq_len, len(self.feature_columns))) after metadata
is read (not gated by previous model presence) so architecture updates from
metadata are applied before loading state_dict from model_path.
- Around line 219-240: predict_batch() currently calls
get_standard_training_data() directly and swallows exceptions, which can let
NaNs/inf or differing cleaning rules slip into batch inputs; update
predict_batch() to reuse the same preprocessing helper used by predict() (e.g.,
call self._prepare_dataframe(df) or a shared clean function instead of
get_standard_training_data()), then select self.feature_columns, check length vs
self.seq_len, apply self.scaler.transform, and only append to
tensor_list/valid_tickers when the cleaned feature_frame has no NaN/inf; also
preserve/bubble relevant error info instead of silently swallowing it so
problematic tickers are logged clearly (reference: methods predict_batch(),
predict(), _prepare_dataframe(), get_standard_training_data(), and attributes
feature_columns, seq_len, scaler, tensor_list, valid_tickers).

---

Duplicate comments:
In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 33-283: The class violates BaseSignalModel by missing get_signals
and by having predict return a dict; change predict to return raw prediction
array/probs (e.g., numpy array or list of floats) matching BaseSignalModel's
predict signature and move the pipeline-facing dict mapping into a new
get_signals method that calls predict (or predict_batch) and returns the
{"tcn_Xd": float} mapping; update predict_batch similarly to return raw 2D probs
(batch x horizons) and have get_signals call predict/predict_batch as
appropriate for single-ticker vs batch inputs, referencing methods
TCNWrapper.predict, TCNWrapper.predict_batch and the new TCNWrapper.get_signals
to locate changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f9ba3990-e9f7-47fa-a925-ef2628276857

📥 Commits

Reviewing files that changed from the base of the PR and between 762b34a and 30a5127.

📒 Files selected for processing (1)
  • AI/modules/signal/models/TCN/wrapper.py

Comment on lines +77 to +118
def train(
self,
X_train: np.ndarray,
y_train: np.ndarray,
X_val: Optional[np.ndarray] = None,
y_val: Optional[np.ndarray] = None,
**kwargs,
):
# wrapper 단독 테스트용 학습 루프입니다. 실제 대규모 학습은 train.py 사용을 기준으로 둡니다.
if self.model is None:
self.build(X_train.shape[1:])

criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(
self.model.parameters(),
lr=float(kwargs.get("learning_rate", self.config.get("learning_rate", 1e-3))),
)
epochs = int(kwargs.get("epochs", self.config.get("epochs", 20)))
batch_size = int(kwargs.get("batch_size", self.config.get("batch_size", 64)))

X_tensor = torch.from_numpy(X_train).float().to(self.device)
y_tensor = torch.from_numpy(y_train).float().to(self.device)

for epoch in range(epochs):
self.model.train()
permutation = torch.randperm(X_tensor.size(0), device=self.device)
epoch_loss = 0.0

for i in range(0, X_tensor.size(0), batch_size):
indices = permutation[i : i + batch_size]
batch_x = X_tensor[indices]
batch_y = y_tensor[indices]

optimizer.zero_grad()
logits = self.model(batch_x)
loss = criterion(logits, batch_y)
loss.backward()
optimizer.step()
epoch_loss += loss.item()

if (epoch + 1) % 5 == 0:
print(f"Epoch [{epoch + 1}/{epochs}] Loss: {epoch_loss:.4f}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

새로 학습한 인스턴스에서 첫 predict()가 디스크 checkpoint로 되돌아갈 수 있습니다.

train()이 끝나도 self.is_loaded가 여전히 False라서, 다음 predict()는 항상 _load_artifacts()로 들어갑니다. 기본 model_path에 이전 model.pt가 남아 있으면 방금 학습한 in-memory weights 대신 디스크 weights로 추론합니다.

🔧 제안 수정
             if (epoch + 1) % 5 == 0:
                 print(f"Epoch [{epoch + 1}/{epochs}] Loss: {epoch_loss:.4f}")
+
+        self.model.eval()
+        self.is_loaded = True

Also applies to: 179-181

🧰 Tools
🪛 Ruff (0.15.5)

[warning] 81-81: Unused method argument: X_val

(ARG002)


[warning] 82-82: Unused method argument: y_val

(ARG002)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 77 - 118, train() does
not mark the wrapper as having in-memory weights, so after training predict()
still calls _load_artifacts() and may load disk checkpoint overwriting the
freshly trained model; after finishing training in the train() method (e.g., at
end of the epoch loop) set self.is_loaded = True and ensure the in-memory model
state is the authoritative artifact (either call the existing
save/_save_artifacts routine or otherwise update whatever artifact state
_load_artifacts() expects, so predict() will use the in-memory weights instead
of reloading disk model.pt); reference: train(), predict(), _load_artifacts(),
and self.is_loaded.

Comment on lines +120 to +145
def _load_artifacts(self):
# metadata -> scaler -> model 순서로 읽어 추론에 필요한 상태를 복원합니다.

if self.is_loaded:
return # 이미 로드된 상태라면 중복 로딩을 방지합니다.
if self.metadata_path and os.path.exists(self.metadata_path):
with open(self.metadata_path, "r", encoding="utf-8") as f:
self.metadata = json.load(f)
self.feature_columns = self.metadata.get("feature_columns", self.feature_columns)
self.horizons = self.metadata.get("horizons", self.horizons)
self.seq_len = int(self.metadata.get("seq_len", self.seq_len))
self.channels = self.metadata.get("channels", self.channels)
self.kernel_size = int(self.metadata.get("kernel_size", self.kernel_size))
self.dropout = float(self.metadata.get("dropout", self.dropout))

if self.scaler is None and os.path.exists(self.scaler_path):
with open(self.scaler_path, "rb") as f:
self.scaler = pickle.load(f)

if self.model is None:
self.build((self.seq_len, len(self.feature_columns)))

if self.model is not None and os.path.exists(self.model_path):
state_dict = torch.load(self.model_path, map_location=self.device)
self.model.load_state_dict(state_dict)
self.model.eval()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

다른 체크포인트를 다시 load()하면 이전 scaler/model 캐시가 남습니다.

load()는 경로와 is_loaded만 바꾸고, _load_artifacts()self.scaler is None / self.model is None일 때만 새 아티팩트를 읽습니다. 같은 인스턴스로 다른 체크포인트를 열면 새 weights에 이전 scaler나 이전 아키텍처가 섞일 수 있습니다.

🔧 제안 수정
     def load(self, filepath: str):
         """
         외부 경로의 가중치를 불러옵니다.
         가중치가 위치한 동일 폴더 내의 scaler 및 metadata를 읽어오도록 경로를 동기화합니다.
         """
         self.model_path = filepath
         target_dir = os.path.dirname(filepath)
         self.scaler_path = os.path.join(target_dir, "scaler.pkl")
         self.metadata_path = os.path.join(target_dir, "metadata.json")
-        
-        self.is_loaded = False # 새 경로로 로드할 때는 중복 로딩 방지 플래그를 초기화합니다.
+
+        self.model = None
+        self.scaler = None
+        self.metadata = {}
+        self.is_loaded = False  # 새 경로의 아티팩트를 강제로 다시 읽습니다.
         self._load_artifacts()

Also applies to: 272-283

🧰 Tools
🪛 Ruff (0.15.5)

[error] 137-137: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 120 - 145, The instance
caches previous artifacts so calling load() with a different checkpoint can mix
old scaler/model with new weights; update _load_artifacts() (or load()) to evict
prior artifacts before loading: explicitly reset self.scaler = None, self.model
= None (and optionally self.is_loaded = False) when a new metadata/scaler/model
path is provided, or change the conditional checks in _load_artifacts() to
always load from scaler_path/model_path (instead of checking "is None"); also
ensure you rebuild the model using build((self.seq_len,
len(self.feature_columns))) after metadata is read (not gated by previous model
presence) so architecture updates from metadata are applied before loading
state_dict from model_path.

Comment on lines +219 to +240
# 1. 딕셔너리로 받은 종목별 데이터를 순회하며 전처리 및 윈도우 추출
for ticker, df in ticker_data_map.items():
try:
# dataset_builder의 다형성을 활용하여 전처리 (에러 발생 종목은 스킵)
prepared = get_standard_training_data(df.copy())
feature_frame = prepared[self.feature_columns]

if len(feature_frame) < self.seq_len:
continue # 시퀀스 길이가 부족한 신규 상장 종목 등은 건너뜁니다.

latest_window = feature_frame.tail(self.seq_len).to_numpy(dtype=np.float32)

# [참고] Global Scaler를 가정하고 일괄 적용합니다.
if self.scaler is not None:
latest_window = self.scaler.transform(latest_window).astype(np.float32)

tensor_list.append(latest_window)
valid_tickers.append(ticker)

except Exception as e:
# 특정 종목 데이터 불량 시 전체 파이프라인이 멈추지 않도록 예외 처리
print(f"⚠️ [{ticker}] 전처리 실패로 배치 추론에서 제외됨: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

predict_batch()predict()와 다른 정제 규칙을 사용합니다.

predict()_prepare_dataframe()에서 empty/missing-feature/inf/nan 처리까지 맞춰두는데, predict_batch()get_standard_training_data()를 직접 호출하고 예외를 통째로 삼킵니다. 같은 DF라도 단건은 성공하고 배치는 스킵되거나, NaN이 그대로 모델로 들어갈 수 있으니 여기서도 공통 전처리 helper를 재사용하는 쪽이 안전합니다.

🔧 제안 수정
         for ticker, df in ticker_data_map.items():
             try:
-                # dataset_builder의 다형성을 활용하여 전처리 (에러 발생 종목은 스킵)
-                prepared = get_standard_training_data(df.copy())
+                prepared = self._prepare_dataframe(df)
                 feature_frame = prepared[self.feature_columns]
@@
-            except Exception as e:
+            except ValueError as e:
                 # 특정 종목 데이터 불량 시 전체 파이프라인이 멈추지 않도록 예외 처리
                 print(f"⚠️ [{ticker}] 전처리 실패로 배치 추론에서 제외됨: {e}")
🧰 Tools
🪛 Ruff (0.15.5)

[warning] 238-238: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/wrapper.py` around lines 219 - 240,
predict_batch() currently calls get_standard_training_data() directly and
swallows exceptions, which can let NaNs/inf or differing cleaning rules slip
into batch inputs; update predict_batch() to reuse the same preprocessing helper
used by predict() (e.g., call self._prepare_dataframe(df) or a shared clean
function instead of get_standard_training_data()), then select
self.feature_columns, check length vs self.seq_len, apply self.scaler.transform,
and only append to tensor_list/valid_tickers when the cleaned feature_frame has
no NaN/inf; also preserve/bubble relevant error info instead of silently
swallowing it so problematic tickers are logged clearly (reference: methods
predict_batch(), predict(), _prepare_dataframe(), get_standard_training_data(),
and attributes feature_columns, seq_len, scaler, tensor_list, valid_tickers).

@twq110 twq110 merged commit 41abbe4 into main Mar 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants