Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,79 @@ Ciężki trening jest odseparowany od CI, ponieważ wymaga danych z Git LFS,
więcej czasu i pamięci oraz tworzy duże modele. Z tego samego powodu AutoML nie
jest uruchamiany przy każdym Pull Requeście i pozostaje ręczną opcją treningu.

## Serwowanie modelu

Kod współdzielony między interfejsami serwującymi (Streamlit i FastAPI) znajduje
się w `src/credit_scoring/serving/`:

```text
src/credit_scoring/serving/schema.py Definicje cech i słowniki kodowania
src/credit_scoring/serving/inference.py Wczytywanie modelu, budowa wektora cech, predykcja
src/credit_scoring/serving/prediction_logger.py Logowanie predykcji + weryfikacja co 10. predykcję
src/credit_scoring/serving/api.py Aplikacja FastAPI
```

### Aplikacja Streamlit

```powershell
streamlit run app.py
```

### API FastAPI

```powershell
uvicorn credit_scoring.serving.api:app --reload --app-dir src
```

Dokumentacja interaktywna (Swagger UI): http://127.0.0.1:8000/docs

Najważniejsze endpointy:

| Metoda | Endpoint | Opis |
|--------|------------------------|---------------------------------------------------|
| GET | `/health` | Status API i informacja, czy model jest wczytany |
| GET | `/model/metrics` | Zawartość `data/08_reporting/metrics.json` |
| POST | `/predict` | Predykcja dla jednego klienta |
| POST | `/predict/batch` | Predykcja dla listy klientów |
| GET | `/predictions/stats` | Licznik predykcji i wynik ostatniej weryfikacji |

Przykład żądania `curl`:

```bash
curl -X POST http://127.0.0.1:8000/predict \
-H "Content-Type: application/json" \
-d '{
"age": 35, "occupation": "Engineer", "annual_income": 50000,
"monthly_salary": 4000, "monthly_balance": 300, "amount_invested": 100,
"num_bank_accounts": 4, "num_credit_card": 4, "num_of_loan": 2,
"interest_rate": 12, "outstanding_debt": 1200, "credit_utilization": 32,
"total_emi": 100, "credit_mix": "Standard", "credit_history_age_months": 120,
"delay_from_due_date": 10, "num_delayed_payment": 5, "changed_credit_limit": 8,
"num_credit_inquiries": 4, "payment_min": "No", "loan_types": ["Personal Loan"],
"payment_behaviour": "Low_spent_Small_value_payments"
}'
```

### Logowanie predykcji i weryfikacja co 10. predykcję

Każda predykcja — niezależnie czy wykonana ze Streamlit, czy z FastAPI —
jest logowana przez `PredictionLogger` do:

```text
data/09_predictions/predictions_log.jsonl jedna linia JSON na predykcję
data/09_predictions/verification_log.jsonl jedna linia JSON co 10. predykcję
```

Co 10. predykcję (`verify_every=10`) automatycznie wykonywana jest weryfikacja
ostatniej partii: sprawdzenie zgodności wektora cech ze schematem modelu, brak
wartości NaN/Inf, sumowanie się prawdopodobieństw do 1.0, średnia ufność modelu
oraz rozkład przewidzianych klas. Wynik (`OK`/`WARNING`) jest zapisywany do
`verification_log.jsonl` oraz logowany przez standardowy moduł `logging`
(logger `credit_scoring.predictions`). Licznik predykcji jest odtwarzany z
liczby linii już zapisanych w pliku logu, dzięki czemu przetrwa restart
aplikacji. Pliki `*.jsonl` w `data/09_predictions/` nie są commitowane do
repozytorium.

## Do zrobienia

Kolejne etapy projektu:
Expand Down
188 changes: 65 additions & 123 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
from __future__ import annotations

import json
import pickle
import sys
import uuid
from pathlib import Path

import numpy as np
Expand All @@ -11,105 +11,40 @@

# Ścieżki projektu
PROJECT_ROOT = Path(__file__).resolve().parent
MODEL_PATH = PROJECT_ROOT / "data" / "06_models" / "baseline_random_forest.pkl"
METRICS_PATH = PROJECT_ROOT / "data" / "08_reporting" / "metrics.json"


sys.path.insert(0, str(PROJECT_ROOT / "src"))

# Cechy modelu, wczytywanie modelu i logowanie predykcji
# wspólne dla Streamlit i API FastAPI
from credit_scoring.serving import inference # noqa: E402
from credit_scoring.serving.prediction_logger import prediction_logger # noqa: E402
from credit_scoring.serving.schema import ( # noqa: E402
CREDIT_MIX_MAP,
LOAN_TYPES,
MODEL_FEATURES,
OCCUPATIONS,
PAYMENT_BEHAVIOURS,
PAYMENT_MIN_MAP,
TARGET_LABELS,
TARGET_PL,
)

# Definicje cech
TARGET_LABELS = {0: "Poor", 1: "Standard", 2: "Good"}
TARGET_PL = {0: "Niska (Poor)", 1: "Średnia (Standard)", 2: "Dobra (Good)"}
MODEL_PATH = inference.MODEL_PATH
DEFAULT_COLOR = {0: "#35A4E5", 1: "#8378FF"}
TARGET_COLOR = {0: "#e53935", 1: "#fb8c00", 2: "#43a047"}

LOAN_TYPES = [
"Auto Loan",
"Credit-Builder Loan",
"Debt Consolidation Loan",
"Home Equity Loan",
"Mortgage Loan",
"Not Specified",
"Payday Loan",
"Personal Loan",
"Student Loan",
]

PAYMENT_BEHAVIOURS = [
"High_spent_Large_value_payments",
"High_spent_Medium_value_payments",
"High_spent_Small_value_payments",
"Low_spent_Large_value_payments",
"Low_spent_Medium_value_payments",
"Low_spent_Small_value_payments",
]

OCCUPATIONS = [
"Architect",
"Developer",
"Doctor",
"Engineer",
"Entrepreneur",
"Journalist",
"Lawyer",
"Manager",
"Mechanic",
"Media_Manager",
"Musician",
"Scientist",
"Teacher",
"Writer",
]

CREDIT_MIX_MAP = {"Bad": 0, "Standard": 1, "Good": 2}
PAYMENT_MIN_MAP = {"No": 0, "Yes": 1}

# Kolejność cech wymagana przez model — identyczna jak przy treningu.
MODEL_FEATURES = [
"Age",
"Annual_Income",
"Monthly_Inhand_Salary",
"Num_Bank_Accounts",
"Num_Credit_Card",
"Interest_Rate",
"Num_of_Loan",
"Delay_from_due_date",
"Num_of_Delayed_Payment",
"Changed_Credit_Limit",
"Num_Credit_Inquiries",
"Credit_Mix",
"Outstanding_Debt",
"Credit_Utilization_Ratio",
"Credit_History_Age",
"Payment_of_Min_Amount",
"Total_EMI_per_month",
"Amount_invested_monthly",
"Monthly_Balance",
*[f"LoanType_{lt}" for lt in LOAN_TYPES],
*[f"PayBeh_{pb}" for pb in PAYMENT_BEHAVIOURS],
*[f"Occupation_{oc}" for oc in OCCUPATIONS],
]


# Ładowanie modelu i metryk
def _is_lfs_pointer(path: Path) -> bool:
"""Wykrywa, czy plik to wskaźnik Git LFS, a nie faktyczny model."""
try:
if path.stat().st_size > 5000: # ograniczenie pkl (setki MB)
return False
with open(path, "rb") as handle:
head = handle.read(200)
return b"git-lfs" in head
except OSError:
return False
return inference.is_lfs_pointer(path)


@st.cache_resource(show_spinner="Wczytywanie modelu...")
def load_model(path_str: str):
"""Wczytuje wytrenowany model z pliku pickle (cache na czas sesji)."""
with open(path_str, "rb") as handle:
return pickle.load(handle)
return inference.load_model(path_str)


@st.cache_data(show_spinner=False)
Expand All @@ -121,51 +56,56 @@ def load_metrics(path_str: str) -> dict:
return {}


# Budowa wektora cech z formularza
def build_feature_row(inputs: dict) -> pd.DataFrame:
"""Tworzy pojedynczy wiersz cech w kolejności wymaganej przez model."""
row = {feature: 0 for feature in MODEL_FEATURES}

# Cechy numeryczne i porządkowe
row["Age"] = inputs["age"]
row["Annual_Income"] = inputs["annual_income"]
row["Monthly_Inhand_Salary"] = inputs["monthly_salary"]
row["Num_Bank_Accounts"] = inputs["num_bank_accounts"]
row["Num_Credit_Card"] = inputs["num_credit_card"]
row["Interest_Rate"] = inputs["interest_rate"]
row["Num_of_Loan"] = inputs["num_of_loan"]
row["Delay_from_due_date"] = inputs["delay_from_due_date"]
row["Num_of_Delayed_Payment"] = inputs["num_delayed_payment"]
row["Changed_Credit_Limit"] = inputs["changed_credit_limit"]
row["Num_Credit_Inquiries"] = inputs["num_credit_inquiries"]
row["Credit_Mix"] = CREDIT_MIX_MAP[inputs["credit_mix"]]
row["Outstanding_Debt"] = inputs["outstanding_debt"]
row["Credit_Utilization_Ratio"] = inputs["credit_utilization"]
row["Credit_History_Age"] = inputs["credit_history_age_months"]
row["Payment_of_Min_Amount"] = PAYMENT_MIN_MAP[inputs["payment_min"]]
row["Total_EMI_per_month"] = inputs["total_emi"]
row["Amount_invested_monthly"] = inputs["amount_invested"]
row["Monthly_Balance"] = inputs["monthly_balance"]

# Typy kredytów (multiselect -> one-hot)
for loan_type in inputs["loan_types"]:
row[f"LoanType_{loan_type}"] = 1

# Zachowanie płatnicze (one-hot)
row[f"PayBeh_{inputs['payment_behaviour']}"] = 1

# Zawód (one-hot)
if inputs["occupation"] in OCCUPATIONS:
row[f"Occupation_{inputs['occupation']}"] = 1

return pd.DataFrame([row])[MODEL_FEATURES]
"""Tworzy pojedynczy wiersz cech w kolejności wymaganej przez model.
Nakładka na `credit_scoring.serving.inference.build_feature_row`
"""
return inference.build_feature_row(inputs)


# Predykcja
def predict(model, features: pd.DataFrame) -> tuple[np.ndarray, np.ndarray | None]:
preds = model.predict(features)
proba = model.predict_proba(features) if hasattr(model, "predict_proba") else None
return preds, proba
return inference.predict(model, features)


def log_predictions_and_show_verification(
model,
features_df: pd.DataFrame,
preds: np.ndarray,
proba: np.ndarray | None,
source: str,
) -> None:
"""Logowanie predykcji i weryfikacja co 10. predykcję"""
classes = getattr(model, "classes_", [0, 1, 2])
last_record = None

for row_idx in range(len(features_df)):
probabilities = None
if proba is not None:
probabilities = {
TARGET_LABELS.get(int(c), str(c)): float(proba[row_idx][idx])
for idx, c in enumerate(classes)
}
last_record = prediction_logger.log_prediction(
features=features_df.iloc[row_idx].to_dict(),
predicted_class=int(preds[row_idx]),
probabilities=probabilities,
source=source,
request_id=str(uuid.uuid4()),
)

if last_record and "verification" in last_record:
verification = last_record["verification"]
avg_conf = verification["avg_confidence"]
summary = (
f"Zweryfikowano ostatnie {verification['n_checked']} predykcji "
f"(do #{verification['batch_end_index']}) — status: **{verification['status']}**"
+ (f", śr. ufność modelu: {avg_conf:.2f}." if avg_conf is not None else ".")
)
if verification["status"] == "OK":
st.success(f"🔍 {summary}")
else:
st.warning(f"🔍 {summary} Uwagi: " + "; ".join(verification["issues"]))


# UI
Expand Down Expand Up @@ -461,6 +401,7 @@ def render_result(pred: int, proba: np.ndarray | None) -> None:
}
features = build_feature_row(inputs)
preds, proba = predict(model, features)
log_predictions_and_show_verification(model, features, preds, proba, source="streamlit_single")
st.divider()
render_result(int(preds[0]), proba)

Expand Down Expand Up @@ -529,6 +470,7 @@ def render_result(pred: int, proba: np.ndarray | None) -> None:

if st.button("Przewiduj dla całego pliku", type="primary"):
preds, proba = predict(model, model_input)
log_predictions_and_show_verification(model, model_input, preds, proba, source="streamlit_batch")
result = raw_df.copy()
result["Predykcja"] = [TARGET_LABELS.get(int(p), str(p)) for p in preds]
if proba is not None:
Expand Down
Empty file added data/09_predictions/.gitkeep
Empty file.
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,7 @@ missingno>=0.5
streamlit>=1.38
mlflow>=2.14

fastapi>=0.115
uvicorn[standard]>=0.32

autogluon.tabular[lightgbm,catboost,xgboost,fastai,ray]==1.5.0
11 changes: 11 additions & 0 deletions src/credit_scoring/serving/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""Warstwa serwowania modelu

Ten pakiet zawiera kod współdzielony między interfejsem Streamlit (``app.py``)
oraz API REST (``credit_scoring.serving.api``):

- :mod:`credit_scoring.serving.schema` — definicje cech i słowniki kodowania,
- :mod:`credit_scoring.serving.inference` — wczytywanie modelu i predykcja,
- :mod:`credit_scoring.serving.prediction_logger` — logowanie predykcji
oraz weryfikacja co N-tą predykcję,
- :mod:`credit_scoring.serving.api` — aplikacja FastAPI.
"""
Loading
Loading