We'll build a classification model to predict whether a student passes or fails based on study hours and previous scores.
1. Load & explore data
2. Clean & preprocess
3. Split into train/test sets
4. Choose & train a model
5. Evaluate performance
6. Improve (tune, feature engineer)
7. Deploy
1import pandas as pd
2import numpy as np
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler
5from sklearn.linear_model import LogisticRegression
6from sklearn.metrics import classification_report, confusion_matrix
7
8# 1. Load data
9df = pd.read_csv("students.csv")
10
11# 2. Features & target
12X = df`[["study_hours", "prev_score", "attendance_pct"]]`
13y = df["passed"] # 1 = pass, 0 = fail
14
15# 3. Train/Test split (80% / 20%)
16X_train, X_test, y_train, y_test = train_test_split(
17 X, y, test_size=0.2, random_state=42
18)
19
20# 4. Scale features (important for Logistic Regression)
21scaler = StandardScaler()
22X_train = scaler.fit_transform(X_train)
23X_test = scaler.transform(X_test) # transform only — don't fit on test!
24
25# 5. Train model
26model = LogisticRegression()
27model.fit(X_train, y_train)
28
29# 6. Evaluate
30y_pred = model.predict(X_test)
31print(classification_report(y_test, y_pred))
32print(confusion_matrix(y_test, y_pred))| Metric | What it means |
|---|---|
| Accuracy | Overall % correct predictions |
| Precision | Of predicted positives, how many are actually positive |
| Recall | Of actual positives, how many did we predict correctly |
| F1 Score | Harmonic mean of precision and recall |
Next step: Try
RandomForestClassifierandXGBClassifieron the same dataset and compare the metrics.