#Building & Evaluating Your First ML Model¶

We'll build a classification model to predict whether a student passes or fails based on study hours and previous scores.

The ML Workflow¶

1. Load & explore data
2. Clean & preprocess
3. Split into train/test sets
4. Choose & train a model
5. Evaluate performance
6. Improve (tune, feature engineer)
7. Deploy

Full Example — Student Pass/Fail Classifier¶

python

32 lines

1import pandas as pd
2import numpy as np
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler
5from sklearn.linear_model import LogisticRegression
6from sklearn.metrics import classification_report, confusion_matrix
7
8# 1. Load data
9df = pd.read_csv("students.csv")
10
11# 2. Features & target
12X = df`[["study_hours", "prev_score", "attendance_pct"]]`
13y = df["passed"]   # 1 = pass, 0 = fail
14
15# 3. Train/Test split (80% / 20%)
16X_train, X_test, y_train, y_test = train_test_split(
17    X, y, test_size=0.2, random_state=42
18)
19
20# 4. Scale features (important for Logistic Regression)
21scaler  = StandardScaler()
22X_train = scaler.fit_transform(X_train)
23X_test  = scaler.transform(X_test)     # transform only — don't fit on test!
24
25# 5. Train model
26model = LogisticRegression()
27model.fit(X_train, y_train)
28
29# 6. Evaluate
30y_pred = model.predict(X_test)
31print(classification_report(y_test, y_pred))
32print(confusion_matrix(y_test, y_pred))

Understanding the Metrics¶

Metric	What it means
Accuracy	Overall % correct predictions
Precision	Of predicted positives, how many are actually positive
Recall	Of actual positives, how many did we predict correctly
F1 Score	Harmonic mean of precision and recall

Common Pitfalls¶

Data leakage — fitting the scaler on the full dataset (not just training data)
Class imbalance — 95% pass rate → model predicts "pass" every time and gets 95% accuracy
Overfitting — great training accuracy, poor test accuracy
Underfitting — model is too simple to capture the pattern

Next step: Try RandomForestClassifier and XGBClassifier on the same dataset and compare the metrics.

Full Example — Student Pass/Fail Classifier¶

python

32 lines

1import pandas as pd
2import numpy as np
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler
5from sklearn.linear_model import LogisticRegression
6from sklearn.metrics import classification_report, confusion_matrix
7
8# 1. Load data
9df = pd.read_csv("students.csv")
10
11# 2. Features & target
12X = df`[["study_hours", "prev_score", "attendance_pct"]]`
13y = df["passed"]   # 1 = pass, 0 = fail
14
15# 3. Train/Test split (80% / 20%)
16X_train, X_test, y_train, y_test = train_test_split(
17    X, y, test_size=0.2, random_state=42
18)
19
20# 4. Scale features (important for Logistic Regression)
21scaler  = StandardScaler()
22X_train = scaler.fit_transform(X_train)
23X_test  = scaler.transform(X_test)     # transform only — don't fit on test!
24
25# 5. Train model
26model = LogisticRegression()
27model.fit(X_train, y_train)
28
29# 6. Evaluate
30y_pred = model.predict(X_test)
31print(classification_report(y_test, y_pred))
32print(confusion_matrix(y_test, y_pred))

Metric

What it means

Accuracy

Overall % correct predictions

Precision

Of predicted positives, how many are actually positive

Recall

Of actual positives, how many did we predict correctly

F1 Score

Harmonic mean of precision and recall

Common Pitfalls¶

Data leakage — fitting the scaler on the full dataset (not just training data)

Class imbalance — 95% pass rate → model predicts "pass" every time and gets 95% accuracy

Overfitting — great training accuracy, poor test accuracy

Underfitting — model is too simple to capture the pattern

Next step: Try RandomForestClassifier and XGBClassifier on the same dataset and compare the metrics.

Building & Evaluating Your First Model

#Building & Evaluating Your First ML Model¶

The ML Workflow¶

Full Example — Student Pass/Fail Classifier¶

Understanding the Metrics¶

Common Pitfalls¶

Building & Evaluating Your First Model

#Building & Evaluating Your First ML Model¶

The ML Workflow¶

Full Example — Student Pass/Fail Classifier¶

Understanding the Metrics¶

Common Pitfalls¶