
The User Interface & Ground-Truth Testing
Part 5 of 5 | ← Part 4 | Complete Series
Streamlit Overview
Streamlit lets you build data apps in pure Python—no HTML, CSS, or JavaScript needed.
import streamlit as st
from datetime import datetime
import pandas as pd
import requests
# Page config
st.set_page_config(
page_title="IPL AI Assistant",
page_icon="🏏",
layout="wide",
)
# Title
st.title("🏏 IPL AI Assistant")
st.subtitle("Predictions + Q&A Powered by ML")
# Tabs
tab1, tab2, tab3 = st.tabs(["💬 Chat", "🎯 Predict", "📊 Metrics"])
Enter fullscreen mode Exit fullscreen mode
Tab 1: Chat Interface
with tab1:
st.header("Ask Anything")
# Initialize session state
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# User input
if user_input := st.chat_input("Ask about IPL..."):
st.session_state.messages.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(user_input)
# Call backend
try:
response = requests.post(
"http://localhost:8000/chat",
json={"message": user_input},
timeout=5,
)
response.raise_for_status()
result = response.json()
assistant_message = result.get("message", "I couldn't understand that.")
st.session_state.messages.append({
"role": "assistant",
"content": assistant_message,
})
with st.chat_message("assistant"):
st.markdown(assistant_message)
except Exception as e:
st.error(f"❌ Backend error: {str(e)}")
Enter fullscreen mode Exit fullscreen mode
Key Concepts:
-
Session State:
st.session_state.messagespersists across reruns- When user submits a message, Streamlit reruns the entire script
- Session state preserves conversation history
- Without it: chat history disappears on each input
-
Chat Message:
st.chat_message()renders messages with role-based styling- "user" = right-aligned, blue background
- "assistant" = left-aligned, gray background
-
Chat Input:
st.chat_input()provides a textbox with submission handling- Returns None until user submits
- Automatically clears after submission
Tab 2: Prediction Interface
with tab2:
st.header("Match Prediction Simulator")
col1, col2 = st.columns(2)
with col1:
st.subheader("Teams")
batting_team = st.selectbox(
"Batting Team:",
[
"Mumbai Indians",
"Chennai Super Kings",
"Royal Challengers Bangalore",
"Kolkata Knight Riders",
# ... all 10 teams
],
)
bowling_team = st.selectbox(
"Bowling Team:",
options=["All teams except batting team"],
index=0,
)
with col2:
st.subheader("Venue")
venue = st.text_input("Ground name:", "Wankhede")
st.subheader("Pre-Match Form")
col1, col2, col3, col4 = st.columns(4)
with col1:
h2h_rate = st.slider(
"H2H Win Rate (Batting Team)",
0.0, 1.0, 0.5, 0.05,
)
with col2:
overall_rate = st.slider(
"Overall Win Rate",
0.0, 1.0, 0.5, 0.05,
)
with col3:
venue_rate = st.slider(
"Venue Win Rate",
0.0, 1.0, 0.5, 0.05,
)
with col4:
rolling_rate = st.slider(
"Last 5 Matches Win Rate",
0.0, 1.0, 0.5, 0.05,
)
st.subheader("Toss")
col1, col2 = st.columns(2)
with col1:
toss_win = st.radio("Who won toss?", ["Batting Team", "Bowling Team"])
toss_win = 1 if toss_win == "Batting Team" else 0
with col2:
toss_choice = st.radio("Toss choice?", ["Bat", "Field"])
toss_choice = toss_choice.lower()
# Predict button
if st.button("🎯 Predict Match Outcome", use_container_width=True):
try:
response = requests.post(
"http://localhost:8000/predict",
json={
"batting_team": batting_team,
"bowling_team": bowling_team,
"venue": venue,
"h2h_rate": h2h_rate,
"overall_rate": overall_rate,
"venue_rate": venue_rate,
"rolling_rate": rolling_rate,
"toss_win": toss_win,
"toss_choice": toss_choice,
},
timeout=5,
)
response.raise_for_status()
result = response.json()
winner = result["winner"]
confidence = result["confidence"]
st.success(
f"### 🏆 {winner} wins!\n"
f"**Confidence:** {confidence:.1%}"
)
# Show prediction breakdown
st.info(
f"**Model:** {result['model']}\n\n"
f"**Reasoning:**\n"
f"- H2H: {h2h_rate:.0%}\n"
f"- Form: {rolling_rate:.0%}\n"
f"- Venue: {venue_rate:.0%}\n"
)
except Exception as e:
st.error(f"❌ Prediction failed: {str(e)}")
Enter fullscreen mode Exit fullscreen mode
Key UI Patterns:
- st.selectbox() — Dropdown selector
- st.slider() — Range input (0.0-1.0)
- st.radio() — Single-choice radio buttons
- st.columns() — Grid layout (col1, col2, etc.)
- st.button() — Form submission
- st.success/error/info() — Colored alerts
Tab 3: Metrics & Transparency
with tab3:
st.header("Model Performance")
# Load metrics
import json
with open("models/metrics.json") as f:
metrics = json.load(f)
col1, col2, col3 = st.columns(3)
col1.metric("Test Accuracy", f"{metrics['accuracy']:.1%}")
col2.metric("Precision", f"{metrics['precision']:.1%}")
col3.metric("Recall", f"{metrics['recall']:.1%}")
st.subheader("Confusion Matrix")
st.image("models/confusion_matrix.png", use_column_width=True)
st.subheader("Feature Importance")
importance_df = pd.DataFrame({
"Feature": ["h2h_rate", "rolling_rate", "venue_rate", ...],
"Importance": [0.32, 0.28, 0.18, ...],
}).sort_values("Importance", ascending=False)
st.bar_chart(importance_df.set_index("Feature"))
st.subheader("Q&A Engine")
st.info(
f"**Total Q&A Pairs:** 42,523\n\n"
f"**Vocabulary Size:** 18,394\n\n"
f"**Match Strategy:** TF-IDF + Cosine Similarity (threshold: 0.15)\n\n"
f"**Coverage:** {(42523 / 50000 * 100):.1f}% of expected cricket topics"
)
Enter fullscreen mode Exit fullscreen mode
Testing: The Foundation of Trust
Good tests = confidence in deployment.
Test Structure
# tests/test_qa.py
import pytest
import pandas as pd
from src.build_qa_model import answer_question
from joblib import load
# Load test data
test_df = pd.read_csv("datasets/ipl_2008_2024_complete.csv")
qa_model = load("models/qa_model.joblib")
# Extract Q&A components
tfidf = qa_model["tfidf"]
Q_matrix = qa_model["Q_matrix"]
answers = qa_model["answers"]
Enter fullscreen mode Exit fullscreen mode
Test 1: Specific Match Facts
def test_match_lookup():
"""Can we answer specific match questions?"""
questions = [
"Who won the match on 2024-04-01 between MI and RR?",
"How many runs were scored by MI in 2024-04-01?",
"What was the result of MI vs RR on 2024-04-01?",
]
for question in questions:
answer, score = answer_question(
question, tfidf, Q_matrix, answers, threshold=0.15
)
assert answer is not None, f"Failed on: {question}"
assert len(answer) > 10, f"Answer too short: {answer}"
assert score > 0.15, f"Confidence too low: {score}"
Enter fullscreen mode Exit fullscreen mode
Why data-driven?
- Not hardcoded:
"answer == 'Mumbai Indians wins'" - CSV-based: Pulls real facts from dataset
- Robust: Works even if answer phrasing changes
Test 2: Aggregate Statistics
def test_most_wins():
"""Can we retrieve aggregate stats?"""
questions = [
"Which team has won most matches?",
"Who has most IPL titles?",
"Team with highest win percentage?",
]
for question in questions:
answer, score = answer_question(
question, tfidf, Q_matrix, answers, threshold=0.15
)
# Verify answer is a valid team name
assert answer is not None
valid_teams = ["Mumbai Indians", "CSK", "RCB", ...]
assert any(team in answer for team in valid_teams)
Enter fullscreen mode Exit fullscreen mode
Test 3: Head-to-Head
def test_head_to_head():
"""Can we answer H2H questions?"""
questions = [
"Head-to-head record between MI and CSK?",
"Does MI have winning record vs KKR?",
"Who dominates MI vs RR?",
]
for question in questions:
answer, score = answer_question(
question, tfidf, Q_matrix, answers, threshold=0.15
)
assert answer is not None
# H2H answers contain numbers (win counts)
assert any(char.isdigit() for char in answer)
Enter fullscreen mode Exit fullscreen mode
Test 4: Threshold Behavior
def test_threshold_protects_low_confidence():
"""Low-confidence matches are rejected."""
nonsense = "xyzabc qwerty asdfgh" # Gibberish
answer, score = answer_question(
nonsense, tfidf, Q_matrix, answers, threshold=0.15
)
# Model shouldn't hallucinate
assert answer is None, f"Got answer for nonsense: {answer}"
assert score < 0.15
Enter fullscreen mode Exit fullscreen mode
Test 5: ML Model Predictions
def test_model_predictions():
"""Can we predict match winners?"""
from src.train import normalize_teams, engineer_features
from joblib import load
ml_model = load("models/model.joblib")
pipeline = ml_model["pipeline"]
# Create a test case
test_row = test_df.iloc[0].copy() # Use real match
test_row["date"] = "2023-05-01" # Test on recent data
# Engineer features
features_df = engineer_features(
test_df[test_df["date"] < "2023-01-01"], # Historical
test_row,
)
# Predict
prediction = pipeline.predict(features_df)
prob = pipeline.predict_proba(features_df)
assert prediction[0] in [0, 1] # Binary classification
assert 0 <= max(prob[0]) <= 1 # Valid probability
assert abs(sum(prob[0]) - 1.0) < 0.01 # Probabilities sum to 1
Enter fullscreen mode Exit fullscreen mode
Test 6: Feature Sanity
def test_feature_ranges():
"""Are features in valid ranges?"""
from src.features import engineer_features
# Get one match
test_match = test_df.iloc[0]
# Engineer
features = engineer_features(test_df, test_match)
# Check rates (should be 0-1)
assert 0 <= features["h2h_rate"].values[0] <= 1
assert 0 <= features["overall_rate"].values[0] <= 1
assert 0 <= features["venue_rate"].values[0] <= 1
assert 0 <= features["rolling_rate"].values[0] <= 1
# Check binary fields
assert features["toss_win"].values[0] in [0, 1]
Enter fullscreen mode Exit fullscreen mode
Test 7: No Data Leakage
def test_future_data_not_used():
"""Ensure no future data affects past predictions."""
from src.features import engineer_features
# Engineer a 2023 match using only 2022 history
before_date = "2023-01-01"
hist_data = test_df[test_df["date"] < before_date]
test_match = test_df[
(test_df["date"] >= before_date) &
(test_df["date"] < "2023-02-01")
].iloc[0]
features = engineer_features(hist_data, test_match)
# Features should only use hist_data, not test_match
# (This is enforced in engine_features with before_date guards)
assert features is not None
# Verify: no 2023 data leaked into 2022 rates
assert all(hist_data["date"] < before_date)
Enter fullscreen mode Exit fullscreen mode
Running Tests
# Install test dependencies
pip install pytest
# Run all tests
pytest tests/
# Run with verbose output
pytest tests/ -v
# Run specific test
pytest tests/test_qa.py::test_match_lookup -v
# Coverage report
pytest tests/ --cov=src --cov-report=html
Enter fullscreen mode Exit fullscreen mode
Deployment
Option 1: Streamlit Cloud
# Push to GitHub
git push origin main
# Link in Streamlit Cloud (streamlit.io/cloud)
# - Select repository
# - Select app.py
# - Auto-deploys on push
Enter fullscreen mode Exit fullscreen mode
Live in 2 minutes, updates automatically.
Option 2: Docker
FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501"]
Enter fullscreen mode Exit fullscreen mode
docker build -t ipl-app .
docker run -p 8501:8501 ipl-app
Enter fullscreen mode Exit fullscreen mode
Visit http://localhost:8501
Option 3: Heroku / Railway
# Deploy with one command
heroku create ipl-ai-assistant
git push heroku main
Enter fullscreen mode Exit fullscreen mode
App runs at ipl-ai-assistant.herokuapp.com
Performance Monitoring
# Log predictions to dataframe
import logging
from datetime import datetime
logging.basicConfig(filename="predictions.log", level=logging.INFO)
@st.cache_data
def get_prediction_logs():
return pd.read_csv("predictions.log")
# Dashboard
st.line_chart(
get_prediction_logs()
.groupby("hour")["confidence"]
.mean()
)
Enter fullscreen mode Exit fullscreen mode
Track:
- Average confidence over time
- Common queries
- Backend latency
Common Issues & Solutions
Issue
Solution
"Connection refused"
Backend not running on localhost:8000
Chat history disappears
Use st.session_state, not regular variables
Predictions slow
Enable model caching, use lazy loading
Tests fail on new data
Read from CSV, not hardcoded values
Threshold too strict
Lower from 0.15 to 0.10 for more results
Summary: The Complete System
Component
Purpose
Technology
Feature Engineering
Calculate pre-match metrics
pandas, numpy
ML Model
Predict match winners
scikit-learn
Q&A Engine
Answer cricket questions
TF-IDF, cosine similarity
FastAPI Backend
Intelligent routing, lazy loading
FastAPI, uvicorn
Streamlit Frontend
Chat, prediction, metrics
Streamlit
Testing
Verify correctness
pytest, CSV-based assertions
You now have:
✅ Production-ready predictions (61.8% accuracy)
✅ Intelligent Q&A with 42K learning pairs
✅ Low-latency API (<20ms per request)
✅ Smooth UI with session persistence
✅ 22 tests ensuring reliability
✅ Multiple deployment options
What's Next?
Deploy this and message me your results! 🚀
Repository: https://github.com/jayakrishnayadav24/ipl-ai-assistant
Series Complete 🎉
← Part 4: Backend Routing | ← Return to Series
Built with 💚 for cricket fans. Questions? DM me!
Source: Dev.to


