MSBA-Capstone-MasterControl-GroupProject

πŸ“Š MSBA Capstone - Group Project Dashboard

MSBA IS 6813 | Spring 2026

Project Status Engine Deploy


πŸ› οΈ Functional Hub & Assignments

Core Tools πŸ› οΈ πŸ“‹ Specs πŸ“Š Data Room πŸ“ Shared Google Drive 🌐 Group Dashboard πŸ’» GitHub Repo
Assignments πŸ“‚ πŸ“„ 01 Business Problem Statement πŸ“‚ 02 EDA πŸ“‚ 03 Modeling πŸ“‚ 04 Presentation 🚫

πŸ“… Deadlines

Phase Milestone Hard Deadline
🟒 Business Problem Statement Jan 28
🟑 EDA Group Notebook Feb 18
βšͺ Modeling Notebook Mar 18
βšͺ Practice Presentation Apr 05
βšͺ Final Sponsor Delivery Apr 08/15
βšͺ Portfolio & Peer Eval Apr 19

βš™οΈ Notebook tips

1. Notebook Standards & The β€œGolden” YAML

Primary Directive: Copy this block exactly into the top of every .qmd file.

---
title: 
subtitle: 
date: "Spring 2026"
format:
  html:
    theme: journal
    toc: true
    toc-depth: 3
    toc-float: true
    number-sections: false
    code-fold: true
    code-tools: true
    df-print: paged
    highlight-style: github
  pdf:
    documentclass: article
    geometry:
      - margin=1in
    toc: true
    number-sections: false
    colorlinks: true
    mainfont: "Arial"
    sansfont: "Arial"
    monofont: "Courier New"
editor: visual
---

2. Standard Setup & Parallel Processing (Copy-Paste)

Rule: Use these blocks to initialize your environment. They include Dynamic Core Selection to maximize performance on any machine without crashing it (N-1 logic).

🟒 R Setup (Tidyverse + Parallel)

# Load Core Packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, here, parallel, doParallel)

# Dynamic Parallel Processing (Detects your hardware)
# Leaves 1 core free for the OS to prevent freezing
num_cores <- parallel::detectCores(logical = FALSE)
cl <- makeCluster(num_cores - 1)
registerDoParallel(cl)

print(paste("Cluster active with", num_cores - 1, "cores."))

πŸ”΅ Python Setup (Pandas + Multiprocessing)

import pandas as pd
import numpy as np
import multiprocessing
from pyprojroot import here

# Dynamic Core Selector
# Use 'n_jobs' in Scikit-Learn models (e.g., n_jobs=n_jobs)
n_jobs = multiprocessing.cpu_count() - 1

print(f"Parallel processing enabled: {n_jobs} cores available.")

⚑ Performance Note: Why Parallel Processing?

Standard R and Python scripts run linearly on a single CPU core, leaving 80-90% of your computer’s power idle. By enabling parallel processing (as shown above), we distribute computations across multiple cores simultaneously.

3. Foolproof Data Loading (Polyglot Paths)

Try not to use absolute paths (e.g., C:/Users/Thomas/...).

For R (using here):

library(here)
# Automatically finds the project root (where .git is)
df <- read.csv(here::here("data", "application_train.csv"))

For Python (using pyprojroot):

from pyprojroot import here
# Automatically finds the project root
path = here("data/application_train.csv")
df = pd.read_csv(path)

🧠 Repository Architecture & Usage Flow

Visual map of how files, data, and code interact within this repository.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ“‚ data/       β”‚       β”‚  πŸ“‚ notebooks/     β”‚      β”‚  πŸ“‚ output/      β”‚
β”‚  (Local Only)   │──────▢│  (Code Execution)  │─────▢│  (Deliverables)  β”‚
β”‚  Raw .csv Files β”‚       β”‚  .qmd Analysis     β”‚      β”‚  .csv / .png     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                           β–²
        β”‚                           β”‚
        └────── (Load via 'here') β”€β”€β”˜

πŸ“‚ Physical Directory Structure

β”œβ”€β”€ data/               # RAW data (Local only - Git ignored)
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 01_Business_Problem/
β”‚   β”œβ”€β”€ 02_EDA/
β”‚   β”œβ”€β”€ 03_Modeling/
β”‚   β”œβ”€β”€ 04_Presentation/
β”‚   └── individual/     # Individual "Sandboxes" for portfolio
β”œβ”€β”€ output/             # Exported .csv results and .png plots
β”œβ”€β”€ docs/               # Meeting notes and sponsor requirements
└── README.md           # This Hub

πŸ“ž Contact Information

Team Member Email (Personal) Email (University) Phone
Thomas Beck thomasscottbeck@gmail.com u0399590@utah.edu +1 (801) 631-2080
Max Ridgeway [TBD] u1230181@utah.edu +1 (801) 597-3824
Astha KC asthakc.us@gmail.com u1561947@utah.edu +1 (971) 500-6757

Note: Before starting any work session, run git pull to sync the latest model changes from the team.