Deep Dive Project: Predicting UIUC Course GPA Distribution
Team Members
- Xiaotian Zhao (xzhao87)
- Qi Cui (qicui3)
- Yuhui Lai (yuhuil3)
- Jiadong Gui (jgui3)
Deliverable
The repo is hosted on Gitlab and Google Drive.
Project Overview
This project aims to leverage historical GPA data of University of Illinois at Urbana-Champaign (UIUC) courses to predict GPA or grade distribution for future courses. This could provide insights for faculty, administrators, and students by identifying potential performance outcomes based on historical patterns.
Problem Statement
The primary goal is to analyze historical data on course GPAs to develop a model capable of predicting the GPA distribution of a course or class section in the future. This prediction could help stakeholders, such as:
- Faculty interested in course structure adjustments,
- Administrators in making informed faculty hiring decisions,
- Students aiming to register for courses with an understanding of their expected performance.
Given the availability of past course grades, class size, term information, and faculty details, this project will explore relationships among these factors to model grade distributions.
Dataset
We are utilizing the UIUC GPA dataset, available at the GPA Dataset Repository. This dataset includes information on various UIUC courses, including:
- Overall GPA per class or section,
- Grade distribution per letter grade (A+ to F),
- Course term, year, and size,
- Department, subject, course number, and title,
- Instructor information.
License
This project follows the data license of the original dataset as published by its maintainers on GitHub.
Milestone 1 Objectives
-
Data Extraction:
- A notebook will be created to load and preprocess data from the GPA dataset.
- We will prepare a smaller, debugging dataset for initial code testing to ensure a quick runtime (under 2 minutes).
- The full working dataset will be created for model training (running within approximately 40 minutes).
-
Data Conversion:
- The dataset will be converted to
pandas
DataFrames, enabling efficient manipulation and data processing. - We will convert datetime columns to pandas timestamps, allowing time-based operations.
- The processed data will be saved as a binary
.pkl
file for faster loading in future steps.
- The dataset will be converted to
-
Folder Structure:
- A Google Drive folder has been established to share project files, datasets, and notebooks.
- All project members, TAs, and graders have access.
This README will be updated with additional details and observations as we progress through subsequent milestones.