Skip to content
Snippets Groups Projects
user avatar
Qi Cui authored
f76429ba
History
Name Last commit Last update
Dataset
Docs
Milestones
.gitignore
README.md
code.ipynb

Deep Dive Project: Predicting UIUC Course GPA Distribution

Team Members

  • Xiaotian Zhao (xzhao87)
  • Qi Cui (qicui3)
  • Yuhui Lai (yuhuil3)
  • Jiadong Gui (jgui3)

Deliverable

The repo is hosted on Gitlab and Google Drive.

Project Overview

This project aims to leverage historical GPA data of University of Illinois at Urbana-Champaign (UIUC) courses to predict GPA or grade distribution for future courses. This could provide insights for faculty, administrators, and students by identifying potential performance outcomes based on historical patterns.

Problem Statement

The primary goal is to analyze historical data on course GPAs to develop a model capable of predicting the GPA distribution of a course or class section in the future. This prediction could help stakeholders, such as:

  • Faculty interested in course structure adjustments,
  • Administrators in making informed faculty hiring decisions,
  • Students aiming to register for courses with an understanding of their expected performance.

Given the availability of past course grades, class size, term information, and faculty details, this project will explore relationships among these factors to model grade distributions.

Dataset

We are utilizing the UIUC GPA dataset, available at the GPA Dataset Repository. This dataset includes information on various UIUC courses, including:

  • Overall GPA per class or section,
  • Grade distribution per letter grade (A+ to F),
  • Course term, year, and size,
  • Department, subject, course number, and title,
  • Instructor information.

License

This project follows the data license of the original dataset as published by its maintainers on GitHub.

Milestone 1 Objectives

  1. Data Extraction:

    • A notebook will be created to load and preprocess data from the GPA dataset.
    • We will prepare a smaller, debugging dataset for initial code testing to ensure a quick runtime (under 2 minutes).
    • The full working dataset will be created for model training (running within approximately 40 minutes).
  2. Data Conversion:

    • The dataset will be converted to pandas DataFrames, enabling efficient manipulation and data processing.
    • We will convert datetime columns to pandas timestamps, allowing time-based operations.
    • The processed data will be saved as a binary .pkl file for faster loading in future steps.
  3. Folder Structure:

    • A Google Drive folder has been established to share project files, datasets, and notebooks.
    • All project members, TAs, and graders have access.

This README will be updated with additional details and observations as we progress through subsequent milestones.