Skip to content
Snippets Groups Projects
Commit 9097d6d7 authored by Guy Jacob's avatar Guy Jacob
Browse files

NCF: Add missing script to get dataset and update README

parent f312e092
No related branches found
No related tags found
No related merge requests found
...@@ -25,8 +25,6 @@ The model trains on binary information about whether or not a user interacted wi ...@@ -25,8 +25,6 @@ The model trains on binary information about whether or not a user interacted wi
## Setup ## Setup
### Steps to configure machine
* Install `unzip` and `curl` * Install `unzip` and `curl`
```bash ```bash
...@@ -41,14 +39,21 @@ The model trains on binary information about whether or not a user interacted wi ...@@ -41,14 +39,21 @@ The model trains on binary information about whether or not a user interacted wi
pip install -e . pip install -e .
``` ```
* Download and verify data * Obtain the ml-20m dataset
```bash ```bash
cd <distiller-repo-root>/examples/ncf cd <distiller-repo-root>/examples/ncf
# Creates ml-20.zip # Creates ml-20.zip
source ../download_dataset.sh source download_dataset.sh
# Confirms the MD5 checksum of ml-20.zip # Confirms the MD5 checksum of ml-20.zip
source ../verify_dataset.sh source verify_dataset.sh
# Extracts the dataset into a sub-directory named 'ml-20m'
# During the last step the script might appear to hang,
# This is normal, it finishes after a few minutes
source extract_dataset.sh
``` ```
## Running the Sample ## Running the Sample
......
#!/bin/bash
echo "unzip ml-20m.zip"
if unzip -u ml-20m.zip
then
echo "Start processing ml-20m/ratings.csv"
python convert.py ml-20m/ratings.csv ml-20m --negatives 999
else
echo "Problem unzipping ml-20.zip"
echo "Please run 'download_data.sh && verify_datset.sh' first"
fi
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment