In this tutorial, we'll explain how to install and configure MLflow on Ubuntu 24.04 server. We'll use PostgreSQL, add authentication, install Nginx and SSL.
MLflow is an open-source platform designed to manage the complete machine learning lifecycle, including experimentation, reproducibility, and deployment. It allows you to track experiments, package code into reproducible runs, and deploy models. In this guide, we will walk through how to install and configure MLflow on a VPS or dedicated server running Ubuntu 24.04.
Prerequisites
Before getting started, ensure you have the following:
- A KVM VPS or dedicated server with Ubuntu 24.04.
- A non-root user with sudo privileges.
- Python 3.8 or higher.
- Basic knowledge of Python, virtual environments, and machine learning.
Install and Configure MLflow on Ubuntu 24.04
Step 1: Update and Upgrade Your System
It's always a good practice to start by updating your package lists and upgrading installed packages to their latest versions:
sudo apt update
sudo apt upgrade -y
Step 2: Install Python and Pip
MLflow is a Python-based tool, so you need to install Python and Pip (Python's package installer). On Ubuntu 24.04, you can install these with:
sudo apt install python3 python3-pip -y
Verify the installation:
python3 --version
pip3 --version
Step 3: Set Up a Python Virtual Environment
To avoid conflicts between dependencies for different Python projects, it’s a good idea to use a virtual environment. Install venv:
sudo apt install python3-venv -y
Create a project directory:
mkdir mlflow-project && cd mlflow-project
Create and activate a virtual environment for MLflow:
python3 -m venv mlflow-env
source mlflow-env/bin/activate
Step 4: Install MLflow
With the virtual environment activated, use Pip to install MLflow:
pip install mlflow
Once the installation is complete, you can verify it by checking the MLflow version:
mlflow --version
Step 5: Set Up a Backend Database for MLflow Tracking
MLflow can use different backends to store experiment data. By default, MLflow uses a local SQLite database. However, for production, it’s recommended to use a more robust database such as PostgreSQL or MySQL. To use PostgreSQL as the backend for MLflow:
Install PostgreSQL:
sudo apt install postgresql postgresql-contrib -y
Log in to the PostgreSQL prompt as the default postgres user:
sudo -u postgres psql
Create a new database and user for MLflow:
CREATE DATABASE mlflow_db;
CREATE USER mlflow_user WITH PASSWORD 'mlflow_password';
ALTER ROLE mlflow_user SET client_encoding TO 'utf8';
ALTER ROLE mlflow_user SET default_transaction_isolation TO 'read committed';
ALTER ROLE mlflow_user SET timezone TO 'UTC';
GRANT ALL ON SCHEMA public TO mlflow_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO mlflow_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO mlflow_user;
GRANT ALL PRIVILEGES ON DATABASE mlflow_db TO mlflow_user;
\q
Next, modify PostgreSQL's authentication method.
Open the pg_hba.conf
file, which is PostgreSQL's client authentication configuration file. You can find this file in /etc/postgresql/16/main/
(the version might be different on your system):
sudo nano /etc/postgresql/16/main/pg_hba.conf
Look for the line that defines the authentication method for local connections, which might look like this:
local all all peer
Change the authentication method from peer to md5 (or password) for your user:
local all all md5
Test the connection by logging in to PostgreSQL with the newly created user:
psql -U mlflow_user -d mlflow_db -W
Step 6: Configure MLflow to Use the Database
To configure MLflow to use PostgreSQL as the backend, set up environment variables for MLflow. Create a file called .env in the directory where you'll run MLflow and add the following:
export MLFLOW_TRACKING_URI="postgresql://mlflow_user:mlflow_password@localhost/mlflow_db"
Note: Replace database username and password with your login details.
Load the environment variables into your shell session:
source .env
Step 7: Set Up an Artifact Store for MLflow
In addition to the backend database, MLflow requires an artifact store to save models and other artifacts. You can use local storage, cloud storage (e.g., AWS S3), or a network file system (NFS).
To use local storage as an artifact store, specify a directory:
mkdir ~/mlflow-artifacts
Set this directory as the artifact store by adding the following line to the .env file:
export MLFLOW_ARTIFACT_ROOT=~/mlflow-artifacts
Step 9: Configure Firewall
If you're running this on a VPS, you may need to allow traffic through port 5000 to access MLflow remotely:
sudo ufw allow 5000
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw reload
Step 8: Run the MLflow Tracking Server
You can now start the MLflow server to track experiments and manage your ML projects. Use the following command to start the server, specifying the tracking URI and artifact root:
mlflow server \
--backend-store-uri $MLFLOW_TRACKING_URI \
--default-artifact-root $MLFLOW_ARTIFACT_ROOT \
--host 0.0.0.0 \
--port 5000
This will start the MLflow tracking server on port 5000
. You can now access the MLflow UI by navigating to http://your_server_ip:5000
.
Step 10: Testing MLflow
To ensure everything is working properly, create a simple test MLflow experiment. First, deactivate your virtual environment:
mkdir mlflow-test && cd mlflow-test
Create the test_mlflow.py script:
nano test_mlflow.py
Add following content:
import mlflow
# Start a new MLflow experiment
mlflow.set_experiment("Test Experiment")
# Log a parameter and a metric
with mlflow.start_run():
mlflow.log_param("learning_rate", 0.01)
mlflow.log_metric("accuracy", 0.95)
# Save the run ID for later reference
run_id = mlflow.active_run().info.run_id
print(f"Run ID: {run_id}")
Save and exit the file.
Run the script:
python test_mlflow.py
This script logs a parameter (learning_rate) and a metric (accuracy) to MLflow, which you can track via the MLflow UI.
Step 11: Configuring MLflow for Production
If you're using MLflow in a production environment, you should run it as a service for better reliability.
Create a systemd service file for MLflow:
sudo nano /etc/systemd/system/mlflow.service
Add the following content to the file, adjusting paths as necessary
:
[Unit]
Description=MLflow Tracking Server
After=network.target
[Service]
User=your_user
WorkingDirectory=/path/to/your/mlflow-project
ExecStart=/path/to/mlflow-env/bin/mlflow server \
--backend-store-uri $MLFLOW_TRACKING_URI \
--default-artifact-root $MLFLOW_ARTIFACT_ROOT \
--host 0.0.0.0 --port 5000
Restart=always
[Install]
WantedBy=multi-user.target
Save and exit the file.
Reload systemd and start the MLflow service:
sudo systemctl daemon-reload
sudo systemctl start mlflow
sudo systemctl enable mlflow
Check the status of the service:
sudo systemctl status mlflow
MLflow will now automatically start and run in the background as a system service.
Step 12: Install Nginx
Install Nginx to act as a reverse proxy for MLflow:
sudo apt install nginx -y
Step 13: Configure Basic Authentication for MLflow
We'll use Nginx as a reverse proxy, but first, let's set up basic HTTP authentication to secure access to the MLflow UI.
Install htpasswd for Creating Passwords
To generate the password file, you need the apache2-utils package (for Debian-based systems like Ubuntu):
sudo apt install apache2-utils
Create a password file (for example, in /etc/nginx/.htpasswd
):
sudo htpasswd -c /etc/nginx/.htpasswd mlflow_user
You'll be prompted to create a password for the mlflow_user. This file will store the username and hashed password for basic authentication.
Step 14: Configure Nginx as a Reverse Proxy for MLflow
Now, we need to configure Nginx to forward traffic from port 80 (or 443 for SSL) to MLflow's port 5000.
Create a new Nginx configuration file for MLflow:
sudo nano /etc/nginx/sites-available/mlflow
Add the following configuration, replacing your_domain_or_ip
with your server’s domain or IP address:
server {
listen 80;
server_name your_domain_or_ip;
location / {
proxy_pass http://127.0.0.1:5000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Enable Basic Auth
auth_basic "Restricted Content";
auth_basic_user_file /etc/nginx/.htpasswd;
}
}
Save and exit the file.
Enable the configuration:
sudo ln -s /etc/nginx/sites-available/mlflow /etc/nginx/sites-enabled/
Test the Nginx configuration for any syntax errors:
sudo nginx -t
Restart Nginx to apply the changes:
sudo systemctl restart nginx
At this point, Nginx should be forwarding requests to the MLflow server, and it will require HTTP basic authentication.
Step 15: Install Certbot and Obtain SSL Certificates
To secure the server with SSL, you can use Certbot, the official Let’s Encrypt client, which simplifies the process of obtaining and renewing SSL certificates.
Install Certbot and the Nginx plugin:
sudo apt install certbot python3-certbot-nginx -y
Obtain an SSL certificate:
Run the following Certbot command to obtain and install the SSL certificate automatically:
sudo certbot --nginx -d your_domain_or_ip
Certbot will prompt you for an email address and ask you to agree to the terms of service. Afterward, it will automatically obtain the certificate and configure Nginx to use SSL.
Conclusion
You have successfully installed and configured MLflow on your server. You can now use MLflow to track experiments, log parameters and metrics, and manage the entire lifecycle of your machine learning projects. With a robust backend like PostgreSQL and proper artifact storage, MLflow becomes a powerful tool for managing machine learning workflows in a production environment.