Build a Machine Learning API with FastAPI on a VPS

By Anurag Singh

Updated on Oct 21, 2024

Build a Machine Learning API with FastAPI on a VPS

This is tutorial, we'll learn how to build a machine learning API with FastAPI on a VPS. 

We'll walk through the steps to build, deploy, and scale a machine learning API using FastAPI on a Virtual Private Server (VPS) for real-time predictions. FastAPI is a modern, high-performance web framework for building APIs with Python. By the end, you'll have a production-ready API to serve your machine learning models.

Prerequisites:

  • KVM VPS or Dedicated Server: You should have a VPS or dedicated server running Linux (e.g., Ubuntu 24.04).
  • Python: Version 3.8+ installed on your server.
  • FastAPI: To build the API.
  • Machine Learning Model: Trained model (e.g., a saved model using Scikit-learn, TensorFlow, or PyTorch).
  • Docker (Optional): To containerize and ease deployment.
  • NGINX: To serve the API.
  • Gunicorn: A Python WSGI HTTP server for running FastAPI in production.

Build a Machine Learning API with FastAPI on a VPS

Step 1: Set Up Your VPS

Update and upgrade packages. Make sure your VPS is up to date.

sudo apt update && sudo apt upgrade -y

Step 2: Install Python and Required Libraries

Install Python. FastAPI requires Python 3.8+.

sudo apt install python3 python3-pip -y

Create a virtual environment

It's a good practice to use virtual environments to manage your Python packages.

sudo apt install python3-venv -y
python3 -m venv fastapi-env
source fastapi-env/bin/activate

Install FastAPI and Uvicorn. Uvicorn is an ASGI server to run FastAPI.

pip install fastapi uvicorn

Install Machine Learning Libraries

Install the libraries you need (e.g., Scikit-learn, TensorFlow, PyTorch).

pip install scikit-learn

Step 3: Build the FastAPI Application

Create the FastAPI App

Create a new directory for your project.

mkdir ml-api && cd ml-api
nano main.py

Write the FastAPI Application

Here’s a basic example of a FastAPI app that loads a pre-trained Scikit-learn model and makes predictions.

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

# Define the input data model (4 features for the Iris dataset)
class PredictionInput(BaseModel):
    feature1: float
    feature2: float
    feature3: float
    feature4: float

# Load the pre-trained machine learning model
model = joblib.load("simple_model.pkl")

# Create the FastAPI instance
app = FastAPI()

@app.post("/predict/")
async def predict(input_data: PredictionInput):
    # Convert input data into the format expected by the model
    data = [[input_data.feature1, input_data.feature2, input_data.feature3, input_data.feature4]]
    
    # Make a prediction using the ML model
    prediction = model.predict(data)
    
    # Return the prediction result
    return {"prediction": int(prediction[0])}

Save and exit the file.

Save Your Model. Ensure your trained model is saved using libraries such as joblib or pickle. For example:

import joblib
joblib.dump(trained_model, "simple_model.pkl")

Sample Test Machine Learning Model

Let's create a small and simple machine learning mode:

Create a file:

nano test-ml.py

Add following code:

# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import joblib

# Load a sample dataset (Iris dataset)
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize a simple DecisionTreeClassifier
clf = DecisionTreeClassifier()

# Train the model
clf.fit(X_train, y_train)

# Test the model accuracy
accuracy = clf.score(X_test, y_test)
print(f"Model accuracy: {accuracy * 100:.2f}%")

# Save the model to a .pkl file
joblib.dump(clf, 'simple_model.pkl')
print("Model saved as 'simple_model.pkl'")

Save and exit the file.

Now, run the model:

python test-ml.py

Our model is ready to use.

Step 4: Run the FastAPI Application Locally

Before deploying, test the API locally.

uvicorn main:app --reload

Visit http://localhost:8000/docs to access the interactive documentation provided by FastAPI and test the prediction endpoint.

Step 5: Deploy FastAPI on the VPS

To deploy FastAPI for production, use Gunicorn along with Uvicorn workers.

Install Gunicorn

pip install gunicorn

Run FastAPI with Gunicorn

To start FastAPI with Gunicorn and Uvicorn workers:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app
  • -w 4: Number of workers (adjust based on your CPU cores).
  • -k uvicorn.workers.UvicornWorker: Use Uvicorn workers for handling FastAPI.

Step 6: Set Up NGINX as a Reverse Proxy

To make your API accessible over the internet, use NGINX to forward requests to your FastAPI app.

Install NGINX

sudo apt install nginx

Configure NGINX

Create a new configuration file for your FastAPI app.

sudo nano /etc/nginx/sites-available/fastapi

Add the following configuration:

server {
    listen 80;
    server_name your_domain_or_ip;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Enable the NGINX Configuration

Create a symbolic link to enable the site.

sudo ln -s /etc/nginx/sites-available/fastapi /etc/nginx/sites-enabled/

Restart NGINX

sudo systemctl restart nginx

Your FastAPI application should now be accessible via your server's IP address or domain.

Step 7: Scale FastAPI for High Traffic

To handle high traffic, you can horizontally scale the API by adding more Gunicorn workers or deploying the application using Docker and Kubernetes.

Option 1: Increase Gunicorn Workers

Modify the number of Gunicorn workers based on your server's CPU cores:

gunicorn -w 8 -k uvicorn.workers.UvicornWorker main:app

Option 2: Deploy with Docker

Note: We had issue with the Docker. The image gets build but we could not able to request the API.

First create a requirement.txt file

pip freeze > requirements.txt

Create a Dockerfile

To containerize your FastAPI app, create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . /app
CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "main:app"]

Build and Run the Docker Container

docker build -t fastapi-ml-api .
docker run -d -p 80:80 fastapi-ml-api

Step 8: Testing the API

Once deployed, you can test your API by sending a POST request with the input data using tools like curl or Postman:

curl -X 'POST' \
  'http://server_ip:8000/predict/' \
  -H 'Content-Type: application/json' \
  -d '{"feature1": 5.1, "feature2": 3.5, "feature3": 1.4, "feature4": 0.2}'

You should receive a JSON response with the prediction.

Conclusion

You've now deployed a scalable machine learning API using FastAPI on a VPS! You can serve real-time predictions using your model and scale it based on demand. To ensure stability and performance, consider using additional tools like Docker and Kubernetes for orchestration, as well as adding monitoring tools like Prometheus and Grafana to monitor the health of your API.