This is tutorial, we'll learn how to build a machine learning API with FastAPI on a VPS.
We'll walk through the steps to build, deploy, and scale a machine learning API using FastAPI on a Virtual Private Server (VPS) for real-time predictions. FastAPI is a modern, high-performance web framework for building APIs with Python. By the end, you'll have a production-ready API to serve your machine learning models.
Prerequisites:
- KVM VPS or Dedicated Server: You should have a VPS or dedicated server running Linux (e.g., Ubuntu 24.04).
- Python: Version 3.8+ installed on your server.
- FastAPI: To build the API.
- Machine Learning Model: Trained model (e.g., a saved model using Scikit-learn, TensorFlow, or PyTorch).
- Docker (Optional): To containerize and ease deployment.
- NGINX: To serve the API.
- Gunicorn: A Python WSGI HTTP server for running FastAPI in production.
Build a Machine Learning API with FastAPI on a VPS
Step 1: Set Up Your VPS
Update and upgrade packages. Make sure your VPS is up to date.
sudo apt update && sudo apt upgrade -y
Step 2: Install Python and Required Libraries
Install Python. FastAPI requires Python 3.8+.
sudo apt install python3 python3-pip -y
Create a virtual environment
It's a good practice to use virtual environments to manage your Python packages.
sudo apt install python3-venv -y
python3 -m venv fastapi-env
source fastapi-env/bin/activate
Install FastAPI and Uvicorn. Uvicorn is an ASGI server to run FastAPI.
pip install fastapi uvicorn
Install Machine Learning Libraries
Install the libraries you need (e.g., Scikit-learn, TensorFlow, PyTorch
).
pip install scikit-learn
Step 3: Build the FastAPI Application
Create the FastAPI App
Create a new directory for your project.
mkdir ml-api && cd ml-api
nano main.py
Write the FastAPI Application
Here’s a basic example of a FastAPI app that loads a pre-trained Scikit-learn model and makes predictions.
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
# Define the input data model (4 features for the Iris dataset)
class PredictionInput(BaseModel):
feature1: float
feature2: float
feature3: float
feature4: float
# Load the pre-trained machine learning model
model = joblib.load("simple_model.pkl")
# Create the FastAPI instance
app = FastAPI()
@app.post("/predict/")
async def predict(input_data: PredictionInput):
# Convert input data into the format expected by the model
data = [[input_data.feature1, input_data.feature2, input_data.feature3, input_data.feature4]]
# Make a prediction using the ML model
prediction = model.predict(data)
# Return the prediction result
return {"prediction": int(prediction[0])}
Save and exit the file.
Save Your Model. Ensure your trained model is saved using libraries such as joblib or pickle. For example:
import joblib
joblib.dump(trained_model, "simple_model.pkl")
Sample Test Machine Learning Model
Let's create a small and simple machine learning mode:
Create a file:
nano test-ml.py
Add following code:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import joblib
# Load a sample dataset (Iris dataset)
iris = load_iris()
X = iris.data # Features
y = iris.target # Labels
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize a simple DecisionTreeClassifier
clf = DecisionTreeClassifier()
# Train the model
clf.fit(X_train, y_train)
# Test the model accuracy
accuracy = clf.score(X_test, y_test)
print(f"Model accuracy: {accuracy * 100:.2f}%")
# Save the model to a .pkl file
joblib.dump(clf, 'simple_model.pkl')
print("Model saved as 'simple_model.pkl'")
Save and exit the file.
Now, run the model:
python test-ml.py
Our model is ready to use.
Step 4: Run the FastAPI Application Locally
Before deploying, test the API locally.
uvicorn main:app --reload
Visit http://localhost:8000/docs
to access the interactive documentation provided by FastAPI and test the prediction endpoint.
Step 5: Deploy FastAPI on the VPS
To deploy FastAPI for production, use Gunicorn along with Uvicorn workers.
Install Gunicorn
pip install gunicorn
Run FastAPI with Gunicorn
To start FastAPI with Gunicorn and Uvicorn workers:
gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app
- -w 4: Number of workers (adjust based on your CPU cores).
- -k uvicorn.workers.UvicornWorker: Use Uvicorn workers for handling FastAPI.
Step 6: Set Up NGINX as a Reverse Proxy
To make your API accessible over the internet, use NGINX to forward requests to your FastAPI app.
Install NGINX
sudo apt install nginx
Configure NGINX
Create a new configuration file for your FastAPI app.
sudo nano /etc/nginx/sites-available/fastapi
Add the following configuration:
server {
listen 80;
server_name your_domain_or_ip;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Enable the NGINX Configuration
Create a symbolic link to enable the site.
sudo ln -s /etc/nginx/sites-available/fastapi /etc/nginx/sites-enabled/
Restart NGINX
sudo systemctl restart nginx
Your FastAPI application should now be accessible via your server's IP address or domain.
Step 7: Scale FastAPI for High Traffic
To handle high traffic, you can horizontally scale the API by adding more Gunicorn workers or deploying the application using Docker and Kubernetes.
Option 1: Increase Gunicorn Workers
Modify the number of Gunicorn workers based on your server's CPU cores:
gunicorn -w 8 -k uvicorn.workers.UvicornWorker main:app
Option 2: Deploy with Docker
Note: We had issue with the Docker. The image gets build but we could not able to request the API.
First create a requirement.txt file
pip freeze > requirements.txt
Create a Dockerfile
To containerize your FastAPI app, create a Dockerfile:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . /app
CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "main:app"]
Build and Run the Docker Container
docker build -t fastapi-ml-api .
docker run -d -p 80:80 fastapi-ml-api
Step 8: Testing the API
Once deployed, you can test your API by sending a POST request with the input data using tools like curl or Postman:
curl -X 'POST' \
'http://server_ip:8000/predict/' \
-H 'Content-Type: application/json' \
-d '{"feature1": 5.1, "feature2": 3.5, "feature3": 1.4, "feature4": 0.2}'
You should receive a JSON response with the prediction.
Conclusion
You've now deployed a scalable machine learning API using FastAPI on a VPS! You can serve real-time predictions using your model and scale it based on demand. To ensure stability and performance, consider using additional tools like Docker and Kubernetes for orchestration, as well as adding monitoring tools like Prometheus and Grafana to monitor the health of your API.