Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastMLX Python Client #23

Open
8 tasks
Blaizzy opened this issue Aug 2, 2024 · 3 comments
Open
8 tasks

FastMLX Python Client #23

Blaizzy opened this issue Aug 2, 2024 · 3 comments

Comments

@Blaizzy
Copy link
Collaborator

Blaizzy commented Aug 2, 2024

Feature Description

Implement a FastMLX client that allows users to specify custom server settings, including base URL, port, and number of workers. This feature will provide greater flexibility for users who want to run the FastMLX server with specific configurations.

Proposed Implementation

  1. Modify the FastMLX class constructor to accept additional parameters:

  2. Update the FastMLXClient class to:

    • Parse the base_url to extract host and port
    • Store the workers parameter
    • Use these values when starting the server
  3. Modify the start_fastmlx_server function to accept host, port, and workers as parameters.

  4. Update the ensure_server_running method in FastMLXClient to use the custom settings when starting the server.

Example Usage

from fastmlx import FastMLX

client = FastMLX(
    api_key="your-api-key",
    base_url="http://localhost:8080",  # Custom port
    workers=4  # Custom number of workers
)

# Use the client...

client.close()

# Or use as a context manager
with FastMLX(api_key="your-api-key", base_url="http://localhost:8080", workers=4) as client:
    # Your code here
    pass

Benefits

  • Allows users to run the FastMLX server on a custom port
  • Enables configuration of the number of worker processes for the server
  • Provides flexibility for different deployment scenarios

Potential Challenges

  • Ensuring backward compatibility with existing usage
  • Proper error handling for invalid base URLs or worker counts
  • Documenting the new functionality clearly for users

Tasks

  • Update FastMLX class constructor
  • Modify FastMLXClient to handle custom settings
  • Update start_fastmlx_server function
  • Modify ensure_server_running method
  • Add error handling for invalid inputs
  • Update documentation and README
  • Add tests for new functionality
  • Update examples in the codebase

Questions

  • Should we provide a way to update these settings after client initialization?
  • Do we need to add any validation for the number of workers (e.g., min/max values)?
  • Should we consider adding more server configuration options in the future?

Please provide any feedback or suggestions on this proposed implementation.

@stewartugelow
Copy link

Q: Why do you need a standalone client? Couldn't you set all of these variables by API?

@Blaizzy
Copy link
Collaborator Author

Blaizzy commented Aug 21, 2024

Yes, you can set the variables.

But this would help if you want to programmatically start and stop the server.

Imagine like the OpenAI/Anthrophic Python Client

@stewartugelow
Copy link

I ask out of complete ignorance, but would one of the following approaches from ChatGPT work?


Using os.subprocess (or more commonly subprocess module) in Python to start and stop a FastAPI server programmatically can work, but there are a few considerations, trade-offs, and potentially better alternatives. Here's an overview of what to keep in mind:

Using subprocess to Start/Stop a FastAPI Server

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. Using subprocess to start a FastAPI server typically involves launching the server in a separate process, and you can control it (e.g., stop it) by managing the process.

Example of starting a FastAPI server with subprocess:

import subprocess

# Start the FastAPI server
process = subprocess.Popen(["uvicorn", "app:app", "--host", "127.0.0.1", "--port", "8000"])

# Stop the FastAPI server
process.terminate()  # Graceful termination
process.kill()       # Force kill if necessary

Considerations:

  1. Process Management: When using subprocess, you're working with a separate process. Managing the lifecycle of this process requires careful handling, especially around shutdown, cleanup, and ensuring that the process is terminated properly.

  2. Error Handling: You need to capture and handle any potential errors that arise from starting the subprocess. For example, the FastAPI server might fail to start due to port conflicts, missing dependencies, or invalid configurations.

  3. Blocking Behavior: The subprocess call may block your main thread depending on how you handle the process. If your main program needs to continue running, you’ll need to manage the subprocess asynchronously or in a separate thread.

  4. Cross-Platform Compatibility: If you plan to run your FastAPI server on different platforms (e.g., Windows, Linux), ensure that your subprocess code accounts for platform-specific behavior, such as differences in process termination or command-line syntax.

  5. Graceful Shutdown: Simply terminating the process with .terminate() or .kill() may not allow FastAPI to shut down gracefully, which can lead to problems like unsaved data, incomplete responses, or locked resources.

Alternatives to subprocess

There are other methods to start and stop a FastAPI server programmatically that may offer better control or integration with your Python application:

1. Direct Integration via uvicorn.run()

Instead of starting a new process, you can run the FastAPI app directly in your Python code using uvicorn.run(). This keeps everything in the same process and allows for tighter control and better integration with your existing application logic.

Example:

import uvicorn
from threading import Thread

def start_fastapi():
    uvicorn.run("app:app", host="127.0.0.1", port=8000)

# Start FastAPI server in a separate thread
thread = Thread(target=start_fastapi)
thread.start()

# Stop the server by exiting the program or controlling the thread

Advantages:

  • Single Process: Everything runs in a single process, so there’s no need to manage multiple processes.
  • Better Integration: Since you control the server in your main program, it’s easier to handle lifecycle events, logging, and error handling.
  • Asynchronous Handling: Since FastAPI and Uvicorn are asynchronous by nature, this approach can fit well if your application already uses async features.

Disadvantages:

  • Blocking: If you run uvicorn.run() directly in the main thread, it will block execution until the server is stopped. Using a thread or asyncio loop can mitigate this.

2. Using an Asyncio Event Loop

If your application uses asyncio, you can start and stop the FastAPI server within an asyncio event loop. This allows for non-blocking execution and more flexible server management.

Example:

import asyncio
import uvicorn

async def start_fastapi():
    config = uvicorn.Config("app:app", host="127.0.0.1", port=8000)
    server = uvicorn.Server(config)
    await server.serve()

async def stop_fastapi(server):
    await server.shutdown()

loop = asyncio.get_event_loop()
loop.create_task(start_fastapi())

Advantages:

  • Asynchronous Handling: Works well in applications that are built on asyncio, providing non-blocking control over the server lifecycle.
  • Graceful Shutdown: You can shut down the server asynchronously and allow for cleanup operations.

3. Using Gunicorn with asyncio Workers

If you need to run the FastAPI app in production, you might consider using gunicorn with Uvicorn workers. You can control Gunicorn programmatically as well, although it adds complexity to manage the Gunicorn process.

Example:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker app:app

Gunicorn handles process management, and you can programmatically start and stop it by managing the Gunicorn process.

Advantages:

  • Production-Ready: Gunicorn provides features like worker management, logging, and load balancing out of the box.
  • Scalability: Ideal for scaling FastAPI applications across multiple CPU cores.

Disadvantages:

  • Complexity: More complex to set up and manage programmatically compared to using uvicorn.run() directly.

Conclusion

Using subprocess to start and stop a FastAPI server is a valid option, but it requires careful process management and error handling. Depending on your use case, integrating FastAPI directly using uvicorn.run() or within an asyncio event loop might be more efficient and offer tighter control.

  • Use subprocess: If you need to manage FastAPI as a separate process from your main application.
  • Use uvicorn.run(): If you want tighter integration and control over the FastAPI server from within your Python application.
  • Use asyncio: If you are already working with asynchronous code and need non-blocking server control.
  • Use Gunicorn: For production environments where you need robust process management and scalability.

Each method has its trade-offs, so choose the one that best fits your application's architecture and requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants