Local LLM Deployment Suite

Project Overview

An academic project built with a team of 4 students to deploy private, offline artificial intelligence. The objective was to design a system that automates downloading and running large language models locally with hardware acceleration.

Key Implementation Details

Containerized Orchestration: Developed an integrated Docker Compose environment that automates installing and running Llama.cpp libraries.
Hardware Pass-through: Configured NVIDIA Container Toolkit integration to leverage CUDA on host GPUs, accelerating inference speeds.
Quantization Pipeline: Automated model quantization scripts to output GGUF formatted models, reducing local RAM overhead while maintaining response quality.

🔗 GitHub Repository

Project Overview#

Key Implementation Details#

Project Overview

Key Implementation Details