Code-Trainer V6 — Documentation

A 6-phase pipeline to build and deploy a multimodal code generation model capable of generating source code from VS Code screenshot images.

Quick Start

git clone https://github.com/cmndcntrlcyber/code-trainer
cd code-trainer
uv sync
playwright install chromium

The pipeline is divided into 6 phases, each building on the previous:

Phase 1: Data Collection — Scrape high-quality GitHub repositories, filter code files, and capture Monaco Editor screenshots with 8 VS Code themes across 8 programming languages.
Phase 2: Preprocessing — Convert screenshot captures to HuggingFace datasets in Qwen chat format, apply quality filtering, compute statistics, and upload to HF Hub.
Phase 3: Vision Model — Train a Swin-B vision encoder + MLP projector + Qwen2.5-Coder-1.5B decoder with LoRA adapters locally on RTX 5060 Ti. Establishes multimodal baseline.
Phase 4: Qwen Fine-tuning — Fine-tune Qwen2.5-Coder-14B with LoRA on HuggingFace Skills A100 GPU. Runs 3 parallel sweep configs (conservative / standard / aggressive) then full training on top-2.
Phase 5: GGUF Deployment — Merge LoRA weights into base model, quantize to GGUF Q4_K_M via llama.cpp, and upload to HF Hub for local serving via llama.cpp or Ollama.
Phase 6: Inference Agent — vLLM + Qwen-Agent + MCP tool integration for production inference. Planned architecture includes screenshot ingestion endpoint, code generation API, and IDE plugin.

All pipeline behavior is controlled by src/config/v6_config.yaml. Copy .env.example to .env and set:

GITHUB_TOKEN=ghp_...
HF_USERNAME=cmndcntrlcyber