Code-Trainer V6 — Documentation
A 6-phase pipeline to build and deploy a multimodal code generation model capable of generating source code from VS Code screenshot images.
Quick Start
git clone https://github.com/cmndcntrlcyber/code-trainer
cd code-trainer
uv sync
playwright install chromium Pipeline Overview
The pipeline is divided into 6 phases, each building on the previous:
- Phase 1: Data Collection — Scrape high-quality GitHub repositories, filter code files, and capture Monaco Editor screenshots with 8 VS Code themes across 8 programming languages.
- Phase 2: Preprocessing — Convert screenshot captures to HuggingFace datasets in Qwen chat format, apply quality filtering, compute statistics, and upload to HF Hub.
- Phase 3: Vision Model — Train a Swin-B vision encoder + MLP projector + Qwen2.5-Coder-1.5B decoder with LoRA adapters locally on RTX 5060 Ti. Establishes multimodal baseline.
- Phase 4: Qwen Fine-tuning — Fine-tune Qwen2.5-Coder-14B with LoRA on HuggingFace Skills A100 GPU. Runs 3 parallel sweep configs (conservative / standard / aggressive) then full training on top-2.
- Phase 5: GGUF Deployment — Merge LoRA weights into base model, quantize to GGUF Q4_K_M via llama.cpp, and upload to HF Hub for local serving via llama.cpp or Ollama.
- Phase 6: Inference Agent — vLLM + Qwen-Agent + MCP tool integration for production inference. Planned architecture includes screenshot ingestion endpoint, code generation API, and IDE plugin.
Configuration
All pipeline behavior is controlled by src/config/v6_config.yaml.
Copy .env.example to .env and set:
GITHUB_TOKEN=ghp_...
HF_USERNAME=cmndcntrlcyber Stack
| Component | Version | Role |
|---|---|---|
Python | 3.12+ | Core language |
PyTorch | 2.1+ | ML framework |
Transformers | 4.40+ | Model loading & training |
PEFT | 0.10+ | LoRA adapters |
TRL | 0.8+ | SFT training |
Playwright | 1.42+ | Screenshot capture |
W&B | 0.16+ | Experiment tracking |
uv | latest | Dependency management |