Code-Trainer V6 — Documentation

A 6-phase pipeline to build and deploy a multimodal code generation model capable of generating source code from VS Code screenshot images.

Quick Start

git clone https://github.com/cmndcntrlcyber/code-trainer
cd code-trainer
uv sync
playwright install chromium

Pipeline Overview

The pipeline is divided into 6 phases, each building on the previous:

  1. Phase 1: Data Collection — Scrape high-quality GitHub repositories, filter code files, and capture Monaco Editor screenshots with 8 VS Code themes across 8 programming languages.
  2. Phase 2: Preprocessing — Convert screenshot captures to HuggingFace datasets in Qwen chat format, apply quality filtering, compute statistics, and upload to HF Hub.
  3. Phase 3: Vision Model — Train a Swin-B vision encoder + MLP projector + Qwen2.5-Coder-1.5B decoder with LoRA adapters locally on RTX 5060 Ti. Establishes multimodal baseline.
  4. Phase 4: Qwen Fine-tuning — Fine-tune Qwen2.5-Coder-14B with LoRA on HuggingFace Skills A100 GPU. Runs 3 parallel sweep configs (conservative / standard / aggressive) then full training on top-2.
  5. Phase 5: GGUF Deployment — Merge LoRA weights into base model, quantize to GGUF Q4_K_M via llama.cpp, and upload to HF Hub for local serving via llama.cpp or Ollama.
  6. Phase 6: Inference Agent — vLLM + Qwen-Agent + MCP tool integration for production inference. Planned architecture includes screenshot ingestion endpoint, code generation API, and IDE plugin.

Configuration

All pipeline behavior is controlled by src/config/v6_config.yaml. Copy .env.example to .env and set:

GITHUB_TOKEN=ghp_...
HF_USERNAME=cmndcntrlcyber

Stack

ComponentVersionRole
Python 3.12+ Core language
PyTorch 2.1+ ML framework
Transformers 4.40+ Model loading & training
PEFT 0.10+ LoRA adapters
TRL 0.8+ SFT training
Playwright 1.42+ Screenshot capture
W&B 0.16+ Experiment tracking
uv latest Dependency management