No description
  • JavaScript 59.9%
  • CSS 23.7%
  • Python 15.1%
  • HTML 0.7%
  • Shell 0.6%
Find a file
2026-01-22 01:26:08 +01:00
backend Initial anonymous release 2026-01-22 01:26:08 +01:00
frontend Initial anonymous release 2026-01-22 01:26:08 +01:00
vllm_test Initial anonymous release 2026-01-22 01:26:08 +01:00
.gitignore Initial anonymous release 2026-01-22 01:26:08 +01:00
LICENSE Initial anonymous release 2026-01-22 01:26:08 +01:00
package-lock.json Initial anonymous release 2026-01-22 01:26:08 +01:00
package.json Initial anonymous release 2026-01-22 01:26:08 +01:00
Procfile Initial anonymous release 2026-01-22 01:26:08 +01:00
README.md Initial anonymous release 2026-01-22 01:26:08 +01:00
start-development.sh Initial anonymous release 2026-01-22 01:26:08 +01:00

ConvoScope

A Privacy-Preserving Platform for LLM Conversation Research

ConvoScope enables researchers to collect and study LLM conversations while protecting participant privacy. All conversation processing happens client-side in the browser—raw data never leaves the user's device.


Features

  • Alignment-Based Classifier Training: Define target conversations through natural language descriptions. No ML expertise required.
  • Client-Side Privacy: All filtering, classification, and PII detection runs entirely in the browser using ONNX models and Transformers.js.
  • Flexible Conversation Generation: Choose between OpenAI API or local vLLM (UserLM-8b + Qwen3-8B) for synthetic conversation generation.
  • Custom Binary Classifiers: Train study-specific classifiers that export to ONNX for browser inference.
  • Demo Mode: Try the platform with pre-generated sample conversations.
  • Complete Research Workflow: From study creation to data collection, designed for non-technical researchers.

Quick Start

  1. Clone the repository:

    git clone https://repo.paperbackwriters.club/code/convoscope.git
    cd convoscope
    
  2. Install dependencies:

    npm install
    cd frontend && npm install && cd ..
    cd backend && npm install && cd ..
    
  3. Enable demo mode:

    # Frontend
    echo "REACT_APP_DEMO_MODE=true" > frontend/.env
    
    # Backend
    echo "DEMO_MODE=true" >> backend/.env
    
  4. Start the development servers:

    npm run dev
    
  5. Open http://localhost:3000 in your browser.


Full Installation

Prerequisites

  • Node.js >= 18.0.0
  • npm >= 8.0.0
  • Python 3.9+ (for classifier training and local vLLM)
  • 32GB RAM (for local model inference)

Backend Setup

  1. Navigate to the backend directory:

    cd backend
    
  2. Install Node.js dependencies:

    npm install
    
  3. Install Python dependencies (for classifier training):

    pip install -r services/classifier/requirements.txt
    
  4. Configure environment variables:

    cp .env.example .env
    

    Edit .env:

    NODE_ENV=development
    PORT=3001
    DEMO_MODE=false
    
    # Optional: OpenAI API key for conversation generation
    # OPENAI_API_KEY=sk-your-key-here
    
    # Optional: SendGrid for email notifications (production only)
    # SENDGRID_API_KEY=your-sendgrid-key
    
  5. Start the server:

    npm run dev
    

Frontend Setup

  1. Navigate to the frontend directory:

    cd frontend
    
  2. Install dependencies:

    npm install
    
  3. Configure environment variables:

    echo "REACT_APP_DEMO_MODE=false" > .env
    
  4. Start the development server:

    npm start
    

Conversation Generation Options

Option 1: OpenAI API

Use the OpenAI API for generating synthetic conversations during the alignment phase.

  1. Obtain an API key from OpenAI Platform
  2. Enter the key in the study creation form when prompted

Option 2: Local vLLM (Privacy-Preserving)

Run conversation generation entirely locally using open-source models.

Required Models

  • User Simulator: microsoft/UserLM-8b - Trained on WildChat for realistic user behavior
  • Assistant Simulator: Qwen/Qwen3-8B - Natural conversational responses

Setup

  1. Navigate to the vLLM directory:

    cd vllm_test
    
  2. Install Python dependencies:

    pip install -r requirements.txt
    
  3. Download the models (first run will download automatically):

    python test_mps_setup.py
    
  4. The backend will automatically detect available models when you select "Local vLLM" in the study creation form.

Hardware Requirements

Backend Device Memory Speed
vLLM CUDA GPU ~32GB VRAM Fast
Transformers Apple Silicon ~32GB RAM Medium
Transformers CUDA GPU ~32GB VRAM Medium
OpenAI API N/A Minimal Fast

Classifier Training Service

ConvoScope includes a Python-based classifier training service that:

  1. Generates synthetic training data using few-shot learning from alignment samples
  2. Trains a binary classifier (relevant vs not relevant) using sentence-transformers
  3. Exports to ONNX for client-side browser inference

Architecture

backend/services/classifier/
├── config.py               # Configuration classes
├── synthetic_generator.py  # Few-shot data generation (threaded)
├── model.py               # Binary classifier (MiniLM-L6-v2 + classification head)
├── trainer.py             # Training loop with early stopping
├── export_onnx.py         # ONNX export with quantization
└── train_classifier.py    # Main CLI entry point

Training Pipeline

Accepted Samples (positive few-shot) ──┐
                                       ├─→ Synthetic Generator (8 threads)
Rejected Samples (negative few-shot) ──┘           │
                                                   ↓
                                         1000 synthetic samples
                                         (500 positive + 500 negative)
                                                   │
                                                   ↓
                                         Binary Classifier Training
                                         (5 epochs, early stopping)
                                                   │
                                                   ↓
                                         ONNX Export (quantized)
                                                   │
                                                   ↓
                                         Browser-ready model

Demo Mode vs Real Mode

Mode Behavior
Demo (DEMO_MODE=true) Training completes instantly with mock metrics. Frontend uses built-in category classifier.
Real (DEMO_MODE=false) Full pipeline: generates 1000 samples, trains classifier, exports ONNX model.

Manual Training

You can also run the training service directly:

cd backend/services/classifier
python train_classifier.py \
  --input training_data.json \
  --output ./output \
  --study-id my-study

Input JSON format:

{
  "study_id": "abc123",
  "prompt_spec": "Conversations about mental health...",
  "accepted_samples": [...],
  "rejected_samples": [...],
  "generation_method": "openai",
  "openai_api_key": "sk-...",
  "demo_mode": false
}

Project Structure

convoscope/
├── backend/                    # Express.js API server
│   ├── routes/                 # API routes (studies, forms)
│   ├── services/
│   │   ├── classifier/         # Python classifier training service
│   │   │   ├── config.py
│   │   │   ├── synthetic_generator.py
│   │   │   ├── model.py
│   │   │   ├── trainer.py
│   │   │   ├── export_onnx.py
│   │   │   └── train_classifier.py
│   │   ├── classifierService.js  # Node.js wrapper
│   │   ├── modelTrainer.js       # Training orchestration
│   │   ├── storageService.js     # Data persistence
│   │   └── syntheticGenerator.js # Alignment sample generation
│   └── server.js               # Main server entry
├── frontend/                   # React application
│   ├── src/
│   │   ├── components/
│   │   │   ├── Form/           # Data donation form (participant)
│   │   │   └── ResearcherDashboard/  # Study management
│   │   └── config.js           # App configuration
│   └── public/                 # Static assets
├── vllm_test/                  # Local LLM generation scripts
├── train_classifier/           # Legacy classifier utilities
├── paper/                      # CHI 2026 paper source
├── README.md
├── LICENSE
└── .gitignore

Usage Guide

For Researchers

  1. Create a Study: Navigate to the Researcher Dashboard and define your study parameters:

    • Study name and description
    • Target conversation topics (prompt specification)
    • Generation method (OpenAI API or Local vLLM)
    • Conversation filters (turn count, message length, etc.)
  2. Alignment Phase: Review generated sample conversations:

    • Accept conversations that match your research criteria
    • Reject conversations that don't match
    • Need at least 5 accepted samples to proceed
  3. Training: The system trains a custom binary classifier:

    • Generates 1000 synthetic samples using your accepted/rejected examples
    • Trains a sentence-transformer classifier
    • Exports to ONNX for browser inference
  4. Publish: Receive a shareable link for participants.

  5. Collect Data: Participants upload their conversations, which are filtered and anonymized client-side before submission.

For Participants

  1. Export Conversations: Download your ChatGPT conversations from chat.openai.com → Settings → Data Controls → Export.

  2. Upload: Visit the study link and upload your exported ZIP file.

  3. Review: The system filters relevant conversations and highlights detected PII for your review.

  4. Anonymize: Add custom anonymization terms and review auto-detected sensitive information.

  5. Submit: Only reviewed, anonymized content is sent to the researcher.


API Endpoints

Health Check

GET /api/health

Studies (Researcher)

GET    /api/studies              # List all studies
POST   /api/studies              # Create new study
GET    /api/studies/:id          # Get study details
PUT    /api/studies/:id          # Update study
DELETE /api/studies/:id          # Delete study
POST   /api/studies/:id/generate # Generate alignment samples
POST   /api/studies/:id/accept   # Accept a sample
POST   /api/studies/:id/reject   # Reject a sample
POST   /api/studies/:id/start-training  # Start classifier training
GET    /api/studies/:id/training-status # Get training status

Forms (Participant)

GET    /api/form/:id          # Get form configuration
POST   /api/form/:id/submit   # Submit anonymized data

System

GET    /api/vllm/status       # Check local model availability

Environment Variables

Backend

Variable Description Default
NODE_ENV Environment mode development
PORT Server port 3001
DEMO_MODE Enable demo mode false
OPENAI_API_KEY OpenAI API key (optional) -
SENDGRID_API_KEY SendGrid key for emails (production) -

Frontend

Variable Description Default
REACT_APP_DEMO_MODE Enable demo mode false
REACT_APP_API_URL Backend API URL http://localhost:3001/api

Deployment

Heroku

  1. Create a Heroku app:

    heroku create your-app-name
    
  2. Set environment variables:

    heroku config:set NODE_ENV=production
    heroku config:set SENDGRID_API_KEY=your-key
    
  3. Add Python buildpack (for classifier training):

    heroku buildpacks:add --index 1 heroku/python
    heroku buildpacks:add --index 2 heroku/nodejs
    
  4. Deploy:

    git push heroku main
    

Privacy Architecture

ConvoScope implements a client-side privacy architecture:

  1. Conversation Parsing: ZIP file extraction and JSON parsing happen in the browser.
  2. Classification: ONNX models run via Transformers.js to filter relevant conversations.
  3. PII Detection: Named entity recognition identifies sensitive information client-side.
  4. User Review: Participants review and approve all data before submission.
  5. Anonymization: PII is replaced with placeholders before leaving the browser.

No raw conversation data ever reaches the server.


Citation

If you use ConvoScope in your research, please cite:

@inproceedings{convoscope2026,
  title={ConvoScope: A Privacy-Preserving Platform for LLM Conversation Research},
  author={[Authors]},
  booktitle={CHI '26 Extended Abstracts},
  year={2026},
  publisher={ACM}
}

License

MIT License - see LICENSE for details.


Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Support

  • Issues: Repository Issues
  • Contact: For support, please open an issue in the repository.