- JavaScript 59.9%
- CSS 23.7%
- Python 15.1%
- HTML 0.7%
- Shell 0.6%
| backend | ||
| frontend | ||
| vllm_test | ||
| .gitignore | ||
| LICENSE | ||
| package-lock.json | ||
| package.json | ||
| Procfile | ||
| README.md | ||
| start-development.sh | ||
ConvoScope
A Privacy-Preserving Platform for LLM Conversation Research
ConvoScope enables researchers to collect and study LLM conversations while protecting participant privacy. All conversation processing happens client-side in the browser—raw data never leaves the user's device.
Features
- Alignment-Based Classifier Training: Define target conversations through natural language descriptions. No ML expertise required.
- Client-Side Privacy: All filtering, classification, and PII detection runs entirely in the browser using ONNX models and Transformers.js.
- Flexible Conversation Generation: Choose between OpenAI API or local vLLM (UserLM-8b + Qwen3-8B) for synthetic conversation generation.
- Custom Binary Classifiers: Train study-specific classifiers that export to ONNX for browser inference.
- Demo Mode: Try the platform with pre-generated sample conversations.
- Complete Research Workflow: From study creation to data collection, designed for non-technical researchers.
Quick Start
Demo Mode (Recommended for First-Time Users)
-
Clone the repository:
git clone https://repo.paperbackwriters.club/code/convoscope.git cd convoscope -
Install dependencies:
npm install cd frontend && npm install && cd .. cd backend && npm install && cd .. -
Enable demo mode:
# Frontend echo "REACT_APP_DEMO_MODE=true" > frontend/.env # Backend echo "DEMO_MODE=true" >> backend/.env -
Start the development servers:
npm run dev -
Open http://localhost:3000 in your browser.
Full Installation
Prerequisites
- Node.js >= 18.0.0
- npm >= 8.0.0
- Python 3.9+ (for classifier training and local vLLM)
- 32GB RAM (for local model inference)
Backend Setup
-
Navigate to the backend directory:
cd backend -
Install Node.js dependencies:
npm install -
Install Python dependencies (for classifier training):
pip install -r services/classifier/requirements.txt -
Configure environment variables:
cp .env.example .envEdit
.env:NODE_ENV=development PORT=3001 DEMO_MODE=false # Optional: OpenAI API key for conversation generation # OPENAI_API_KEY=sk-your-key-here # Optional: SendGrid for email notifications (production only) # SENDGRID_API_KEY=your-sendgrid-key -
Start the server:
npm run dev
Frontend Setup
-
Navigate to the frontend directory:
cd frontend -
Install dependencies:
npm install -
Configure environment variables:
echo "REACT_APP_DEMO_MODE=false" > .env -
Start the development server:
npm start
Conversation Generation Options
Option 1: OpenAI API
Use the OpenAI API for generating synthetic conversations during the alignment phase.
- Obtain an API key from OpenAI Platform
- Enter the key in the study creation form when prompted
Option 2: Local vLLM (Privacy-Preserving)
Run conversation generation entirely locally using open-source models.
Required Models
- User Simulator:
microsoft/UserLM-8b- Trained on WildChat for realistic user behavior - Assistant Simulator:
Qwen/Qwen3-8B- Natural conversational responses
Setup
-
Navigate to the vLLM directory:
cd vllm_test -
Install Python dependencies:
pip install -r requirements.txt -
Download the models (first run will download automatically):
python test_mps_setup.py -
The backend will automatically detect available models when you select "Local vLLM" in the study creation form.
Hardware Requirements
| Backend | Device | Memory | Speed |
|---|---|---|---|
| vLLM | CUDA GPU | ~32GB VRAM | Fast |
| Transformers | Apple Silicon | ~32GB RAM | Medium |
| Transformers | CUDA GPU | ~32GB VRAM | Medium |
| OpenAI API | N/A | Minimal | Fast |
Classifier Training Service
ConvoScope includes a Python-based classifier training service that:
- Generates synthetic training data using few-shot learning from alignment samples
- Trains a binary classifier (relevant vs not relevant) using sentence-transformers
- Exports to ONNX for client-side browser inference
Architecture
backend/services/classifier/
├── config.py # Configuration classes
├── synthetic_generator.py # Few-shot data generation (threaded)
├── model.py # Binary classifier (MiniLM-L6-v2 + classification head)
├── trainer.py # Training loop with early stopping
├── export_onnx.py # ONNX export with quantization
└── train_classifier.py # Main CLI entry point
Training Pipeline
Accepted Samples (positive few-shot) ──┐
├─→ Synthetic Generator (8 threads)
Rejected Samples (negative few-shot) ──┘ │
↓
1000 synthetic samples
(500 positive + 500 negative)
│
↓
Binary Classifier Training
(5 epochs, early stopping)
│
↓
ONNX Export (quantized)
│
↓
Browser-ready model
Demo Mode vs Real Mode
| Mode | Behavior |
|---|---|
Demo (DEMO_MODE=true) |
Training completes instantly with mock metrics. Frontend uses built-in category classifier. |
Real (DEMO_MODE=false) |
Full pipeline: generates 1000 samples, trains classifier, exports ONNX model. |
Manual Training
You can also run the training service directly:
cd backend/services/classifier
python train_classifier.py \
--input training_data.json \
--output ./output \
--study-id my-study
Input JSON format:
{
"study_id": "abc123",
"prompt_spec": "Conversations about mental health...",
"accepted_samples": [...],
"rejected_samples": [...],
"generation_method": "openai",
"openai_api_key": "sk-...",
"demo_mode": false
}
Project Structure
convoscope/
├── backend/ # Express.js API server
│ ├── routes/ # API routes (studies, forms)
│ ├── services/
│ │ ├── classifier/ # Python classifier training service
│ │ │ ├── config.py
│ │ │ ├── synthetic_generator.py
│ │ │ ├── model.py
│ │ │ ├── trainer.py
│ │ │ ├── export_onnx.py
│ │ │ └── train_classifier.py
│ │ ├── classifierService.js # Node.js wrapper
│ │ ├── modelTrainer.js # Training orchestration
│ │ ├── storageService.js # Data persistence
│ │ └── syntheticGenerator.js # Alignment sample generation
│ └── server.js # Main server entry
├── frontend/ # React application
│ ├── src/
│ │ ├── components/
│ │ │ ├── Form/ # Data donation form (participant)
│ │ │ └── ResearcherDashboard/ # Study management
│ │ └── config.js # App configuration
│ └── public/ # Static assets
├── vllm_test/ # Local LLM generation scripts
├── train_classifier/ # Legacy classifier utilities
├── paper/ # CHI 2026 paper source
├── README.md
├── LICENSE
└── .gitignore
Usage Guide
For Researchers
-
Create a Study: Navigate to the Researcher Dashboard and define your study parameters:
- Study name and description
- Target conversation topics (prompt specification)
- Generation method (OpenAI API or Local vLLM)
- Conversation filters (turn count, message length, etc.)
-
Alignment Phase: Review generated sample conversations:
- Accept conversations that match your research criteria
- Reject conversations that don't match
- Need at least 5 accepted samples to proceed
-
Training: The system trains a custom binary classifier:
- Generates 1000 synthetic samples using your accepted/rejected examples
- Trains a sentence-transformer classifier
- Exports to ONNX for browser inference
-
Publish: Receive a shareable link for participants.
-
Collect Data: Participants upload their conversations, which are filtered and anonymized client-side before submission.
For Participants
-
Export Conversations: Download your ChatGPT conversations from chat.openai.com → Settings → Data Controls → Export.
-
Upload: Visit the study link and upload your exported ZIP file.
-
Review: The system filters relevant conversations and highlights detected PII for your review.
-
Anonymize: Add custom anonymization terms and review auto-detected sensitive information.
-
Submit: Only reviewed, anonymized content is sent to the researcher.
API Endpoints
Health Check
GET /api/health
Studies (Researcher)
GET /api/studies # List all studies
POST /api/studies # Create new study
GET /api/studies/:id # Get study details
PUT /api/studies/:id # Update study
DELETE /api/studies/:id # Delete study
POST /api/studies/:id/generate # Generate alignment samples
POST /api/studies/:id/accept # Accept a sample
POST /api/studies/:id/reject # Reject a sample
POST /api/studies/:id/start-training # Start classifier training
GET /api/studies/:id/training-status # Get training status
Forms (Participant)
GET /api/form/:id # Get form configuration
POST /api/form/:id/submit # Submit anonymized data
System
GET /api/vllm/status # Check local model availability
Environment Variables
Backend
| Variable | Description | Default |
|---|---|---|
NODE_ENV |
Environment mode | development |
PORT |
Server port | 3001 |
DEMO_MODE |
Enable demo mode | false |
OPENAI_API_KEY |
OpenAI API key (optional) | - |
SENDGRID_API_KEY |
SendGrid key for emails (production) | - |
Frontend
| Variable | Description | Default |
|---|---|---|
REACT_APP_DEMO_MODE |
Enable demo mode | false |
REACT_APP_API_URL |
Backend API URL | http://localhost:3001/api |
Deployment
Heroku
-
Create a Heroku app:
heroku create your-app-name -
Set environment variables:
heroku config:set NODE_ENV=production heroku config:set SENDGRID_API_KEY=your-key -
Add Python buildpack (for classifier training):
heroku buildpacks:add --index 1 heroku/python heroku buildpacks:add --index 2 heroku/nodejs -
Deploy:
git push heroku main
Privacy Architecture
ConvoScope implements a client-side privacy architecture:
- Conversation Parsing: ZIP file extraction and JSON parsing happen in the browser.
- Classification: ONNX models run via Transformers.js to filter relevant conversations.
- PII Detection: Named entity recognition identifies sensitive information client-side.
- User Review: Participants review and approve all data before submission.
- Anonymization: PII is replaced with placeholders before leaving the browser.
No raw conversation data ever reaches the server.
Citation
If you use ConvoScope in your research, please cite:
@inproceedings{convoscope2026,
title={ConvoScope: A Privacy-Preserving Platform for LLM Conversation Research},
author={[Authors]},
booktitle={CHI '26 Extended Abstracts},
year={2026},
publisher={ACM}
}
License
MIT License - see LICENSE for details.
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Support
- Issues: Repository Issues
- Contact: For support, please open an issue in the repository.