sequenceDiagram participant User participant Frontend participant NestJS participant PythonBackend participant OpenAI participant CacheSystem User->>Frontend: Initiate Interview Request Frontend->>NestJS: WebRTC Audio/Video Connection Frontend->>NestJS: Websocket Connection NestJS->>PythonBackend: Establish Websocket Connection loop Interview Process User->>Frontend: Audio Input Frontend->>NestJS: Audio Stream NestJS->>NestJS: STT Conversion NestJS->>PythonBackend: Text Question PythonBackend->>CacheSystem: Check Question Cache alt Cache Hit CacheSystem-->>PythonBackend: Return Cached Result else Cache Miss PythonBackend->>OpenAI: LLM Inference Request OpenAI-->>PythonBackend: Return Answer PythonBackend->>CacheSystem: Update Cache end PythonBackend->>NestJS: Return AI Answer NestJS->>NestJS: TTS Conversion NestJS->>Frontend: Audio Answer Frontend->>User: Play AI Answer end User->>Frontend: End Interview Frontend->>NestJS: Close Connection NestJS->>PythonBackend: Save Interview Record
Independently developed complete pipeline: audio-video interview process based on WebRTC → STT → LLM inference → TTS audio return, supporting seamless switching between AI and human interviewers
Built real-time communication system based on Socket.IO + Redis Pub/Sub + Redis Adapter, supporting message synchronization and state broadcasting in multi-client, multi-instance deployments
Supports role switching (system, AI, human), question bank management, resume import, and position-specific Q&A, building structured and unstructured question combinations
System Architecture:
NestJS Backend: Handles audio-video calls and conversation records, HTTP calls to translation and TTS APIs, streaming STT (Baidu, Google, iFlytek), maintains WebSocket connections with frontend and Python backend
Python Backend: Handles AI requests, RAG system, email analysis system, user login management, rate limiting. Connects with NodeJS via WebSocket in AI interview system, and interacts with frontend via REST API for login and email analysis
2. Intelligent Email Analysis System (AI Information Extraction + Auto-closing)
Supports user login via Gmail OAuth2.0, uses Gmail API to fetch application emails, filters historical emails by labels/time
Uses parallel async LLM to analyze email content, extracts application status (read, replied, interview invitation), position name, company info, and aggregates into Timeline
Integrates company position information to push job recommendations to users
3. Todo Management & Smart Reminder System
Automatically generates todos based on email analysis and user behavior, "TODO" "DONE", building complete application tracking path
Uses Resend to implement time-triggered deadline reminders, real-time notifications, and status change pushes, avoiding key information omissions
Supports status synchronization and cross-device access, user operations instantly feedback to database
Vector search engine based on LangChain + Qdrant, supporting Query-Answer vector similarity search
Efficient retrieval process: user questions are vectorized and simultaneously compared with pre-processed Query and Answer vectors for similarity calculation, with weighted merging of results to obtain the most relevant content
Optimized document processing: interview materials are pre-processed into Query-Answer pairs, with separate embedding calculations, enabling retrieval to consider both question similarity and answer relevance
Simplified index structure: direct vector similarity sorting for retrieval, avoiding complex index algorithm tuning while meeting performance requirements at current data scale
Multi-turn Q&A using ConversationSummaryBufferMemory for summary caching, combined with Token threshold controller for automatic context truncation, reducing overall Token cost by 45%
Intent classification using LangChain LLMChain, driving MemoryRouter for dynamic context switching between multiple Agents
Professional Agent system includes:
BQ Interview Agent: handling behavioral questions and soft skills assessment
Coding Interview Agent: evaluating programming ability and algorithm analysis
Basic Knowledge Agent: testing professional domain knowledge
Interview Analysis Agent: providing interview performance evaluation and feedback
Question Agent: generating in-depth questions based on candidate background
5. Asynchronous Task System (Performance Guarantee)
Multi-machine multi-queue task scheduling system based on Celery + Redis, with Worker pool independently deployed for CPU-intensive (PDF/email parsing) and IO-intensive (LLM calls, embedding generation) tasks
Priority task scheduling (e.g., VIP user request queue jumping), reducing urgent task average latency by 60%
Real-time monitoring with Flower, combined with Prometheus/Grafana for automatic alerts and visual tracking of task failure rates and queue backlog metrics
Flower Monitoring Tool Details
Flower is Celery's officially recommended task monitoring tool, suitable for real-time monitoring of Celery task execution, Worker status, task queue backlog, etc.. It can be understood as Celery's visual Dashboard + API server.
Flower Features:
👨🔧 Worker status (active, stopped, heartbeat lost)
📦 Queue length (backlog status)
✅ Task execution status (success, failure, retry, duration)
User-level (10/min), IP-level (50/min), and global OpenAI call-level (500/min) rate limiting system built with Redis sliding window algorithm, intercepting 98% of high-frequency abuse requests
Combined with Redis Bloom filter to cache processed request fingerprints, preventing duplicate submissions or replay attacks, with false positive rate < 0.1%
High-frequency data (e.g., user's last 3 interview summaries) using Redis cache with JSON serialization storage, reducing query latency from MongoDB's 15ms to 1ms
Pre-generated and cached high-frequency interview question embeddings, reducing OpenAI API call volume by 35%
Lightweight MongoDB query cache layer with LRU + TTL strategy for automatic old data eviction, reducing high-frequency query (e.g., user/position information) latency from 15ms to 1ms, increasing overall QPS by 4x
Low-latency interview system built with WebRTC + Socket.IO + Redis Pub/Sub, supporting seamless switching between AI and human interviewers
High-performance RAG Architecture
Vector search engine based on LangChain + Qdrant, supporting Query-Answer vector similarity search
Multi-level Cache Design
Multi-level architecture with Redis + MongoDB query cache, reducing key query latency from 15ms to 1ms, increasing system QPS by 4x
Asynchronous Task Optimization
CPU/IO separation deployment strategy based on Celery, with priority scheduling, reducing urgent task latency by 60%
Efficient Token Management
Reducing overall token cost by 45% through summary caching and context truncation
Comprehensive Security Protection
Multi-dimensional rate limiting + Bloom filter + end-to-end encryption, blocking 98% of abuse requests with <0.1% false positive rate
Multi-Agent Routing System
Context-aware dynamic routing based on intent classification, supporting switching between specialized domain agents
Intelligent Email Analysis
Gmail API integration + parallel async LLM processing + semantic extraction, building an intelligent tracking system for the entire application lifecycle
Vector search engine based on LangChain + Qdrant, supporting Query-Answer vector similarity search
Efficient retrieval process: user questions are vectorized and simultaneously compared with pre-processed Query and Answer vectors for similarity calculation, with weighted merging of results to obtain the most relevant content
Optimized document processing: interview materials are pre-processed into Query-Answer pairs, with separate embedding calculations, enabling retrieval to consider both question similarity and answer relevance
Simplified index structure: direct vector similarity sorting for retrieval, avoiding complex index algorithm tuning while meeting performance requirements at current data scale
Multi-turn Q&A using ConversationSummaryBufferMemory for summary caching, combined with Token threshold controller for automatic context truncation, reducing overall Token cost by 45%
Intent classification using LangChain LLMChain, driving MemoryRouter for dynamic context switching between multiple Agents
Professional Agent system includes:
BQ Interview Agent: handling behavioral questions and soft skills assessment
Coding Interview Agent: evaluating programming ability and algorithm analysis
Basic Knowledge Agent: testing professional domain knowledge
Interview Analysis Agent: providing interview performance evaluation and feedback
Question Agent: generating in-depth questions based on candidate background
Multi-machine multi-queue task scheduling system based on Celery + Redis, with Worker pool independently deployed for CPU-intensive (PDF/email parsing) and IO-intensive (LLM calls, embedding generation) tasks
Priority task scheduling (e.g., VIP user request queue jumping), reducing urgent task average latency by 60%
Real-time monitoring with Flower, combined with Prometheus/Grafana for automatic alerts and visual tracking of task failure rates and queue backlog metrics
Flower Monitoring Tool Details
Flower is Celery's officially recommended task monitoring tool, suitable for real-time monitoring of Celery task execution, Worker status, task queue backlog, etc.. It can be understood as Celery's visual Dashboard + API server.
Flower Features:
👨🔧 Worker status (active, stopped, heartbeat lost)
📦 Queue length (backlog status)
✅ Task execution status (success, failure, retry, duration)
User-level (10/min), IP-level (50/min), and global OpenAI call-level (500/min) rate limiting system built with Redis sliding window algorithm, intercepting 98% of high-frequency abuse requests
Combined with Redis Bloom filter to cache processed request fingerprints, preventing duplicate submissions or replay attacks, with false positive rate < 0.1%
High-frequency data (e.g., user's last 3 interview summaries) using Redis cache with JSON serialization storage, reducing query latency from MongoDB's 15ms to 1ms
Pre-generated and cached high-frequency interview question embeddings, reducing OpenAI API call volume by 35%
Lightweight MongoDB query cache layer with LRU + TTL strategy for automatic old data eviction, reducing high-frequency query (e.g., user/position information) latency from 15ms to 1ms, increasing overall QPS by 4x