Implementing Email Analysis with Google OAuth Integration
Published on
/7 mins read/---
Building a secure and efficient email analysis system requires careful consideration of authentication, data processing, and performance optimization. This article explores the implementation of an email analysis feature using Google OAuth for authentication and GPT4O MINI for content analysis, with a focus on system architecture and best practices.
Email Analysis with Google OAuth Integration
Overview
This document outlines the implementation of the email analysis feature using Google OAuth for authentication and GPT4O MINI for email content analysis. The system allows users to analyze their job-related emails by connecting their Gmail account, processing the emails, and presenting the results in a structured format.
Key Components
Frontend Layer
EmailAnalysis Page: React-based UI for email analysis
Google OAuth Integration: Handles user authentication flow
Local Storage: Stores JWT session token
Material-UI Components: Modern and responsive design
Backend Services
Token Service
Manages Gmail OAuth tokens
Generates and validates JWT session tokens
Handles token refresh and expiration
Implements token caching strategy
Email Service
Fetches emails using Gmail API
Implements batched email retrieval
Handles rate limiting and quotas
Manages email metadata caching
Processing Service
Analyzes email content
Classifies job application status
Extracts relevant information
Implements result caching
Caching Layer
Token Cache
TTLCache for Gmail access tokens (1 hour TTL)
No MongoDB storage for tokens
Analysis Cache
Memory Cache: Fast access to recent results
MongoDB Cache: Permanent storage for long-term data retention
cached_at field for tracking purposes only
Security Layer
JWT-based session management
Secure token storage and transmission
Rate limiting and request validation
Error handling and logging
Components
Frontend Components
EmailAnalysis Page
Path: src/pages/EmailAnalysis/
Handles user interaction and displays email analysis results
Manages Google OAuth flow initiation
Header Integration
Path: src/components/Layout/Header.tsx
Provides navigation access to email analysis feature
Uses MailOutlined icon for visual representation
Backend Services
Token Service (server-python/src/services/token_service.py)
Manages OAuth token lifecycle
Handles token exchange and validation
Implements token caching for performance
Email Service (server-python/src/services/email_service.py)
sequenceDiagram participant Client participant Backend participant Google Client->>Backend: Request with expired token Backend->>Backend: Detect token expiration Backend->>Client: Return 401 with X-Token-Expired header Client->>Backend: POST /api/auth/refresh Note over Backend: Get refresh token from cookie Backend->>Google: Request new access token Google->>Backend: Return new tokens Backend->>Backend: Store access token in TTLCache Backend->>Client: Return new session token
Token Security
Access Token: Never exposed to frontend
Refresh Token: HttpOnly cookie, HTTPS only
Session Token: JWT with user info, short-lived
Token Cleanup
Access Token: Auto-removed from TTLCache after 1 hour
Refresh Token: Auto-expired from cookie after 30 days
Session Token: Auto-expired after 24 hours
Authentication Flow
sequenceDiagram participant User participant Frontend participant Backend participant Google participant Cache participant Cookie User->>Frontend: Click Login Frontend->>Google: Request Authorization Google->>User: Show Consent Screen User->>Google: Grant Permission Google->>Frontend: Return Authorization Code Frontend->>Backend: Send Authorization Code Backend->>Google: Exchange Code for Tokens Google->>Backend: Return Access & Refresh Tokens Backend->>Cache: Store Access Token (1h TTL) Backend->>Cookie: Store Refresh Token (30d) Backend->>Backend: Generate JWT Session Token Backend-->>Frontend: Return JWT Token & User Info Frontend->>Frontend: Store JWT Token (24h)
User Session Management
Session Token (JWT)
interface SessionToken { sub: string // User email name: string // User name picture: string // User avatar URL exp: number // Expiration timestamp}
Every API request includes JWT token in Authorization header
Backend validates token and user existence
Automatic token refresh on expiration
Session Cleanup
Automatic logout on token expiration
Clear local storage on logout
Revoke refresh token cookie
Configuration
Environment Variables
Required environment variables for setup:
GMAIL_CLIENT_ID: Google OAuth client ID
GMAIL_CLIENT_SECRET: Google OAuth client secret
GMAIL_REDIRECT_URIS: OAuth callback URL
OAuth Scopes
The application requires the following Gmail API scopes:
https://www.googleapis.com/auth/gmail.readonly
https://www.googleapis.com/auth/userinfo.email
Internationalization
The feature supports multiple languages including:
English (en)
Chinese (zh)
Traditional Chinese (zh-tw)
Japanese (ja)
Spanish (es)
Korean (ko)
Security Considerations
Token Storage
Access tokens stored in memory only (TTLCache)
Refresh tokens stored in HttpOnly cookies
No persistent token storage in MongoDB
Automatic token refresh handling
Authentication Flow
Implements standard OAuth 2.0 flow
Uses secure token exchange
Implements token expiration handling
Error Handling
The system implements comprehensive error handling for:
OAuth authentication failures
Email fetching errors
Processing service errors
Rate limiting and throttling
Performance Optimization
Caching Strategy
Two-tier caching system for email analysis results:
TTLCache (in-memory): Fast access, 7 days TTL
MongoDB (persistent): Long-term storage, 360 days TTL
Optimized email fetching process:
First fetch only message IDs
Check cache status for each message ID
For cached messages:
Only fetch minimal metadata (subject, from, to, date)
Skip body content fetching and parsing
For uncached messages:
Fetch full message content
Perform analysis
Cache results
Token caching for reduced API calls
Batch processing for multiple emails
Smart sync time handling:
Stores last sync time in UTC format
Uses the later time between last_sync time and days_back limit
Prevents processing of already analyzed emails
Ensures no emails beyond days_back limit are processed
Rate Limiting
Implements Gmail API rate limit handling
Batch processing to optimize API usage
Minimizes API calls through efficient caching
Cache Settings
# Memory Cache (TTLCache)MAX_TOKEN_CACHE_SIZE = 1000 # Maximum tokens to cacheTOKEN_CACHE_TTL = 3600 # 1 hour in secondsMAX_ANALYSIS_CACHE_SIZE = 1000 # Maximum analysis results to cacheANALYSIS_CACHE_TTL = 604800 # 7 days in seconds# MongoDB Cachedb_cache_ttl_days: int = 360 # 360 days for analysis results# Local Storage KeysLOCAL_STORAGE_LAST_SYNC_KEY = 'email_analysis_last_sync' # Stores UTC timestamp
Email Fetching Process with Sync Time
sequenceDiagram participant Client participant Frontend participant Backend participant GmailAPI Client->>Frontend: Request Emails Frontend->>Frontend: Get last_sync from localStorage Frontend->>Backend: fetch_emails(last_sync) Backend->>Backend: Calculate days_back limit Backend->>Backend: Compare last_sync with days_back Note over Backend: Use later of last_sync<br/>and days_back limit Backend->>GmailAPI: Query with date filter GmailAPI-->>Backend: Return filtered emails Backend->>Backend: Process & analyze emails Backend-->>Frontend: Return results Frontend->>Frontend: Update last_sync time Frontend->>Frontend: Store new last_sync in localStorage