MyDataPilot: AI-Powered Data Science Assistant

Abstract

Challenge: The complexity of data science workflows presents significant barriers for non-technical professionals seeking data-driven insights.

Solution: We introduce a web portal that democratizes access to multi-agent AI systems, enabling users without machine learning expertise to perform sophisticated data science tasks through natural language interaction.

Key Features:

Five specialized AI agents collaborating to decompose complex requests
Interactive task review interface with user modification capabilities
Transparent AI dialogue visualization with editable reasoning process
Real-time streaming execution with step-by-step control

Technology: Built on microservices architecture (Node.js + Python), powered by OpenAI's GPT-4, with React-based frontend featuring session persistence and secure API management.

Backend Architecture

Overview

Our backend employs a microservices architecture with two core services:

Node.js Service: Handles authentication, session management, and real-time communication
Python Service: Executes AI agent workflows and data processing tasks

Multi-Agent AI System

Task Splitter

Decomposes complex queries into 5-7 atomic subtasks

Task Optimizer

Enriches tasks with dependencies and expected outputs

Code Writer

Generates Python code for data science tasks

Code Reviewer

Validates execution and provides corrections

Answer Extractor

Synthesizes outputs into user-friendly responses

Workflow Process

Key Technical Features

Robust Error Handling

Token management with intelligent truncation
Exponential backoff for rate limits
Comprehensive error recovery mechanisms

Transparency & Control

Editable AI dialogue sequences
Real-time execution monitoring
Session persistence and recovery

Technology Stack

Node.js + Express Python + Flask MongoDB OpenAI GPT-4 Server-Sent Events JWT Authentication

In Summary: Our backend transforms natural language into executable data science workflows through collaborative AI agents, providing unprecedented transparency and control while maintaining accessibility for non-technical users.

Frontend Architecture

Design Philosophy

Built with React 18, our interface follows progressive disclosure principles to guide non-technical users through complex data science workflows seamlessly.

Three-Phase User Journey

1. Task Input

Natural language request with intelligent suggestions

2. Plan Review

Visual task breakdown with editing capabilities

3. Execution

Step-by-step monitoring with full control

Key Innovations

Transparent AI Dialogue System

Our unique AIDialogueSequence component reveals the complete reasoning chain of each AI agent:

View prompt generation, code writing, and error handling processes
Edit any dialogue node to observe downstream effects
Understand how AI makes decisions at each step

Granular Execution Control

Every subtask provides real-time control options:

Retry Edit Continue View AI Dialogue

Enhanced Features

Code Intelligence

Syntax highlighting with Prism.js
One-click code download
Direct server file saving

Rich Content Support

LaTeX rendering with KaTeX
Interactive data visualizations
Multi-format export options

Smart Persistence

Auto-save with 5-second delay
Session history management
Work continuity across sessions

Real-time Updates

Server-Sent Events streaming
Live progress indicators
Instant error feedback

Frontend Technology Stack

React 18 TypeScript Tailwind CSS Server-Sent Events Prism.js KaTeX React Router

Responsive Design

Desktop

Tablet

Mobile

User Experience First: Through carefully crafted visual feedback, loading states, and micro-interactions, our interface ensures that complex AI capabilities remain accessible and intuitive for all users, regardless of their technical background.

Key Features

Secure Authentication

Email-verified accounts with flexible login options

AI Task Planning

Automatic decomposition into manageable subtasks

Session Persistence

Auto-save with complete conversation history

User Authentication & Setup

Our secure authentication system ensures data privacy while providing flexible access options. Users can register with email verification and choose to login with either username or email.

Personal API Key Management

Users maintain control over their AI usage by providing their own OpenAI API key. This ensures data privacy, usage transparency, and cost control.

Encrypted Storage User-Controlled Usage Tracking

Intelligent Task Decomposition

Our AI automatically breaks down complex requests into 5 manageable subtasks, each with clear objectives and expected outputs. Users can review and modify the plan before execution.

File Integration: Simply provide local file paths in your requests, and our system will automatically access and process your data files as part of the task execution.

Transparent AI Reasoning

Revolutionary Transparency: View and edit the complete AI reasoning process. Every prompt, code generation step, and decision point is visible and modifiable.

Unique Feature: Edit & Regenerate

Modify any part of the AI conversation and watch as subsequent responses adapt to your changes.

Granular Execution Control

Full control over task execution with real-time monitoring and intervention capabilities:

Retry: Re-execute tasks that encounter errors
Edit: Add context or corrections before re-running
Continue: Proceed to the next subtask
View AI Dialogue: Inspect and modify reasoning process

Intelligent Conversation Interface

The main interface combines all features into a seamless experience: natural language input, real-time task processing, session history, and comprehensive result display. You can reference local file paths in your requests, and the system will automatically access and process them.

Roadmap & Community

Development Roadmap

Q1 2025 - Completed

✅ Multi-agent system architecture
✅ Interactive task editing interface
✅ AI dialogue transparency

Q2 2025 - In Progress

🔄 Integration with IMPROVE model for enhanced accuracy
🔄 Advanced visualization capabilities
🔄 Performance optimization

Q3-Q4 2025 - Planned

📋 Support for multiple LLM providers (Claude, Gemini)
📋 Collaborative workspace features
📋 Custom agent creation toolkit
📋 Enterprise features and API

Coming Soon

IMPROVE Model

Enhanced reasoning with iterative refinement

Advanced Analytics

Interactive visualizations and dashboards

Team Collaboration

Share and collaborate on projects

Get in Touch

We value your feedback and suggestions! Help us improve the platform:

Email Feedback

Contact: xy63@illinois.edu

Contribute

Join our open-source community and help democratize AI:

Star our repository
Report bugs and issues
Suggest new features
Submit pull requests
Improve documentation

View the Github

Acknowledgments

This project wouldn't be possible without the support and contributions from:

Professor Haohan Wang
Project guidance and mentorship

Chen Ke
Prompter System Model

Eric Xue
IMPROVE Model

Join Us in Democratizing AI

Together, we can make advanced data science accessible to everyone

Try the App Contact Us

MyDataPilot: AI-Powered Data Science Assistant

We use the prompter system model to design a web portal for people without ML experiences to use multi-agent systems for data science tasks.

Abstract

Backend Architecture

Overview

Multi-Agent AI System

Workflow Process

Key Technical Features

Robust Error Handling

Transparency & Control

Technology Stack

Frontend Architecture

Design Philosophy

Three-Phase User Journey

Key Innovations

Transparent AI Dialogue System

Granular Execution Control

Enhanced Features

Frontend Technology Stack

Responsive Design

Key Features

User Authentication & Setup

Personal API Key Management

Intelligent Task Decomposition

Transparent AI Reasoning

Granular Execution Control

Intelligent Conversation Interface

Roadmap & Community

Development Roadmap

Coming Soon

Get in Touch

Contribute

Join Us in Democratizing AI