MyDataPilot: AI-Powered Data Science Assistant

Xucheng Yu1, Haohan Wang1,
1University of Illinois Urbana-Champaign
Prompter System
Prompter System Model
Distillation Process
Overview

We use the prompter system model to design a web portal for people without ML experiences to use multi-agent systems for data science tasks.

Abstract

Challenge: The complexity of data science workflows presents significant barriers for non-technical professionals seeking data-driven insights.

Solution: We introduce a web portal that democratizes access to multi-agent AI systems, enabling users without machine learning expertise to perform sophisticated data science tasks through natural language interaction.

Key Features:

  • Five specialized AI agents collaborating to decompose complex requests
  • Interactive task review interface with user modification capabilities
  • Transparent AI dialogue visualization with editable reasoning process
  • Real-time streaming execution with step-by-step control

Technology: Built on microservices architecture (Node.js + Python), powered by OpenAI's GPT-4, with React-based frontend featuring session persistence and secure API management.

Backend Architecture

Overview

Our backend employs a microservices architecture with two core services:

  • Node.js Service: Handles authentication, session management, and real-time communication
  • Python Service: Executes AI agent workflows and data processing tasks

Multi-Agent AI System

Task Splitter

Decomposes complex queries into 5-7 atomic subtasks

Task Optimizer

Enriches tasks with dependencies and expected outputs

Code Writer

Generates Python code for data science tasks

Code Reviewer

Validates execution and provides corrections

Answer Extractor

Synthesizes outputs into user-friendly responses

Workflow Process

Backend Workflow
  1. Input Processing: Natural language request → JSON structured format
  2. Task Decomposition: Complex query → 5 atomic subtasks
  3. Task Execution:
    • Coding tasks → Code generation → Execution → Review loop
    • Analysis tasks → Direct domain-specific processing
  4. Output Synthesis: Individual results → Coherent final response

Key Technical Features

Robust Error Handling
  • Token management with intelligent truncation
  • Exponential backoff for rate limits
  • Comprehensive error recovery mechanisms
Transparency & Control
  • Editable AI dialogue sequences
  • Real-time execution monitoring
  • Session persistence and recovery

Technology Stack

Node.js + Express Python + Flask MongoDB OpenAI GPT-4 Server-Sent Events JWT Authentication

In Summary: Our backend transforms natural language into executable data science workflows through collaborative AI agents, providing unprecedented transparency and control while maintaining accessibility for non-technical users.

Frontend Architecture

Design Philosophy

Built with React 18, our interface follows progressive disclosure principles to guide non-technical users through complex data science workflows seamlessly.

Three-Phase User Journey

1. Task Input

Natural language request with intelligent suggestions

2. Plan Review

Visual task breakdown with editing capabilities

3. Execution

Step-by-step monitoring with full control

Key Innovations

Transparent AI Dialogue System

Our unique AIDialogueSequence component reveals the complete reasoning chain of each AI agent:

  • View prompt generation, code writing, and error handling processes
  • Edit any dialogue node to observe downstream effects
  • Understand how AI makes decisions at each step
Granular Execution Control

Every subtask provides real-time control options:

Retry Edit Continue View AI Dialogue

Enhanced Features

Code Intelligence

  • Syntax highlighting with Prism.js
  • One-click code download
  • Direct server file saving

Rich Content Support

  • LaTeX rendering with KaTeX
  • Interactive data visualizations
  • Multi-format export options

Smart Persistence

  • Auto-save with 5-second delay
  • Session history management
  • Work continuity across sessions

Real-time Updates

  • Server-Sent Events streaming
  • Live progress indicators
  • Instant error feedback
AI Reasoning Process

Transparent AI reasoning process

Edit and Regenerate

Edit dialogues and regenerate results

Frontend Technology Stack

React 18 TypeScript Tailwind CSS Server-Sent Events Prism.js KaTeX React Router
Responsive Design

Desktop

Tablet

Mobile

User Experience First: Through carefully crafted visual feedback, loading states, and micro-interactions, our interface ensures that complex AI capabilities remain accessible and intuitive for all users, regardless of their technical background.

Key Features

Secure Authentication

Email-verified accounts with flexible login options

AI Task Planning

Automatic decomposition into manageable subtasks

Session Persistence

Auto-save with complete conversation history

User Authentication & Setup

Our secure authentication system ensures data privacy while providing flexible access options. Users can register with email verification and choose to login with either username or email.

User Login

Secure login interface

User Registration

Email-verified registration

Personal API Key Management

Users maintain control over their AI usage by providing their own OpenAI API key. This ensures data privacy, usage transparency, and cost control.

Encrypted Storage User-Controlled Usage Tracking
API Key Setup

Intelligent Task Decomposition

Our AI automatically breaks down complex requests into 5 manageable subtasks, each with clear objectives and expected outputs. Users can review and modify the plan before execution.

File Integration: Simply provide local file paths in your requests, and our system will automatically access and process your data files as part of the task execution.

Task Plan Review

Interactive task plan review

Edit Task Details

Modify task details before execution

Transparent AI Reasoning

Revolutionary Transparency: View and edit the complete AI reasoning process. Every prompt, code generation step, and decision point is visible and modifiable.

AI Communication Records

Complete AI dialogue history

Unique Feature: Edit & Regenerate

Modify any part of the AI conversation and watch as subsequent responses adapt to your changes.

Edit AI Conversation

Edit AI dialogue content

Regenerated Result

Regenerated results

Granular Execution Control

Full control over task execution with real-time monitoring and intervention capabilities:

  • Retry: Re-execute tasks that encounter errors
  • Edit: Add context or corrections before re-running
  • Continue: Proceed to the next subtask
  • View AI Dialogue: Inspect and modify reasoning process
Subtask Results

Interactive execution controls

Intelligent Conversation Interface

Main Chat Interface

The main interface combines all features into a seamless experience: natural language input, real-time task processing, session history, and comprehensive result display. You can reference local file paths in your requests, and the system will automatically access and process them.

Roadmap & Community

Development Roadmap

Q1 2025 - Completed

  • ✅ Multi-agent system architecture
  • ✅ Interactive task editing interface
  • ✅ AI dialogue transparency

Q2 2025 - In Progress

  • 🔄 Integration with IMPROVE model for enhanced accuracy
  • 🔄 Advanced visualization capabilities
  • 🔄 Performance optimization

Q3-Q4 2025 - Planned

  • 📋 Support for multiple LLM providers (Claude, Gemini)
  • 📋 Collaborative workspace features
  • 📋 Custom agent creation toolkit
  • 📋 Enterprise features and API

Coming Soon

IMPROVE Model

Enhanced reasoning with iterative refinement

Advanced Analytics

Interactive visualizations and dashboards

Team Collaboration

Share and collaborate on projects

Future Optimization Directions

System architecture evolution

Get in Touch

We value your feedback and suggestions! Help us improve the platform:

Contribute

Join our open-source community and help democratize AI:

  • Star our repository
  • Report bugs and issues
  • Suggest new features
  • Submit pull requests
  • Improve documentation
View the Github

Acknowledgments

This project wouldn't be possible without the support and contributions from:

Professor Haohan Wang
Project guidance and mentorship

Chen Ke
Prompter System Model

Eric Xue
IMPROVE Model

Join Us in Democratizing AI

Together, we can make advanced data science accessible to everyone