Unlock the Power of Your Data: Fine-Tune LLMs with Easy Dataset

Tired of wrestling with complex tools to prepare your data for Large Language Models (LLMs)? Easy Dataset simplifies the entire process, empowering you to transform your domain expertise into high-quality training data. Stop wasting time on tedious data preparation and start fine-tuning your LLMs for optimal performance with this intuitive and powerful tool.

Key Benefits: Why Choose Easy Dataset?

Streamlined Workflow: Transform raw documents into structured datasets ready for LLM fine-tuning.
Enhanced Model Performance: Improve accuracy and relevance by training on domain-specific data.
Increased Efficiency: Save time and resources with automated data processing and generation.

Features That Make a Difference

Easy Dataset boasts a comprehensive suite of features designed to accelerate your LLM fine-tuning workflow:

Intelligent Document Processing: Automatically split Markdown files into logical segments for focused training.
Smart Question Generation: Extract relevant questions from text segments, ensuring comprehensive coverage of your data.
Automated Answer Generation: Leverage LLM APIs to generate detailed answers, reducing manual effort.
Flexible Editing: Refine questions, answers, and datasets at any stage to ensure data quality.
Versatile Export Options: Export datasets in Alpaca, ShareGPT, JSON, and JSONL formats for seamless compatibility.
Broad Model Compatibility: Works with any LLM API following the OpenAI format, giving you flexibility.
User-Friendly Interface: An intuitive UI makes Easy Dataset accessible to both technical and non-technical users.
Customizable System Prompts: Guide model responses with custom prompts tailored to your specific needs.

Getting Started: Your Path to Fine-Tuned LLMs

Ready to experience the power of Easy Dataset? Here's how to get started:

Download the Client:
- Requires Node.js 18.x or higher and either pnpm (recommended) or npm.

Clone the Repository:

git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset

Install Dependencies:
```
npm install
```
Start the Development Server:
```
npm run build
npm run start
```

Build with Local Dockerfile (Optional)

For isolated environments, you can use Docker:

Clone the Repository:

git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset

Build the Docker Image:
```
docker build -t easy-dataset .
```
Run the Container:
```
docker run -d -p 1717:1717 -v {YOUR_LOCAL_DB_PATH}:/app/local-db --name easy-dataset easy-dataset
```
- Important: Replace {YOUR_LOCAL_DB_PATH} with the desired location for your local database.
Open in Browser: Access the application at http://localhost:1717.

Usage: A Step-by-Step Guide to Dataset Creation

Easy Dataset guides you through the process with these easy steps:

Create a Project:
- Click "Create Project," enter a name and description, and configure your LLM API settings.
Process Documents:
- Upload Markdown files in the "Text Split" section, review the segmented text, and adjust as needed.
Generate Questions:
- Navigate to "Questions," select text segments, and review/edit the generated questions. Organize them using the tag tree.
Create Datasets:
- Go to "Datasets," select questions to include, generate answers using your configured LLM, and review/edit.
Export Datasets:
- Click "Export," choose your preferred format (Alpaca or ShareGPT), file format (JSON or JSONL), add custom system prompts (optional), and export.

Project Structure: A Glimpse Under the Hood

The project's organized structure makes it easy to navigate and contribute:

app/: Next.js application directory with API routes and front-end pages.
components/: React components for various sections like datasets, home, projects, questions, and text splitting.
lib/: Core libraries and utilities for database operations, internationalization, LLM integration, and text splitting.
locales/: Internationalization resources for English and Chinese.
public/: Static assets, including images.
local-db/: Local file-based database for storing project data.

Contribute and Shape the Future of LLM Training

Easy Dataset thrives on community contributions! Fork the repository, create a feature branch, make your changes, and submit a pull request.

License: Open and Accessible

This project is licensed under the Apache License 2.0, ensuring open access and collaboration.

Unlock the Power of Your Data: Fine-Tune LLMs with Easy Dataset

Features That Make a Difference

Easy Dataset boasts a comprehensive suite of features designed to accelerate your LLM fine-tuning workflow:

Intelligent Document Processing: Automatically split Markdown files into logical segments for focused training.

Smart Question Generation: Extract relevant questions from text segments, ensuring comprehensive coverage of your data.

Automated Answer Generation: Leverage LLM APIs to generate detailed answers, reducing manual effort.

Flexible Editing: Refine questions, answers, and datasets at any stage to ensure data quality.

Versatile Export Options: Export datasets in Alpaca, ShareGPT, JSON, and JSONL formats for seamless compatibility.

Broad Model Compatibility: Works with any LLM API following the OpenAI format, giving you flexibility.

User-Friendly Interface: An intuitive UI makes Easy Dataset accessible to both technical and non-technical users.

Customizable System Prompts: Guide model responses with custom prompts tailored to your specific needs.

Getting Started: Your Path to Fine-Tuned LLMs

Ready to experience the power of Easy Dataset? Here's how to get started:

Download the Client:

Requires Node.js 18.x or higher and either pnpm (recommended) or npm.

Clone the Repository:

git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset

Install Dependencies:

npm install

Start the Development Server:

npm run build
npm run start

Build with Local Dockerfile (Optional)

For isolated environments, you can use Docker:

Clone the Repository:

git clone https://github.com/ConardLi/easy-dataset.git
cd easy-dataset

Build the Docker Image:

docker build -t easy-dataset .

Run the Container:

docker run -d -p 1717:1717 -v {YOUR_LOCAL_DB_PATH}:/app/local-db --name easy-dataset easy-dataset

Important: Replace {YOUR_LOCAL_DB_PATH} with the desired location for your local database.

Open in Browser: Access the application at http://localhost:1717.

Usage: A Step-by-Step Guide to Dataset Creation

Easy Dataset guides you through the process with these easy steps:

Create a Project:

Click "Create Project," enter a name and description, and configure your LLM API settings.

Process Documents:

Upload Markdown files in the "Text Split" section, review the segmented text, and adjust as needed.

Generate Questions:

Navigate to "Questions," select text segments, and review/edit the generated questions. Organize them using the tag tree.

Create Datasets:

Go to "Datasets," select questions to include, generate answers using your configured LLM, and review/edit.

Export Datasets:

Click "Export," choose your preferred format (Alpaca or ShareGPT), file format (JSON or JSONL), add custom system prompts (optional), and export.

Project Structure: A Glimpse Under the Hood

The project's organized structure makes it easy to navigate and contribute:

app/: Next.js application directory with API routes and front-end pages.

components/: React components for various sections like datasets, home, projects, questions, and text splitting.

lib/: Core libraries and utilities for database operations, internationalization, LLM integration, and text splitting.

locales/: Internationalization resources for English and Chinese.

public/: Static assets, including images.

local-db/: Local file-based database for storing project data.

Unlock the Power of Your Data: Fine-Tune LLMs with Easy Dataset

Key Benefits: Why Choose Easy Dataset?

Features That Make a Difference

Getting Started: Your Path to Fine-Tuned LLMs

Build with Local Dockerfile (Optional)

Usage: A Step-by-Step Guide to Dataset Creation

Project Structure: A Glimpse Under the Hood

Contribute and Shape the Future of LLM Training

License: Open and Accessible

Unlock the Power of Your Data: Fine-Tune LLMs with Easy Dataset

Key Benefits: Why Choose Easy Dataset?

Features That Make a Difference

Getting Started: Your Path to Fine-Tuned LLMs

Build with Local Dockerfile (Optional)

Usage: A Step-by-Step Guide to Dataset Creation

Project Structure: A Glimpse Under the Hood

Contribute and Shape the Future of LLM Training

License: Open and Accessible

Related Posts