
The source language of the documents (using ISO 639-1 code)
SOURCE_LANGUAGE = " en "
The target language for translation (using ISO 639-1 code)
TARGET_LANGUAGE = " zh " # Chinese as an example
Root directory containing the documents to be translated
SOURCE_DIRECTORY = " docs "
Output directory to store the translated documents
OUTPUT_DIRECTORY = " translated_docs "
Translation prompt prefix for guiding the language model
TRANSLATION_PROMPT = f" Please translate the following {SOURCE_LANGUAGE} text to {TARGET_LANGUAGE}: "
--- Translation Function ---
def translate_text(text: str, model: str = MODEL_NAME, prompt: str = TRANSLATION_PROMPT) -> str: """ Translates the given text using the specified local language model. Args: text (str): The text to be translated. model (str, optional): The name of the language model to use. Defaults to MODEL_NAME. prompt (str, optional): The translation prompt to guide the model. Defaults to TRANSLATION_PROMPT. Returns: str: The translated text. """ try:
Construct structured data for the request
data = { "prompt": f"{prompt}{text}", "model": model, "stream": False # Streamed responses are harder to handle in this simple example }
Send a POST request to the Ollama API
response = requests.post(OLLAMA_API_URL, data=json.dumps(data), stream=False) response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx) response_data = response.json()
Extract the translated text from the response
translated_text = response_data.get("response", " ") return translated_text.strip() except requests.exceptions.RequestException as e: print(f" Network error occurred: {e} ") return None except json.JSONDecodeError as e: print(f" JSON decoding error: {e} Response Text: {response.text} ") return None except Exception as e: print(f" An unexpected error occurred: {e} ") return None def process_markdown_file(input_filepath: str, output_filepath: str): """ Reads a markdown file, translates its content, and writes the translated content to a new file. Args: input_filepath (str): The full path to the input Markdown file. output_filepath (str): The full path to the output translated Markdown file. """ try: with open(input_filepath, " r ", encoding=" utf-8 ") as infile: markdown_content = infile.read() print(f" Translating content from: {input_filepath} ") translated_content = translate_text(markdown_content) if translated_content: with open(output_filepath, " w ", encoding=" utf-8 ") as outfile: outfile.write(translated_content) print(f" Successfully translated and saved to: {output_filepath} ") else: print(f" Translation failed for {input_filepath}. Check for errors. ") except FileNotFoundError: print(f" Error: Input file not found: {input_filepath} ") except IOError as e: print(f" IO error occurred: {e} ") except Exception as e: print(f" An unexpected error occurred: {e} ") def traverse_directory_and_translate(source_dir: str, output_dir: str): """ Traverses the source directory, finds Markdown files, and translates them. Args: source_dir (str): The path to the root directory containing the files to translate. output_dir (str): The path to the root directory where translated files will be saved. """
Ensure the output directory exists; create it if it doesn't
if not os.path.exists(output_dir): os.makedirs(output_dir)
Walk through the source directory and its subdirectories
for root, _, files in os.walk(source_dir): for filename in files: if filename.endswith(".md"):
Create the full input and output file paths
input_filepath = os.path.join(root, filename)
Create corresponding subdirectories in the output directory
relative_path = os.path.relpath(root, source_dir) output_subdir = os.path.join(output_dir, relative_path)
Ensure the output subdirectory exists
os.makedirs(output_subdir, exist_ok=True) output_filepath = os.path.join(output_subdir, filename)
Process this markdown file
process_markdown_file(input_filepath, output_filepath) def main(): """ Main function to start the batch translation process. """ start_time = time.time() print(" Starting batch translation process... ")
Initiate the traversal and translation
traverse_directory_and_translate(SOURCE_DIRECTORY, OUTPUT_DIRECTORY) end_time = time.time() total_time = end_time - start_time print(f" Batch translation completed in {total_time:.2f} seconds. ") if name == " main ": main()
How the Script Works:
Configuration: The script starts by defining several configuration variables. Change them according to your setup, especially the model name.
Translation Function: The translate_text
function sends the markdown content to the locally running Ollama model and retrieves the translation. It uses a prompt
to instruct the LLM to translate. The response is parsed from JSON format and returned.
Markdown Processing: The process_markdown_file
function reads a file, calls the translate_text
and writes the translated output to a corresponding output file.
Directory Traversal: The traverse_directory_and_translate
function recursively scans through your source directory, processes each .md
file it finds, and saves the translated files to a parallel directory structure in the destination folder. If the destination directory doesn't exist, it automatically creates it.
Error Handling: The script includes comprehensive error handling to catch potential issues, such as file access problems, network errors, JSON parsing failures, and unexpected exceptions. This helps ensure the script runs smoothly and provides informative error messages when problems occur.
Preparing and Running the Script
Save the above code in a file named translate_script.py inside the servbay_translator directory.
Open your terminal, navigate to the servbay_translator directory, and run the script:
Examine Results: After the script completes successfully, a translated_docs directory will be created containing the translated Markdown files. The directory structure mirrors that of the original docs directory.
Troubleshooting
Ollama Not Running: Make sure the Ollama service is running in ServBay. If it's not, start it in the ServBay interface. Check the Ollama configuration for any error messages.
Model Name: Double-check that the MODEL_NAME in your script exactly matches the name of the model you downloaded in ServBay. Model names are case-sensitive.
Network Issues: If you get network errors, make sure nothing is blocking connections to http://127.0.0.1:11434 .
Permission Issues: Make sure your script has read/write permissions for the source and destination directories.
Part 4: Optimizing Translation Quality and Performance
Choosing the Right Model:
Model Capabilities Vary: Not all models are created equal. Some are better at creative writing, while others excel at technical translations. Experiment with different models to find the one that yields the best results for your specific content.
Specialized vs. General-Purpose: While general-purpose models like Llama 3 or Mistral often perform admirably, you might find that models specifically trained for translation (if available) offer superior accuracy and fluency.
Model Size Matters: Larger models (those with more parameters) generally exhibit greater language understanding and generation capabilities. However, they also demand more memory and processing power. Strike a balance between translation quality and hardware constraints.
Prompt Engineering for Better Results:
Be Specific: The more precise your instructions, the better the model can understand your intent. Instead of a generic "translate this," try "Translate the following English text to Chinese, maintaining the original Markdown formatting."
Context is Key: Provide context if necessary. If translating a technical document, specify the subject matter to help the model choose appropriate terminology.
Control Tone: If you desire a specific tone (e.g., formal, informal, humorous), instruct the model accordingly in the prompt.
Iterative Refinement: Experiment with different prompts and evaluate the results. Refine your prompts based on the model's output to achieve the desired translation quality.
Leveraging GPU Acceleration:
GPU Benefits: If you have a compatible NVIDIA or Apple Silicon GPU, Ollama can leverage it to significantly accelerate inference speed, especially for larger models.
ServBay Configuration: By default, ServBay usually detects and utilizes the GPU automatically, but confirm it is enabled in the settings (referring to the Ollama configuration options described earlier). Ensure the necessary drivers are installed for your GPU. You might need to configure additional parameters (e.g., GPU Overhead) based on ServBay's and Ollama's documents and your specific GPU model.
Monitoring Performance: Observe the GPU utilization during translation. If the GPU is not being fully utilized, there might be configuration issues or bottlenecks elsewhere in the system.
Optimizing Batch Processing:
Parallel Processing: For very large translation tasks, consider multi-threading or parallel processing to distribute the workload across multiple CPU cores. This can dramatically reduce the overall translation time. The Parallel Num. option in ServBay’s Ollama configuration can be tuned.
Batch Size: Experiment with different batch sizes (the amount of text sent to the model in each request). Larger batches can sometimes improve throughput, but excessively large batches might exceed memory limits or increase latency.
Caching: Implement caching mechanisms to store frequently translated phrases or sentences. This can avoid redundant translations and improve overall efficiency. However, It's impossible to implement in the example script because Ollama itself doesn't provide an API for custom caching.
Handling Specific File Types:
Markdown Preservation: The example script preserves Markdown formatting, but certain complex elements might require special handling. Test the translation with various Markdown features to ensure compatibility.
Other Formats: Adapt the script to handle other file types (e.g., HTML, plain text, Word documents) by modifying the file reading and writing functions accordingly. Consider using libraries like Beautiful Soup for HTML parsing or python-docx for Word documents.
Post-Processing:
Manual Review: For critical documents, always perform a manual review of the translated content to ensure accuracy and catch any subtle errors or nuances that the model might have missed.
Terminology Consistency: Use glossaries or terminology management tools to ensure consistent use of key terms throughout the translated documents.
Conclusion: Your Private, Efficient Translation Powerhouse
By combining ServBay's user-friendly local AI integration with a customized Python script, you've created a powerful, private, and cost-effective solution for batch document translation. This approach not only protects your sensitive data but also offers the flexibility to fine-tune the translation process to meet your specific needs. As local LLM technology continues to evolve, expect even greater capabilities and efficiencies in the future!
Get started with ServBay today and experience the future of local AI-powered development!
Effortlessly Translate Documents Locally: A ServBay & Local AI Guide
Why Translate Documents Locally? Data Privacy and Cost Savings
In today’s global landscape, translating documents into multiple languages expands reach and impact. However, online translation services raise data privacy concerns and can be costly. Discover how to leverage ServBay and local AI for secure, budget-friendly, and efficient document translation.
ServBay: Your All-in-One Web Development and Local AI Environment
ServBay simplifies web development with easy management of servers, databases, and language environments. Its latest version integrates Ollama, enabling you to run powerful AI models locally.
- Simplified Management: One-stop management of web servers, databases, and multiple PHP, Python, Java, .Net, and Node.js versions.
- Easy Switching: Effortlessly switch between Nginx, Apache, and Caddy servers.
- Database Support: Supports MySQL, PostgreSQL, and MongoDB databases.
- Integrated AI/LLM: Built-in support for AI with Ollama integration.
Ollama: Your Local LLM Swiss Army Knife for Seamless AI Integration
Ollama simplifies running large language models (LLMs) like Llama 3 and Mistral locally. ServBay’s integration offers:
- One-Click Installation: Simplified setup without complex manual configuration.
- Visual Configuration: Fine-tune settings like model download threads and GPU optimization via a user-friendly interface.
- Unified Ecosystem: Seamless integration of web development and AI inference environments.
Streamline AI Model Management with ServBay: Find, Download, and Deploy
ServBay simplifies AI model selection and management:
- Model Listing and Search: Easily search the official Ollama model library.
- One-Click Download and Deletion: Download and manage models with a single click.
- Local Model Overview: Clear overview of locally available AI models.
Why Choose Local AI for Translation? Unlocking the Core Value
Local AI via Ollama in ServBay offers critical advantages over online services:
- Data Privacy: All processing happens on your computer, keeping sensitive information secure.
- Cost-Effectiveness: No API call fees, reducing long-term expenses.
- Offline Availability: Translate documents without an internet connection.
- Customization and Control: Choose the best model for your needs and experiment with settings.
- Low Latency: Faster response times compared to remote APIs.
Step-by-Step Guide: Installing and Configuring Ollama in ServBay
Let's configure Ollama in ServBay for translation tasks:
- Navigate to AI Configuration: Launch ServBay and click "AI," then "Ollama."
- Adjust Configuration (Optional): Review and adjust settings like Bind IP, Bind Port, and Model folder.
- Navigate to Model Management: Under "AI," click "Models (Ollama)".
- Download a Translation Model: Search for models like "Llama3" or "Mistral" and download your preferred version.
- Verify Installation: Confirm the model appears in the installed models list.
- Confirm Ollama Service Running: Ensure the Ollama service is active.
Python Script: Automate Batch Translation of Markdown Files
Now, let's create a Python script to automate translating all Markdown files within a folder.
Prerequisites
- Python installed (ServBay includes Python 3).
requests
library:pip install requests
orpip3 install requests
Project Structure
servbay_translator/
├── docs/
│ ├── introduction.md
│ ├── chapter1/
│ │ └── setup.md
│ └── chapter2/
│ └── usage.md
└── translate_script.py