
Location of source Markdown files
SOURCE_DIR = " docs "
Location to save the translated Markdown files
TARGET_DIR = " translated_docs "
Target language for translation
TARGET_LANGUAGE = " German "
Ollama API request timeout, in seconds
REQUEST_TIMEOUT = 300
--- Translation Function ---
def translate_text(text, target_language): try: payload = { "prompt": f"Translate the following text to {target_language}: {text}", "model": MODEL_NAME, "stream": False # Set to False to get the complete translation at once } response = requests.post(OLLAMA_API_URL, data=json.dumps(payload), stream=False, timeout=REQUEST_TIMEOUT) response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx) json_data = response.json() return json_data["response"].strip() except requests.exceptions.RequestException as e: print(f"Request failed: {e}") return None except json.JSONDecodeError: print("Failed to decode JSON response") return None except KeyError: print("Key 'response' not found in JSON") return None
--- File System Functions ---
def process_markdown_file(input_file, output_file, target_language): try: with open(input_file, "r", encoding="utf-8") as infile: markdown_content = infile.read() translated_content = translate_text(markdown_content, target_language) if translated_content: with open(output_file, "w", encoding="utf-8") as outfile: outfile.write(translated_content) print(f"Translated: {input_file} -> {output_file}") else: print(f"Translation failed for: {input_file}") except FileNotFoundError: print(f"Error: Input file not found: {input_file}") except Exception as e: print(f"An error occurred processing {input_file}: {e}") def process_directory(source_dir, target_dir, target_language): for root, _, files in os.walk(source_dir): for filename in files: if filename.endswith(".md"): input_file = os.path.join(root, filename)
Create corresponding directory structure in the translated_docs folder
relative_path = os.path.relpath(input_file, source_dir) output_file = os.path.join(target_dir, relative_path) output_dir = os.path.dirname(output_file) os.makedirs(output_dir, exist_ok=True) # Ensure directories exist process_markdown_file(input_file, output_file, target_language)
--- Main Script Execution ---
if name == "main": start_time = time.time()
Ensure the target directory exists
os.makedirs(TARGET_DIR, exist_ok=True) process_directory(SOURCE_DIR, TARGET_DIR, TARGET_LANGUAGE) end_time = time.time() elapsed_time = end_time - start_time print(f"Translation process completed in {elapsed_time:.2f} seconds.")
Code Explanation:
Configuration Constants:
OLLAMA_API_URL
: Defines the endpoint for the Ollama API. The default is http://127.0.0.1:11434/api/generate
, which aligns with ServBay's default Ollama configuration.
MODEL_NAME
: Specifies the exact model name used for translation. This name must match the model name installed in Ollama within ServBay precisely (e.g., "llama3:8b"
). Ensure this is correct.
SOURCE_DIR
: Sets the directory containing the source Markdown files to be translated (default is "docs"
).
TARGET_DIR
: Sets the directory where translated files will be saved (default is "translated_docs"
).
TARGET_LANGUAGE
: Sets the target language for the translation(default is "German"
).
REQUEST_TIMEOUT
: sets the timeout for API requests, given local LLMs can take time.
Translation Function:
translate_text(text, target_language)
: This function sends the text to be translated to the local Ollama API and returns the translated text. It constructs a JSON payload with the prompt, model name, and other necessary parameters, sends a POST request to the API, and retrieves the translated content from the JSON response.
Error Handling: The function includes robust error handling, capturing potential issues such as network errors (requests.exceptions.RequestException
), JSON decoding failures (json.JSONDecodeError
), and missing keys in the JSON response (KeyError
).
File System Functions:
process_markdown_file(input_file, output_file, target_language)
: Reads the content of a single Markdown file, translates it using the translate_text
function, and saves the translated content to the specified output file. It also includes error handling for file operations.
process_directory(source_dir, target_dir, target_language)
: Recursively traverses the source directory, finds all Markdown files, and calls process_markdown_file
to translate each file. It preserves the original directory structure in the translated output.
Main Script Execution:
The if __name__ == "__main__":
block ensures that the main translation process starts when the script is executed directly.
It creates the target directory if it does not exist, calls process_directory
to start the translation, and measures the total execution time.
The translation process is initiated by calling the process_directory
function, which recursively translates all Markdown files in the source directory.
Usage:
Place your Markdown files in the docs
folder.
Run the script python translate_script.py
.
Find the translated Markdown files in the translated_docs
folder with the same directory structure as the original docs
folder.
Customization:
MODEL_NAME
: Change this to the name of the translation model installed in your ServBay Ollama setup (e.g., "mistral:7b"
).
SOURCE_DIR
: Modify this to point to the directory containing the Markdown files you want to translate.
TARGET_DIR
: Change this to specify where you want the translated files to be saved.
TARGET_LANGUAGE
: Set this to the desired output language (e.g., "Spanish"
, "French"
, "Japanese"
).
Timeout Values: Adjust REQUEST_TIMEOUT
if the translations are timing out.
Important Considerations:
Resource Usage:
Ensure your machine has enough RAM to load the LLM model (MODEL_NAME
). If you encounter memory issues, try using a smaller model (e.g., a 7B parameter model instead of a 13B or larger model).
Model Choice:
Experiment with different LLM models to find the one that provides the best translation quality for your specific needs. General-purpose models like Llama 3 and Mistral often perform well, but specialized translation models (if available) might offer better results.
Error Handling:
The script includes basic error handling, but you might want to add more sophisticated error logging and reporting for production use.
Performance:
Local LLM inference can be slow, especially on CPUs. Consider using a GPU to accelerate the translation process if possible. You can also adjust parameters like stream
in the translate_text
function to optimize performance. Setting stream
to True
can provide incremental translation results, which might be useful for long documents.
Batch Processing:
For very large batches of files, consider implementing a more robust batch processing mechanism with error recovery and parallel processing (using libraries like concurrent.futures
) to improve throughput and resilience.
By following these steps and customizing the script appropriately, you can efficiently translate large amounts of Markdown content locally using ServBay's AI capabilities.
Part 4: Advanced Usage and Tips
Beyond the basic translation script, here are some advanced tips and techniques to enhance your document translation workflow with ServBay and local LLMs.
- Optimizing Model Selection
Experiment with different LLMs to find the best fit for your content type and target language. Consider these factors:
General vs. Specialized Models: While general-purpose models like Llama 3 and Mistral perform well, models specifically trained for translation might offer superior accuracy and fluency in certain language pairs. Check the Ollama library for specialized translation models.
Parameter Size: Larger models (e.g., 70B parameters) generally provide higher quality translations, but they require more resources (RAM, VRAM) and are slower. Balance quality with performance by choosing a model size appropriate for your hardware. For batch processing, a 7B or 8B model can be a good starting point.
Instruction Following: Models tagged as instruct
or chat
are typically better at following translation instructions.
- Enhancing Translation Prompts
The quality of the translation greatly depends on the prompt you provide to the LLM. Refine your prompts for better results:
Specific Instructions: Instead of a generic "Translate this text," provide more specific instructions:
"Translate this technical document into French, maintaining a formal tone."
"Translate this blog post into Spanish, adapting it for a Latin American audience."
Contextual Information: Include relevant context to help the LLM understand the text better. For example:
"Translate this sentence about quantum physics into German: '...' Make sure to use the correct technical terminology."
Control Output Style: Specify the desired tone, style, and level of formality to align the translation with the intended audience.
- Handling Complex Markdown Structures
The basic script translates the entire Markdown file as a single text block. For more complex documents, you might need to handle Markdown structures (headings, lists, code blocks) differently:
Split by Paragraphs: Split the Markdown content into paragraphs and translate each paragraph separately to preserve formatting and context within each paragraph.
Ignore Code Blocks: Identify and exclude code blocks from translation to avoid unintended modifications.
Preserve Headings: Ensure headings are correctly translated and retain their Markdown formatting.
Use Regular Expressions: Employ regular expressions to identify and manipulate Markdown elements for more precise control over the translation process.
- Leveraging Streaming for Large Documents
For very large documents, the stream: False
setting in the Ollama API request might lead to long processing times or timeouts. Consider using stream: True
to receive the translation in chunks:
Modify the translate_text
function to handle streaming responses:
This code iterates through the streaming response, extracts the translated text from each chunk, and concatenates them to form the complete translation.
- Implementing Error Handling and Logging
Enhance the script's error handling for production use:
Log Errors: Record detailed error messages, including timestamps, file names, and exception details, to a log file for debugging and analysis.
Retry Mechanism: Implement a retry mechanism to automatically retry failed translations due to temporary network issues or API errors.
Skip Corrupted Files: Add logic to detect and skip corrupted or unreadable Markdown files to prevent the script from crashing.
- Parallel Processing for Speed
To accelerate the translation of large batches of files, use parallel processing with the concurrent.futures
library: