Go to file

chickenflyshigh ecc264cf65 IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated		2024-11-08 13:20:03 +11:00
fonts	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
helpers	IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated	2024-11-08 13:20:03 +11:00
templates	Wayland SS delegated to Grim at reduced quality for faster faster speeds	2024-11-07 11:09:20 +11:00
.gitignore	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
api_models.json	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
config.py	IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated	2024-11-08 13:20:03 +11:00
data.py	Batched asynchronous API requests. Added additional draw options.	2024-11-06 22:51:03 +11:00
draw.py	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
LICENSE	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
logging_config.py	IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated	2024-11-08 13:20:03 +11:00
main.py	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
qt_app.py	IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated	2024-11-08 13:20:03 +11:00
README.md	IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated	2024-11-08 13:20:03 +11:00
requirements.txt	Updated README + reduced translation mismatch rates + LICENSE	2024-11-07 23:43:04 +11:00
web_app.py	IMAGE_CHANGE_THRESHOLD exception handled + japanese romanisation fixed and unicode range updated	2024-11-08 13:20:03 +11:00

README.md

What does this do?

It provides translations from a source language to another language of a specified region on your screen while also providing necessary romanisation (including pinyin and furigana) to provide a guide to pronounciation. The main goal of this is primarily for people that have a low/basic level of understanding of a language to further develop that language by allowing the users to have the tool to allow them to immerse themselves in native content. Main uses of this include but are not limited to: playing games and watching videos with subtitles in another language (although technically it might just be better to obtain an audio transcription, translate and replace the subtitles if possible -- however this is not always feasible if watching many episodes and/or you are watching videos spontaneously).

Limitations

If the learn mode is enabled for the app, the added translations and romanisation naturally results in texts taking up three times the space and therefore this is less suitable for texts that contain tightly packed words. You can optionally change the config to insert smaller text or change the overall font size of your screen so there is less text. A pure translation mode also exists, although if it is intended for web browsing, Google itself provides a more reliable method of translation which does not rely on the computationally heavy optical character recognition (OCR).

Usage (draft)

Clone the repository, navigate to the repository and install all required packages with pip install -r requirements.txt in a new Python environment (the OCR packages are very finnicky).
If using external APIs, you will need to obtain the api keys for the currently supported sites [Google (Gemini models), Groq (an assortment of open-source LLMs)] and add in the associated api keys in the environmental variables file. If using another API, you will need to define a new class with the _request function in helpers/batching.py, inheriting the ApiModels class. A template is created in the file under the Gemini and Groq classes. All exception handling are already taken care of.
Edit the api_models.json file for the models you want added. The first level of the json file is the respective class name defined in helpers/batching.py. The second level defines the model names from their corresponding API endpoints. For the third level, the rates of each model are specified. rpmin, rph, rpd, rpw, rpmth, rpy are respectively the rates per minute, hour, day, week, month, year.
Create and edit the .env config file. For information about all the variables to edit, check the section under "EDIT THESE ENVIRONMENTAL VARIABLES" in the config.py file. If CUDA is not detected, it will default to using the CPU mode for all local LLMs and OCRs. In this case, it is recommended to set the OCR_MODEL variable to rapid which is optimised for CPUs. Currently the only support for this is with SOURCE_LANG=ch_tra, ch_sim or en. Refer to [notes][1]
If you are using the wayland display protocol (only available for Linux -- check with echo $WAYLAND_DISPLAY), download the grim package onto your machine locally with any of the package managers.

Archlinux-based: sudo pacman -S grim
Debian-based: sudo apt install grim
Fedora: dnf install grim
OpenSUSE: zypper install grim
NixOS: nix-shell -p grim

Screenshotting is limited in Wayland, and grim is one of the more lightweight options out there.

The RapidOCR, PaddleOCR and (maybe I can't remember) the easyOCR models need to be downloaded before any of this can be used. It should download when you execute a function that initialises the model with the desired language to OCR but appears to not work well when running the app directly. (I'll add more to this later...). And this obviously holds the same for the local translation and LLM models...
Run the main.py file and a QT6 app should appear. Alternatively if that doesn't work, go to the last line of the main.py file and edit the argument to web which will run the translations locally on 0.0.0.0:5000 or on any other port you specify.

Notes and optimisations

Accuracy is limited with RapidOCR especially if there is a high dynamical range in graphics. [1]
Consider lowering the quality of the screen capture for faster OCR processing and lower screen capture time -> OCR accuracy and subsequent translations can be affected but entire translation process should be under 2 seconds without too much sacrifice in OCR quality. Edit the helpers/utils.py printsc functions (will work on setting a config for this).
Not much of the database aspect is worked on at the moment. Right now it stores all the texts and translations in unicode/ASCII in the database/translations.db file. Use it however you want, it is stored locally only for you.
Downloading all the models may take up a few GBs of space.
About 3.5GB of VRAM is used by easyOCR. Up to 1.5GB of VRAM for Paddle and Rapid OCR.

Debugging Issues

CUDNN Version mismatch when using PaddleOCR. Check if LD_LIBRARY_PATH is correctly set to the directory containing the cudnn.so file. If using a local installation, it could help to just remove the nvidia-cudnn-cn12 from your Python environment.
Segmentation fault when using PaddleOCR, EasyOCR or RapidOCR. Ensure the only cv2 library is the opencv-contrib-python library. Check out https://pypi.org/project/opencv-python-headless/ for more info.

Demo

Demo of Korean to Chinese (simplified) translation with the learn-cover mode (mode intended for people learning the language to see the romanisation/pinyin/furigana etc with the translation above).

TODO:

Create an overlay window that works in Wayland.
Make use of the translation data -> maybe make a personalised game that uses

Terms of Use

By using this application, you agree to the following terms and conditions.

Data Collection and External API Use

1.1 Onscreen Data Transmission: The application is designed to send data displayed on your screen, including potentially sensitive or personal information, to an external API if local processing is not setup.

1.2 Third-Party API Integration: When local methods cannot fulfill certain functions, the App will transmit data to external third-party APIs. These APIs are not under our control, and we do not guarantee the security, confidentiality, or purpose of use of the data once transmitted.

Acknowledgment

By using the app, you acknowledge that you have read, understood, and agree to these Terms of Use, including the potential risks associated with transmitting data to external APIs.