| fonts | ||
| helpers | ||
| templates | ||
| .gitignore | ||
| api_models.json | ||
| config.py | ||
| data.py | ||
| draw.py | ||
| LICENSE | ||
| logging_config.py | ||
| main.py | ||
| qt_app.py | ||
| README.md | ||
| requirements.txt | ||
| web_app.py | ||
What does this do?
It provides translations from a source language to another language of a specified region on your screen while also providing necessary romanisation (including pinyin and furigana) to provide a guide to pronounciation. The main goal of this is primarily for people that have a low/basic level of understanding of a language to further develop that language by allowing the users to have the tool to allow them to immerse themselves in native content. Main uses of this include but are not limited to: playing games and watching videos with subtitles in another language (although technically it might just be better to obtain an audio transcription, translate and replace the subtitles if possible -- however this is not always feasible if watching many episodes and/or you are watching videos spontaneously).
Limitations
If the learn mode is enabled for the app, the added translations and romanisation naturally results in texts taking up three times the space and therefore this is less suitable for texts that contain tightly packed words. You can optionally change the config to insert smaller text or change the overall font size of your screen so there is less text. A pure translation mode also exists, although if it is intended for web browsing, Google itself provides a more reliable method of translation which does not rely on the computationally heavy optical character recognition (OCR).
Usage (draft)
-
Clone the repository, navigate to the repository and install all required packages with
pip install -r requirements.txtin a new Python environment (the OCR packages are very finnicky). -
If using external APIs, you will need to obtain the api keys for the currently supported sites [Google (Gemini models), Groq (an assortment of open-source LLMs)] and add in the associated api keys in the environmental variables file. If using another API, you will need to define a new class with the
_requestfunction inhelpers/batching.py, inheriting theApiModelsclass. A template is created in the file under theGeminiandGroqclasses. All exception handling are already taken care of. -
Edit the
api_models.jsonfile for the models you want added. The first level of the json file is the respective class name defined inhelpers/batching.py. The second level defines themodelnames from their corresponding API endpoints. For the third level, the rates of each model are specified.rpmin,rph,rpd,rpw,rpmth,rpyare respectively the rates per minute, hour, day, week, month, year. -
Create and edit the
.envconfig file. For information about all the variables to edit, check the section under "EDIT THESE ENVIRONMENTAL VARIABLES" in theconfig.pyfile. If CUDA is not detected, it will default to using theCPUmode for all local LLMs and OCRs. In this case, it is recommended to set theOCR_MODELvariable torapidwhich is optimised for CPUs. Currently the only support for this is withSOURCE_LANG=ch_tra,ch_simoren. Refer to [notes][1] -
If you are using the
waylanddisplay protocol (only available for Linux -- check withecho $WAYLAND_DISPLAY), download thegrimpackage onto your machine locally with any of the package managers.
- Archlinux-based:
sudo pacman -S grim - Debian-based:
sudo apt install grim - Fedora:
dnf install grim - OpenSUSE:
zypper install grim - NixOS:
nix-shell -p grim
Screenshotting is limited in Wayland, and grim is one of the more lightweight options out there.
-
The RapidOCR, PaddleOCR and (maybe I can't remember) the easyOCR models need to be downloaded before any of this can be used. It should download when you execute a function that initialises the model with the desired language to OCR but appears to not work well when running the app directly. (I'll add more to this later...). And this obviously holds the same for the local translation and LLM models...
-
Run the
main.pyfile and a QT6 app should appear. Alternatively if that doesn't work, go to the last line of themain.pyfile and edit the argument towebwhich will run the translations locally on0.0.0.0:5000or on any other port you specify.
Notes and optimisations
-
Accuracy is limited with RapidOCR especially if there is a high dynamical range in graphics. [1]
-
Consider lowering the quality of the screen capture for faster OCR processing and lower screen capture time -> OCR accuracy and subsequent translations can be affected but entire translation process should be under 2 seconds without too much sacrifice in OCR quality. Edit the
helpers/utils.pyprintscfunctions (will work on setting a config for this). -
Not much of the database aspect is worked on at the moment. Right now it stores all the texts and translations in unicode/ASCII in the database/translations.db file. Use it however you want, it is stored locally only for you.
-
Downloading all the models may take up a few GBs of space.
-
About 3.5GB of VRAM is used by easyOCR. Up to 1.5GB of VRAM for Paddle and Rapid OCR.
Debugging Issues
- CUDNN Version mismatch when using PaddleOCR. Check if LD_LIBRARY_PATH is correctly set to the directory containing the cudnn.so file. If using a local installation, it could help to just remove the nvidia-cudnn-cn12 from your Python environment.
- Segmentation fault when using PaddleOCR, EasyOCR or RapidOCR. Ensure the only cv2 library is the opencv-contrib-python library. Check out https://pypi.org/project/opencv-python-headless/ for more info.
Demo
Demo of Korean to Chinese (simplified) translation with the learn-cover mode (mode intended for people learning the language to see the romanisation/pinyin/furigana etc with the translation above).
TODO:
- Create an overlay window that works in Wayland.
- Make use of the translation data -> maybe make a personalised game that uses
Terms of Use
By using this application, you agree to the following terms and conditions.
Data Collection and External API Use
1.1 Onscreen Data Transmission: The application is designed to send data displayed on your screen, including potentially sensitive or personal information, to an external API if local processing is not setup.
1.2 Third-Party API Integration: When local methods cannot fulfill certain functions, the App will transmit data to external third-party APIs. These APIs are not under our control, and we do not guarantee the security, confidentiality, or purpose of use of the data once transmitted.
Acknowledgment
By using the app, you acknowledge that you have read, understood, and agree to these Terms of Use, including the potential risks associated with transmitting data to external APIs.