| fonts | ||
| helpers | ||
| templates | ||
| .gitignore | ||
| api_models.json | ||
| config.py | ||
| data.py | ||
| draw.py | ||
| LICENSE | ||
| logging_config.py | ||
| main.py | ||
| qt_app.py | ||
| README.md | ||
| requirements.txt | ||
| web_app.py | ||
Usage (draft)
-
Clone the repository, navigate to the repository and install all required packages with
pip install -r requirements.txtin a new Python environment (the OCR packages are very finnicky). -
If using external APIs, you will need to obtain the api keys for the currently supported sites [Google (Gemini models), Groq (an assortment of open-source LLMs)] and add in the associated api keys in the environmental variables file. If using another API, you will need to define a new class with the
_requestfunction inhelpers/batching.py, inheriting theApiModelsclass. A template is created in the file under theGeminiandGroqclasses. All exception handling are already taken care of. -
Edit the
api_models.jsonfile for the models you want added. The first level of the json file is the respective class name defined inhelpers/batching.py. The second level defines themodelnames from their corresponding API endpoints. For the third level, the rates of each model are specified.rpmin,rph,rpd,rpw,rpmth,rpyare respectively the rates per minute, hour, day, week, month, year. -
Edit the
.envconfig file. For information about all the variables to edit, check the section under "EDIT THESE ENVIRONMENTAL VARIABLES". If CUDA is not detected, it will default to using theCPUmode for all local LLMs and OCRs. In this case, it is recommended to set theOCR_MODELvariable torapidwhich is optimised for CPUs. Currently the only support for this is withSOURCE_LANG=ch_tra,ch_simoren. Refer to [notes][1] -
If you are using the
waylanddisplay protocol (only available for Linux -- check withecho $WAYLAND_DISPLAY), download thegrimpackage onto your machine locally with any of the package managers.
- Archlinux-based:
sudo pacman -S grim - Debian-based:
sudo apt install grim - Fedora:
dnf install grim - OpenSUSE:
zypper install grim - NixOS:
nix-shell -p grim
Screenshotting is limited in Wayland, and grim is one of the more lightweight options out there.
-
The RapidOCR, PaddleOCR and (maybe I can't remember) the easyOCR models need to be downloaded before any of this can be used. It should download when you execute a function that initialises the model with the desired language to OCR but appears to not work well when running the app directly. (I'll add more to this later...). And this obviously holds the same for the local translation and LLM models...
-
Run the
main.pyfile and a QT6 app should appear. Alternatively if that doesn't work, go to the last line of themain.pyfile and edit the argument towebwhich will run the translations locally on0.0.0.0:5000or on any other port you specify.
Notes and optimisations
-
Accuracy is limited with RapidOCR especially if there is a high dynamical range in graphics. [1]
-
Consider lowering the quality of the screen capture for faster OCR processing and lower screen capture time -> OCR accuracy and subsequent translations can be affected but entire translation process should be under 2 seconds without too much sacrifice in OCR quality. Edit the
helpers/utils.pyprintscfunctions (will work on setting a config for this). -
Not much of the database aspect is worked on at the moment. Right now it stores all the texts and translations in unicode/ASCII in the database/translations.db file. Use it however you want, it is stored locally only for you.
-
Downloading all the models may take up a few GBs of space.
-
About 3.5GB of VRAM is used by easyOCR. Up to 1.5GB of VRAM for Paddle and Rapid OCR.
Debugging Issues
- CUDNN Version mismatch when using PaddleOCR. Check if LD_LIBRARY_PATH is correctly set to the directory containing the cudnn.so file. If using a local installation, it could help to just remove the nvidia-cudnn-cn12 from your Python environment.
- Segmentation fault when using PaddleOCR, EasyOCR or RapidOCR. Ensure the only cv2 library is the opencv-contrib-python library. Check out https://pypi.org/project/opencv-python-headless/ for more info.
TODO:
- Create an overlay window that works in Wayland.
- Make use of the translation data -> maybe make a personalised game that uses
Terms of Use
By using this application, you agree to the following terms and conditions.
Data Collection and External API Use
1.1 Onscreen Data Transmission: The application is designed to send data displayed on your screen, including potentially sensitive or personal information, to an external API if local processing is not setup.
1.2 Third-Party API Integration: When local methods cannot fulfill certain functions, the App will transmit data to external third-party APIs. These APIs are not under our control, and we do not guarantee the security, confidentiality, or purpose of use of the data once transmitted.
Acknowledgment
By using the app, you acknowledge that you have read, understood, and agree to these Terms of Use, including the potential risks associated with transmitting data to external APIs.