Updated README + reduced translation mismatch rates + LICENSE
This commit is contained in:
parent
56d8c18871
commit
7e80713191
3
.gitignore
vendored
3
.gitignore
vendored
@ -5,4 +5,5 @@ __pycache__/
|
||||
.*
|
||||
test.py
|
||||
notebooks/
|
||||
qttest.py
|
||||
qttest.py
|
||||
*.db
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2024 chickenflyshigh
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
57
README.md
57
README.md
@ -1,4 +1,59 @@
|
||||
## Usage (draft)
|
||||
|
||||
1. Clone the repository, navigate to the repository and install all required packages with `pip install -r requirements.txt` in a new Python environment (the OCR packages are very finnicky).
|
||||
|
||||
2. If using external APIs, you will need to obtain the api keys for the currently supported sites [Google (Gemini models), Groq (an assortment of open-source LLMs)] and add in the associated api keys in the environmental variables file. If using another API, you will need to define a new class with the `_request` function in `helpers/batching.py`, inheriting the `ApiModels` class. A template is created in the file under the `Gemini` and `Groq` classes. All exception handling are already taken care of.
|
||||
|
||||
3. Edit the `api_models.json` file for the models you want added. The first level of the json file is the respective class name defined in `helpers/batching.py`. The second level defines the `model` names from their corresponding API endpoints. For the third level, the rates of each model are specified. `rpmin`, `rph`, `rpd`, `rpw`, `rpmth`, `rpy` are respectively the rates per minute, hour, day, week, month, year.
|
||||
|
||||
4. Edit the `.env` config file. For information about all the variables to edit, check the section under "EDIT THESE ENVIRONMENTAL VARIABLES". If CUDA is not detected, it will default to using the `CPU` mode for all local LLMs and OCRs. In this case, it is recommended to set the `OCR_MODEL` variable to `rapid` which is optimised for CPUs. Currently the only support for this is with `SOURCE_LANG=ch_tra`, `ch_sim` or `en`. Refer to [notes][1]
|
||||
|
||||
5. If you are using the `wayland` display protocol (only available for Linux -- check with `echo $WAYLAND_DISPLAY`), download the `grim` package onto your machine locally with any of the package managers.
|
||||
|
||||
- Archlinux-based: `sudo pacman -S grim`
|
||||
- Debian-based: `sudo apt install grim`
|
||||
- Fedora: `dnf install grim`
|
||||
- OpenSUSE: `zypper install grim`
|
||||
- NixOS: `nix-shell -p grim`
|
||||
|
||||
Screenshotting is limited in Wayland, and `grim` is one of the more lightweight options out there.
|
||||
|
||||
6. The RapidOCR, PaddleOCR and (maybe I can't remember) the easyOCR models need to be downloaded before any of this can be used. It should download when you execute a function that initialises the model with the desired language to OCR but appears to not work well when running the app directly. (I'll add more to this later...). And this obviously holds the same for the local translation and LLM models...
|
||||
|
||||
7. Run the `main.py` file and a QT6 app should appear. Alternatively if that doesn't work, go to the last line of the `main.py` file and edit the argument to `web` which will run the translations locally on `0.0.0.0:5000` or on any other port you specify.
|
||||
|
||||
## Notes and optimisations
|
||||
|
||||
- Accuracy is limited with RapidOCR especially if there is a high dynamical range in graphics. [1]
|
||||
|
||||
- Consider lowering the quality of the screen capture for faster OCR processing and lower screen capture time -> OCR accuracy and subsequent translations can be affected but entire translation process should be under 2 seconds without too much sacrifice in OCR quality. Edit the `helpers/utils.py` `printsc` functions (will work on setting a config for this).
|
||||
|
||||
- Not much of the database aspect is worked on at the moment. Right now it stores all the texts and translations in unicode/ASCII in the database/translations.db file. Use it however you want, it is stored locally only for you.
|
||||
|
||||
- Downloading all the models may take up a few GBs of space.
|
||||
|
||||
- About 3.5GB of VRAM is used by easyOCR. Up to 1.5GB of VRAM for Paddle and Rapid OCR.
|
||||
|
||||
## Debugging Issues
|
||||
|
||||
1. CUDNN Version mismatch when using PaddleOCR. Check if LD_LIBRARY_PATH is correctly set to the directory containing the cudnn.so file. If using a local installation, it could help to just remove nvidia-cudnn-cn12 from python environment.
|
||||
1. CUDNN Version mismatch when using PaddleOCR. Check if LD_LIBRARY_PATH is correctly set to the directory containing the cudnn.so file. If using a local installation, it could help to just remove the nvidia-cudnn-cn12 from your Python environment.
|
||||
2. Segmentation fault when using PaddleOCR, EasyOCR or RapidOCR. Ensure the only cv2 library is the opencv-contrib-python library. Check out https://pypi.org/project/opencv-python-headless/ for more info.
|
||||
|
||||
## TODO:
|
||||
|
||||
- Create an overlay window that works in Wayland.
|
||||
- Make use of the translation data -> maybe make a personalised game that uses
|
||||
|
||||
## Terms of Use
|
||||
|
||||
By using this application, you agree to the following terms and conditions.
|
||||
|
||||
# Data Collection and External API Use
|
||||
|
||||
1.1 Onscreen Data Transmission: The application is designed to send data displayed on your screen, including potentially sensitive or personal information, to an external API if local processing is not setup.
|
||||
|
||||
1.2 Third-Party API Integration: When local methods cannot fulfill certain functions, the App will transmit data to external third-party APIs. These APIs are not under our control, and we do not guarantee the security, confidentiality, or purpose of use of the data once transmitted.
|
||||
|
||||
## Acknowledgment
|
||||
|
||||
By using the app, you acknowledge that you have read, understood, and agree to these Terms of Use, including the potential risks associated with transmitting data to external APIs.
|
||||
|
||||
@ -15,6 +15,8 @@
|
||||
"llama3-groq-70b-8192-tool-use-preview": { "rpmin": 30, "rpd": 14400 },
|
||||
"llama-3.2-90b-vision-preview": { "rpmin": 15, "rpd": 3500 },
|
||||
"llama-3.2-11b-text-preview": { "rpmin": 30, "rpd": 7000 },
|
||||
"llama-3.2-11b-vision-preview": { "rpmin": 30, "rpd": 7000 }
|
||||
"llama-3.2-11b-vision-preview": { "rpmin": 30, "rpd": 7000 },
|
||||
"gemma-7b-it": { "rpmin": 30, "rpd": 14400 },
|
||||
"llama3-8b-8192": { "rpmin": 30, "rpd": 14400 }
|
||||
}
|
||||
}
|
||||
|
||||
86
config.py
86
config.py
@ -2,63 +2,82 @@ import os, ast, torch, platform
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv(override=True)
|
||||
|
||||
###################################################################################################
|
||||
### EDIT THESE VARIABLES ###
|
||||
|
||||
### available languages: 'ch_sim', 'ch_tra', 'ja', 'ko', 'en'
|
||||
|
||||
INTERVAL = float(os.getenv('INTERVAL'))
|
||||
|
||||
### OCR
|
||||
IMAGE_CHANGE_THRESHOLD = float(os.getenv('IMAGE_CHANGE_THRESHOLD', 0.75)) # higher values mean more sensitivity to changes in the screen, too high and the screen will constantly refresh
|
||||
OCR_MODEL = os.getenv('OCR_MODEL', 'easy') # 'easy', 'paddle', 'rapid' ### easy is the most accurate, paddle is the fastest with CUDA and rapid is the fastest with CPU. Rapid has only between Chinese and English unless you add more languages
|
||||
OCR_USE_GPU = ast.literal_eval(os.getenv('OCR_USE_GPU', 'True'))
|
||||
|
||||
if platform.system() == 'Windows':
|
||||
default_tmp_dir = "C:\\Users\\AppData\\Local\\Temp"
|
||||
elif platform.system() in ['Linux', 'Darwin']:
|
||||
default_tmp_dir = "/tmp"
|
||||
|
||||
TEMP_IMG_DIR = os.getenv('TEMP_IMG_PATH', default_tmp_dir) # where the temporary images are stored
|
||||
|
||||
|
||||
###################################################################################################
|
||||
### EDIT THESE VARIABLES ###
|
||||
# Create a .env file in the same directory as this file and add the variables there. Of course you can choose to edit this file but then if you pull from the repository again all the config will be goneeee unless perhaps it is saved or stashed.
|
||||
# The default values should be fine for most cases. Only ones that you need to change are the API keys, and the variables under Translation and API Translation if you choose to use an external API.
|
||||
# available languages: 'ch_sim', 'ch_tra', 'ja', 'ko', 'en'
|
||||
|
||||
INTERVAL = float(os.getenv('INTERVAL'), 1.5) # Interval in seconds between translations. If your system is slow, a lower value is probably fine with regards to API rates.
|
||||
|
||||
### OCR
|
||||
IMAGE_CHANGE_THRESHOLD = float(os.getenv('IMAGE_CHANGE_THRESHOLD', 0.75)) # higher values mean more sensitivity to changes in the screen, too high and the screen will constantly refresh
|
||||
OCR_MODEL = os.getenv('OCR_MODEL', 'easy') # 'easy', 'paddle', 'rapid' ### easy is the most accurate, paddle is the fastest with CUDA and rapid is the fastest with CPU. Rapid has only between Chinese and English unless you add more languages
|
||||
OCR_USE_GPU = ast.literal_eval(os.getenv('OCR_USE_GPU', 'True')) # True or False to use CUDA for OCR. Defaults to CPU if no CUDA GPU is available
|
||||
|
||||
|
||||
|
||||
|
||||
### Drawing/Overlay Config
|
||||
FILL_COLOUR = os.getenv('FILL_COLOUR', 'white')
|
||||
FONT_FILE = os.getenv('FONT_FILE')
|
||||
FONT_SIZE_MAX = int(os.getenv('FONT_SIZE_MAX', 20))
|
||||
FONT_SIZE_MIN = int(os.getenv('FONT_SIZE_MIN', 8))
|
||||
LINE_SPACING = int(os.getenv('LINE_SPACING', 3))
|
||||
REGION = ast.literal_eval(os.getenv('REGION','(0,0,2560,1440)'))
|
||||
DRAW_TRANSLATIONS_MODE = os.getenv('DRAW_TRANSLATIONS_MODE', 'add')
|
||||
FILL_COLOUR = os.getenv('FILL_COLOUR', 'white') # colour of the textboxes
|
||||
FONT_COLOUR = os.getenv('FONT_COLOUR', "#ff0000") # colour of the font
|
||||
FONT_FILE = os.getenv('FONT_FILE', os.path.join(__file__, "fonts", "Arial-Unicode-Bold.ttf")) # path to the font file. Ensure it is a unicode .ttf file if you want to be able to see most languages.
|
||||
FONT_SIZE_MAX = int(os.getenv('FONT_SIZE_MAX', 20)) # Maximum font size you want to be able to see onscreen
|
||||
FONT_SIZE_MIN = int(os.getenv('FONT_SIZE_MIN', 8)) # Minimum font size you want to be able to see onscreen
|
||||
LINE_SPACING = int(os.getenv('LINE_SPACING', 3)) # spacing between lines of text with the learn modes in DRAW_TRANSLATIONS_MODE
|
||||
REGION = ast.literal_eval(os.getenv('REGION','(0,0,2560,1440)')) # (x1, y1, x2, y2) - the region of the screen to capture
|
||||
DRAW_TRANSLATIONS_MODE = os.getenv('DRAW_TRANSLATIONS_MODE', 'learn_cover')
|
||||
"""
|
||||
DRAW_TRANSLATIONS_MODE possible options:
|
||||
`learn': adds translated text, original text (should be added so when texts get moved around the translation of which it references is understood) and (optionally with the other TO_ROMANIZE option) romanized text above the original text. Texts can overlap if squished into a corner. Works well for games where texts are sparser
|
||||
'learn_cover': same as above but covers the original text with the translated text. Can help with readability and is less cluttered but with sufficiently dense text the texts can still overlap
|
||||
'translation_only_cover': cover the original text with the translated text - will not show the original text at all but not affected by overlapping texts
|
||||
'translation_only_cover': cover the original text with the translated text - will not show the original text at all but also will not be affected by overlapping texts
|
||||
"""
|
||||
|
||||
FONT_COLOUR = os.getenv('FONT_COLOUR', "#ff0000")
|
||||
TO_ROMANIZE = ast.literal_eval(os.getenv('TO_ROMANIZE', 'True'))
|
||||
|
||||
|
||||
# API KEYS https://github.com/cheahjs/free-llm-api-resources?tab=readme-ov-file
|
||||
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
|
||||
GROQ_API_KEY = os.getenv('GROQ_API_KEY') #
|
||||
# MISTRAL_API_KEY = os.getenv('MISTRAL_API_KEY') # https://console.mistral.ai/api-keys/ slow asf
|
||||
|
||||
|
||||
### Translation
|
||||
MAX_TRANSLATE = int(os.getenv('MAX_TRANSLATION', 200))
|
||||
SOURCE_LANG = os.getenv('SOURCE_LANG', 'ja')
|
||||
TARGET_LANG = os.getenv('TARGET_LANG', 'en')
|
||||
MAX_TRANSLATE = int(os.getenv('MAX_TRANSLATION', 200)) # Maximum number of phrases to send to the translation model to translate
|
||||
SOURCE_LANG = os.getenv('SOURCE_LANG', 'ch_sim') # Translate from 'ch_sim', 'ch_tra', 'ja', 'ko', 'en'
|
||||
TARGET_LANG = os.getenv('TARGET_LANG', 'en') # Translate to 'ch_sim', 'ch_tra', 'ja', 'ko', 'en'
|
||||
TO_ROMANIZE = ast.literal_eval(os.getenv('TO_ROMANIZE', 'True')) # romanize the text or not. Only available for one of the learn modes in DRAW_TRANSLATIONS_MODE. It is added above the original text
|
||||
|
||||
### API Translation (could be external or a local API)
|
||||
# API KEYS
|
||||
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY') # https://ai.google.dev/
|
||||
GROQ_API_KEY = os.getenv('GROQ_API_KEY') # https://console.groq.com/keys
|
||||
# MISTRAL_API_KEY = os.getenv('MISTRAL_API_KEY') # https://console.mistral.ai/api-keys/ slow asf
|
||||
|
||||
### Local Translation
|
||||
### Local Translation Models
|
||||
TRANSLATION_MODEL= os.environ['TRANSLATION_MODEL'] # 'opus' or 'm2m' # opus is a lot more lightweight
|
||||
TRANSLATION_USE_GPU = ast.literal_eval(os.getenv('TRANSLATION_USE_GPU', 'True'))
|
||||
MAX_INPUT_TOKENS = int(os.getenv('MAX_INPUT_TOKENS', 512))
|
||||
MAX_OUTPUT_TOKENS = int(os.getenv('MAX_OUTPUT_TOKENS', 512))
|
||||
BATCH_SIZE = int(os.getenv('BATCH_SIZE', 6))
|
||||
LOCAL_FILES_ONLY = ast.literal_eval(os.getenv('LOCAL_FILES_ONLY', 'False'))
|
||||
LOCAL_FILES_ONLY = ast.literal_eval(os.getenv('LOCAL_FILES_ONLY', 'False')) # will not attempt pinging Huggingface for the models and just use the cached local models
|
||||
|
||||
|
||||
|
||||
###################################################################################################
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
###################################################################################################
|
||||
### DO NOT EDIT THESE VARIABLES ###
|
||||
## Filepaths
|
||||
API_MODELS_FILEPATH = os.path.join(os.path.dirname(__file__), 'api_models.json')
|
||||
|
||||
@ -71,6 +90,7 @@ if TRANSLATION_USE_GPU is False:
|
||||
else:
|
||||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
||||
|
||||
TEMP_IMG_DIR = os.getenv('TEMP_IMG_PATH', default_tmp_dir) # where the temporary images are stored
|
||||
TEMP_IMG_PATH = os.path.join(TEMP_IMG_DIR, 'tempP_img91258102.png')
|
||||
### Just for info
|
||||
|
||||
@ -78,4 +98,4 @@ available_langs = ['ch_sim', 'ch_tra', 'ja', 'ko', 'en'] # there are limitations
|
||||
seq_llm_models = ['opus', 'm2m']
|
||||
api_llm_models = ['gemini']
|
||||
causal_llm_models = []
|
||||
curr_models = seq_llm_models + api_llm_models + causal_llm_models
|
||||
curr_models = seq_llm_models + api_llm_models + causal_llm_models
|
||||
|
||||
Binary file not shown.
8
draw.py
8
draw.py
@ -26,7 +26,7 @@ def modify_image(input: io.BytesIO | str, ocr_output, translation: list) -> byte
|
||||
raise TypeError('Incorrect filetype input')
|
||||
# Save the modified image back to bytes without changing the format
|
||||
with io.BytesIO() as byte_stream:
|
||||
image.save(byte_stream, format=image.format) # Save in original format
|
||||
image.save(byte_stream, format='PNG') # Save in original format
|
||||
modified_image_bytes = byte_stream.getvalue()
|
||||
return modified_image_bytes
|
||||
|
||||
@ -71,7 +71,8 @@ def draw_one_phrase_learn(draw: ImageDraw,
|
||||
adjust_if_intersects(x_onscreen, y_onscreen, bounding_box, bounding_boxes, untranslated_phrase, max_width, total_height)
|
||||
|
||||
adjusted_x, adjusted_y, adjusted_max_x, adjusted_max_y, _ = bounding_boxes[-1]
|
||||
draw.rectangle([(adjusted_x,adjusted_y), (adjusted_max_x, adjusted_max_y)], outline="black", width=1)
|
||||
draw.rounded_rectangle([(adjusted_x,adjusted_y), (adjusted_max_x, adjusted_max_y)], fill=FILL_COLOUR,outline="purple", width=1, radius=5)
|
||||
# draw.rectangle([(adjusted_x,adjusted_y), (adjusted_max_x, adjusted_max_y)], outline="black", width=1)
|
||||
position = (adjusted_x,adjusted_y)
|
||||
|
||||
|
||||
@ -107,6 +108,7 @@ def draw_one_phrase_translation_only_cover(draw: ImageDraw,
|
||||
if FONT_COLOUR == 'rainbow':
|
||||
rainbow_text(draw, translated_phrase, *top_left, font)
|
||||
else:
|
||||
draw.rounded_rectangle([top_left, bottom_right], fill=FILL_COLOUR,outline="purple", width=1, radius=5)
|
||||
draw.text(top_left, translated_phrase, fill= FONT_COLOUR, font=font)
|
||||
|
||||
break
|
||||
@ -137,7 +139,7 @@ def draw_one_phrase_learn_cover(draw: ImageDraw,
|
||||
adjust_if_intersects(x_onscreen, y_onscreen, bounding_box, bounding_boxes, untranslated_phrase, max_width, total_height)
|
||||
|
||||
adjusted_x, adjusted_y, adjusted_max_x, adjusted_max_y, _ = bounding_boxes[-1]
|
||||
draw.rounded_rectangle([(adjusted_x,adjusted_y), (adjusted_max_x, adjusted_max_y)], fill=FILL_COLOUR,outline="black", width=2, radius=5)
|
||||
draw.rounded_rectangle([(adjusted_x,adjusted_y), (adjusted_max_x, adjusted_max_y)], fill=FILL_COLOUR,outline="purple", width=1, radius=5)
|
||||
position = (adjusted_x,adjusted_y)
|
||||
|
||||
|
||||
|
||||
BIN
fonts/Arial-Unicode-Bold.ttf
Normal file
BIN
fonts/Arial-Unicode-Bold.ttf
Normal file
Binary file not shown.
@ -168,27 +168,15 @@ class ApiModel():
|
||||
return False
|
||||
|
||||
|
||||
# async def request_func(request):
|
||||
# @wraps(request)
|
||||
# async def wrapper(self, text, *args, **kwargs):
|
||||
# if await self._are_rates_good():
|
||||
# try:
|
||||
# self.session_calls += 1
|
||||
# response = await request(self, text, *args, **kwargs)
|
||||
# return response
|
||||
# except Exception as e:
|
||||
# logger.error(f"Error with model {self.model} from {self.site}. Error: {e}")
|
||||
# else:
|
||||
# logger.error(f"Rate limit reached for this model.")
|
||||
# raise TooManyRequests('Rate limit reached for this model.')
|
||||
# return wrapper
|
||||
|
||||
# @request_func
|
||||
async def translate(self, texts_to_translate, store = False):
|
||||
async def translate(self, texts_to_translate, store = False) -> tuple[int, # exit code: 0 for success, 1 for incorrect response type, 2 for incorrect translation count
|
||||
list[str],
|
||||
int # number of translations that do not match the number of texts to translate
|
||||
]:
|
||||
"""Main Translation Function. All API models will need to define a new class and also define a _request function as shown below in the Gemini and Groq class models."""
|
||||
if isinstance(texts_to_translate, str):
|
||||
texts_to_translate = [texts_to_translate]
|
||||
if len(texts_to_translate) == 0:
|
||||
return []
|
||||
return (0, [], 0)
|
||||
#prompt = f"Without any additional remarks, and without any code, translate the following items of the Python list from {self.from_lang} into {self.target_lang} and output as a Python list ensuring proper escaping of characters and ensuring the length of the list given is exactly equal to the length of the list you provide. Do not output in any other language other than the specified target language: {texts_to_translate}"
|
||||
prompt = f"""INSTRUCTIONS:
|
||||
- Provide ONE and ONLY ONE translation to each text provided in the JSON array given.
|
||||
@ -208,19 +196,27 @@ Expected format:
|
||||
|
||||
Translation:"""
|
||||
response = await self._request(prompt)
|
||||
response_list = ast.literal_eval(response.strip())
|
||||
try:
|
||||
response_list = ast.literal_eval(response.strip())
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to evaluate response from {self.model} from {self.site}. Error: {e}. Response: {response}")
|
||||
return (1, [], 99999)
|
||||
logger.debug(repr(self))
|
||||
logger.info(f'{self.model} translated texts from: {texts_to_translate} to {response_list}.')
|
||||
if not isinstance(response_list, list):
|
||||
raise TypeError(f"Incorrect response type. Expected list, got {type(response_list)}")
|
||||
# raise TypeError(f"Incorrect response type. Expected list, got {type(response_list)}")
|
||||
logger.error(f"Incorrect response type. Expected list, got {type(response_list)}")
|
||||
return (1, [], 99999)
|
||||
if len(response_list) != len(texts_to_translate) and len(texts_to_translate) <= MAX_TRANSLATE:
|
||||
logger.error(f"{self.model} model failed to translate all the texts. Number of translations to make: {len(texts_to_translate)}; Number of translated texts: {len(response_list)}.")
|
||||
logger.error(f"Number of translations does not match number of texts to translate. Sent: {len(texts_to_translate)}. Received: {len(response_list)}.")
|
||||
if store:
|
||||
self._db_add_translation(texts_to_translate, response_list, mismatch=True)
|
||||
# raise ValueError(f"Number of translations does not match number of texts to translate. Sent: {len(texts_to_translate)}. Received: {len(response_list)}.")
|
||||
return (2, response_list, abs(len(texts_to_translate) - len(response_list)))
|
||||
else:
|
||||
if store:
|
||||
self._db_add_translation(texts_to_translate, response_list)
|
||||
return response_list
|
||||
return (0, response_list, 0)
|
||||
|
||||
class Groq(ApiModel):
|
||||
def __init__(self, # model name as defined by the API
|
||||
@ -248,21 +244,7 @@ class Groq(ApiModel):
|
||||
response_json = await response.json()
|
||||
return response_json["choices"][0]["message"]["content"]
|
||||
# https://console.groq.com/settings/limits for limits
|
||||
# def request(self, content):
|
||||
# chat_completion = self.client.chat.completions.create(
|
||||
# messages=[
|
||||
# {
|
||||
# "role": "user",
|
||||
# "content": content,
|
||||
# }
|
||||
# ],
|
||||
# model=self.model
|
||||
# )
|
||||
# return chat_completion.choices[0].message.content
|
||||
|
||||
# async def translate(self, texts_to_translate):
|
||||
# return super().translate(self.request, texts_to_translate)
|
||||
|
||||
|
||||
class Gemini(ApiModel):
|
||||
def __init__(self, # model name as defined by the API
|
||||
model,
|
||||
@ -273,20 +255,6 @@ class Gemini(ApiModel):
|
||||
site = 'Google',
|
||||
**kwargs)
|
||||
|
||||
# def request(self, content):
|
||||
# genai.configure(api_key=self.api_key)
|
||||
# safety_settings = {
|
||||
# "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
|
||||
# "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE",
|
||||
# "HARM_CATEGORY_SEXUALLY_EXPLICIT": "BLOCK_NONE",
|
||||
# "HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_NONE"}
|
||||
# try:
|
||||
# response = genai.GenerativeModel(self.model).generate_content(content, safety_settings=safety_settings)
|
||||
# except ResourceExhausted as e:
|
||||
# logger.error(f"Rate limited with {self.model}. Error: {e}")
|
||||
# raise ResourceExhausted("Rate limited.")
|
||||
# return response.text.strip()
|
||||
|
||||
async def _request(self, content):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
@ -307,14 +275,35 @@ class Gemini(ApiModel):
|
||||
) as response:
|
||||
response_json = await response.json()
|
||||
return response_json['candidates'][0]['content']['parts'][0]['text']
|
||||
|
||||
# async def translate(self, texts_to_translate):
|
||||
# return super().translate(self.request, texts_to_translate)
|
||||
|
||||
|
||||
|
||||
"""
|
||||
DEFINE YOUR OWN API MODELS BELOW WITH THE SAME TEMPLATE AS BELOW. All fields required are indicated by <required field>.
|
||||
|
||||
class <NameOfWebsite>(ApiModel):
|
||||
def __init__(self, # model name as defined by the API
|
||||
model,
|
||||
api_key = <API_KEY>, # api key for the model wrt the site
|
||||
**kwargs):
|
||||
super().__init__(model,
|
||||
api_key = api_key,
|
||||
site = <name_of_website>,
|
||||
**kwargs)
|
||||
|
||||
async def _request(self, content):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
<API ENDPOINT e.g. https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key={self.api_key}>,
|
||||
headers={
|
||||
"Content-Type": "application/json"
|
||||
<ANY OTHER HEADERS REQUIRED BY THE API separated by commas>
|
||||
},
|
||||
json={
|
||||
"contents": [{"parts": [{"text": content}]}]
|
||||
<ANY OTHER JSON PAIRS REQUIRED separated by commas>
|
||||
}
|
||||
) as response:
|
||||
response_json = await response.json()
|
||||
return <Anything needed to extract the message response from `response_json`>
|
||||
"""
|
||||
|
||||
###################################################################################################
|
||||
|
||||
|
||||
@ -29,6 +29,7 @@ def translate(translation_func):
|
||||
|
||||
###############################
|
||||
def init_API_LLM(from_lang, target_lang):
|
||||
"""Initialise the API models. The models are stored in a json file. The models are instantiated, added to database/database api rates are updated and the languages are set."""
|
||||
from_lang = standardize_lang(from_lang)['translation_model_lang']
|
||||
target_lang = standardize_lang(target_lang)['translation_model_lang']
|
||||
with open(API_MODELS_FILEPATH, 'r') as f:
|
||||
@ -45,19 +46,19 @@ def init_API_LLM(from_lang, target_lang):
|
||||
|
||||
async def translate_API_LLM(texts_to_translate: List[str],
|
||||
models: List[ApiModel],
|
||||
call_size: int = 2,
|
||||
stagger_delay: int = 2) -> List[str]:
|
||||
call_size: int = 2) -> List[str]:
|
||||
"""Translate the texts using the models three at a time. If the models fail to translate the text, it will try the next model in the list."""
|
||||
async def try_translate(model: ApiModel) -> Optional[List[str]]:
|
||||
try:
|
||||
result = await model.translate(texts_to_translate, store=True)
|
||||
logger.debug(f'Try_translate result: {result}')
|
||||
return result
|
||||
except Exception as e:
|
||||
logger.error(f"Translation failed for {model.model} from {model.site}: {e}")
|
||||
return None
|
||||
result = await model.translate(texts_to_translate, store=True)
|
||||
logger.debug(f'Try_translate result: {result}')
|
||||
return result
|
||||
random.shuffle(models)
|
||||
groups = [models[i:i+call_size] for i in range(0, len(models), call_size)]
|
||||
|
||||
no_of_models = len(models)
|
||||
translation_attempts = 0
|
||||
|
||||
best_translation = None # (model, translation_errors)
|
||||
|
||||
for group in groups:
|
||||
tasks = set(asyncio.create_task(try_translate(model)) for model in group)
|
||||
while tasks:
|
||||
@ -70,28 +71,30 @@ async def translate_API_LLM(texts_to_translate: List[str],
|
||||
result = await task
|
||||
logger.debug(f'Result: {result}')
|
||||
if result is not None:
|
||||
# Cancel remaining tasks
|
||||
for t in pending:
|
||||
t.cancel()
|
||||
return result
|
||||
logger.error("All models have failed to translate the text.")
|
||||
raise TypeError("Models have likely all outputted garbage translations or rate limited.")
|
||||
# def translate_API_LLM(text, models):
|
||||
# random.shuffle(models)
|
||||
# logger.debug(f"All Models Available: {models}")
|
||||
# for model in models:
|
||||
# logger.info(f"Attempting translation with model {model}.")
|
||||
# try:
|
||||
# translation = model.translate(text)
|
||||
# logger.debug(f"Translation obtained: {translation}")
|
||||
# if translation or translation == []:
|
||||
# return translation
|
||||
# except Exception as e:
|
||||
# logger.error(f"Error with model {repr(model)}. Error: {e}")
|
||||
# continue
|
||||
# logger.error("All models have failed to translate the text.")
|
||||
# raise TypeError("Models have likely all outputted garbage translations or rate limited.")
|
||||
translation_attempts += 1
|
||||
status_code, translations, translation_mismatches = result
|
||||
if status_code == 0:
|
||||
# Cancel remaining tasks
|
||||
for t in pending:
|
||||
t.cancel()
|
||||
return translations
|
||||
else:
|
||||
logger.error(f"Model has failed to translate the text.")
|
||||
if translation_attempts == no_of_models:
|
||||
if best_translation is not None:
|
||||
return translations
|
||||
else:
|
||||
logger.error("All models have failed to translate the text.")
|
||||
raise TypeError("Models have likely all outputted garbage translations or rate limited.")
|
||||
elif status_code == 1:
|
||||
if best_translation is None:
|
||||
best_translation = (translations, translation_mismatches)
|
||||
else:
|
||||
best_translation = (translations, translation_mismatches) if len(result[2]) < len(best_translation[1]) else best_translation
|
||||
else:
|
||||
continue
|
||||
|
||||
|
||||
###############################
|
||||
# Best model by far. Aya-23-8B. Gemma is relatively good. If I get the time to quantize either gemma or aya those will be good to use. llama3.2 is really good as well.
|
||||
def init_AYA():
|
||||
|
||||
@ -176,8 +176,18 @@ def similar_tfidf(list1,list2,threshold) -> float:
|
||||
|
||||
return float(cosine_similarity(vec1, vec2)[0, 0]) > threshold
|
||||
|
||||
def similar_jacard(list1, list2) -> float:
|
||||
if not list1 or not list2:
|
||||
return 0.0
|
||||
return len(set(list1).intersection(set(list2))) / len(set(list1).union(set(list2)))
|
||||
|
||||
|
||||
def check_similarity(list1, list2, threshold, method = 'tfidf'):
|
||||
if method == 'tfidf':
|
||||
return True if similar_tfidf(list1, list2) >= threshold else False
|
||||
elif method == 'jacard':
|
||||
return True if similar_jacard(list1, list2) >= threshold else False
|
||||
else:
|
||||
raise ValueError("Invalid method. Please use one of 'tfidf' or 'jacard'.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Example usage
|
||||
|
||||
107
main.py
107
main.py
@ -1,96 +1,33 @@
|
||||
###################################################################################
|
||||
##### IMPORT LIBRARIES #####
|
||||
import os, time, sys, threading, subprocess
|
||||
|
||||
import os, time, sys, threading, subprocess, asyncio
|
||||
import config
|
||||
import web_app, qt_app
|
||||
from logging_config import logger
|
||||
from data import create_tables
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'helpers'))
|
||||
|
||||
from translation import translate_Seq_LLM, translate_API_LLM, init_API_LLM, init_Seq_LLM
|
||||
from utils import printsc, convert_image_to_bytes, bytes_to_image, similar_tfidf, is_wayland
|
||||
from ocr import get_words, init_OCR, id_keep_source_lang
|
||||
from data import Base, engine, create_tables
|
||||
from draw import modify_image
|
||||
import config, asyncio
|
||||
from config import SOURCE_LANG, TARGET_LANG, OCR_MODEL, OCR_USE_GPU, LOCAL_FILES_ONLY, REGION, INTERVAL, MAX_TRANSLATE, TRANSLATION_MODEL, IMAGE_CHANGE_THRESHOLD, TEMP_IMG_PATH
|
||||
from logging_config import logger
|
||||
import web_app
|
||||
import view_buffer_app
|
||||
|
||||
|
||||
###################################################################################
|
||||
create_tables()
|
||||
|
||||
|
||||
async def main():
|
||||
###################################################################################
|
||||
|
||||
# Initialisation
|
||||
##### Create the database if not present #####
|
||||
create_tables()
|
||||
|
||||
##### Initialize the OCR #####
|
||||
OCR_LANGUAGES = [SOURCE_LANG, TARGET_LANG, 'en']
|
||||
ocr = init_OCR(model=OCR_MODEL, paddle_lang= SOURCE_LANG, easy_languages = OCR_LANGUAGES, use_GPU=OCR_USE_GPU)
|
||||
|
||||
##### Initialize the translation #####
|
||||
# model, tokenizer = init_Seq_LLM(TRANSLATION_MODEL, from_lang =SOURCE_LANG , target_lang = TARGET_LANG)
|
||||
models = init_API_LLM(SOURCE_LANG, TARGET_LANG)
|
||||
###################################################################################
|
||||
|
||||
runs = 0
|
||||
|
||||
# label, app = view_buffer_app.create_viewer()
|
||||
|
||||
# try:
|
||||
while True:
|
||||
logger.debug("Capturing screen")
|
||||
printsc(REGION, TEMP_IMG_PATH)
|
||||
logger.debug(f"Screen Captured. Proceeding to perform OCR.")
|
||||
ocr_output = id_keep_source_lang(ocr, TEMP_IMG_PATH, SOURCE_LANG) # keep only phrases containing the source language
|
||||
logger.debug(f"OCR completed. Detected {len(ocr_output)} phrases.")
|
||||
if runs == 0:
|
||||
logger.info('Initial run')
|
||||
prev_words = set()
|
||||
else:
|
||||
logger.debug(f'Run number: {runs}.')
|
||||
runs += 1
|
||||
|
||||
curr_words = set(get_words(ocr_output))
|
||||
logger.debug(f'Current words: {curr_words} Previous words: {prev_words}')
|
||||
### If the OCR detects different words, translate screen -> to ensure that the screen is not refreshing constantly and to save GPU power
|
||||
if not similar_tfidf(list(curr_words), list(prev_words), threshold = IMAGE_CHANGE_THRESHOLD) and prev_words != curr_words:
|
||||
logger.info('Beginning Translation')
|
||||
|
||||
to_translate = [entry[1] for entry in ocr_output][:MAX_TRANSLATE]
|
||||
# translation = translate_Seq_LLM(to_translate, model_type = TRANSLATION_MODEL, model = model, tokenizer = tokenizer, from_lang = SOURCE_LANG, target_lang = TARGET_LANG)
|
||||
try:
|
||||
translation = await translate_API_LLM(to_translate, models, call_size = 3)
|
||||
except TypeError as e:
|
||||
logger.error(f"Failed to translate using API models. Error: {e}. Sleeping for 30 seconds.")
|
||||
time.sleep(30)
|
||||
continue
|
||||
logger.debug('Translation complete. Modifying image.')
|
||||
translated_image = modify_image(TEMP_IMG_PATH, ocr_output, translation)
|
||||
# view_buffer_app.show_buffer_image(translated_image, label)
|
||||
web_app.latest_image = bytes_to_image(translated_image)
|
||||
logger.debug("Image modified. Saving image.")
|
||||
prev_words = curr_words
|
||||
else:
|
||||
logger.info(f"Skipping translation. No significant change in the screen detected. Total translation attempts so far: {runs}.")
|
||||
logger.debug("Continuing to next iteration.")
|
||||
time.sleep(INTERVAL)
|
||||
# finally:
|
||||
# label.close()
|
||||
# app.quit()
|
||||
################### TODO ##################
|
||||
# 3. Quantising/finetuning larger LLMs. Consider using Aya-23-8B, Gemma, llama3.2 models.
|
||||
# 5. Maybe refreshing issue of flask app. Also get webpage to update only if the image changes.
|
||||
|
||||
if __name__ == "__main__":
|
||||
# subprocess.Popen(['feh','--auto-reload', '/home/James/Pictures/translated.png'])
|
||||
# asyncio.run(main())
|
||||
# Start the image updating thread
|
||||
def main(app):
|
||||
logger.info('Configuration:')
|
||||
for i in dir(config):
|
||||
if not callable(getattr(config, i)) and not i.startswith("__"):
|
||||
logger.info(f'{i}: {getattr(config, i)}')
|
||||
threading.Thread(target=asyncio.run, args=(main(),), daemon=True).start()
|
||||
|
||||
# Start the Flask web server
|
||||
web_app.app.run(host='0.0.0.0', port=5000, debug=False)
|
||||
if app == 'qt':
|
||||
# Start the Qt app
|
||||
qt_app.qt_app_main()
|
||||
elif app == 'web':
|
||||
threading.Thread(target=asyncio.run, args=(web_app.web_app_main(),), daemon=True).start()
|
||||
|
||||
web_app.app.run(host='0.0.0.0', port=5000, debug=False)
|
||||
################### TODO ##################
|
||||
# 3. Quantising/finetuning larger LLMs. Consider using Aya-23-8B, Gemma, llama3.2 models.
|
||||
|
||||
if __name__ == '__main__':
|
||||
main('qt')
|
||||
|
||||
147
qt_app.py
Normal file
147
qt_app.py
Normal file
@ -0,0 +1,147 @@
|
||||
import config, asyncio, sys, os, time, numpy as np, qt_app, web_app
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'helpers'))
|
||||
|
||||
from translation import translate_Seq_LLM, translate_API_LLM, init_API_LLM, init_Seq_LLM
|
||||
from utils import printsc, convert_image_to_bytes, bytes_to_image, check_similarity, is_wayland
|
||||
from ocr import get_words, init_OCR, id_keep_source_lang
|
||||
from data import Base, engine, create_tables
|
||||
from draw import modify_image
|
||||
|
||||
from config import (SOURCE_LANG, TARGET_LANG, OCR_MODEL, OCR_USE_GPU, LOCAL_FILES_ONLY,
|
||||
REGION, INTERVAL, MAX_TRANSLATE, TRANSLATION_MODEL,
|
||||
IMAGE_CHANGE_THRESHOLD, TEMP_IMG_PATH)
|
||||
from logging_config import logger
|
||||
from PySide6.QtWidgets import QMainWindow, QLabel, QVBoxLayout, QWidget, QApplication
|
||||
from PySide6.QtCore import Qt, QThread, Signal
|
||||
from PySide6.QtGui import QPixmap, QImage
|
||||
|
||||
|
||||
class MainWindow(QMainWindow):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.setWindowTitle("Translator")
|
||||
|
||||
# Create main widget and layout
|
||||
main_widget = QWidget()
|
||||
self.setCentralWidget(main_widget)
|
||||
layout = QVBoxLayout(main_widget)
|
||||
|
||||
# Create image label
|
||||
self.image_label = QLabel()
|
||||
layout.addWidget(self.image_label)
|
||||
|
||||
# Set up image generator thread
|
||||
self.generator = qt_app.ImageGenerator()
|
||||
self.generator.image_ready.connect(self.update_image)
|
||||
self.generator.start()
|
||||
|
||||
# Set initial window size
|
||||
window_width, width_height = REGION[2] - REGION[0], REGION[3] - REGION[1]
|
||||
|
||||
self.resize(window_width, width_height)
|
||||
|
||||
def update_image(self, image_buffer):
|
||||
"""Update the displayed image directly from buffer bytes"""
|
||||
if image_buffer is None:
|
||||
return
|
||||
|
||||
# Convert buffer to QImage
|
||||
q_image = QImage.fromData(image_buffer)
|
||||
|
||||
if q_image.isNull():
|
||||
logger.error("Failed to create QImage from buffer")
|
||||
return
|
||||
|
||||
# Convert QImage to QPixmap and display it
|
||||
pixmap = QPixmap.fromImage(q_image)
|
||||
|
||||
# Scale the pixmap to fit the label while maintaining aspect ratio
|
||||
scaled_pixmap = pixmap.scaled(
|
||||
self.image_label.size(),
|
||||
Qt.KeepAspectRatio,
|
||||
Qt.SmoothTransformation
|
||||
)
|
||||
|
||||
self.image_label.setPixmap(scaled_pixmap)
|
||||
|
||||
|
||||
class ImageGenerator(QThread):
|
||||
"""Thread for generating images continuously"""
|
||||
image_ready = Signal(np.ndarray)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.running = True
|
||||
self.OCR_LANGUAGES = [SOURCE_LANG, TARGET_LANG, 'en']
|
||||
self.ocr = init_OCR(model=OCR_MODEL, paddle_lang= SOURCE_LANG, easy_languages = self.OCR_LANGUAGES, use_GPU=OCR_USE_GPU)
|
||||
self.ocr_output = id_keep_source_lang(self.ocr, TEMP_IMG_PATH, SOURCE_LANG)
|
||||
self.models = init_API_LLM(SOURCE_LANG, TARGET_LANG)
|
||||
self.runs = 0
|
||||
self.prev_words = set()
|
||||
self.curr_words = set(get_words(self.ocr_output))
|
||||
self.translated_image = None
|
||||
|
||||
def run(self):
|
||||
asyncio.run(self.async_run())
|
||||
|
||||
async def async_run(self):
|
||||
|
||||
while self.running:
|
||||
logger.debug("Capturing screen")
|
||||
printsc(REGION, TEMP_IMG_PATH)
|
||||
logger.debug(f"Screen Captured. Proceeding to perform OCR.")
|
||||
self.ocr_output = id_keep_source_lang(self.ocr, TEMP_IMG_PATH, SOURCE_LANG) # keep only phrases containing the source language
|
||||
logger.debug(f"OCR completed. Detected {len(self.ocr_output)} phrases.")
|
||||
if self.runs == 0:
|
||||
logger.info('Initial run')
|
||||
self.prev_words = set()
|
||||
else:
|
||||
logger.debug(f'Run number: {self.runs}.')
|
||||
self.runs += 1
|
||||
|
||||
self.curr_words = set(get_words(self.ocr_output))
|
||||
logger.debug(f'Current words: {self.curr_words} Previous words: {self.prev_words}')
|
||||
### If the OCR detects different words, translate screen -> to ensure that the screen is not refreshing constantly and to save GPU power
|
||||
if self.prev_words != self.curr_words and not check_similarity(self.curr_words, self.prev_words, threshold = IMAGE_CHANGE_THRESHOLD, method="jacard"):
|
||||
logger.info('Beginning Translation')
|
||||
|
||||
to_translate = [entry[1] for entry in self.ocr_output][:MAX_TRANSLATE]
|
||||
# translation = translate_Seq_LLM(to_translate, model_type = TRANSLATION_MODEL, model = model, tokenizer = tokenizer, from_lang = SOURCE_LANG, target_lang = TARGET_LANG)
|
||||
try:
|
||||
translation = await translate_API_LLM(to_translate, self.models, call_size = 3)
|
||||
except TypeError as e:
|
||||
logger.error(f"Failed to translate using API models. Error: {e}. Sleeping for 30 seconds.")
|
||||
time.sleep(30)
|
||||
continue
|
||||
logger.debug('Translation complete. Modifying image.')
|
||||
self.translated_image = modify_image(TEMP_IMG_PATH, self.ocr_output, translation)
|
||||
# view_buffer_app.show_buffer_image(translated_image, label)
|
||||
logger.debug("Image modified. Saving image.")
|
||||
self.prev_words = self.curr_words
|
||||
else:
|
||||
logger.info(f"Skipping translation. No significant change in the screen detected. Total translation attempts so far: {self.runs}.")
|
||||
logger.debug("Continuing to next iteration.")
|
||||
time.sleep(INTERVAL)
|
||||
self.image_ready.emit(self.translated_image)
|
||||
|
||||
def stop(self):
|
||||
self.running = False
|
||||
self.wait()
|
||||
|
||||
|
||||
|
||||
def closeEvent(self, event):
|
||||
"""Clean up when closing the window"""
|
||||
self.generator.stop()
|
||||
event.accept()
|
||||
|
||||
|
||||
def qt_app_main():
|
||||
app = QApplication(sys.argv)
|
||||
window = MainWindow()
|
||||
window.show()
|
||||
sys.exit(app.exec())
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
qt_app_main()
|
||||
262
requirements.txt
Normal file
262
requirements.txt
Normal file
@ -0,0 +1,262 @@
|
||||
absl-py==2.1.0
|
||||
aiohappyeyeballs==2.4.3
|
||||
aiohttp==3.10.10
|
||||
aiosignal==1.3.1
|
||||
albucore==0.0.13
|
||||
albumentations==1.4.10
|
||||
annotated-types==0.7.0
|
||||
anyio==4.6.2.post1
|
||||
argon2-cffi==23.1.0
|
||||
argon2-cffi-bindings==21.2.0
|
||||
arrow==1.3.0
|
||||
astor==0.8.1
|
||||
asttokens==2.4.1
|
||||
astunparse==1.6.3
|
||||
async-lru==2.0.4
|
||||
attrs==24.2.0
|
||||
babel==2.16.0
|
||||
beautifulsoup4==4.12.3
|
||||
bleach==6.2.0
|
||||
blinker==1.8.2
|
||||
cachetools==5.5.0
|
||||
certifi==2024.8.30
|
||||
cffi==1.17.1
|
||||
charset-normalizer==3.4.0
|
||||
click==8.1.7
|
||||
coloredlogs==15.0.1
|
||||
comm==0.2.2
|
||||
contourpy==1.3.0
|
||||
ctranslate2==4.5.0
|
||||
cycler==0.12.1
|
||||
Cython==3.0.11
|
||||
datasets==3.1.0
|
||||
debugpy==1.8.7
|
||||
decorator==5.1.1
|
||||
defusedxml==0.7.1
|
||||
Deprecated==1.2.14
|
||||
dill==0.3.8
|
||||
distro==1.9.0
|
||||
easyocr==1.7.2
|
||||
EasyProcess==1.1
|
||||
entrypoint2==1.1
|
||||
eval_type_backport==0.2.0
|
||||
executing==2.1.0
|
||||
fastjsonschema==2.20.0
|
||||
filelock==3.16.1
|
||||
fire==0.7.0
|
||||
Flask==3.0.3
|
||||
Flask-SSE==1.0.0
|
||||
flatbuffers==24.3.25
|
||||
fonttools==4.54.1
|
||||
fqdn==1.5.1
|
||||
frozenlist==1.5.0
|
||||
fsspec==2024.10.0
|
||||
gast==0.6.0
|
||||
google==3.0.0
|
||||
google-ai-generativelanguage==0.6.10
|
||||
google-api-core==2.22.0
|
||||
google-api-python-client==2.151.0
|
||||
google-auth==2.35.0
|
||||
google-auth-httplib2==0.2.0
|
||||
google-generativeai==0.8.3
|
||||
google-pasta==0.2.0
|
||||
googleapis-common-protos==1.65.0
|
||||
greenlet==3.1.1
|
||||
groq==0.11.0
|
||||
grpcio==1.67.1
|
||||
grpcio-status==1.67.1
|
||||
h11==0.14.0
|
||||
h5py==3.12.1
|
||||
httpcore==1.0.6
|
||||
httplib2==0.22.0
|
||||
httpx==0.27.2
|
||||
huggingface-hub==0.26.2
|
||||
humanfriendly==10.0
|
||||
idna==3.10
|
||||
imageio==2.36.0
|
||||
imgaug==0.4.0
|
||||
ipykernel==6.29.5
|
||||
ipython==8.29.0
|
||||
ipywidgets==8.1.5
|
||||
isoduration==20.11.0
|
||||
itsdangerous==2.2.0
|
||||
jaconv==0.4.0
|
||||
jedi==0.19.1
|
||||
jeepney==0.8.0
|
||||
Jinja2==3.1.4
|
||||
jiter==0.7.0
|
||||
joblib==1.4.2
|
||||
json5==0.9.25
|
||||
jsonpath-python==1.0.6
|
||||
jsonpointer==3.0.0
|
||||
jsonschema==4.23.0
|
||||
jsonschema-specifications==2024.10.1
|
||||
jupyter==1.1.1
|
||||
jupyter-console==6.6.3
|
||||
jupyter-events==0.10.0
|
||||
jupyter-lsp==2.2.5
|
||||
jupyter_client==8.6.3
|
||||
jupyter_core==5.7.2
|
||||
jupyter_server==2.14.2
|
||||
jupyter_server_terminals==0.5.3
|
||||
jupyterlab==4.2.5
|
||||
jupyterlab_pygments==0.3.0
|
||||
jupyterlab_server==2.27.3
|
||||
jupyterlab_widgets==3.0.13
|
||||
keras==3.6.0
|
||||
kiwisolver==1.4.7
|
||||
langid==1.1.6
|
||||
lazy_loader==0.4
|
||||
libclang==18.1.1
|
||||
lmdb==1.5.1
|
||||
lxml==5.3.0
|
||||
Markdown==3.7
|
||||
markdown-it-py==3.0.0
|
||||
MarkupSafe==3.0.2
|
||||
matplotlib==3.9.2
|
||||
matplotlib-inline==0.1.7
|
||||
mdurl==0.1.2
|
||||
mecab-python3==1.0.10
|
||||
mistralai==1.1.0
|
||||
mistune==3.0.2
|
||||
ml-dtypes==0.4.1
|
||||
mpmath==1.3.0
|
||||
mss==9.0.2
|
||||
multidict==6.1.0
|
||||
multiprocess==0.70.16
|
||||
mypy-extensions==1.0.0
|
||||
namex==0.0.8
|
||||
nbclient==0.10.0
|
||||
nbconvert==7.16.4
|
||||
nbformat==5.10.4
|
||||
nest-asyncio==1.6.0
|
||||
networkx==3.4.2
|
||||
ninja==1.11.1.1
|
||||
notebook==7.2.2
|
||||
notebook_shim==0.2.4
|
||||
numpy==1.26.4
|
||||
nvidia-cublas-cu12==12.4.5.8
|
||||
nvidia-cuda-cupti-cu12==12.4.127
|
||||
nvidia-cuda-nvrtc-cu12==12.4.127
|
||||
nvidia-cuda-runtime-cu12==12.4.127
|
||||
nvidia-cudnn-cu12==9.1.0.70
|
||||
nvidia-cufft-cu12==11.2.1.3
|
||||
nvidia-curand-cu12==10.3.5.147
|
||||
nvidia-cusolver-cu12==11.6.1.9
|
||||
nvidia-cusparse-cu12==12.3.1.170
|
||||
nvidia-nccl-cu12==2.21.5
|
||||
nvidia-nvjitlink-cu12==12.4.127
|
||||
nvidia-nvtx-cu12==12.4.127
|
||||
onnxruntime==1.19.2
|
||||
openai==1.53.0
|
||||
opencv-contrib-python==4.10.0.84
|
||||
opt-einsum==3.3.0
|
||||
optimum==1.23.3
|
||||
optree==0.13.0
|
||||
overrides==7.7.0
|
||||
packaging==24.1
|
||||
paddleocr==2.9.1
|
||||
paddlepaddle-gpu==2.6.2
|
||||
pandas==2.2.3
|
||||
pandocfilters==1.5.1
|
||||
parso==0.8.4
|
||||
pexpect==4.9.0
|
||||
pillow==11.0.0
|
||||
pinyin==0.4.0
|
||||
plac==1.4.3
|
||||
platformdirs==4.3.6
|
||||
prometheus_client==0.21.0
|
||||
prompt_toolkit==3.0.48
|
||||
propcache==0.2.0
|
||||
proto-plus==1.25.0
|
||||
protobuf==5.28.3
|
||||
psutil==6.1.0
|
||||
ptyprocess==0.7.0
|
||||
pure_eval==0.2.3
|
||||
pyarrow==18.0.0
|
||||
pyasn1==0.6.1
|
||||
pyasn1_modules==0.4.1
|
||||
pyclipper==1.3.0.post6
|
||||
pycparser==2.22
|
||||
pydantic==2.9.2
|
||||
pydantic_core==2.23.4
|
||||
pydotenv==0.0.7
|
||||
Pygments==2.18.0
|
||||
pykakasi==2.3.0
|
||||
pyparsing==3.2.0
|
||||
pypinyin==0.53.0
|
||||
pyscreenshot==3.1
|
||||
PySide6==6.8.0.2
|
||||
PySide6_Addons==6.8.0.2
|
||||
PySide6_Essentials==6.8.0.2
|
||||
python-bidi==0.6.3
|
||||
python-dateutil==2.8.2
|
||||
python-docx==1.1.2
|
||||
python-dotenv==1.0.1
|
||||
python-json-logger==2.0.7
|
||||
pytz==2024.2
|
||||
PyYAML==6.0.2
|
||||
pyzmq==26.2.0
|
||||
RapidFuzz==3.10.1
|
||||
rapidocr-onnxruntime==1.3.25
|
||||
redis==5.2.0
|
||||
referencing==0.35.1
|
||||
regex==2024.9.11
|
||||
requests==2.32.3
|
||||
rfc3339-validator==0.1.4
|
||||
rfc3986-validator==0.1.1
|
||||
rich==13.9.3
|
||||
rpds-py==0.20.0
|
||||
rsa==4.9
|
||||
sacremoses==0.1.1
|
||||
safetensors==0.4.5
|
||||
scikit-image==0.24.0
|
||||
scikit-learn==1.5.2
|
||||
scipy==1.14.1
|
||||
Send2Trash==1.8.3
|
||||
sentencepiece==0.2.0
|
||||
setuptools==75.3.0
|
||||
shapely==2.0.6
|
||||
shiboken6==6.8.0.2
|
||||
six==1.16.0
|
||||
sniffio==1.3.1
|
||||
soupsieve==2.6
|
||||
SQLAlchemy==2.0.36
|
||||
stack-data==0.6.3
|
||||
sympy==1.13.1
|
||||
tensorboard==2.18.0
|
||||
tensorboard-data-server==0.7.2
|
||||
termcolor==2.5.0
|
||||
terminado==0.18.1
|
||||
threadpoolctl==3.5.0
|
||||
tifffile==2024.9.20
|
||||
tinycss2==1.4.0
|
||||
tokenizers==0.20.1
|
||||
tomli==2.0.2
|
||||
torch==2.5.1
|
||||
torchvision==0.20.1
|
||||
tornado==6.4.1
|
||||
tqdm==4.66.6
|
||||
traitlets==5.14.3
|
||||
transformers==4.46.1
|
||||
triton==3.1.0
|
||||
types-python-dateutil==2.9.0.20241003
|
||||
typing-inspect==0.9.0
|
||||
typing_extensions==4.12.2
|
||||
tzdata==2024.2
|
||||
unidic==1.1.0
|
||||
uri-template==1.3.0
|
||||
uritemplate==4.1.1
|
||||
urllib3==2.2.3
|
||||
uroman==1.3.1.1
|
||||
wasabi==0.10.1
|
||||
wcwidth==0.2.13
|
||||
webcolors==24.8.0
|
||||
webencodings==0.5.1
|
||||
websocket-client==1.8.0
|
||||
Werkzeug==3.0.6
|
||||
wheel==0.44.0
|
||||
widgetsnbextension==4.0.13
|
||||
wrapt==1.16.0
|
||||
xxhash==3.5.0
|
||||
yarl==1.17.1
|
||||
@ -1,52 +0,0 @@
|
||||
|
||||
#### Same thread as main.py so it will be relatively unresponsive. Just for use locally for a faster image display from buffer.
|
||||
|
||||
|
||||
from PySide6.QtWidgets import QApplication, QLabel
|
||||
from PySide6.QtCore import Qt
|
||||
from PySide6.QtGui import QImage, QPixmap
|
||||
import sys
|
||||
def create_viewer():
|
||||
"""Create and return a QLabel widget for displaying images"""
|
||||
app = QApplication.instance()
|
||||
if app is None:
|
||||
app = QApplication(sys.argv)
|
||||
|
||||
label = QLabel()
|
||||
label.setWindowTitle("Image Viewer")
|
||||
label.setMinimumSize(640, 480)
|
||||
# Enable mouse tracking for potential future interactivity
|
||||
label.setMouseTracking(True)
|
||||
# Better scaling quality
|
||||
label.setScaledContents(True)
|
||||
label.show()
|
||||
|
||||
return label, app
|
||||
|
||||
def show_buffer_image(buffer, label):
|
||||
"""
|
||||
Display an image from buffer using PySide6
|
||||
|
||||
Parameters:
|
||||
buffer: bytes
|
||||
Raw image data in memory
|
||||
label: QLabel
|
||||
Qt label widget to display the image
|
||||
"""
|
||||
# Convert buffer to QImage
|
||||
qimg = QImage.fromData(buffer)
|
||||
|
||||
# Convert to QPixmap and set to label
|
||||
pixmap = QPixmap.fromImage(qimg)
|
||||
|
||||
# Scale with better quality
|
||||
scaled_pixmap = pixmap.scaled(
|
||||
label.size(),
|
||||
Qt.KeepAspectRatio,
|
||||
Qt.SmoothTransformation
|
||||
)
|
||||
|
||||
label.setPixmap(scaled_pixmap)
|
||||
|
||||
# Process Qt events to update the display
|
||||
QApplication.processEvents()
|
||||
83
web_app.py
83
web_app.py
@ -1,9 +1,86 @@
|
||||
from flask import Flask, Response, render_template
|
||||
import threading
|
||||
import io
|
||||
import os, time, sys, threading, subprocess
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'helpers'))
|
||||
|
||||
from translation import translate_Seq_LLM, translate_API_LLM, init_API_LLM, init_Seq_LLM
|
||||
from utils import printsc, convert_image_to_bytes, bytes_to_image, check_similarity, is_wayland
|
||||
from ocr import get_words, init_OCR, id_keep_source_lang
|
||||
from data import Base, engine, create_tables
|
||||
from draw import modify_image
|
||||
import asyncio
|
||||
from config import SOURCE_LANG, TARGET_LANG, OCR_MODEL, OCR_USE_GPU, LOCAL_FILES_ONLY, REGION, INTERVAL, MAX_TRANSLATE, TRANSLATION_MODEL, IMAGE_CHANGE_THRESHOLD, TEMP_IMG_PATH
|
||||
from logging_config import logger
|
||||
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
latest_image = None
|
||||
|
||||
async def web_app_main():
|
||||
###################################################################################
|
||||
global latest_image
|
||||
# Initialisation
|
||||
##### Create the database if not present #####
|
||||
create_tables()
|
||||
|
||||
##### Initialize the OCR #####
|
||||
OCR_LANGUAGES = [SOURCE_LANG, TARGET_LANG, 'en']
|
||||
ocr = init_OCR(model=OCR_MODEL, paddle_lang= SOURCE_LANG, easy_languages = OCR_LANGUAGES, use_GPU=OCR_USE_GPU)
|
||||
|
||||
##### Initialize the translation #####
|
||||
# model, tokenizer = init_Seq_LLM(TRANSLATION_MODEL, from_lang =SOURCE_LANG , target_lang = TARGET_LANG)
|
||||
models = init_API_LLM(SOURCE_LANG, TARGET_LANG)
|
||||
###################################################################################
|
||||
|
||||
runs = 0
|
||||
|
||||
# label, app = view_buffer_app.create_viewer()
|
||||
|
||||
# try:
|
||||
while True:
|
||||
logger.debug("Capturing screen")
|
||||
printsc(REGION, TEMP_IMG_PATH)
|
||||
logger.debug(f"Screen Captured. Proceeding to perform OCR.")
|
||||
ocr_output = id_keep_source_lang(ocr, TEMP_IMG_PATH, SOURCE_LANG) # keep only phrases containing the source language
|
||||
logger.debug(f"OCR completed. Detected {len(ocr_output)} phrases.")
|
||||
if runs == 0:
|
||||
logger.info('Initial run')
|
||||
prev_words = set()
|
||||
else:
|
||||
logger.debug(f'Run number: {runs}.')
|
||||
runs += 1
|
||||
|
||||
curr_words = set(get_words(ocr_output))
|
||||
logger.debug(f'Current words: {curr_words} Previous words: {prev_words}')
|
||||
|
||||
### If the OCR detects different words, translate screen -> to ensure that the screen is not refreshing constantly and to save GPU power
|
||||
if prev_words != curr_words and not check_similarity(curr_words,prev_words, threshold = IMAGE_CHANGE_THRESHOLD, method="jacard"):
|
||||
logger.info('Beginning Translation')
|
||||
|
||||
to_translate = [entry[1] for entry in ocr_output][:MAX_TRANSLATE]
|
||||
# translation = translate_Seq_LLM(to_translate, model_type = TRANSLATION_MODEL, model = model, tokenizer = tokenizer, from_lang = SOURCE_LANG, target_lang = TARGET_LANG)
|
||||
try:
|
||||
if len(to_translate) == 0:
|
||||
logger.info("No text detected. Skipping translation. Continuing to next iteration.")
|
||||
continue
|
||||
translation = await translate_API_LLM(to_translate, models, call_size = 3)
|
||||
except TypeError as e:
|
||||
logger.error(f"Failed to translate using API models. Error: {e}. Sleeping for 30 seconds.")
|
||||
time.sleep(2*INTERVAL)
|
||||
continue
|
||||
logger.debug('Translation complete. Modifying image.')
|
||||
translated_image = modify_image(TEMP_IMG_PATH, ocr_output, translation)
|
||||
latest_image = bytes_to_image(translated_image)
|
||||
logger.debug("Image modified. Saving image.")
|
||||
prev_words = curr_words
|
||||
else:
|
||||
logger.info(f"Skipping translation. No significant change in the screen detected. Total translation attempts so far: {runs}.")
|
||||
logger.debug("Continuing to next iteration.")
|
||||
time.sleep(INTERVAL)
|
||||
|
||||
# Global variable to hold the current image
|
||||
def curr_image():
|
||||
return latest_image
|
||||
@ -29,8 +106,8 @@ def stream_image():
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Start the image updating thread
|
||||
import main, asyncio
|
||||
threading.Thread(target=asyncio.run, args=(main(),), daemon=True).start()
|
||||
|
||||
threading.Thread(target=asyncio.run, args=(web_app_main(),), daemon=True).start()
|
||||
|
||||
# Start the Flask web server
|
||||
app.run(host='0.0.0.0', port=5000, debug=True)
|
||||
app.run(host='0.0.0.0', port=5000, debug=False)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user