Compare commits

..

1 Commits

Author SHA1 Message Date
2071612dce Added LICENSE and README.md 2024-10-30 02:47:09 +11:00

View File

@ -1,16 +1,3 @@
# Voice Assistant Chatbot #
## Usage This
1. Clone the repository and install the required packages with `pip install -r requirements`.
2. Either use an already deployed Ollama server or deploy your own Ollama server from [here](ttps://ollama.com/). Download any LLM you want. A powerful model with less paramaters and size is the `llama3.2:latest` model which is what this Voice Assistant uses by default. If you use another LLM ensure that it is reflected in the script in `main.py` by setting the `LLM_MODEL` environmental variable (see Step 5).
3. If you are able to download the `piper-tts` package from `pip` great, otherwise follow the steps from [GitHub](https://github.com/rhasspy/piper) to install the package. You can also download other voices in different languages from [here](https://github.com/rhasspy/piper/blob/master/VOICES.md). Ensure that you download both the `.onnx` file and the corresponding `.onnx.json` config file.
4. Go to [Picovoice](https://picovoice.ai) and create an account. Upon creation of an account, you have an `ACCESS_KEY` which you will need for the app. Then follow the instructions of the [link](https://picovoice.ai/docs/quick-start/porcupine-python/) and create two Wake Words for the voice assistant to react to. One is for waking up the voice assistant and continuing the conversation, while the second one is for ending the current conversation. Note that Picovoice is run fully offline even though you need an ACCESS_KEY.
5. The default model for audio transcription used is `openai/whisper-large-v3-turbo` which can be particularly demanding on certain hardware if there isn't CUDA or if there aren't enough VRAM. Consider using smaller Whisper models by editing the `MODEL_ID` environmental variable. Find the other models [here](https://huggingface.co/openai). You can also consider using [Leopard](https://picovoice.ai/docs/leopard/) for a small and relatively accurate audio transcription model. You will need to modify the `whisper.py` file to use Leopard instead of WhisperX. For basic audio transcription, you can also use [VOSK](https://alphacephei.com/vosk/). Vosk is relatively inaccurate but uses minimal resources and runs offline. For direct mic transcription (without the need of separately capturing audio and transcribing audio) one could use the `SpeechRecognition` python library which can connect to various APIs for direct Speech to Text. This will require significant changes in the code if desired. The Speech to Text computations will in this case be done on remote servers and thus reduce load on the local hardware. Note that a common API to call from is Google speech, is approximately 50% less accurate than `openai/whisper-large-v3` and subject to rate limiting/charges.
6. In the `main.py` file there is a section named `Environmental Variables`. You may want to run it once (or leave it empty to create all the files in the working directory) after modifying the `ROOT_URL` variable to create directories rooted at `ROOT_URL`. Create a new `.env` file and assign environmental variables in it as needed. The only required environmental variables are the `ACCESS_KEY`, `WAKE_WORD_1` and `WAKE_WORD_2` variables.
7. Run `python main.py lang=en`. Set the language based on the ISO 639 language codes. For the default English voice, GLaDOS from Portal 2 is currently being used. If you would like to change the voice, edit the `language.py` file such that the value corresponding to the key `en` for the `config_file` and `onnx_file` are changed to their respective files. Note that the files need to be in the `piper-tts` directory.
8. Enjoy your local chatbot.
## License
This project is licensed under the MIT License. See the [License](https://gitea.frogguingang.com/chickenflyshigh/voice-assistant-chatbot/src/branch/main/LICENSE) file for the full license text.