A chatbot that is purely activated by voice.
Go to file
2024-10-30 02:59:10 +11:00
piper-tts Initial commit 2024-10-29 22:05:42 +11:00
language.py Initial commit 2024-10-29 22:05:42 +11:00
LICENSE Added LICENSE and README.md 2024-10-30 02:59:10 +11:00
main.py Initial commit 2024-10-29 22:05:42 +11:00
ollama.py Initial commit 2024-10-29 22:05:42 +11:00
piper.py Initial commit 2024-10-29 22:05:42 +11:00
README.md Added LICENSE and README.md 2024-10-30 02:59:10 +11:00
record_audio.py Initial commit 2024-10-29 22:05:42 +11:00
requirements.txt Initial commit 2024-10-29 22:05:42 +11:00
whisper.py Initial commit 2024-10-29 22:05:42 +11:00

Voice Assistant Chatbot

Usage

  1. Clone the repository and install the required packages with pip install -r requirements.
  2. Either use an already deployed Ollama server or deploy your own Ollama server from here. Download any LLM you want. A powerful model with less paramaters and size is the llama3.2:latest model which is what this Voice Assistant uses by default. If you use another LLM ensure that it is reflected in the script in main.py by setting the LLM_MODEL environmental variable (see Step 5).
  3. If you are able to download the piper-tts package from pip great, otherwise follow the steps from GitHub to install the package. You can also download other voices in different languages from here. Ensure that you download both the .onnx file and the corresponding .onnx.json config file.
  4. Go to Picovoice and create an account. Upon creation of an account, you have an ACCESS_KEY which you will need for the app. Then follow the instructions of the link and create two Wake Words for the voice assistant to react to. One is for waking up the voice assistant and continuing the conversation, while the second one is for ending the current conversation. Note that Picovoice is run fully offline even though you need an ACCESS_KEY.
  5. The default model for audio transcription used is openai/whisper-large-v3-turbo which can be particularly demanding on certain hardware if there isn't CUDA or if there aren't enough VRAM. Consider using smaller Whisper models by editing the MODEL_ID environmental variable. Find the other models here. You can also consider using Leopard for a small and relatively accurate audio transcription model. You will need to modify the whisper.py file to use Leopard instead of WhisperX. For basic audio transcription, you can also use VOSK. Vosk is relatively inaccurate but uses minimal resources and runs offline. For direct mic transcription (without the need of separately capturing audio and transcribing audio) one could use the SpeechRecognition python library which can connect to various APIs for direct Speech to Text. This will require significant changes in the code if desired. The Speech to Text computations will in this case be done on remote servers and thus reduce load on the local hardware. Note that a common API to call from is Google speech, is approximately 50% less accurate than openai/whisper-large-v3 and subject to rate limiting/charges.
  6. In the main.py file there is a section named Environmental Variables. You may want to run it once (or leave it empty to create all the files in the working directory) after modifying the ROOT_URL variable to create directories rooted at ROOT_URL. Create a new .env file and assign environmental variables in it as needed. The only required environmental variables are the ACCESS_KEY, WAKE_WORD_1 and WAKE_WORD_2 variables.
  7. Run python main.py lang=en. Set the language based on the ISO 639 language codes. For the default English voice, GLaDOS from Portal 2 is currently being used. If you would like to change the voice, edit the language.py file such that the value corresponding to the key en for the config_file and onnx_file are changed to their respective files. Note that the files need to be in the piper-tts directory.
  8. Enjoy your local chatbot.

License

This project is licensed under the MIT License. See the License file for the full license text.