A chatbot that is purely activated by voice.
| piper-tts | ||
| language.py | ||
| LICENSE | ||
| main.py | ||
| ollama.py | ||
| piper.py | ||
| README.md | ||
| record_audio.py | ||
| requirements.txt | ||
| whisper.py | ||
Voice Assistant Chatbot
Usage
- Clone the repository and install the required packages with
pip install -r requirements. - Either use an already deployed Ollama server or deploy your own Ollama server from here. Download any LLM you want. A powerful model with less paramaters and size is the
llama3.2:latestmodel which is what this Voice Assistant uses by default. If you use another LLM ensure that it is reflected in the script inmain.pyby setting theLLM_MODELenvironmental variable (see Step 5). - If you are able to download the
piper-ttspackage frompipgreat, otherwise follow the steps from GitHub to install the package. You can also download other voices in different languages from here. Ensure that you download both the.onnxfile and the corresponding.onnx.jsonconfig file. - Go to Picovoice and create an account. Upon creation of an account, you have an
ACCESS_KEYwhich you will need for the app. Then follow the instructions of the link and create two Wake Words for the voice assistant to react to. One is for waking up the voice assistant and continuing the conversation, while the second one is for ending the current conversation. Note that Picovoice is run fully offline even though you need an ACCESS_KEY. - The default model for audio transcription used is
openai/whisper-large-v3-turbowhich can be particularly demanding on certain hardware if there isn't CUDA or if there aren't enough VRAM. Consider using smaller Whisper models by editing theMODEL_IDenvironmental variable. Find the other models here. You can also consider using Leopard for a small and relatively accurate audio transcription model. You will need to modify thewhisper.pyfile to use Leopard instead of WhisperX. For basic audio transcription, you can also use VOSK. Vosk is relatively inaccurate but uses minimal resources and runs offline. For direct mic transcription (without the need of separately capturing audio and transcribing audio) one could use theSpeechRecognitionpython library which can connect to various APIs for direct Speech to Text. This will require significant changes in the code if desired. The Speech to Text computations will in this case be done on remote servers and thus reduce load on the local hardware. Note that a common API to call from is Google speech, is approximately 50% less accurate thanopenai/whisper-large-v3and subject to rate limiting/charges. - In the
main.pyfile there is a section namedEnvironmental Variables. You may want to run it once (or leave it empty to create all the files in the working directory) after modifying theROOT_URLvariable to create directories rooted atROOT_URL. Create a new.envfile and assign environmental variables in it as needed. The only required environmental variables are theACCESS_KEY,WAKE_WORD_1andWAKE_WORD_2variables. - Run
python main.py lang=en. Set the language based on the ISO 639 language codes. For the default English voice, GLaDOS from Portal 2 is currently being used. If you would like to change the voice, edit thelanguage.pyfile such that the value corresponding to the keyenfor theconfig_fileandonnx_fileare changed to their respective files. Note that the files need to be in thepiper-ttsdirectory. - Enjoy your local chatbot.
License
This project is licensed under the MIT License. See the License file for the full license text.