This project utilizes the DeepGram Transcribe API and Google Cloud Platform (App Engine and Cloud Storage) for deployment.
The application is deployed as a RESTful API service using FastAPI and can be interacted with through the Swagger UI. The application processes an audio file containing the conversation of at least two people and returns relevant information about the individuals and the discussion topics.
Click here to use the application.
API documentation page
Click on try it out
Upload file
Execute
To upload a file:
- Click on the drop-down arrow next to "POST".
- Select "Try it out".
- Click on "Choose file" to upload an audio file of a conversation between at least two people from your local computer.
- Finally, click "Execute".
When the user uploads the file, it is stored in a Google Cloud bucket, which is subsequently accessed by DeepGram to retrieve the audio via its remote file transcribe API. The result is then processed to match a dialogue format such as:
speaker_0: Hi. Is this the Crystal Heights Hotel in Singapore?
speaker_1: Yes, it is. Good afternoon. How may I assist you today?
Using the OpenAI API and prompt engineering, relevant insights from the conversation are obtained using the GPT-4 model. These insights are then displayed to the user.
This API is scalable and flexible, making it usable on other platforms such as Postman or the requests library in application development.