Description: Add in Support for Audio to text multi-modal benchmarking in GuideLLM. Whisper initial target **Acceptance Criteria:**