Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sandboxed (MacOS) Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types' #81

Open
petiatil opened this issue Dec 13, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@petiatil
Copy link

petiatil commented Dec 13, 2023

Batch transcription works using the same code tested in the python console.

When sandboxed, the Batch transcription process fails, as some underlying library tries to access "/private/etc/apache2/mime.types".

To prevent this permission error, the file needs to be accessed within the app environment (or an alternative option is needed to avoid the file, if possible).

Real-time transcription works in the sandboxed context.

I first thought to store a local copy of the mime.types file and track down where Speechmatics is accessing it (to reroute the library to access the local version), but it is elusive and I suspect there is a better solution.

If there isn't a straightforward solution using the Speechmatics Python method, I'll plan to test with a lower-abstraction approach in python.

Batch transcription test:

import speechmatics
from speechmatics.batch_client import BatchClient

ssl_context = ssl.create_default_context()
ssl_context.load_verify_locations(certifi.where())

conf = speechmatics.models.BatchTranscriptionConfig(
              language=LANGUAGE,
              output_local=englishLocale if LANGUAGE == "en" else None,
              operating_point=operatingPoint,
            )

          settings = speechmatics.models.ConnectionSettings(
            url="https://asr.api.speechmatics.com/v2",
            auth_token=speechmaticsAPIkey,
            ssl_context=ssl_context,
          )

          try:
            with BatchClient(settings) as client:
              job_id = client.submit_job(audio=audio_file, transcription_config=conf)
              transcript = client.wait_for_completion(job_id, transcription_format='json-v2')
@petiatil petiatil added the bug Something isn't working label Dec 13, 2023
@petiatil petiatil changed the title Batch transcription isn't working in a sandboxed (MacOS secured) context. Sandboxed Batch Transcription Fails Due to Permission Error Accessing '/private/etc/apache2/mime.types Dec 13, 2023
@petiatil petiatil changed the title Sandboxed Batch Transcription Fails Due to Permission Error Accessing '/private/etc/apache2/mime.types Sandboxed Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types' Dec 13, 2023
@petiatil petiatil changed the title Sandboxed Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types' Sandboxed (MacOS) Batch transcription fails due to permission error accessing '/private/etc/apache2/mime.types' Dec 13, 2023
@nickgerig
Copy link

Hi @petiatil

We did some digging and it seems like the httpx lib imports mimetypes here:

https://github.com/encode/httpx/blob/2318fd822cdb16435ccb5cabcba16c0b7969c1e4/httpx/_utils.py#L4

So maybe this is the issue you're seeing:

https://github.com/python/cpython/blob/3.12/Lib/mimetypes.py#L48

We do have an open issue to replace httpx but it is unlikely to be done soon.

Hopefully that helps a little.

@petiatil
Copy link
Author

Fortunately, using requests directly resolved the sandbox issue.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants