Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to create TSV file #103

Open
Gurey opened this issue Mar 14, 2018 · 4 comments
Open

Error when trying to create TSV file #103

Gurey opened this issue Mar 14, 2018 · 4 comments

Comments

@Gurey
Copy link

Gurey commented Mar 14, 2018

Trying to create a TSV gives me an error.

endpoint: localhost:9292/ocr

with payload:

{
  "img_url": "http://bit.ly/ocrimage",
  "engine": "tesseract",
  "engine_args": {
    "config_vars": {
		"tessedit_create_tsv": "1",
		"tessedit_pageseg_mode": "1"
    }
  }
}

Gives me:
Error processing image url: . Error: Could not find outfile. Basename: /tmp/785ad93e-2721-4fd1-7892-b78ffc442ae0 Extensions: [txt hocr]

tesseract official TSV config:
https://github.com/tesseract-ocr/tesseract/blob/master/tessdata/configs/tsv

@tleyden
Copy link
Owner

tleyden commented Mar 16, 2018

Does it work if you remove the "config_vars"?

The relevant code is here:

func findOutfile(outfileBaseName string, fileExtensions []string) (string, error) {
for _, fileExtension := range fileExtensions {
outFile := fmt.Sprintf("%v.%v", outfileBaseName, fileExtension)
logg.LogTo("OCR_TESSERACT", "checking if exists: %v", outFile)
if _, err := os.Stat(outFile); err == nil {
return outFile, nil
}
}
return "", fmt.Errorf("Could not find outfile. Basename: %v Extensions: %v", outfileBaseName, fileExtensions)
}

I wonder if tesseract is writing to a file extension that OpenOCR isn't expecting.. looks like OpenOCR looks for .txt or .hocr

@Gurey
Copy link
Author

Gurey commented Mar 16, 2018

Hi!
thank you for getting back to me.

these 2 requests works fine:

{
  "img_url": "http://bit.ly/ocrimage",
  "engine": "tesseract",
  "engine_args": {
    "config_vars": {
    	"tessedit_create_hocr": "1",
	"tessedit_pageseg_mode": "1"
    }
  },
  "psm": 3 
}
{
  "img_url": "http://bit.ly/ocrimage",
  "engine": "tesseract",
  "engine_args": {
    "config_vars": {}
  },
  "psm": 3 
}

I just tried to run tesseract in my terminal with the command tesseract pic.jpg out tsv and that gives me a file named out.tsv

@tleyden
Copy link
Owner

tleyden commented Mar 16, 2018

Ok I think the fix is pretty easy, it just needs to look for files with .tsv extension. Are you able to submit a PR?

@luzanikita
Copy link

#126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants