Returning currency without price #12

rennerocha · 2019-07-03T21:20:51Z

Does it makes sense to return a Price instance with currency set and without a price value?

Actually if we have a substring inside a string that matches with an existing currency, it will be returned even if we are not mentioning the currency:

In [1]: Price.fromstring("STRING WITH NO PRICE BUT HAS A WORD THAT CONTAINS PART OF A CURRENCY NAME: EUROPE")
Out[1]: Price(amount=None, currency='EUR')

This happens because we use regexes to match the currencies inside a string. But this approach would create a problem when single-letter currencies is handled better (#3) .

I can see two options to handle this scenario:

Consider as currency only if the substring is surrounded by whitespaces:

In [2]: Price.fromstring("SOMETHING EUROPE SOMETHING")
Out[2]: Price(amount=None, currency=None)

In [3]: Price.fromstring("SOMETHING EUR SOMETHING")
Out[3]: Price(amount=None, currency='EUR')

Consider as currency only if the entire string matches exactly with the currency name:

In [4]: Price.fromstring("EUR")
Out[4]: Price(amount=None, currency='EUR')

In [5]: Price.fromstring("SOMETHING EUR SOMETHING")
Out[5]: Price(amount=None, currency=None)

The text was updated successfully, but these errors were encountered:

Gallaecio · 2019-07-04T07:13:56Z

I think option 1 is the way to go. We just need to improve the regular expressions that we use to match currencies so that "Europa" does not count as "EUR".

rennerocha · 2019-07-04T12:00:55Z

What about single letter currencies? Do the same?

In [1]: Price.fromstring("SOMETHING R SOMETHING")
Out[1]: Price(amount=None, currency='R')

In [1]: Price.fromstring("SOMETHING WORD SOMETHING")
Out[1]: Price(amount=None, currency=None)

Gallaecio · 2019-07-04T12:57:42Z

I think so.

When a real-life scenario comes up where this is troublesome, we can look into improvements to solve or mitigate such issues. For example, ignore specific currency expressions on specific locales (I assume we are getting locale support à la dateparser eventually), such as ignoring "R" on languages where "R" is a common word and they always use a different text when referring to the "R" currency.

ejulio · 2019-10-07T16:11:49Z

I think a price is a combination of currency and value.
@rennerocha , can you share a specific case where this behavior is required?
Like, having the currency/symbol, but not the price?

My concern is that, as it is a specific case, it could be in a place specific for currency parsing from a string, instead of returning a price without value.

kmike · 2019-10-18T21:20:07Z

This is not very constructive, I know.. Just an observation: converting HTML to text is a tricky problem, which may require a full browser to do properly; e.g. depending on a website, a whitespace should or shouldn't be inserted around  tags. This means we can't 100% rely on whitespaces in these strings handled properly. Option 1 can still be a good tradeoff though.

Manish-210 · 2021-03-16T09:47:53Z

I think instead of relying on whitespaces completely, it will be good to check that the currency's string ( eg: EUR ) is not surrounded by alphabets.

In [2]: Price.fromstring("SOMETHING EUROPE SOMETHING")
Out[2]: Price(amount=None, currency=None)

because there is " O " after EUR

In [2]: Price.fromstring("EUR SOMETHING")
Out[2]: Price(amount=None, currency='Eur')

I also agree with the @kmike tags in HTML will mostly surround the strings.
Consider:

In [2]: Price.fromstring("<span>EUR something </span>")
Out[2]: Price(amount=None, currency='Eur')

In the above example, EUR is not surrounded by any alphabet, it is surrounded by >. So it might be a good idea to only check if there is not any alphabet present adjacent to a string of currency.

Gallaecio · 2021-03-17T08:45:58Z

I think the input to price-parser should not include HTML tags, I think the user it meant to clean that up before passing the input to number-parser. I think @kmike’s point is that, when a user does that clean up, the result may not have those spaces.

I’m thinking for example of 10 EUR*, which could (and probably should) result in 10 EUR* when HTML is stripped, and we should aim for extraction to work nonetheless. This example is rather straightforward, but things can get messier. Think, for example, of 10 EURno taxes resulting in 10 EURno taxes; to support something like that, we would need a more complex solution, but it would still be doable (e.g. support all-uppercase currency codes if followed by a lowercase letter). So, using word boundaries is a good start, but things could get more complicated.

rennerocha mentioned this issue Oct 2, 2019

Avoid incorrect currency when no price available #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning currency without price #12

Returning currency without price #12

rennerocha commented Jul 3, 2019

Gallaecio commented Jul 4, 2019

rennerocha commented Jul 4, 2019

Gallaecio commented Jul 4, 2019 •

edited

Loading

ejulio commented Oct 7, 2019 •

edited

Loading

kmike commented Oct 18, 2019

Manish-210 commented Mar 16, 2021

Gallaecio commented Mar 17, 2021

Returning currency without price #12

Returning currency without price #12

Comments

rennerocha commented Jul 3, 2019

Gallaecio commented Jul 4, 2019

rennerocha commented Jul 4, 2019

Gallaecio commented Jul 4, 2019 • edited Loading

ejulio commented Oct 7, 2019 • edited Loading

kmike commented Oct 18, 2019

Manish-210 commented Mar 16, 2021

Gallaecio commented Mar 17, 2021

Gallaecio commented Jul 4, 2019 •

edited

Loading

ejulio commented Oct 7, 2019 •

edited

Loading