|
| 1 | +========================== |
| 2 | +Introduction to dateparser |
| 3 | +========================== |
| 4 | + |
| 5 | + |
| 6 | +Features |
| 7 | +======== |
| 8 | + |
| 9 | +* Generic parsing of dates in over 200 language locales plus numerous formats in a language agnostic fashion. |
| 10 | +* Generic parsing of relative dates like: ``'1 min ago'``, ``'2 weeks ago'``, ``'3 months, 1 week and 1 day ago'``, ``'in 2 days'``, ``'tomorrow'``. |
| 11 | +* Generic parsing of dates with time zones abbreviations or UTC offsets like: ``'August 14, 2015 EST'``, ``'July 4, 2013 PST'``, ``'21 July 2013 10:15 pm +0500'``. |
| 12 | +* Date lookup in longer texts. |
| 13 | +* Support for non-Gregorian calendar systems. See `Supported Calendars`_. |
| 14 | +* Extensive test coverage. |
| 15 | + |
| 16 | + |
| 17 | +Basic Usage |
| 18 | +=========== |
| 19 | + |
| 20 | +The most straightforward way is to use the `dateparser.parse <#dateparser.parse>`_ function, |
| 21 | +that wraps around most of the functionality in the module. |
| 22 | + |
| 23 | +.. automodule:: dateparser |
| 24 | + :members: parse |
| 25 | + |
| 26 | + |
| 27 | +Popular Formats |
| 28 | +--------------- |
| 29 | + |
| 30 | + >>> import dateparser |
| 31 | + >>> dateparser.parse('12/12/12') |
| 32 | + datetime.datetime(2012, 12, 12, 0, 0) |
| 33 | + >>> dateparser.parse('Fri, 12 Dec 2014 10:55:50') |
| 34 | + datetime.datetime(2014, 12, 12, 10, 55, 50) |
| 35 | + >>> dateparser.parse('Martes 21 de Octubre de 2014') # Spanish (Tuesday 21 October 2014) |
| 36 | + datetime.datetime(2014, 10, 21, 0, 0) |
| 37 | + >>> dateparser.parse('Le 11 Décembre 2014 à 09:00') # French (11 December 2014 at 09:00) |
| 38 | + datetime.datetime(2014, 12, 11, 9, 0) |
| 39 | + >>> dateparser.parse('13 января 2015 г. в 13:34') # Russian (13 January 2015 at 13:34) |
| 40 | + datetime.datetime(2015, 1, 13, 13, 34) |
| 41 | + >>> dateparser.parse('1 เดือนตุลาคม 2005, 1:00 AM') # Thai (1 October 2005, 1:00 AM) |
| 42 | + datetime.datetime(2005, 10, 1, 1, 0) |
| 43 | + |
| 44 | +This will try to parse a date from the given string, attempting to |
| 45 | +detect the language each time. |
| 46 | + |
| 47 | +You can specify the language(s), if known, using ``languages`` argument. In this case, given languages are used and language detection is skipped: |
| 48 | + |
| 49 | + >>> dateparser.parse('2015, Ago 15, 1:08 pm', languages=['pt', 'es']) |
| 50 | + datetime.datetime(2015, 8, 15, 13, 8) |
| 51 | + |
| 52 | +If you know the possible formats of the dates, you can |
| 53 | +use the ``date_formats`` argument: |
| 54 | + |
| 55 | + >>> dateparser.parse('22 Décembre 2010', date_formats=['%d %B %Y']) |
| 56 | + datetime.datetime(2010, 12, 22, 0, 0) |
| 57 | + |
| 58 | + |
| 59 | +Relative Dates |
| 60 | +-------------- |
| 61 | + |
| 62 | + >>> parse('1 hour ago') |
| 63 | + datetime.datetime(2015, 5, 31, 23, 0) |
| 64 | + >>> parse('Il ya 2 heures') # French (2 hours ago) |
| 65 | + datetime.datetime(2015, 5, 31, 22, 0) |
| 66 | + >>> parse('1 anno 2 mesi') # Italian (1 year 2 months) |
| 67 | + datetime.datetime(2014, 4, 1, 0, 0) |
| 68 | + >>> parse('yaklaşık 23 saat önce') # Turkish (23 hours ago) |
| 69 | + datetime.datetime(2015, 5, 31, 1, 0) |
| 70 | + >>> parse('Hace una semana') # Spanish (a week ago) |
| 71 | + datetime.datetime(2015, 5, 25, 0, 0) |
| 72 | + >>> parse('2小时前') # Chinese (2 hours ago) |
| 73 | + datetime.datetime(2015, 5, 31, 22, 0) |
| 74 | + |
| 75 | +.. note:: Testing above code might return different values for you depending on your environment's current date and time. |
| 76 | + |
| 77 | +.. note:: Support for relative dates in future needs a lot of improvement, we look forward to community's contribution to get better on that part. See ":ref:`contributing`". |
| 78 | + |
| 79 | + |
| 80 | +OOTB Language Based Date Order Preference |
| 81 | +----------------------------------------- |
| 82 | + |
| 83 | + >>> # parsing ambiguous date |
| 84 | + >>> parse('02-03-2016') # assumes english language, uses MDY date order |
| 85 | + datetime.datetime(2016, 2, 3, 0, 0) |
| 86 | + >>> parse('le 02-03-2016') # detects french, uses DMY date order |
| 87 | + datetime.datetime(2016, 3, 2, 0, 0) |
| 88 | + |
| 89 | +.. note:: Ordering is not locale based, that's why do not expect `DMY` order for UK/Australia English. You can specify date order in that case as follows using `settings`: |
| 90 | + |
| 91 | + >>> parse('18-12-15 06:00', settings={'DATE_ORDER': 'DMY'}) |
| 92 | + datetime.datetime(2015, 12, 18, 6, 0) |
| 93 | + |
| 94 | +For more on date order, please look at Settings. |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +Timezone and UTC Offset |
| 99 | +----------------------- |
| 100 | + |
| 101 | +By default, `dateparser` returns tzaware `datetime` if timezone is present in date string. Otherwise, it returns a naive `datetime` object. |
| 102 | + |
| 103 | + >>> parse('January 12, 2012 10:00 PM EST') |
| 104 | + datetime.datetime(2012, 1, 12, 22, 0, tzinfo=<StaticTzInfo 'EST'>) |
| 105 | + |
| 106 | + >>> parse('January 12, 2012 10:00 PM -0500') |
| 107 | + datetime.datetime(2012, 1, 12, 22, 0, tzinfo=<StaticTzInfo 'UTC\-05:00'>) |
| 108 | + |
| 109 | + >>> parse('2 hours ago EST') |
| 110 | + datetime.datetime(2017, 3, 10, 15, 55, 39, 579667, tzinfo=<StaticTzInfo 'EST'>) |
| 111 | + |
| 112 | + >>> parse('2 hours ago -0500') |
| 113 | + datetime.datetime(2017, 3, 10, 15, 59, 30, 193431, tzinfo=<StaticTzInfo 'UTC\-05:00'>) |
| 114 | + |
| 115 | + If date has no timezone name/abbreviation or offset, you can specify it using `TIMEZONE` setting. |
| 116 | + |
| 117 | + >>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': 'US/Eastern'}) |
| 118 | + datetime.datetime(2012, 1, 12, 22, 0) |
| 119 | + |
| 120 | + >>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': '+0500'}) |
| 121 | + datetime.datetime(2012, 1, 12, 22, 0) |
| 122 | + |
| 123 | +`TIMEZONE` option may not be useful alone as it only attaches given timezone to |
| 124 | +resultant `datetime` object. But can be useful in cases where you want conversions from and to different |
| 125 | +timezones or when simply want a tzaware date with given timezone info attached. |
| 126 | + |
| 127 | + >>> parse('January 12, 2012 10:00 PM', settings={'TIMEZONE': 'US/Eastern', 'RETURN_AS_TIMEZONE_AWARE': True}) |
| 128 | + datetime.datetime(2012, 1, 12, 22, 0, tzinfo=<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>) |
| 129 | + |
| 130 | + |
| 131 | + >>> parse('10:00 am', settings={'TIMEZONE': 'EST', 'TO_TIMEZONE': 'EDT'}) |
| 132 | + datetime.datetime(2016, 9, 25, 11, 0) |
| 133 | + |
| 134 | +Some more use cases for conversion of timezones. |
| 135 | + |
| 136 | + >>> parse('10:00 am EST', settings={'TO_TIMEZONE': 'EDT'}) # date string has timezone info |
| 137 | + datetime.datetime(2017, 3, 12, 11, 0, tzinfo=<StaticTzInfo 'EDT'>) |
| 138 | + |
| 139 | + >>> parse('now EST', settings={'TO_TIMEZONE': 'UTC'}) # relative dates |
| 140 | + datetime.datetime(2017, 3, 10, 23, 24, 47, 371823, tzinfo=<StaticTzInfo 'UTC'>) |
| 141 | + |
| 142 | +In case, no timezone is present in date string or defined in `settings`. You can still |
| 143 | +return tzaware `datetime`. It is especially useful in case of relative dates when uncertain |
| 144 | +what timezone is relative base. |
| 145 | + |
| 146 | + >>> parse('2 minutes ago', settings={'RETURN_AS_TIMEZONE_AWARE': True}) |
| 147 | + datetime.datetime(2017, 3, 11, 4, 25, 24, 152670, tzinfo=<DstTzInfo 'Asia/Karachi' PKT+5:00:00 STD>) |
| 148 | + |
| 149 | +In case, you want to compute relative dates in UTC instead of default system's local timezone, you can use `TIMEZONE` setting. |
| 150 | + |
| 151 | + >>> parse('4 minutes ago', settings={'TIMEZONE': 'UTC'}) |
| 152 | + datetime.datetime(2017, 3, 10, 23, 27, 59, 647248, tzinfo=<StaticTzInfo 'UTC'>) |
| 153 | + |
| 154 | +.. note:: In case, when timezone is present both in string and also specified using `settings`, string is parsed into tzaware representation and then converted to timezone specified in `settings`. |
| 155 | + |
| 156 | + >>> parse('10:40 pm PKT', settings={'TIMEZONE': 'UTC'}) |
| 157 | + datetime.datetime(2017, 3, 12, 17, 40, tzinfo=<StaticTzInfo 'UTC'>) |
| 158 | + |
| 159 | + >>> parse('20 mins ago EST', settings={'TIMEZONE': 'UTC'}) |
| 160 | + datetime.datetime(2017, 3, 12, 21, 16, 0, 885091, tzinfo=<StaticTzInfo 'UTC'>) |
| 161 | + |
| 162 | +For more on timezones, please look at Settings. |
| 163 | + |
| 164 | + |
| 165 | +Incomplete Dates |
| 166 | +---------------- |
| 167 | + |
| 168 | + >>> from dateparser import parse |
| 169 | + >>> parse('December 2015') # default behavior |
| 170 | + datetime.datetime(2015, 12, 16, 0, 0) |
| 171 | + >>> parse('December 2015', settings={'PREFER_DAY_OF_MONTH': 'last'}) |
| 172 | + datetime.datetime(2015, 12, 31, 0, 0) |
| 173 | + >>> parse('December 2015', settings={'PREFER_DAY_OF_MONTH': 'first'}) |
| 174 | + datetime.datetime(2015, 12, 1, 0, 0) |
| 175 | + |
| 176 | + >>> parse('March') |
| 177 | + datetime.datetime(2015, 3, 16, 0, 0) |
| 178 | + >>> parse('March', settings={'PREFER_DATES_FROM': 'future'}) |
| 179 | + datetime.datetime(2016, 3, 16, 0, 0) |
| 180 | + >>> # parsing with preference set for 'past' |
| 181 | + >>> parse('August', settings={'PREFER_DATES_FROM': 'past'}) |
| 182 | + datetime.datetime(2015, 8, 15, 0, 0) |
| 183 | + |
| 184 | +You can also ignore parsing incomplete dates altogether by setting `STRICT_PARSING` flag as follows: |
| 185 | + |
| 186 | + >>> parse('December 2015', settings={'STRICT_PARSING': True}) |
| 187 | + None |
| 188 | + |
| 189 | +For more on handling incomplete dates, please look at Settings. |
| 190 | + |
| 191 | + |
| 192 | +Search for Dates in Longer Chunks of Text |
| 193 | +----------------------------------------- |
| 194 | + |
| 195 | +You can extract dates from longer strings of text. They are returned as list of tuples with text chunk containing the date and parsed datetime object. |
| 196 | + |
| 197 | +.. automodule:: dateparser.search |
| 198 | + :members: search_dates |
| 199 | + |
| 200 | +Dependencies |
| 201 | +============ |
| 202 | + |
| 203 | +`dateparser` relies on following libraries in some ways: |
| 204 | + |
| 205 | + * dateutil_'s module ``relativedelta`` for its freshness parser. |
| 206 | + * convertdate_ to convert *Jalali* dates to *Gregorian*. |
| 207 | + * hijri-converter_ to convert *Hijri* dates to *Gregorian*. |
| 208 | + * tzlocal_ to reliably get local timezone. |
| 209 | + * ruamel.yaml_ (optional) for operations on language files. |
| 210 | + |
| 211 | +.. _dateutil: https://pypi.python.org/pypi/python-dateutil |
| 212 | +.. _convertdate: https://pypi.python.org/pypi/convertdate |
| 213 | +.. _hijri-converter: https://pypi.python.org/pypi/hijri-converter |
| 214 | +.. _tzlocal: https://pypi.python.org/pypi/tzlocal |
| 215 | +.. _ruamel.yaml: https://pypi.python.org/pypi/ruamel.yaml |
| 216 | + |
| 217 | +Supported languages and locales |
| 218 | +=============================== |
| 219 | +You can check the supported locales by visiting the ":ref:`supported-locales`" section. |
| 220 | + |
| 221 | + |
| 222 | +Supported Calendars |
| 223 | +=================== |
| 224 | +* Gregorian calendar. |
| 225 | + |
| 226 | +* Persian Jalali calendar. For more information, refer to `Persian Jalali Calendar <https://en.wikipedia.org/wiki/Iranian_calendars#Zoroastrian_calendar>`_. |
| 227 | + |
| 228 | + >>> from dateparser.calendars.jalali import JalaliCalendar |
| 229 | + >>> JalaliCalendar('جمعه سی ام اسفند ۱۳۸۷').get_date() |
| 230 | + {'date_obj': datetime.datetime(2009, 3, 20, 0, 0), 'period': 'day'} |
| 231 | + |
| 232 | + |
| 233 | +* Hijri/Islamic Calendar. For more information, refer to `Hijri Calendar <https://en.wikipedia.org/wiki/Islamic_calendar>`_. |
| 234 | + |
| 235 | + >>> from dateparser.calendars.hijri import HijriCalendar |
| 236 | + >>> HijriCalendar('17-01-1437 هـ 08:30 مساءً').get_date() |
| 237 | + {'date_obj': datetime.datetime(2015, 10, 30, 20, 30), 'period': 'day'} |
| 238 | + |
| 239 | +.. note:: `HijriCalendar` only works with Python ≥ 3.6. |
| 240 | +.. note:: For `Finnish` language, please specify `settings={'SKIP_TOKENS': []}` to correctly parse freshness dates. |
| 241 | + |
| 242 | + |
| 243 | +Install using following command to use calendars. |
| 244 | + |
| 245 | +.. tip:: |
| 246 | + pip install dateparser[calendars] |
0 commit comments