Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] text extraction in Selector and SelectorList #127

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3c471b8
[tmp] Selector.text and SelectorList.text methods
kmike Nov 2, 2018
8dea4ce
[wip] move converting to text to .get method, add getall support, .cl…
kmike Nov 17, 2018
da7bb80
bump html-text required version number
kmike May 30, 2019
859044c
Merge branch 'master' into selector-text
kmike Feb 9, 2022
7bae279
selector text unit tests
shahidkarimi Mar 11, 2022
e4733ee
code formtting
shahidkarimi Mar 11, 2022
857ca72
code formatting improvements
shahidkarimi Mar 11, 2022
7941093
removed unwated tests
shahidkarimi Apr 4, 2022
102f2e3
Merge pull request #236 from shahidkarimi/selector-text-tests
kmike May 20, 2022
1f917bb
Merge branch 'master' into selector-text
kmike Jun 28, 2022
d87982d
apply black
kmike Jun 28, 2022
14dadbd
fixed failing test
kmike Jun 28, 2022
af0d28a
Make new arguments keyword-only
kmike Jun 28, 2022
1737f83
documentation for selector .get() text
shahidkarimi Aug 12, 2022
17ae5e0
suggested changes in the PR fixed
shahidkarimi Aug 26, 2022
f8f1c66
Merge branch 'master' into selector-text
kmike Nov 10, 2022
c6580cc
Update docs/usage.rst
kmike Nov 13, 2022
419af4b
Merge pull request #248 from shahidkarimi/selector-text-doc
kmike Nov 13, 2022
b8d0352
Merge branch 'master' into selector-text
kmike Apr 24, 2024
ee3e734
fixed typing
kmike May 1, 2024
69456c1
fixed a refactoring issue
kmike May 1, 2024
a492278
document O(N^2) gotcha
kmike May 8, 2024
8b4ae25
make flake8 config compatible with black
kmike May 8, 2024
ccaaa5b
refactor text and cleaning tests; add more of them
kmike May 8, 2024
4eea4fa
fixed default .cleaned cleaner value
kmike May 8, 2024
27c9919
fixed black formatting went wrong
kmike May 8, 2024
852bbef
fix docs references
kmike May 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
documentation for selector .get() text
shahidkarimi committed Aug 12, 2022
commit 1737f8365793d0a050102ce3672a1490eeed3d85
9 changes: 9 additions & 0 deletions docs/usage.rst
Original file line number Diff line number Diff line change
@@ -120,6 +120,15 @@ pseudo-elements::
>>> selector.css('title::text').get()
'Example website'

Extract text witout ::text
==========================
You can extract inner text without specifying ``::text`` in your selctor instead
an optional paramter text=True in the ``get()`` or ``getall()`` methods.

>>> selector.css('title').get(text=True)

You can pass additional paramter ``guess_punct_space``, ``guess_layout`` and ``guess_layout``

As you can see, ``.xpath()`` and ``.css()`` methods return a
:class:`~parsel.selector.SelectorList` instance, which is a list of new
selectors. This API can be used for quickly selecting nested data::