Skip to content
/ piculet Public

Extract data from XML or HTML documents using XPath.

License

Notifications You must be signed in to change notification settings

uyar/piculet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bfb672d · Jun 25, 2023
May 27, 2023
Jan 17, 2022
May 15, 2023
May 15, 2023
Dec 24, 2022
May 2, 2023
Aug 10, 2016
May 27, 2023
May 27, 2023
May 15, 2023
Jun 10, 2023
Feb 4, 2023
May 2, 2023

Repository files navigation

Piculet

Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library. If available, it will make use of the lxml package for improved performance and better XPath support.

Piculet is used for the parsers of the Cinemagoer project.

Getting started

Piculet works with Python 3.8 and later versions. You can install it using pip:

pip install piculet

Installing Piculet creates a script named piculet which can be used to invoke the command line interface:

$ piculet -h
usage: piculet [-h] [--version] [--html] -s SPEC [document]

For example, say you want to extract some data from the file shining.html. An example specification is given in movie.json. Download both of these files and run the command:

$ piculet -s movie.json shining.html

Getting help

The documentation is available on: https://piculet.readthedocs.io/

The source code can be obtained from: https://github.com/uyar/piculet

License

Copyright (C) 2014-2023 H. Turgut Uyar <[email protected]>

Piculet is released under the LGPL license, version 3 or later. Read the included LICENSE.txt file for details.