Open-ESEF is a Python-based, open-source project designed to handle XBRL (eXtensible Business Reporting Language) filings, specifically those adhering to the ESEF (European Single Electronic Format) standard.
ESEF is the mandated digital reporting format for annual financial reports of listed companies in the European Union, established by the European Securities and Markets Authority (ESMA). Open-ESEF provides a robust toolkit for parsing, validating, and analyzing these ESEF XBRL filings.
Funding Acknowledgment (DFG): Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) β Collaborative Research Center (SFB/TRR) Project-ID 403041268 β TRR 266 Accounting for Transparency.
Open-ESEF is under active development. Stay tuned for updates and new features as the project progresses!
To install the latest stable version:
pip install openesef-
Clone the Repository:
git clone https://github.com/reeyarn/openesef.git cd openesef -
Install Dependencies and Build Package:
# Install Cython first pip install cython # Install the package in development mode with Cython compilation pip install -e .
Note: The package will automatically compile the Cython extensions during installation. If you modify any .pyx files, you'll need to reinstall the package using pip install -e . again.
- Verify Installation:
python -c "from openesef import base, taxonomy, instance; import openesef.engines.tax_pres as oetp; print('Open-ESEF installed successfully!')"
Explore the Example and output with Notebooks: examples/apple_2020.ipynb
-
Load XBRL filing using ticker and year
from openesef.edgar.loader import load_xbrl_filing from openesef.engines.tax_pres import TaxonomyPresentation # Load XBRL filing using ticker and year xid, tax = load_xbrl_filing(ticker="AAPL", year=2020) # OR Load using filing URL: # xid, tax = load_xbrl_filing(filing_url="/Archives/edgar/data/320193/0000320193-20-000096.txt")
-
Create presentation object to analyze statements and concepts
t_pres = TaxonomyPresentation(tax) # Print statement names print("\nFinancial Statements:") for statement in t_pres.statement_dimensions.keys(): print(f"- {statement}")
-
Get concepts from Statement of Operations
print("\nConcepts in Statement of Operations:") statement_concepts = t_pres.statement_concepts.get('CONSOLIDATEDSTATEMENTSOFOPERATIONS', []) concepts_statement_of_operations = [] for concept in statement_concepts: concepts_statement_of_operations.append(concept['concept_qname']) print(f"Statement: {concept['statement_name']}") print(f"Concept: {concept['concept_qname']}") print(f"Label: {concept['label']}")
-
Print fact values for Statement of Operations concepts
print("\nFact Values:") for key, fact in xid.xbrl.facts.items(): concept_qname = str(fact.qname) context = xid.xbrl.contexts[fact.context_ref] if concept_qname in concepts_statement_of_operations: print(f"{concept_qname:<90} Value: {fact.value:<15} ")
In this forked repository, I began by adapting the code from the fractalexperience/xbrl/ package to facilitate its compatibility with ESEF.
The issue in that repository was that, unlike US-SEC-EDGAR, ESEF files adhere to a folder structure. Consequently, the schema references in ESEF files are relative to the instance file rather than the taxonomy folder, and fractalexperience/xbrl/ package did not handle this out of the box. Using SAP SE 2022 ESEF filing as an example, the ESEF filing root folder contains the following folders and files:
π¦ sap-2022-12-31-DE
βββ π¦ META-INF
β βββ π catalog.xml
β βββ π taxonomyPackage.xml
βββ π¦ reports
β βββ π sap-2022-12-31-DE.xhtml
βββ π¦ www.sap.com
βββ π sap-2022-12-31.xsd
βββ π sap-2022-12-31_cal.xml
βββ π sap-2022-12-31_def.xml
βββ π sap-2022-12-31_lab-de.xml
βββ π sap-2022-12-31_lab-en.xml
βββ π sap-2022-12-31_pre.xml
I have tried to modify the code to handle ESEF by adding the esef_filing_root parameter and passing it around.
Explore the example with code: examples/try_vw2020.py
This project supports the European Single Electronic Format (ESEF), established by the European Securities and Markets Authority (ESMA) as the mandated digital reporting standard for annual financial reports of listed companies in the European Union. The ESEF specifications and guidelines are sourced from ESMAβs official publications and are adhered to in this implementation. For more information, visit esma.europa.eu.
This project supports the processing of filings based on the International Financial Reporting Standards (IFRS) and the European Single Electronic Format (ESEF).
IFRS Taxonomy The IFRS Taxonomy is developed and maintained by the IFRS Foundation. The taxonomy files included or referenced in this project are sourced from the IFRS Foundationβs official repository.
- Copyright: The IFRS Taxonomy is Copyright Β© IFRS Foundation. All rights reserved.
- Disclaimer: This project is an open-source tool and is not affiliated with, endorsed by, or commercially licensed by the IFRS Foundation. The files are used solely to facilitate the technical validation and creation of XBRL/iXBRL documents. For official standards, please visit ifrs.org.
ESEF Guidelines The ESEF reporting standard is established by the European Securities and Markets Authority (ESMA) for listed companies in the European Union.
- Source: ESEF specifications are sourced from ESMAβs official publications.
- Attribution: Adherence to ESEF guidelines in this project is based on public technical standards available at esma.europa.eu.
This project includes copies of the US GAAP Financial Reporting Taxonomy (e.g., us-gaap-YYYY-MM-DD.xsd), sourced from official locations (e.g., fasb.org and xbrl.us). These files are Copyright Β© Financial Accounting Foundation (FAF) and, for certain prior versions, XBRL US, Inc.
The taxonomy files are redistributed within this project as a "Permitted Work" pursuant to the FAF's Copyright Notice and policies. They are provided for public use to assist in the implementation and processing of XBRL data.
Compliance Conditions:
- Non-Modification: All original copyright notices, XML comments, disclaimers, and license statements embedded in the taxonomy files have been preserved unchanged.
- No Ownership Claim: This project does not claim ownership of the taxonomy; rights remain exclusively with the FAF and XBRL US.
- Authorized Use: Use of these files is subject to the Notice of Authorized Uses maintained by the FAF.
For full license terms, please see the Official Terms and Conditions.
The use of the standards, taxonomies, and schemas listed above is intended to support educational and research purposes in alignment with the open-source goals of this project.
Rights Infringement Contact: If any use herein is found to infringe upon the rights of the FASB, XBRL US, ESMA, or the IFRS Foundation, please contact the author immediately:
Contact: [email protected]
Upon receipt of a valid notice, the author will promptly remove or adjust the offending content to address any concerns.
Open-ESEF builds upon and extends the excellent work of these open-source projects:
- XBRL-Model (
fractalexperience/xbrl/): Provides the foundation for XBRL parsing, taxonomy handling, and data modeling. Open-ESEF adapts and extends this library to handle ESEF-specific requirements. - SEC EDGAR Financial Reports (
farhadab/sec-edgar-financials): Provides code for interacting with the SEC EDGAR system (modules are currently under review and being streamlined). - pyXBRL (
ifanchu/pyXBRL): (used the code for the DEI part, aka the document and entity information, such as the current fiscal period, fiscal year end, etc.). - ESEF.jl (Julia): (used their hint to use the filings.xbrl.org API to get the ESEF filings).
- BrelLibrary/brel a Python library for reading and analyzing financial reports developped by Robin Schmidiger (a master student at ETH Zurich Department of Computer Science System Groups, supervised by Prof. Gustavo Alonso and Dr. Ghislain Fourny); see his master thesis
- gepsio (.Net): .Net library for XBRL and ESEF.
- parse-xbrl (JavaScript): JavaScript XBRL parser.
- altova/sec-xbrl/tree/master (Python, Altova): Altova's Python SEC XBRL tools.
- secdatabase/SEC-XBRL-Financial-Statement-Dataset (https://www.secdatabase.com/): SEC XBRL financial statement dataset.
- altova/sec-xbrl/ (Python): Another Altova Python XBRL repo.
- DataQualityCommittee/dqc_us_rules/ (xbrl.us/dqc aka XBRL-US Data Quality Committee Rules): XBRL-US Data Quality Committee Rules.
- steffen-zou/Extract-financial-data-from-XBRL/: Python XBRL data extraction.
-
ESEF Compliance: Specifically designed to handle XBRL filings in the ESEF format, addressing the unique folder structure and referencing conventions of ESEF reports.
-
XBRL Taxonomy Management:
- Resolves XBRL concepts, labels, and relationships.
- Processes XBRL linkbases (presentation, definition, calculation, label, reference).
- Supports taxonomy packages and efficient in-memory storage for large taxonomies.
- Handles references to external taxonomies like US-GAAP, IFRS, etc.
-
XBRL Instance Document Processing:
- Parses XBRL facts and their associated contexts (entity, period, units, decimals, dimensions).
- Supports dimensional data (explicit and typed dimensions, segments, scenarios).
- Extracts Document and Entity Information (DEI).
- Identifies key reporting contexts (Current/Prior, Instant/Duration).
-
Data Modeling & Storage:
- Utilizes a
Cubeclass for semantic indexing of facts in a multidimensional space (dimensions: metric, entity, period, unit, custom dimensions). - Optimized storage in partitioned JSON datasets within ZIP archives using SHA-1 hashing for efficient content addressing.
- Utilizes a
-
Inline XBRL (iXBRL) Support: Processes iXBRL documents, extracting embedded XBRL data from XHTML reports.
-
SEC EDGAR Integration:
- Direct access to SEC EDGAR filings using company tickers
- Real-time ticker to CIK mapping using SEC's company tickers API
https://www.sec.gov/files/company_tickers.json; addededgar.stock.update_symbols_data()to update the symbols data file. - Automatic handling of filing downloads and XBRL extraction.
-
Modular Architecture: Well-structured codebase with clear separation of concerns (base components, taxonomy logic, instance processing, engines).
-
Logging & Debugging: Detailed logging for taxonomy resolution and instance processing.
[Detailed Architecture Overview (Coming Soon)] - This section will be expanded to provide a more in-depth look at the Open-ESEF architecture.
Key Components:
base: Core modules providing fundamental classes and utilities (e.g.,pool,resolver,ebase,fbase).taxonomy: Modules for handling XBRL taxonomies (taxonomy,schema,linkbase,tpack).instance: Modules for processing XBRL instance documents (instance,fact,context,unit,dei,filing_loader).engines: Modules for reporting and data analysis (functionality to be documented).edgar: Modules for SEC EDGAR filing retrieval (currently being streamlined).filings_xbrl_org: Interacting withhttps://filings.xbrl.org/to get the ESEF filings.util: Utility functions such asutil_mylogger.setup_logger().
Data Flow (Simplified):
- Input: XBRL/ESEF instance documents and taxonomy files.
- Resolution: Taxonomies and schemas are resolved and cached.
- Parsing: Instance documents are parsed, facts and contexts extracted.
- Modeling: Data is modeled using
Taxonomy,Instance, andCubeclasses. - Output: Processed data can be accessed programmatically or serialized for storage/analysis.
Technical Highlights:
- LXML for XML Processing: Efficient XML parsing and XLink resolution.
- SHA-1 Hashing: Content addressing for optimized data storage.
- Memory File System: Uses
fs.memoryfor in-memory file handling and caching. - Modular Design: Encapsulated components for maintainability and extensibility.
Standards Compliance:
- XBRL 2.1
- XBRL Dimensions 1.0
- ESEF Reporting Manual
-
0.3.8
engines/tax_pres.py- Used Cython to this file
- Enhanced taxonomy presentation processing
- Fixed calculation linkbase processing errors
- Improved memory management for large filings
- Optimized fact extraction for disclosures
- Added better error handling for label links
- Enhanced logging and memory usage tracking
engines/ins_facts.py- Moved
fact_df = ins_facts(xid, tax)to
- Moved
edgar/loader.pyaddedget_xbrl_df()to replaceget_fact_df()
-
0.3.7
- Taxonomy now processes calculation networks
- Added
engines.tax_pres.tax_calc_df()to get the calculation network dataframe
-
0.3.5
- Improved
engines.tax_presby avoiding double for loop for disclosure only facts
- Improved
-
0.3.1
- Added
util.ram_usage.check_memory_usage()to check the memory usage
- Added
-
0.3.0
- Enhanced taxonomy presentation processing with new
TaxonomyPresentationclass:- Intelligent statement detection and concept organization
- Automated extraction of financial statement structures
- Improved dimension and segment validation
- Support for both US-GAAP and IFRS taxonomies
- Integrated SEC EDGAR functionality with memfs for efficient XBRL extraction
- Added statement-specific concept mapping and validation
- Improved fact extraction with dimensional context support
- Enhanced taxonomy presentation processing with new
- Author: Reeyarn Zhiyang Li
- Email: [email protected]
- Website: https://reeyarn.li