Skip to content

πŸ§™β€β™‚οΈ MagicXML is a FastAPI-based service designed to fetch, process, and convert XML data into structured CSV files. It is optimized for handling large XML files by processing them in chunks asynchronously, making it suitable for heavy data processing tasks.

License

Notifications You must be signed in to change notification settings

Solrikk/MagicXML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MagicXML πŸ§™β€β™‚οΈ

Advanced XML to CSV Conversion Tool

License: MIT Python 3.8+ FastAPI


πŸš€ Overview

MagicXML is a high-performance web application built with FastAPI that transforms XML data into structured CSV format. Designed for data analysts, developers, and e-commerce professionals, MagicXML handles complex XML structures with advanced parsing capabilities, asyncio-powered processing, and intelligent data classification.

πŸ”— Live Demo: https://magic-xml.replit.app

✨ Key Features

  • High-Performance Processing: Asynchronous architecture for efficient handling of large XML files
  • Intelligent Data Extraction: Contextual parsing of complex nested XML structures
  • Data Cleaning & Sanitization: Automatic cleaning of HTML tags and special characters
  • Multilingual Support: Interface available in English, Russian, and more languages
  • RESTful API: Programmatic access for seamless integration with your systems
  • Callback Support: Optional webhook notifications when processing is complete
  • Robust Error Handling: Comprehensive error management with detailed reporting

πŸ› οΈ Technical Architecture

MagicXML leverages several advanced technologies to deliver exceptional performance:

  • FastAPI Backend: High-performance asynchronous API framework
  • Asyncio & Aiohttp: Non-blocking I/O operations for concurrent processing
  • XML ElementTree: Efficient XML parsing and traversal
  • BeautifulSoup: Intelligent HTML content cleaning
  • Modern Frontend: Responsive design with custom CSS and JavaScript

πŸ“Š Use Cases

  • E-commerce Data Processing: Convert product feeds from XML to CSV
  • Data Analysis: Transform XML datasets into analysis-ready CSV format
  • System Integration: Bridge XML-based systems with CSV-compatible tools
  • Catalog Management: Process large product catalogs efficiently
  • Automated Workflows: Integrate with data pipelines via API

πŸ”§ Installation & Setup

Prerequisites

  • Python 3.8+
  • Git

Quick Start

# Clone the repository
git clone https://github.com/Solrikk/MagicXML.git
cd MagicXML

# Install dependencies
pip install -r requirements.txt

# Run the application
python -m uvicorn main:app --host 0.0.0.0 --port 8080 --reload

πŸ”Œ API Reference

Convert XML to CSV

curl -X 'POST' \
  'https://magic-xml.replit.app/process_link' \
  -H 'Content-Type: application/json' \
  -d '{
    "link_url": "https://example.com/data.xml",
    "preset_id": "optional-tracking-id",
    "return_url": "https://your-callback-url.com/webhook"
  }'

Response

{
  "file_url": "https://magic-xml.replit.app/download/data_files/example_com.csv",
  "preset_id": "optional-tracking-id",
  "status": "completed"
}

Check Processing Status

curl -X 'GET' 'https://magic-xml.replit.app/status/{preset_id}'

Download Generated CSV

curl -X 'GET' 'https://magic-xml.replit.app/download/data_files/{filename}'

πŸ“ Implementation Details

Asynchronous Processing

MagicXML processes XML files asynchronously using Python's asyncio and aiohttp:

async def process_offers_chunk(offers_chunk, build_category_path, format_type):
    offers = []
    for offer_elem in offers_chunk:
        offer_data = await process_offer(offer_elem, build_category_path, format_type)
        offers.append(offer_data)
    return {"offers": offers}

This approach enables efficient concurrent processing, drastically reducing conversion time for large XML files.

Text Processing & Data Cleaning

The application implements sophisticated text processing to ensure data quality:

def clean_description(description):
    if not description:
        return ''
    soup = BeautifulSoup(description, 'html5lib')
    allowed_tags = ['p', 'br']
    for tag in soup.find_all(True):
        if tag.name not in allowed_tags:
            tag.unwrap()
    # Additional cleaning logic...
    return str(soup)

Β© 2025 MagicXML - Advanced XML to CSV Converter

GitHub β€’ Live Demo

About

πŸ§™β€β™‚οΈ MagicXML is a FastAPI-based service designed to fetch, process, and convert XML data into structured CSV files. It is optimized for handling large XML files by processing them in chunks asynchronously, making it suitable for heavy data processing tasks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published