Overview
Rebuilt from the ground up with LLMs and beyond traditional OCR, TextIn xParse excels in processing all types of complex documents, breaking down arbitrary layouts into semantically complete paragraphs and restoring reading order for large model adaptability. Boasting industry-leading table recognition, it resolves merged cells, multi-page tables and borderless tables with ease, and integrates seamlessly with image processing to handle watermarked and curved documents. As an intelligent ETL solution, it enables zero-sample key information extraction, cross-document retrieval and intelligent document classification, solving large model pain points like unstable output and length truncation. TextIn xParse generates high-quality Chunks with semantic relationship labeling, coordinate and chapter information, boosting RAG Q&A accuracy and search efficiency, and supports one-click import to mainstream RAG frameworks including RagFlow, Dify and Coze. It builds a solid document infrastructure for enterprise scenarios like Knowledge Q&A, Agent Enablement, Data Entry and Data Cleaning, automating unstructured data processing, reducing manual workload and maximizing data asset value. Trusted by global leading enterprises, it delivers efficient, accurate document processing capabilities for mission-critical business scenarios.
Highlights
- LLM-Powered Document Parsing: Beyond OCR, realizes accurate structured conversion of complex unstructured documents and adapts perfectly to large model application needs
- High-Quality RAG Empowerment: Generates semantic-optimized Chunks to improve Q&A accuracy and search efficiency, supporting one-click access to mainstream RAG frameworks
- Intelligent ETL Pipeline: Achieves zero-sample extraction, cross-document retrieval and intelligent classification, maximizing enterprise unstructured data asset value
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Cost/hour |
|---|---|
t3a.large Recommended | $10.00 |
Vendor refund policy
nonrefundable
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
TextIn Document Parser AMI - Initial release. Supports PDF, DOCX, TXT parsing and outputs clean, structured JSON data for AI systems.
Additional details
Usage instructions
TextIn Document Parser AMI Usage Instructions Overview This AMI provides a pre-installed, native document parsing service for Ubuntu 20.04 LTS. It converts PDF, DOCX, and TXT files into structured, AI-ready JSON data optimized for LLMs, Agents, and RAG systems. Base OS: Ubuntu 20.04 LTS Default username: ubuntu SSH port: 22 Service port: 30006
- Connect to Your EC2 Instance From the Amazon EC2 Console, obtain the public IP or DNS of your instance. Connect using SSH with your AWS key pair: plaintext ssh -i "your-key-pair.pem" ubuntu@<instance-public-ip> Type yes to confirm the host key on first connection.
- Verify the Service Status The TextIn service starts automatically on boot. Check service status: plaintext sudo systemctl status textin-parser.service If inactive, start and enable it: plaintext sudo systemctl start textin-parser.service sudo systemctl enable textin-parser.service Verify health: plaintext curl http://localhost:30006/health A healthy response returns: plaintext {"status":"healthy"}
- Use the Document Parsing API Submit documents for parsing: plaintext curl -X POST -F "file=@/path/to/your/document.pdf" http://<instance-public-ip>:30006/parse Supported formats: PDF, DOCX, TXT Output: Clean structured JSON
- License Activation (Optional) For enterprise features: Run the license setup script: plaintext ./1-install_licserver.sh Send the generated machine fingerprint (seed.txt) to simon_liu@intsig.net to obtain a license. Place your license file in /home/ubuntu/licFile/. Apply the license: plaintext ./2-apply_license.sh
- Service Management Stop service: plaintext sudo systemctl stop textin-parser.service View real-time logs: plaintext sudo journalctl -u textin-parser.service -f
- Security Best Practices Open only ports 22 (SSH) and 30006 (API) in your security group. Restrict SSH access to trusted IP ranges. Do not expose the service to 0.0.0.0/0 in production.
- Troubleshooting Service not running: Check logs with journalctl. API unreachable: Verify port 30006 is open in the security group. License issues: Confirm seed and license file match.
Last updated: March 2026 Support: For license and service support: simon_liu@intsig.net
Support
Vendor support
Contact email: sheng_song@intsig.net URL: https://www.textin.ai/contact Support time: 8 hours *5 workingdays Buyers can get professional and all-round technical support and after-sales service for TextIn products.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products

