Supported Formats
SAR Portal supports a wide range of document formats for upload, AI analysis, and redaction — over 28 file types across 6 categories.
Fully Supported Formats
These formats have full support for upload, AI analysis, and visual redaction:
PDF Documents
| Aspect | Support |
|---|---|
| Extension | |
| Text extraction | Full |
| OCR (scanned PDFs) | Yes |
| Visual redaction | Yes |
| Searchable output | Yes |
| Max size | 50 MB |
| Max pages | ~500 |
Best for: Reports, contracts, correspondence, exported records
Microsoft Word
| Aspect | Support |
|---|---|
| Formats | .docx, .dotx, .docm, .dotm |
| Text extraction | Full |
| Visual redaction | Yes |
| Formatting preserved | Yes |
| Headers/footers | Redacted |
| Tables & comments | Redacted |
| Track changes | Accepted and redacted |
| Metadata | Stripped on redaction |
| Max size | 50 MB |
Best for: Letters, policies, contracts, memos, templates
Microsoft Excel
| Aspect | Support |
|---|---|
| Formats | .xlsx, .xlsm, .xltx, .xltm |
| Text extraction | Full |
| Cell-level redaction | Yes |
| Formulas preserved | Yes (in unredacted cells) |
| Multiple worksheets | Yes |
| Cell comments | Redacted |
| Metadata | Stripped on redaction |
| Max size | 50 MB |
Best for: Data exports, spreadsheets, reports, logs, templates
Images
| Format | Extension | OCR Support | Redaction |
|---|---|---|---|
| PNG | .png | Yes | Yes |
| JPEG | .jpg, .jpeg | Yes | Yes |
| GIF | .gif | Yes | Yes |
| BMP | .bmp | Yes | Yes |
| TIFF | .tiff, .tif | Yes | Yes |
| WebP | .webp | Yes | Yes |
Best for: Scanned documents, screenshots, photographs, ID verification
Email Files
| Aspect | Support |
|---|---|
| Formats | .eml, .msg |
| Text extraction | Full (headers + body) |
| Header redaction | Yes (To, From, CC, Subject) |
| Body redaction | Yes |
| Attachment handling | Extracted and processed separately |
| Output format | Redacted PDF |
| Max size | 50 MB |
Best for: Correspondence, DSAR communications, email trails, Outlook exports
Text-Based Formats
These formats support text extraction and text-based redaction. The redacted output is generated as a formatted PDF:
| Format | Extension | Structure Preserved | Redaction |
|---|---|---|---|
| Plain text | .txt | N/A | Text replacement |
| CSV | .csv | Row/column structure | Value replacement |
| Log files | .log | N/A | Text replacement |
| Markdown | .md | N/A | Text replacement |
| JSON | .json | JSON structure | Value replacement |
| XML | .xml | XML structure | Value replacement |
| HTML | .html | N/A | Text replacement |
| CSS | .css | N/A | Text replacement |
| JavaScript | .js | N/A | Text replacement |
Best for: System exports, configuration files, data dumps, log files, API responses
Complete Format Reference
| Category | Extensions | Count | AI Analysis | Visual Redaction |
|---|---|---|---|---|
| 1 | Yes | Yes | ||
| Word | .docx, .dotx, .docm, .dotm | 4 | Yes | Yes |
| Excel | .xlsx, .xlsm, .xltx, .xltm | 4 | Yes | Yes |
| Images | .png, .jpg, .jpeg, .gif, .bmp, .tiff, .tif, .webp | 8 | Yes (OCR) | Yes |
| .eml, .msg | 2 | Yes | Yes (PDF output) | |
| Text | .txt, .csv, .log, .md, .json, .xml, .html, .css, .js | 9 | Yes | Text replacement |
| Total | 28+ |
Format-Specific Considerations
PDF Files
Searchable PDFs (text-based)
- Fastest processing
- Best accuracy
- Text directly extractable
Scanned PDFs (image-based)
- Requires OCR processing
- Accuracy depends on scan quality
- Higher processing time
Tips for best results:
- Use high-resolution scans (300 DPI+)
- Ensure text is legible
- Avoid skewed or rotated pages
- Use contrast for clarity
Word Documents
Compatibility
- Best support for .docx (modern format)
- Template formats (.dotx, .dotm) fully supported
- Macro-enabled formats (.docm, .dotm) supported
- Complex formatting generally preserved
What’s Extracted and Redacted
- Body text
- Headers and footers
- Tables
- Comments and track changes
- Metadata (stripped in redacted output)
Excel Files
Processing
- All worksheets processed
- Cell values extracted
- Formulas preserved in unredacted cells
- Hidden rows/columns included
- Cell comments included
Redaction
- Cell-level redaction
- Original formulas preserved in unredacted cells
- Formatting maintained
- Metadata stripped from output
Images
OCR Processing
- Advanced OCR text extraction
- Multiple languages supported
- Handwriting has limited support
- Quality significantly affects accuracy
Best Practices
- Use high resolution images
- Ensure good lighting and contrast
- Avoid compression artifacts
- Straighten skewed images
Email Files
EML Processing
- Parsed using MIME standard
- Headers (From, To, CC, Subject, Date) extracted
- Plain text and HTML body extracted
- Attachments listed and can be processed separately
- Redacted output rendered as PDF
MSG Processing
- Automatically converted to EML at upload
- All Outlook metadata preserved during conversion
- Processed identically to EML after conversion
Text-Based Formats
Format-Aware Redaction
- JSON: Values redacted while preserving keys and structure
- XML: Element content redacted while preserving tags
- CSV: Cell values redacted while preserving structure
- Other text: Standard find-and-replace redaction
Encoding
- UTF-8 with BOM detection
- Encoding preserved in output
Unsupported Formats
The following formats are not currently supported for redaction:
| Format | Reason | Workaround |
|---|---|---|
| Password-protected files | Cannot access content | Remove password, then upload |
| Encrypted documents | Cannot decrypt | Decrypt first, then upload |
| .pst (Outlook archives) | Archive format | Export individual emails as .msg or .eml |
| Database files (.mdb, .sqlite) | Binary format | Export to CSV or Excel |
| Compressed archives (.zip, .rar) | Container format | Extract contents, upload individual files |
| Video/audio files | Not applicable | Extract transcripts as text |
| PowerPoint (.pptx) | Not yet supported | Export to PDF |
File Size Limits
| Plan | Per-File Limit | Total Storage |
|---|---|---|
| Basic | 50 MB | 5 GB |
| Starter | 50 MB | 50 GB |
| Pro | 50 MB | 200 GB |
Handling Large Files
If a file exceeds limits:
- Split into smaller sections
- Compress images (maintain quality)
- Remove unnecessary content
- Contact support for special cases
Upload Validation
All uploads are validated:
Security Checks
- Deep file type verification (actual content, not just extension)
- Content type validation
- Size compliance check
Rejection Reasons
“Invalid file type”
- Extension doesn’t match content
- File is corrupted
- Unsupported format
“File too large”
- Exceeds 50 MB limit
- Compress or split the file
Recommendations by Use Case
Access Requests (Article 15)
- PDF for formatted documents
- Excel for data exports
- Images for ID verification
- Email (.eml/.msg) for correspondence
Erasure Requests (Article 17)
- Excel/CSV for structured data identification
- JSON/XML for system data exports
- Email for communication records
Redaction Priority
- PDF for best visual redaction quality
- Word for editable documents
- Excel for tabular data
- Email for correspondence chains
- Text formats for system-generated data