Docs / Supported Formats

Supported Formats

SAR Portal supports a wide range of document formats for upload, AI analysis, and redaction — over 28 file types across 6 categories.

Fully Supported Formats

These formats have full support for upload, AI analysis, and visual redaction:

PDF Documents

Aspect Support
Extension .pdf
Text extraction Full
OCR (scanned PDFs) Yes
Visual redaction Yes
Searchable output Yes
Max size 50 MB
Max pages ~500

Best for: Reports, contracts, correspondence, exported records

Microsoft Word

Aspect Support
Formats .docx, .dotx, .docm, .dotm
Text extraction Full
Visual redaction Yes
Formatting preserved Yes
Headers/footers Redacted
Tables & comments Redacted
Track changes Accepted and redacted
Metadata Stripped on redaction
Max size 50 MB

Best for: Letters, policies, contracts, memos, templates

Legacy .doc format
Legacy .doc files can be uploaded and stored but require conversion to .docx for full redaction support. We recommend converting to .docx before uploading.

Microsoft Excel

Aspect Support
Formats .xlsx, .xlsm, .xltx, .xltm
Text extraction Full
Cell-level redaction Yes
Formulas preserved Yes (in unredacted cells)
Multiple worksheets Yes
Cell comments Redacted
Metadata Stripped on redaction
Max size 50 MB

Best for: Data exports, spreadsheets, reports, logs, templates

Legacy .xls format
Legacy .xls files can be uploaded and stored but require conversion to .xlsx for full redaction support. We recommend converting to .xlsx before uploading.

Images

Format Extension OCR Support Redaction
PNG .png Yes Yes
JPEG .jpg, .jpeg Yes Yes
GIF .gif Yes Yes
BMP .bmp Yes Yes
TIFF .tiff, .tif Yes Yes
WebP .webp Yes Yes

Best for: Scanned documents, screenshots, photographs, ID verification

Email Files

Aspect Support
Formats .eml, .msg
Text extraction Full (headers + body)
Header redaction Yes (To, From, CC, Subject)
Body redaction Yes
Attachment handling Extracted and processed separately
Output format Redacted PDF
Max size 50 MB

Best for: Correspondence, DSAR communications, email trails, Outlook exports

MSG auto-conversion
Outlook .msg files are automatically converted to .eml format during upload for processing. No manual conversion needed.

Text-Based Formats

These formats support text extraction and text-based redaction. The redacted output is generated as a formatted PDF:

Format Extension Structure Preserved Redaction
Plain text .txt N/A Text replacement
CSV .csv Row/column structure Value replacement
Log files .log N/A Text replacement
Markdown .md N/A Text replacement
JSON .json JSON structure Value replacement
XML .xml XML structure Value replacement
HTML .html N/A Text replacement
CSS .css N/A Text replacement
JavaScript .js N/A Text replacement

Best for: System exports, configuration files, data dumps, log files, API responses

Structured format handling
JSON, XML, and CSV files receive format-aware redaction that preserves document structure while replacing PII values. Other text formats use standard text replacement.

Complete Format Reference

Category Extensions Count AI Analysis Visual Redaction
PDF .pdf 1 Yes Yes
Word .docx, .dotx, .docm, .dotm 4 Yes Yes
Excel .xlsx, .xlsm, .xltx, .xltm 4 Yes Yes
Images .png, .jpg, .jpeg, .gif, .bmp, .tiff, .tif, .webp 8 Yes (OCR) Yes
Email .eml, .msg 2 Yes Yes (PDF output)
Text .txt, .csv, .log, .md, .json, .xml, .html, .css, .js 9 Yes Text replacement
Total 28+

Format-Specific Considerations

PDF Files

Searchable PDFs (text-based)

Scanned PDFs (image-based)

Tips for best results:

Word Documents

Compatibility

What’s Extracted and Redacted

Excel Files

Processing

Redaction

Images

OCR Processing

Best Practices

Email Files

EML Processing

MSG Processing

Text-Based Formats

Format-Aware Redaction

Encoding

Unsupported Formats

The following formats are not currently supported for redaction:

Format Reason Workaround
Password-protected files Cannot access content Remove password, then upload
Encrypted documents Cannot decrypt Decrypt first, then upload
.pst (Outlook archives) Archive format Export individual emails as .msg or .eml
Database files (.mdb, .sqlite) Binary format Export to CSV or Excel
Compressed archives (.zip, .rar) Container format Extract contents, upload individual files
Video/audio files Not applicable Extract transcripts as text
PowerPoint (.pptx) Not yet supported Export to PDF

File Size Limits

Plan Per-File Limit Total Storage
Basic 50 MB 5 GB
Starter 50 MB 50 GB
Pro 50 MB 200 GB

Handling Large Files

If a file exceeds limits:

  1. Split into smaller sections
  2. Compress images (maintain quality)
  3. Remove unnecessary content
  4. Contact support for special cases

Upload Validation

All uploads are validated:

Security Checks

Rejection Reasons

“Invalid file type”

“File too large”

Recommendations by Use Case

Access Requests (Article 15)

Erasure Requests (Article 17)

Redaction Priority