ConverterToMarkdown converts any file to Markdown directly in your browser using specialized JavaScript libraries — no installation, no server, no upload required. Supports 15 formats: DOCX, PDF, XLSX, HTML, CSV, JSON, XML and images with automatic OCR via Tesseract.js.
DOCXConverts to intermediate HTML using mammoth.js, preserving headings (h1–h6), bold, italic, tables and lists. The HTML is then cleaned and converted to Markdown with Turndown. Images are skipped; only text content is converted.
PDFExtracts text from each page using pdf.js. If the PDF contains no extractable text (scanned PDF), automatically falls back to OCR with Tesseract.js page by page — same as with image files. Headers and footers may merge with body text depending on the PDF structure.
XLSX / XLSReads the workbook with SheetJS and converts each sheet into a separate Markdown table with pipe-delimited columns. Multi-sheet files produce multiple tables, each labeled with the sheet name. Formulas are resolved to their current values.
HTMLStrips inline styles, scripts, navigation elements and visual noise with DOMParser before passing the cleaned HTML to Turndown. Preserves semantic structure: headings, paragraphs, links, emphasis, blockquotes and code blocks.
CSVParses CSV files with PapaParse, auto-detecting delimiter (comma, semicolon, tab). Outputs a Markdown table with header row detection. Supports large files with hundreds of rows.
TXT / MDReturns plain text without transformation. Line breaks are preserved as-is.
JSONValidates the JSON structure and wraps the formatted output in a fenced code block with json syntax highlighting. Handles nested objects, arrays, minified JSON and malformed input.
XMLWraps the raw XML content in a fenced code block preserving indentation and structure. Useful for inspection and documentation purposes.
JPG / PNG
WEBP / BMP
GIFRuns OCR (optical character recognition) in the browser using Tesseract.js. Automatically detects the language from browser settings and loads the matching language model. Supports JPG, PNG, WEBP, BMP and GIF. The language model (~4 MB) is downloaded once and cached. Works well with printed text; handwritten content may have lower accuracy.