ContextPreToolUse· Read

Binary file to Markdown converter

Read any PDF, DOCX or PPTX as clean Markdown — slash token usage on binary files

Intercepts Read tool calls on binary documents (PDF, DOCX, PPTX, ODT, RTF, XLSX, EPUB, HTML) and converts them to Markdown via pandoc or pdftotext before Claude processes them. Injects the converted content directly, eliminating the need for Claude to parse raw binary data and drastically reducing token consumption. Falls back silently if no conversion tool is available.

What does the Binary file to Markdown converter hook do?

Binary file to Markdown converter is a Claude Code PreToolUse hook matching Read. It fires automatically at that lifecycle event — outside the model, so it can't be skipped or forgotten. Read any PDF, DOCX or PPTX as clean Markdown — slash token usage on binary files.

As a PreToolUse hook it runs before the action completes, so it can block or adjust what Claude is about to do. Because it is a deterministic Node.js script, it executes on every matching event without relying on the model to remember — the guarantee that makes agentic workflows safe to automate.

Use cases

Reading PDF documentation without wasting tokens on binary encoding
Analysing Word or PowerPoint files in agentic workflows
Reducing context window usage when processing office documents

settings.json fragment

{
  "hooks": {
    "PreToolUse": [
      {
        "hooks": [
          {
            "command": "node $CLAUDE_PROJECT_DIR/.claude/hooks/file-to-markdown.mjs",
            "type": "command"
          }
        ],
        "matcher": "Read"
      }
    ]
  }
}

Script · .claude/hooks/file-to-markdown.mjs

#!/usr/bin/env node
// @hookstack pre-read-file-to-markdown
import { execSync } from "node:child_process";
// @hookstack pre-read-file-to-markdown
// Convertit PDF/DOCX/PPTX et autres fichiers binaires en Markdown avant lecture (PreToolUse Read)
import { existsSync, readFileSync } from "node:fs";
import { basename, extname } from "node:path";
import { fileURLToPath } from "node:url";

const MAX_CHARS = 50_000;

const SUPPORTED = new Set([
	"pdf",
	"docx",
	"pptx",
	"odt",
	"rtf",
	"doc",
	"ppt",
	"xlsx",
	"epub",
	"html",
	"htm",
]);

function defaultExec(cmd) {
	return execSync(cmd, { encoding: "utf8", timeout: 30_000 }).trim();
}

function hasBinary(name, exec) {
	try {
		exec(`which ${name}`);
		return true;
	} catch {
		return false;
	}
}

export function run(input, { exec = defaultExec, exists = existsSync } = {}) {
	if (input.tool_name !== "Read") return null;

	const filePath = input.tool_input?.file_path ?? "";
	if (!filePath) return null;

	const ext = extname(filePath).toLowerCase().replace(".", "");
	if (!SUPPORTED.has(ext)) return null;

	if (!exists(filePath)) return null;

	const hasPdftotext = ext === "pdf" && hasBinary("pdftotext", exec);
	const hasPandoc = hasBinary("pandoc", exec);

	if (!hasPdftotext && !hasPandoc) return null;

	let markdown;
	try {
		if (ext === "pdf" && hasPdftotext) {
			markdown = exec(`pdftotext "${filePath}" -`);
		} else if (hasPandoc) {
			markdown = exec(`pandoc --to markdown --wrap=none "${filePath}"`);
		} else {
			return null;
		}
	} catch {
		return null;
	}

	if (!markdown?.trim()) return null;

	let content = markdown.trim();
	let truncated = false;
	if (content.length > MAX_CHARS) {
		content = content.slice(0, MAX_CHARS);
		truncated = true;
	}

	const name = basename(filePath);
	const suffix = truncated ? ` (truncated to ${MAX_CHARS} chars)` : "";
	const header = `[file-to-markdown] \`${name}\` converted to Markdown${suffix}:\n\n`;

	return { decision: "block", reason: header + content };
}

/* v8 ignore next 5 */
if (process.argv[1] === fileURLToPath(import.meta.url)) {
	const input = JSON.parse(readFileSync(0, "utf8"));
	const result = run(input);
	if (result) process.stdout.write(JSON.stringify(result));
}

Binary file to Markdown converter

What does the Binary file to Markdown converter hook do?

Use cases

Tags

settings.json fragment

Script · .claude/hooks/file-to-markdown.mjs

Learn more

Related hooks