background
the reach of your content is a key consideration when running a technical blog. in particular, content written only in Korean has a limited reach, which limits knowledge sharing with developers around the world. in fact, developer demographics show that South Korea ranks in the bottom 15 of the top countries for developers, further emphasizing the need for multilingual support.
(See: 전문 개발자 수 기준 상위 15개국)
the need for automation
modern AI translation and traditional translation tools provide fairly accurate translations. however, it's inefficient to manually translate and upload every time you write a new article. especially if you want to provide not only English, but also Chinese and Japanese versions, which have a large number of developers, it's not practical to do it manually.
to solve this problem, we implemented an automatic translation script that utilizes the DeepL API.
development Environment Configuration
We installed the necessary packages to run Node.js scripts in the Next.js environment.
pnpm install deepl-node dotenv tsx
initially, I tried to use ts-node
, but it had configuration conflicts with the Next.js environment. instead, we set up a standalone execution environment using the tsx
library.
project Structure
first of all, my project has roughly the following structure.
.
├── src/
│ └── app/
│ └── posts/
│ └── [slug]/
│ └── page.tsx
├── posts/
│ └── post1.mdx
└── package.json
then I created a script like this
import fs from "fs/promises";
import path from "path";
import * as deepl from "deepl-node";
import matter from "gray-matter";
import dotenv from "dotenv";
dotenv.config();
const DEEPL_API_KEY = process.env.DEEPL_API_KEY!;
const translator = new deepl.Translator(DEEPL_API_KEY);
const SOURCE_DIR = "src/posts";
const TARGET_DIR = "src/posts/en";
interface PostContent {
content: string;
data: {
title: string;
description: string;
[key: string]: string;
};
}
async function translatePost(content: {
data: { [p: string]: string };
content: string;
}): Promise<PostContent> {
const translatedTitle = await translator.translateText(
content.data.title,
"ko",
"en-US",
);
const translatedDescription = await translator.translateText(
content.data.description,
"ko",
"en-US",
);
const translatedContent = await translator.translateText(
content.content,
"ko",
"en-US",
);
return {
content: translatedContent.text,
data: {
...content.data,
title: translatedTitle.text,
description: translatedDescription?.text,
originalLang: "ko",
},
};
}
async function processFile(filename: string) {
try {
const sourcePath = path.join(SOURCE_DIR, filename);
const targetPath = path.join(TARGET_DIR, filename);
// 파일 존재 여부 확인
try {
await fs.access(sourcePath);
} catch (error) {
throw new Error(`파일을 찾을 수 없습니다: ${filename}`);
}
// MDX 파일 읽기
const fileContent = await fs.readFile(sourcePath, "utf-8");
const { data, content } = matter(fileContent);
// 번역 실행
console.log(`${filename} 번역 중...`);
const translated = await translatePost({ data, content });
// 번역된 MDX 파일 생성
const translatedFileContent = matter.stringify(
translated.content,
translated.data,
);
await fs.mkdir(TARGET_DIR, { recursive: true });
await fs.writeFile(targetPath, translatedFileContent);
console.log(`${filename} 번역 완료!`);
} catch (error) {
console.error(`Error:`, error);
process.exit(1);
}
}
// 명령줄 인자에서 파일명 가져오기
const filename = process.argv[2];
if (!filename) {
process.exit(1);
}
// 파일 확장자 확인
if (!filename.endsWith(".mdx")) {
console.error("Error: MDX 파일만 지원됩니다.");
process.exit(1);
}
processFile(filename);
add the script frompackage.json
as well.
{
"scripts": {
"translate": "tsx scripts/translate-posts.ts"
}
}
running Result
now, if you type the command in the terminal
this will generate a translated MDX file.
problems
in our initial implementation, we sent the text of the MDX file directly to the DeepL API, but we found the following issues
- breaking Markdown syntax
- unnecessary translation of code blocks
- distorted image tag and link structure
- original
- japanese translation
workaround
i was wondering what to do and came up with the following ideas. first, I saw the part HTML handlingin the DeepL API documentation and realized that sending the text as HTML seemed to handle it well without breaking the form.
therefore, we implemented the following improved process to solve the problem mentioned above.
- MDX to HTML conversion
- Sending the DeepL API with the option to preserve HTML tags
- translated HTML → MDX reconversion
const convertMDXToHtml = async (markdown: string) => {
try {
const html = await unified()
.use(remarkParse)
.use(remarkHtml)
.process(markdown);
return html.toString();
} catch (err) {
console.error("MD => HTML 변환 중 오류가 발생했습니다: ");
return "error";
}
};
const convertHtmlToMDX = async (html: string) => {
try {
const markdown = await unified()
.use(rehypeParse)
.use(rehypeRemark)
.use(remarkStringify)
.process(html);
return markdown.toString();
} catch (err) {
console.error("HTML => MD 변환 중 오류가 발생했습니다: ");
return "error";
}
};
async function translatePost(
content: { data: { [p: string]: string }; content: string },
targetLang: TargetLanguageCode,
): Promise<PostContent> {
// 중략...
// md => html
const html = await convertMDXToHtml(content.content);
// html => translated html
const translatedContent = await translator.translateText(
html,
"ko",
targetLang,
{
tagHandling: "html",
},
);
// translated html => md
const mdx = await convertHtmlToMDX(translatedContent.text);
return {
content: mdx,
// ...
};
}
now the form is coming in correctly!
closing thoughts
with this automation implementation, we have completed the process of deploying multilingual versions of our blog posts without any difficulty. i'll continue to verify that the translated posts are translated as I intended, but for now, I'm excited to see if creating static files in multiple languages actually brings in more traffic for my SEO efforts!