
A deep(L)er look into DeepL's translation interface
I recently built a web text editor with LLM integration. It is a simple editable div. You can write text with basic markdown support like *
and **
for bold and italic.
I wanted to implement a feature to automatically open a context menu when moving the mouse above a sentence. The context menu should then be used to trigger a LLM to rewrite the sentence.
The text is stored as markdown without any deeper structure.
I already knew that DeepL uses such a feature on a word basis (instead of whole sentences) on their translation web interface. Probably competitors like Grammarly and LanguageTooler have similar web interfaces but I decided to stick with DeepL.
I wanted to adapt this feature. Using it felt like they did a great job implementing it. I wanted to know what kind of magic they are using… Drum roll…
DeepL’s hover implementation
Simplified it looked like this for JavaScript:
const translatedText = "...";
var renderedHTML = translatedText.replace(" ", "</span> <span class='highlight-element'>");
renderedHTML = "<span class='highlight-element'>" + renderedHTML;
renderedHTML = renderedHTML.sliced(0, -1 * ("<span class='highlight-element'>").length);
const translatedTextContainer = document.getElementById("#translated-box");
translatedTextContainer.innerHtml = renderedHTML;
and the CSS part:
.highlight-element:hover {
background-color: rgb(207 231 255);
/* other stuff.. */
}
They made use of the hover
state and embedded each of the words in its own span
-tag, which then - on hover - triggers the style change. Simple and straightforward. However, not scalable. When you insert a lot1 of text into the translator, you’ll notice how laggy the interface gets2. For me it looks like DeepL knows that they’re mostly serving only a few sentences per translation request and not more. Why would you over-engineer something almost no one uses? And from a user perspective: If I know that the interface gets laggy when I enter too much text, I split the text in parts and translate one after another. So it seems like no big deal and still feels like they cut corners.
Mouse Events
When you click on a (already highlighted) word, you get a context menu showing similar words. Deepl uses a click event listener to
- detect the selected / clicked element
- show the context menu
They could calculate the position of the mouse and compare this with each of the word span containers. They can also traverse the DOM:
function onClick(event) {
const allElements = document.querySelectorAll('*');
const hoveredElements = [];
allElements.forEach(el => {
if (el.matches(':hover')) {
hoveredElements.push(el);
}
});
if(hoveredElements.length === 0) {
//click event not relevant
return;
}
if(hoveredElements.length > 1) {
//That's odd - might happen du to programmatically setting the hover state on some elements
return;
}
showContextMenu(hoveredElements[0]);
}
Sentence Detection is hard
Compared to DeepLs interface, my implementation should differ in one important thing:
- Translate whole sentences instead of words and support simple text highlighting (bold, italic, underlined, etc.)
This requires pre-processing the raw text on each text insert and delete. I built a regex to detect sentences. This is a problem that’s harder than I thought in the beginning. A sentence might end with a . , ! , ? , : , “. It might also end with .. , … or a new line character, as seen in this example:
Sibling 1: Mom is dead.
Sibling 2: WHAT THE FUCK...
A sentence also might not end with a ., if not followed by a new line or space character. If I have a method call in my sentence, like object1.destroy()
, I do not want this to be split into two sentences.
This is getting out of hands rather quickly.
And this pre-processing code must be run on every text change. This would way faster reach a laggy state than the DeepL use case. DeepL’s hover code must be run only once when the user enters text for translation. After that, there’s almost no need to enter text into the translation result box.
My pre-processing function suddenly looks like this:
private parse(text: string) {
//Replace newlines with <br>
const reg = /(["']?[A-Z][^.?!]+((?![.?!]['"]?\\s["']?[A-Z][^.?!]).)+[.?!'"]+)/;
const r = text
.replace(/\*\*(.+?)\*\*/g, '<strong>$1</strong>')
.replace(/_(.+?)_/g, '<em>$1</em>')
.replace(/\b[^.!?\n]+[.!?]*|\n+/g, (match) => {
if (match === '\n') {
return '</span><br><span>';
} else {
return `<span class="p-.5 hover:bg-slate-300">${match}</span>`;
}
});
return r;
}
Using this was sufficient for me even though it’s far from perfect. It’s interesting to see that large companies don’t mind laggy UIs. I do understand that there’s no need to provide a perfect solution for a problem that is no real problem. I still wished to learn more from looking at the implementation of DeepLs word hover effect. But as always: simple is simply the best.