[HN Gopher] Kagi Translate
___________________________________________________________________
Kagi Translate
Author : lkellar
Score : 173 points
Date : 2024-11-07 19:32 UTC (3 hours ago)
(HTM) web link (blog.kagi.com)
(TXT) w3m dump (blog.kagi.com)
| ziddoap wrote:
| > _Quality ratings based on internal testing and user feedback_
|
| I'd be interested in knowing more about the methodology here.
| People who use Kagi tend to _love_ Kagi, so bias would certainly
| get in the way if not controlled for. How rigorous was the
| quality-rating process? How big of a difference is there between
| "Average", "High" and "Very High"?
|
| I'm also curious to the 1 additional language that Kagi supports
| (Google is listed at 243, Kagi at 244)?
|
| > _Kagi Translate is free for everyone._
|
| That's nice!
| lcnPylGDnU4H9OF wrote:
| > I'm also curious to the 1 additional language that Kagi
| supports (Google is listed at 243, Kagi at 244)?
|
| I just copied all of the values from the select element on the
| page (https://translate.kagi.com/) and there's only 243. Now I
| genuinely wonder if it's Pig Latin.
| https://news.ycombinator.com/item?id=42080562
| banana_giraffe wrote:
| Also, notable, Google claims to support Inuktut and Tshiluba,
| and I don't see those two in Kagi.
| up6w6 wrote:
| I am very suspicious of the results. A few months ago they
| published a LLM benchmark, calling it "perfect" while it
| actually contained like only 50 inputs (academic benchmark
| datasets usually contain tens of thousands of inputs).
| ks2048 wrote:
| A quick scrape of the two sites gives (literally a diff of sets
| of the strings used in language selection),
|
| In Kagi, not Google: Crimean Tatar
| Santali
|
| In Google, not Kagi: Crimean Tatar (Cyrillic)
| Crimean Tatar (Latin) French (Canada) Inuktut
| (Latin) Inuktut (Syllabics) Santali (Latin)
| Santali (Ol Chiki) Tshiluba
|
| They really must have copied Google, because like I said this
| was diffing exact strings, meaning that slight variations of
| how the languages are presented don't exist.
| krackers wrote:
| How is this compared to using gpt-4 directly?
| burkaman wrote:
| I don't know how the translation quality compares, but the
| advantages to this would be that it's free and it can translate
| web pages in-place.
| Aachen wrote:
| And presumably the energy efficiency of a dedicated
| translator compared to a generic language system, assuming
| they didn't build this on top of a GPT. The blog post doesn't
| say but I'm assuming (perhaps that's no longer accurate) that
| it's prohibitively expensive for a small team without huge
| funding to build such a model as a side project
| elashri wrote:
| It varies depending on the language but I find GPT4o to be good
| into knowing the context and go sometimes with the intent not
| just the grammar and rules of the language. But for most cases
| it is an overkill and you still have the chance of
| hallucination (although it has less occurrence chances in these
| use cases)
|
| This is of course based on my experience using it between
| Arabic, English and French which is among the 5 most popular
| languages. Things might be dramatically different with other
| languages.
| ilaksh wrote:
| Have you compared gpt-4o to Kagi?
|
| They might actually be the same thing in some cases.
| gen3 wrote:
| Has anyone seen info on how this works? "It's not revolutionary"
| seems like an understatement when you can do better then DeepL
| and support more languages then google?
| kouteiheika wrote:
| I'm pretty sure it's just a finetuned LLM.
|
| I have some experience experimenting in this space; it's not
| actually that hard to build a model which surpasses DeepL, and
| the wide language support is just a consequence of using an LLM
| trained on the whole Internet, so the model picks up the
| ability to use a bunch of languages.
| ilaksh wrote:
| I'm almost sure they did not find tune an LLM. They are using
| existing LLMs because fine tuning to best the SOTA models at
| translation is impractical unless you target very niche
| languages and even then it would be very hard to get a better
| dataset than what is already used for those models.
|
| Probably all they are doing is like switching between some
| Qwen model (for Chinese) and large Llama or maybe OpenAI or
| Gemini.
|
| So they just have a step (maybe also an LLM) to guess which
| model is best or needed for the input. Maybe something really
| short and simple just goes to a smaller simpler less
| expensive model.
| freediver wrote:
| It uses a combination of LLMs, selecting the best output. (from
| the blog post)
| gen3 wrote:
| Ah, I missed that. Thank you!
| a2128 wrote:
| It just uses LLMs, I've had it output a refusal in the target
| language by entering stuff about nukes in the input
| leipert wrote:
| Kudos in the launch! Looking good!
|
| One benefit of Google Translate is with languages like Hebrew and
| Arabic, you can enter in those languages phonetically or with on-
| screen keyboards.
| ks2048 wrote:
| > Limitations
|
| > We do not translate dynamically created content ...
|
| What does that mean?
| agluszak wrote:
| I would guess it's only able to translate the html content sent
| on page load - so static webpages, but not SPAs etc.
| jsheard wrote:
| I assume it means they only translate what's in the HTML, not
| anything that's added via Javascript later.
| freedomben wrote:
| Indeed, that's what would make most sense to me.
|
| I also strongly suspect the way they're able to make it free
| is by caching the results, so each translation only happens
| one time regardless of how many requests for the page happen.
| If they translated dynamic content, they couldn't (safely)
| cache the results.
| kevincox wrote:
| I don't think JS vs HTML would make any difference to
| caching.
|
| If they are caching by URL you can have dynamic HTML
| generation or a JS generated page that is the same on every
| load.
|
| If you are caching by the text then you can do the same for
| HTML or JS generated (you are just reading the text out of
| the DOM when the JS seems done).
| ks2048 wrote:
| Ah, that makes sense. In my head it sounded like server-side
| dynamic content OR not wanting to translate LLM outputs,
| neither of which makes sense or is possible.
| _kidlike wrote:
| that's what I think too, which kinda makes sense since it's a
| page, and not a browser plugin. If they implemented a browser
| plugin that would do what Google recently removed from their
| plugin, that would be a killer feature. (assuming they can
| then translate all html as it comes in)
|
| Brave browser does it already though, but sometimes it's
| unusably slow.
| Aachen wrote:
| Is that a relevant username, or is J your initial? I can't
| quite place what "JavaScript heard" would mean. I've wondered
| before but there's no contact in your profile and now it felt
| at least somewhat related to the comment itself, sorry for
| being mostly off-topic
| jsheard wrote:
| It's an initial :p
| Aachen wrote:
| Mystery solved! Thanks for obliging my curiosity :)
| ohmahjong wrote:
| Disclaimer: I am already a Kagi customer.
|
| At least for Afrikaans I'm not impressed here. There are some
| inaccuracies, like "varktone" becoming "pork rinds" instead of
| "pig toes" and also some censorship ("jou ma se poes" does NOT
| mean "play with my cat"!). Comparing directly against Google
| Translate, Google nails everything I threw at it.
|
| I didn't see any option to provide feedback, suggested
| translations, etc, but I'm hopeful that this service improves.
| burkaman wrote:
| This is the link they gave for feedback:
| https://kagifeedback.org/d/5305-kagi-translate-feedback/4
| wongarsu wrote:
| Just tried translating your comment to German. Kagi took a very
| literal approach, keeping sentence structure and word choice
| mostly the same. Google Translate and DeepL both went for more
| idiomatic translations.
|
| However translating some other comments from this thread, there
| are cases where Kagi outperforms others on correctness. For
| example one comment below talks about "encountering multiple
| second page loads". Google Translate misunderstands this as
| "encountering a second page load multiple times" while DeepL
| and Kagi both get it right with "encountering page loads of
| multiple seconds" (with DeepL choosing a slightly more
| idiomatic wording)
| epoxia wrote:
| I asked some inappropriate things and it was "translated" to I
| cannot assist with that request. It definitely needs to be more
| clear when it's refusing to translate. But, then again, I don't
| even use kagi.
| GaggiX wrote:
| Maybe they are using Claude API for the translation, Claude
| models are really good multilingual models.
|
| EDIT: the "Limitations" section report the use of LLMs
| without specifying the models used.
| FurkanKambay wrote:
| "The game is my poem" when back-translated from the Turkish
| translation, "oyun benim siirimdir". And there's censorship too
| when doing EN-TR for a few other profanities I tested. When you
| add another particular word to the sentence, it outputs "play
| with my cat, dad".
| dlkmp wrote:
| Just as a quick usability feedback: As long as Deepl translates
| asynchronously as I type, while Kagi requires a full form send &
| page refresh, I am not inclined to switch (translation quality is
| also already too good for my language pairs to consider switching
| for minor improvements, but the usability/ speed is the real
| feature here).
|
| This is coming from a user with existing Kagi Ultimate
| subscription, so I'm generally very open to adopt another tool if
| it fits my needs).
|
| Slightly offtopic, slight related: As already mentioned the last
| time Kagi hit the HN front page when I saw it: the best
| improvement I could envision for kagi is improved search
| performance (page speed). I still encounter multiple second page
| loads far too frequently that I didn't notice with other search
| engines.
| Aachen wrote:
| Interesting, I'm actually annoyed that DeepL sends every
| keystroke and I'm using idk how many resources on their end
| when I'm just interested in the result at the end and for DeepL
| to receive the final version I want to share with them
|
| That it's fast, you don't have to wait much between finishing
| typing and the result being ready, that's great and probably
| better than any form system is likely to be. But if it could be
| a simple enter press and then async loading the result, that
| sounds great to me
| czottmann wrote:
| I uninstalled the DeepL extension because it would load all its
| assets (fonts etc) into every. single. page. No matter the
| host.
|
| Unacceptable.
| burkaman wrote:
| This will be a paid feature apparently:
| https://kagifeedback.org/d/5305-kagi-translate-feedback/9
| freediver wrote:
| > As long as Deepl translates asynchronously as I type, while
| Kagi requires a full form send & page refresh,
|
| This leads to increased cost and we wanted to keep service
| free. But yes we will introduce translate as your type (will be
| limited to paid Kagi members).
| pentacent_hq wrote:
| I recently noticed that Google Translate and Bing have trouble
| translating the German word "Orgel" ("organ", as in "church
| organ", not as in "internal organs") to various languages such as
| Vietnamese or Hebrew. In several attempts, they would translate
| the word to an equivalent of "internal organs" even though the
| German word is, unlike the English "organ", unambiguous.
|
| Kagi Translate seems to do a better job here. It correctly
| translates "Orgel" to "dan organ" (Vietnamese) and "`vgb"
| (Hebrew).
| ynoxinul wrote:
| Google Translate often translates words through English.
| Aachen wrote:
| DeepL also, for the record (since it's being compared in the
| submission)
|
| It's pretty clear if you use the words out of context and
| they're true friends but it gets you the German translation
| of the English translation of whatever Dutch thing you put
| in. I also heard somewhere, perhaps when interviewing with
| DeepL, that they were working towards / close to not needing
| to do that anymore, but so far no dice that I've noticed and
| it has been a few years
| o11c wrote:
| If you write the input in Pig Latin, Kagi detects it as English
| but translates it correctly.
|
| Bing detects it as English but leave it unchanged.
|
| Google detects it as Telegu and gives a garbage translation.
|
| ChatGPT detects it as Pig Latin and translates it correctly.
| jabroni_salad wrote:
| Looks like the page translator wants to use an iframe, so of
| course the x-frame-options header of that page will be the
| limiting factor.
|
| > To protect your security, note.com will not allow Firefox to
| display the page if another site has embedded it. To see this
| page, you need to open it in a new window.
|
| This is a super common setting and it's why I use a browser
| extension instead.
| I_am_tiberius wrote:
| I find it useless without an option to add context to the text I
| want to translate.
| Aachen wrote:
| What do you mean? Does any other translator have such a
| separate field that you could point to, or could you explain
| what you're missing?
|
| When I want to give DeepL context, I just write it in the
| translation field (also, because it's exceptionally bad at
| single word translations, I do it even if the word should be
| unambiguous), so not type in "Katze" but "die Katze schnurrt"
| (the cat purrs). Is that the kind of thing you mean?
| Aachen wrote:
| I can't use it because I'm not classified as "human" by a
| computer. There is no captcha that I could get wrong, just a
| checkbox that probably uses a black box model to classify me
| automatically
|
| Was curious after the post claimed that the quality is better
| than Google and DeepL, but the current top comment showed
| translations from Afrikaans that it got wrong but I could
| understand as a Dutch person who doesn't even speak that language
| (so it's not like seven levels of negation and colloquialisms
| that they broke it on)
|
| What do I do with this "Error Code: 600010"? I've submitted a
| "report" but obviously they're not going to know if those reports
| are from a bot author frustrated with the form or me, a paying
| customer of Kagi's search engine. The feedback page linked in the
| blog post has the same issue: requires you to log in before being
| able to leave feedback, but "We couldn't verify if you're a robot
| or not." The web is becoming more fragmented and unusable every
| day...
| kunwon1 wrote:
| I had tons of issues with these Cloudflare checkboxes. I
| finally figured out it was because I use this extension [1]
| that disables HTML5 autoplay. I assume Cloudflare is doing some
| kind of thing where they verify that the client can playback
| media, as they assume that headless browsers or crawlers won't
| have that capability
|
| [1] https://addons.mozilla.org/en-US/firefox/addon/disable-
| autop...
| freediver wrote:
| > I can't use it because I'm not classified as "human" by a
| computer.
|
| It uses Cloudflare Turnstile captcha.
|
| The service shows no captcha to logged in Kagi users, so you
| can just create a (trial) Kagi account.
| Aachen wrote:
| Thanks, but I am logged in and it still shows that. Clicking
| log in at the top of the page leads me to the login page
| which takes about 10 seconds to (while I'm typing) realise
| that I'm already logged in and then redirects me to the
| homepage (kagi search)
|
| I don't have any site-specific settings and clearly HN works
| fine (as well as other sites) so it's not that cookies are
| disabled or such
|
| Edit: come to think of it, I'm surprised that you find
| translator data to be more sensitive (worth sticking behind a
| gatekeeper) than user logins. Must have been a lot of work to
| develop this intellectual property. There is no Cloudflare
| check on the login page. Not that I'd want to give you ideas,
| though! :-)
| freediver wrote:
| > come to think of it, I'm surprised that you find
| translator data to be more sensitive (worth sticking behind
| a gatekeeper) than user logins. Must have been a lot of
| work to develop this intellectual property. There is no
| Cloudflare check on the login page.
|
| This is just a simple anti-bot measure so we do not get
| hammered by them to death (kagi does not have an infinite
| treasure chest). It is not needed for search, because you
| can not use search for free anyway.
| Aachen wrote:
| I see, that makes sense!
| ziddoap wrote:
| > _What do I do with this "Error Code: 600010"?_
|
| Cloudfare, the gatekeeper of the internet, strikes again.
|
| The usual suspects are VPN or proxy, javascript, cookies, etc.
|
| https://developers.cloudflare.com/turnstile/troubleshooting/...
|
| Unfortunately, even with the error code, I doubt the above page
| will help much.
| baxtr wrote:
| Interesting. Never had such an issue with Google. How do they
| do it?
| spiderfarmer wrote:
| I would love to see an API to compete with DeepL.
| erinnh wrote:
| Kagi develops lots of features, but they seem to often be
| quarter-baked.
|
| Maps for example is basically unusable and has been for a while.
| (at least in Germany)
|
| Trying to search for an address often leads Kagi maps to go to a
| different random address.
|
| Still love the search, but Id love for Kagi to concentrate on one
| thing at a time.
| Aachen wrote:
| Where do I find the map feature?
|
| I'm curious to see if I can identify what data source and
| search software it is based on, since I've heard similar
| complaints about Nominatim and it is indeed finicky if you made
| a typo or don't know the exact address; it does no context
| search based on the current view afaik. Google really does do
| search _well_ compared to the open source software I 'm partial
| to, I gotta give them that
|
| Edit: ah if you horizontally scroll on the homepage there's a
| "search maps" thing. Putting in a street name near me that's
| unique in the world, it comes up with a lookalike name in
| another country. Definitely not any OpenStreetMap-based product
| I know of then, they usually aren't unliteral like that. Since
| the background map is Apple by default, I guess that's what the
| search is as well
| maronato wrote:
| It's in Search. It's one of the types of search you can
| perform. Below the search input is a bar with "Images",
| "Videos", "News", and "Maps".
|
| Can also be found here:
|
| https://kagi.com/maps
| freediver wrote:
| We are focusing most our resources on search (which I hope you
| can agree, we are doing a pretty good job at). And it turns out
| search is not enough and you need other things - like maps (or
| a browser, because some browsers will not let you change search
| engine and our paid users can not use the service). Both are
| also incredibly hard to do right. If it appears quarter-baked
| (and I am first to say that we can and will definetely keep
| imporivng improving with our products), it is not for the lack
| of trying or ambition but the lack of resources. Kagi is 100%
| user-funded. So we need users, and we sometimes work on tools
| that do not bring us money directly, but bring us users (like
| Small Web, Universal Summarizer or Translate). It is all part
| of the plan. And it is a decade-long plan.
| exi1up wrote:
| I could be missing something, but is there some sort of metric
| for these comparisons to other software? Like the BLEU score
| which I've seen in studies relating to comparing LLMs to Google
| Translate. I find it difficult to believe it is better than DeepL
| in a vacuum.
| ninalanyon wrote:
| That's odd. Clicking the switch languages icon swaps the
| languages but not the texts.
| eduction wrote:
| Has anyone else noticed that Google Translate trips up a lot on
| GDPR cookie consent dialogs in Europe? I've often had to
| copy/paste the content of a web page because Google, when given
| the URL,couldn't navigate past the dialog to get to the page
| content (or couldn't allow me to dismiss it). Not sure if Kagi
| has solved this.
| unsupp0rted wrote:
| This is good. I wish it handled you-singular vs. you-polite-
| plural though.
|
| It would be nice to say "use a casual tone". Or "the speaker is a
| woman and the recipient is a man".
| gagabity wrote:
| Some bugs to iron out
|
| "Document Too Long Document is too long to process. It contains
| 158 chunks, but the maximum is 256. Please try again later or
| contact support if the problem persists."
| freediver wrote:
| Fixed, thanks for reporting.
| Decoy1008 wrote:
| I doubt it is better than deepl or google. On some tests it
| couldn't recognize the correct language.
| somat wrote:
| Added to my list, very nice.
|
| One thing I like about google translate that nether deepl or this
| do is tell me how to say the word. I mainly use it to add a
| reading hint to an otherwise opaque japanese title in a database.
___________________________________________________________________
(page generated 2024-11-07 23:00 UTC)