Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change US-specific spellings #590

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions applications/seo.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ title: SEO
description: Machine translation for SEO
---

Machine translation for **search engine optimization** \(**SEO**\) is the translation of [commerce and marketplace](/commerce-and-marketplaces) website content into the languages in which users are searching.
Machine translation for **search engine optimisation** \(**SEO**\) is the translation of [commerce and marketplace](/commerce-and-marketplaces) website content into the languages in which users are searching.

Translations for search engine optimization is challenging for machine translation because the end goal is not just to convey the meaning, but to use the words that the users actually search for in the target language.
Translations for search engine optimisation is challenging for machine translation because the end goal is not just to convey the meaning, but to use the words that the users actually search for in the target language.

Short input, like keywords and tags, and non-sentence input, like lists of keywords, are also a challenge for machine translation.

Content can be purely machine-translated, [hybrid-translated](/hybrid-translation) or human [post-edited](/post-editing).
Search engines can penalize machine-generated content, including purely machine-translated content.
Search engines can penalise machine-generated content, including purely machine-translated content.

### Content types

Expand Down
6 changes: 3 additions & 3 deletions building-and-research/metrics/bertscore.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ BERTScore was invented as an improvement on [n-gram](/n-gram)-based metrics like
>
> [...] First, such methods often fail to robustly match paraphrases.
>
> [...] Second, n-gram models fail to capture distant dependencies and penalize semantically-critical ordering changes.
> [...] Second, n-gram models fail to capture distant dependencies and penalise semantically-critical ordering changes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are quotations taken directly from the paper. In this and several other cases throughout this PR, there are instances of literal quotations. I'm not sure about changing their spellings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we should not.

>
> For example, given a small window of size two, BLEU will only mildly penalize swapping of cause and effect clauses (e.g. A because B instead of B because A), especially when the arguments A and B are long phrases.
> For example, given a small window of size two, BLEU will only mildly penalise swapping of cause and effect clauses (e.g. A because B instead of B because A), especially when the arguments A and B are long phrases.
>
> In contrast, contextualized embeddings are trained to effectively capture distant dependencies and ordering.
> In contrast, contextualised embeddings are trained to effectively capture distant dependencies and ordering.
>
> [*BERTScore: Evaluating Text Generation with BERT*](#resources)

Expand Down
4 changes: 2 additions & 2 deletions building-and-research/metrics/chrF.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ Metrics based on word n-grams are especially problematic for high-morphology lan

chrF was introduced in 2015 by Maja Popović.

The chrF metric compares the machine translation output with reference translations, looking at character sequences. Character sequences matching help in recognizing different forms of a single word.
The chrF metric compares the machine translation output with reference translations, looking at character sequences. Character sequences matching help in recognising different forms of a single word.

> It is language-independent, tokenisation-independent and it shows good correlations with human judgments both on the system- as well as with on the segment-level, especially the CHRF3 variant.
> It is language-independent, tokenisation-independent and it shows good correlations with human judgements both on the system- as well as with on the segment-level, especially the CHRF3 variant.
>
> [*chrF: character n-gram f-score for automatic MT evaluation*](#resources)

Expand Down
2 changes: 1 addition & 1 deletion building-and-research/metrics/comet.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: COMET
description: Evaluation metric using embeddings
---

**COMET** (**Crosslingual Optimized Metric for Evaluation of Translation**) is a metric for automatic evaluation of machine translation that calculates the similarity between a machine translation output and a reference translation using token or sentence embeddings.
**COMET** (**Crosslingual Optimised Metric for Evaluation of Translation**) is a metric for automatic evaluation of machine translation that calculates the similarity between a machine translation output and a reference translation using token or sentence embeddings.

It is based on similarity of vector representations.

Expand Down
2 changes: 1 addition & 1 deletion building-and-research/metrics/ter.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: Translation Error Rate

**Translation Error Rate** (**TER**) is a metric for automatic evaluation of machine translation that calculates the number of edits required to change a machine translation output into one of the references.

> TER is defined as the minimum number of edits needed to change a hypothesis so that it exactly matches one of the references, normalized by the average length of the references.
> TER is defined as the minimum number of edits needed to change a hypothesis so that it exactly matches one of the references, normalised by the average length of the references.
>
> [*A Study of Translation Edit Rate with Targeted Human Annotation*](#resources)

Expand Down
2 changes: 1 addition & 1 deletion concepts/string.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ A string is a sequence of characters.
In computer programming, strings are used to represent text.

By default, a string is _plain text_.
It does not represent rich formatting or styles, like font size and color.
It does not represent rich formatting or styles, like font size and colour.

- A string with no characters is the _empty string_.
- A string can contain even just a single character: `a`
Expand Down
2 changes: 1 addition & 1 deletion events/ai-opportunities-and-risk.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ seo:
The **Rise of the Machines: Balancing Language-Related AI Opportunities and Risks** webinar took place online on 25 April 2023.
It was organised by [Omniscien](/companies#omniscien-technologies).

> ChatGPT 3.5 launched in November 2022, revolutionizing AI with sophisticated, accessible capabilities for all. While related to specialized AI like NMT and ASR, its adaptability sets it apart. The media spotlight highlighted opportunities and risks, with enterprises seizing opportunities and governments considering regulation. Italy banned ChatGPT, and many stakeholders raised ethical concerns. As hype subsides, realism and responsibility reemerge, as seen in Samsung's data loss incident. We explore secure, private language AI usage and discuss AI's future in augmenting human processes and secure enterprise applications.
> ChatGPT 3.5 launched in November 2022, revolutionising AI with sophisticated, accessible capabilities for all. While related to specialized AI like NMT and ASR, its adaptability sets it apart. The media spotlight highlighted opportunities and risks, with enterprises seizing opportunities and governments considering regulation. Italy banned ChatGPT, and many stakeholders raised ethical concerns. As hype subsides, realism and responsibility reemerge, as seen in Samsung's data loss incident. We explore secure, private language AI usage and discuss AI's future in augmenting human processes and secure enterprise applications.

### Speakers

Expand Down
2 changes: 1 addition & 1 deletion events/coco4mt-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ seo:

The first workshop on Corpus Generation and Corpus Augmentation for Machine Translation (**CoCo4MT**) was co-located with [AMTA 2022](/amta2022#workshop-on-corpus-generation-and-corpus-augmentation-for-machine-translation) on 16 September 2022.

It was the first workshop centered around research focusing on corpora creation, cleansing, and augmentation techniques specifically for machine translation.
It was the first workshop centred around research focusing on corpora creation, cleansing, and augmentation techniques specifically for machine translation.

Topics (not limited):

Expand Down
4 changes: 2 additions & 2 deletions events/eamt2020.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ seo:
type: VirtualLocation
url: https://eamt.org

organizer:
eder:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

type: Organization
name: European Association of Machine Translation
url: https://eamt.org
---

The 22nd Annual Conference of the **European Association of Machine Translation 2022** (**[EAMT](/eamt) 2020**) was hosted online from 3 November to 5 November, 2020.

The event was organized by Unbabel, INESC-ID and Instituto Superior Técnico.
The event was organised by Unbabel, INESC-ID and Instituto Superior Técnico.

[EAMT2020](https://eamt2020.inesc-id.pt/)

Expand Down
8 changes: 4 additions & 4 deletions events/eamt2024.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ EAMT 2024 will be jointly organised by EAMT, ZOO Digital and the University of S
- Technologies for machine translation deployment: quality estimation, domain adaptation, etc.
- Resources and evaluation
- Machine translation in special settings: low resources, massive resources, high volume, low computing resources
- Machine translation applications: translation/localization aids, speech translation, multimodal machine translation, machine translation for user generated content (blogs, social networks), machine translation in computer-aided language learning, etc.
- Machine translation applications: translation/localisation aids, speech translation, multimodal machine translation, machine translation for user generated content (blogs, social networks), machine translation in computer-aided language learning, etc.
- Linguistic resources for machine translation: corpora, terminologies, dictionaries, etc.
- Machine translation evaluation techniques, metrics, and evaluation results
- Human factors in machine translation and user interfaces
- Related multilingual technologies: natural language generation, information retrieval, text categorization, text summarization, information extraction, optical character recognition, etc.
- Related multilingual technologies: natural language generation, information retrieval, text categorisation, text summarization, information extraction, optical character recognition, etc.


#### Research: translators & users
Expand All @@ -109,13 +109,13 @@ EAMT 2024 will be jointly organised by EAMT, ZOO Digital and the University of S

- Integrating or optimising machine translation and computer-assisted translation in translation production workflows (translation memory/machine translation thresholds, mixing online and offline tools, using interactive machine translation, dealing with machine translation confidence scores)
- Managing change when implementing and using machine translation (e.g. switching between multiple machine translation systems, limiting degradations when updating or upgrading an machine translation system)
- Implementing open-source machine translation (e.g. strategies to get support, reports on taking pilot results into full deployment, examples of advanced customization sought and obtained thanks to the open-source paradigm, collaboration within open-source machine translation projects)
- Implementing open-source machine translation (e.g. strategies to get support, reports on taking pilot results into full deployment, examples of advanced customisation sought and obtained thanks to the open-source paradigm, collaboration within open-source machine translation projects)
- Evaluating machine translation in a real-world setting (e.g. error detection strategies employed, metrics used, productivity or translation quality gains achieved)
- Ethical and confidentiality issues when using machine translation, especially machine translation in the cloud
- Using machine translation in social networking or real-time communication (e.g. enterprise support chat, multilingual content for social media)
- Machine translation and usability
- Implementing machine translation to process multilingual content for assimilation purposes (e.g. cross-lingual information retrieval, machine translation for e-discovery or spam detection, machine translation for highly dynamic content)
- Machine translation in literary, audiovisual, game localization and creative texts
- Machine translation in literary, audiovisual, game localisation and creative texts
- Impact of machine translation and post-editing on translation practices and the profession: processes, effort, compensation,
- Psycho-social aspects of machine translation adoption (ergonomics, motivation, and social impact on the profession)
- Error analysis and post-editing strategies (including automatic post-editing and automation strategies)
Expand Down
4 changes: 2 additions & 2 deletions events/future-language-ai-enterprise.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ seo:
**The Future of Language Related AI for Enterprises: Local Agents and Fine-Tuned Large Language Models (LLMs)** took place online on 15 June, 2023.
The webinar was organised by [Omniscien](/companies#omniscien-technologies).

> Artificial Intelligence (AI) has fundamentally reshaped our understanding and processing of human language, led by Large Language Models (LLMs) and AI agents. These powerful systems comprehend and generate human-like text using deep learning techniques and vast datasets while agents use natural language processing (NLP) to interact with their environments. However, the one-size-fits-all training approach, as evidenced by OpenAI's ChatGPT and Google's Bard, often hampers task-specific proficiency and organizational knowledge and poses data privacy and compliance risks.
> Addressing these concerns involves refining AI models with affordable fine-tuning tools tailored to unique application requirements, allowing businesses to quickly adapt AI models with their proprietary data, securely processing requests within their corporate networks. Furthermore, using natural language APIs atop tailored AI models can enable secure self-hosted agents to interact with an extensive, curated toolkit, performing tasks like image generation, text summarisation, audio-video transcription and analysis, text-to-speech conversion, document-based Q&A, and generation of organization-specific content.
> Artificial Intelligence (AI) has fundamentally reshaped our understanding and processing of human language, led by Large Language Models (LLMs) and AI agents. These powerful systems comprehend and generate human-like text using deep learning techniques and vast datasets while agents use natural language processing (NLP) to interact with their environments. However, the one-size-fits-all training approach, as evidenced by OpenAI's ChatGPT and Google's Bard, often hampers task-specific proficiency and organisational knowledge and poses data privacy and compliance risks.
> Addressing these concerns involves refining AI models with affordable fine-tuning tools tailored to unique application requirements, allowing businesses to quickly adapt AI models with their proprietary data, securely processing requests within their corporate networks. Furthermore, using natural language APIs atop tailored AI models can enable secure self-hosted agents to interact with an extensive, curated toolkit, performing tasks like image generation, text summarisation, audio-video transcription and analysis, text-to-speech conversion, document-based Q&A, and generation of organisation-specific content.
> Amid the overwhelming noise around AI, this webinar aims to clarify the state of relevant technologies. Omniscien's AI experts will share their vision of a promising enterprise AI future powered by fine-tuning tools and natural Language APIs, reinforced by secure, self-hosted agents, and will showcase the existing and future features of Omniscien's Language Studio platform to illustrate what is already achievable.


Expand Down
8 changes: 4 additions & 4 deletions events/iwslt2023.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,11 @@ Day 1
| 08:45 - 09:15 | **Findings of the IWSLT 2023 Evaluation Campaign** |
| 09:15 - 09:30 | **Q&A** |
| 09:30 - 10:30 | **Invited Talk** |
| 10:30 - 11:00 | ☕️ (ACL organized) |
| 10:30 - 11:00 | ☕️ (ACL organised) |
| 11:00 - 12:30 | **Session 1**<br> **System Papers (posters)** |
| 12:30 - 14:00 | 🍴 |
| 14:00 - 15:30 | **Session 2**<br> **System Papers (posters)** |
| 15:30 - 16:00 | ☕️ (ACL organized) |
| 15:30 - 16:00 | ☕️ (ACL organised) |
| 16:00 - 18:00 | **Session 3**<br> **System and Scientic Papers, including findings of ACL (posters)** |
| 18:00 | **End of day 1** |

Expand All @@ -76,11 +76,11 @@ Day 2
| | |
| --- | --- |
| 09:00 - 10:30 | **Session 4**<br> **Scientific Papers (oral)** |
| 10:30 - 11:00 | ☕️ (ACL organized) |
| 10:30 - 11:00 | ☕️ (ACL organised) |
| 11:00 - 12:30 | **2024 planning meeting** |
| 12:30 - 14:00 | 🍴 |
| 14:00 - 15:30 | **Panel discussion** |
| 15:30 - 16:00 | ☕️ (ACL organized) |
| 15:30 - 16:00 | ☕️ (ACL organised) |
| 16:00 - 16:15 | **Best paper award** |
| 16:15 - 16:30 | **Closing remarks** |
| 16:30 | **End of day 2** |
Expand Down
2 changes: 1 addition & 1 deletion events/lay-use-and-perceptions-of-machine-translation.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ seo:

The online symposium was held by the Department of Translation and Interpeting Studies at Bar Ilan University.

> Machine translation (MT) has had an increasing effect on multilingual communication and understanding in a globalized world.
> Machine translation (MT) has had an increasing effect on multilingual communication and understanding in a globalised world.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't touch anything in a blockquote

> This symposium presents emerging research on lay use and perceptions of MT.

[translation.biu.ac.il/en/node/848](https://translation.biu.ac.il/en/node/848)
Expand Down
2 changes: 1 addition & 1 deletion events/machine-translation-meetup-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Silicon Valley Italian Hub
## Program

- ***Overview of recent scientific advances in machine translation*** - Marcello Federico, AWS AI Labs
- ***Customization of NMT on-the-fly using Neural Fuzzy Adaptation*** - Hugh Aitchison, SYSTRAN
- ***Customisation of NMT on-the-fly using Neural Fuzzy Adaptation*** - Hugh Aitchison, SYSTRAN
- **Providers panel** - Arjun Rattan, Google; Hugh Aitchison, SYSTRAN
- **Users panel** - Belinda Mo, Viva Translate

Expand Down
4 changes: 2 additions & 2 deletions events/machine-translation-meetup-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,13 @@ There was not enough time to answer all of the audience questions during the mee
> >
> > As a reviewer, I push back when people label “ablated” datasets, that is, smaller versions of a larger dataset, as low-resource.
> >
> > Real low-resource languages are noisier, include code-switching, have different scripts, non standardized orthography (that is, same word can be spelled differently in the same dataset).
> > Real low-resource languages are noisier, include code-switching, have different scripts, non standardised orthography (that is, same word can be spelled differently in the same dataset).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't touch anything in a blockquote

>
> Idris Abdulmumin:
>
> > This is sadly true.
> >
> > A lot of researchers work on these big datasets and then simulate low resource conditions on the high resource datasets just to generalize their findings.
> > A lot of researchers work on these big datasets and then simulate low resource conditions on the high resource datasets just to generalise their findings.
> >
> > Simulated low resource dataset usually consist of random text and, as a result, lacks the authenticity of document level texts.
> >
Expand Down
Loading