Reduced Inequality

Closed on: January 26th 2025
- 2 years ago -

CLEAR Global is hiring a

Terms of Reference: NLP engineer for synthetic voice project

🌎 Remote 📝 FULL-TIME 🎯 MID LEVEL

Terms of Reference: NLP engineer for synthetic voice project

Terms of reference

Location: Remote (home based)

Travel: None

Reporting to: CLEAR Global Language Technology Specialist

Timeframe: November 2024-January 2025, with possibility of extension (up to 18 days FTE days total). The role is envisaged as a regular commitment in this time period, approx. 1.5 days’ FTE weekly commitment.

Deadline for applications: November 13th 2024

*Due to the urgency of this project, screening and interviews will commence immediately and the post may be filled before the deadline for applications.

CLEAR Global is an equal opportunity employer, committed to having a diverse team where individuals of all backgrounds collaborate and learn from one another. We believe we can be most effective with diverse experience and expertise in our team. We recruit on merit, actively seek diverse applicant pools and encourage candidates of all backgrounds to apply. We do not discriminate on the basis of disability, age, gender identity and expression, national origin, race and ethnicity, religious beliefs, marital or parental status, or sexual orientation, and welcome all types of diversity.

BackgroundCLEAR Global is looking for an NLP Engineer to contribute to a project it is co-leading with a technology partner as part of a grant award. The project aims to create and improve automatic speech recognition models for African languages. It will do this by applying an experimental methodology which leverages synthetic data using text to speech and large large language models. The synthetic data generated will be used to fine-tune automatic speech recognition models and the effectiveness of this approach will be evaluated.

The project consists of three main steps

  1. Use of different techniques to synthetically generate text to be recorded and rate these with human validators, for 10 African languages (to be selected)
  2. Experimentation with training and use of text-to-speech systems and synthetic voice generation, including rating of outputs, for 3 languages
  3. Assessment of whether automated speech recognition models can be improved from voice outputs generated from text produced by LLMs in low-resource languages. We will benchmark ASR models against available datasets and fine-tune the ASR models with the synthetically generated data, tracking improvements

The NLP Engineer will play a critical role both in planning and in creating synthetic voice data (step 2) and training and evaluating ASR models (step 3), as well as providing contributions to the overall project design where relevant.

The role

The NLP Engineer will execute the experimental methodology for generating and using synthetic data and inform this process, collaborating with CLEAR Global’s internal team and external partners. They will play a key role in the creation of synthetic voice data in up to 3 languages, as well as the assessment, training and evaluation of ASR models under step 3 of the project, in the same 3 languages. The NLP Engineer will also provide ad-hoc advisory support, feeding into decision making, supporting monitoring activities, and inputting into the final report and publications for the project. They will be expected to join occasional calls with CLEAR Global and partners and share regular progress updates within CLEAR Global’s workspaces.

Responsibilities

Key activities include:

  1. Project Design and Planning
    • Engaging with partners and team members to define work steps
    • Researching specific questions as required, such as the role of non-standard transcripts
    • Providing feedback where relevant to inform the selection of languages targeted in this project
    • Tracking compute costs for later reporting
    • Provide inputs, especially quantitative data and key learnings, for the final report and publications for this project.
  2. Creating Synthetic Voices for selected languages
    • Reviewing available open TTS/speech synthesis models and recommend 1-2 to use in this experiment
    • Reviewing available data, e.g. on open.bible, for fine-tuning the TTS model
    • Cooperating with other CLEAR Global team members to create 5-10 hours of speech data in one language to fine-tune TTS model if needed
    • Running the TTS model to create synthetic voice data from the synthetic text created under step 1 and other text sources. We expect this to result in 100+ hours of synthetic voice data per language.
  3. Training ASR Models and Evaluating Performance
    • Preparing evaluation datasets for the given languages
    • Reviewing available ASR models and evaluating them based on available evaluation datasets
    • Training/fine-tuning the ASR model with the synthetic voice data created in step 2
    • Evaluating trained/fine-tuned ASR model
    • Aggregating results, especially quantitative data on changes in model performance and costs for model training, and key learnings

Deliverables

  • Documented review of open TTS/speech synthesis models with 1-2 recommended for use in this project
  • Documented review of available data, e.g. on open.bible, for fine-tuning the TTS model
  • 100+ hours of synthetic voice data created through running the TTS model to create voice data from the synthetic text
  • Documented review and recommendation of available ASR models
  • Selection of evaluation datasets for the agreed languages and baseline evaluation of ASR model
  • Fine-tuned ASR model and evaluation of fine-tuned ASR model
  • Compute costs tracked and reported. This will be included in reports for cost estimates for partners looking to replicate the process.
  • Summary of results and findings, especially of quantitative data including changes in model performance and costs for model training and key learnings

Qualifications and experience required

The right candidate is an energetic team player, flexible and dynamic in approach, who agrees with CLEAR Global’s basic beliefs and values and who can work remotely with team members based throughout the world.

  • Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Natural Language Processing or related field
  • Proven experience in NLP. This can be demonstrated by involvement in NLP projects, code repositories, paper authorship, other non-technical write-ups, and/or dataset/model/demo publications.
  • Experience working with under-resourced languages and non-Latin writing systems
  • Strong analytical and problem-solving skills
  • Excellent communication and collaboration skills
  • Excellent working knowledge of written and spoken English
  • Passion for making a positive impact in the world

Terms and conditions

Payments will be made as follows:

  • An advance of 20% of the total contract value within 30 days of the signature of the contract
  • An interim payment of 20% of the total contract value based on deliverables
  • A final installment of 60% of the total contract value payable on satisfactory completion of the work and presentation of all deliverables.

The researcher will be expected to provide their own equipment and supplies (laptop etc.). Compute and API credits will be provided by CLEAR Global.

How to apply

CLEAR Global will accept offers from individual consultants and consulting firms.

To apply for this consultancy please send the following documents:

1- A technical and financial offer to include:

  • a brief description (no more than two sides of A4) of how you would tackle this role
  • a proposed work plan (including the number of days required for each task)
  • a financial offer, specifying daily fees
  • examples and descriptions of relevant similar work are welcome.

2- Curriculum vitae highlighting experience from similar projects, as well as the contact details (email and telephone number) of at least three professional references.

Please upload the technical and financial offer as one document under “cover letter” and present the CV(s) of the expert(s) proposed in one document under “CV”.

About CLEAR Global

CLEAR Global helps people get vital information, and be heard, whatever language they speak. We believe that everyone has the right to give and receive information in a language and format they understand. We work with nonprofit partners and a global community of language professionals to build local language translation capacity, and raise awareness of language barriers. Our network of over 100,000 community members translate millions of words of life-saving and life-changing information a year.

Core values

CLEAR Global employees and volunteers passionately believe in the value of this work and take personal responsibility for achieving the mission. CLEAR Global’s mission and organizational spirit embody the core values established in its strategic framework:

  • Excellence: In communicating humanitarian information in the right language, CLEAR Global is a leader in the translation industry and in the non-profit sector.
  • Integrity: In believing that every person, whether it’s the people who we serve, our volunteers or our staff, has value, deserves respect and has inherent dignity.
  • Empowerment: In using language to empower people around the world to control their own development and destiny.
  • Innovation: In recognizing and celebrating the power of innovation to address humanitarian and crisis issues around the world.
  • Sustainability: In recognizing that meeting our mission requires the establishment and maintenance of a solid financial and organizational infrastructure.

Tolerance: In that our staff and volunteers value each other, our partners and our end users, create a supportive work environment, and conduct themselves professionally at all times.


Keywords

NLP EngineerSynthetic VoiceLanguage TechnologySpeech RecognitionASR ModelsNatural Language ProcessingData SynthesisEvaluationProject DesignTTS ModelRemote Work

CLEAR Global

CLEAR Global clearglobal.org

We’re CLEAR Global, a nonprofit helping people get vital information and be heard, whatever language they speak.

CLEAR stands for community, language, engagement, accountability, and reach, the cornerstones of our work around the world.

🏷 Details

Posted on
October 25th 2024
Closing on
January 26th 2025
Department
International Programs
Experience
MID-LEVEL
Type
FULL-TIME
Workplace
REMOTE

📢 Share job