Case study translation • aetranslations

library(aetranslations)

Make sure you have first read the README for the repo. Beyond what is explained below you will need to add _quarto.yml, index.qmd and instructions.qmd files for each new language.

Github branches

To start a new language translation create a new branch off main and name appropriately. Update the relevant files described above, following instructions below. Each human language translator should then branch off your branch - naming it according to language. Then once those are all reviewed and merged, the original branch can be reviewed and merged in to main.

Translating dictionaries

For an overview of which translator to use, see the vignette on translators: vignette("translators", package = "aetranslations").

To go from a dataset to a translated dictionary use the aetranslations::translate_dict() function. This produces a list with a dictionary for variable names and a dictionary for values, along with translation columns. Importantly, you should just use the “For Multiple Datasets” code in the drop-down below - even if you just have one dataset. This is because it also translates the file name and is used to setup googlesheets for human translation.


## DEMO FOR HOW TO TRANSLATE ONE DATA (USE THE CODE FROM THE NEXT CHUNK INSTEAD)
# load dataset from appliedepidata
appliedepidata::get_data("mpox_linelist")

# create dictionary with a column for each language 
dictionaries <- translate_dict(mpox_linelist, 
                           source_lang = "en", 
                           target_lang = c("es", "fr", "pt"),
                           translator = "wmcloud"
)

For multiple datasets

If you want to loop over multiple datasets you can do it as below


# Define languages you want to process
langs <- c("es", "fr", "pt")

# Define the names of the datasets you want to process
dataset_names <- c("mpox_linelist", "mpox_aggregate_table")

# create translations of file names 
dataset_names_df <- data.frame(
  dataset_name = dataset_names,
  en = dataset_names
)

dataset_names_df[langs] <- ""

# Create an empty list to store the results
all_dictionaries <- list()

# Loop through each dataset name
for (ds_name in dataset_names) {
  
  # Load the data and get the object
  appliedepidata::get_data(ds_name)
  dataset <- get(ds_name)
  
  # Run the translation and store it in the list, named after the dataset
  message(paste("--- Translating:", ds_name, "---"))
  all_dictionaries[[ds_name]] <- translate_dict(
    dataset,
    source_lang = "en",
    target_lang = langs,
    translator = "wmcloud"
  )
}

Alternatively for existing dictionary

If you already have a dictionary you could use the aetranslations::translate_df() function. This just translates a single column to a single language. (The below example shows creating a dictionary from a dataset, but just imagine you instead had imported a pre-existing dictionary). Once in googlesheets the translation should be reviewed by humans.

appliedepidata::get_data("mpox_linelist")

var_dict <- datadict::dict_from_data(mpox_linelist)
val_dict <- datadict::coded_options(var_dict)

var_dict <- translate_df(
  var_dict, 
  column = "variable_name", 
  source_lang = "en", 
  target_lang = "fr", 
  translator = "wmcloud"
)

var_dict <- translate_df(
  val_dict, 
  column = "variable_name", 
  source_lang = "en", 
  target_lang = "fr", 
  translator = "wmcloud"
)

You can then either export datasets with {rio}, then upload to googledrive - but be sure to convert it to a googlesheet. Alternatively you can upload directly to googlesheet using the code below. Note however that this just puts it in your generic drive, so you will need to then move it to the appropriate share folder.

# This requires the {googlesheets4} package
# install.packages("googlesheets4")
library(googlesheets4)

# Authenticate with your Google account. This will likely open a browser window
# for you to log in and grant permissions the first time you run it.
gs4_auth()


# Create a new, empty spreadsheet
ss <- gs4_create("mpox_dictionaries", sheets = list(dataset_names = dataset_names_df))

# Loop through the list of dictionaries and write each one to a new sheet
for (ds_name in dataset_names) {
  for (dict_name in names(all_dictionaries[[ds_name]])) {
    
    # Create a unique sheet name by combining the dataset and dictionary names
    # e.g., "mpox_linelist_dataset_variables"
    unique_sheet_name <- paste(ds_name, dict_name, sep = "_")
    
    message(paste("Writing to sheet:", unique_sheet_name))
    
    # Write the data frame to the spreadsheet with the unique name
    sheet_write(data = all_dictionaries[[ds_name]][[dict_name]], 
                ss = ss, sheet = unique_sheet_name)
  }
}

Creating datasets

Once the dictionary is translated you can use {matchmaker} to create the new language datasets (see below). The new dataset should be added to {appliedepidata} by following the instructions. For creation, you should adapt the below script and place it in the data-raw file for the english version of the dataset. This way we are able to track how datasets were created.
Ideally, the dictionary should also be added as a dataset to {appliedepidata}.

# load data
appliedepidata::get_data("mpox_linelist")
data_raw <- mpox_linelist

# load translation dictionaries
# (you could just copy the "1rf..." code from the url but the below is easier)

sheet_id <- googlesheets4::as_sheets_id(
  "https://docs.google.com/spreadsheets/d/1YvDvFBvAYH7wzAoRPocxEct3Airsjt73qr76Ik6D0Us/edit?gid=1509937281#gid=1509937281"
)


# read in the names of the translated files
dict_dataset_names <- googlesheets4::read_sheet(
  ss = sheet_id,
  sheet = "dataset_names"
)

# create datasets

dats <- list()
for (j in dataset_names) {

  dict_vars <- googlesheets4::read_sheet(
    ss = sheet_id,
    sheet = paste0(j, "_dataset_variables")
  )

  dict_vals <- googlesheets4::read_sheet(
    ss = sheet_id,
    sheet = paste0(j, "_dataset_values")
  )

  for (i in langs) {
    # overwrite a generic dataset for both notif and labs
    generic_data <- data_raw

    # translate vals first
    # otherwise var names not in dict
    generic_data <- matchmaker::match_df(
      x = generic_data,
      dictionary = dict_vals |>
        # this filter is a leftover from when we added
        # in variables created in the script to the dictionary
        # (you could remove it but doesnt hurt to just leave here,
        # incase we decided to do the same again)
        filter(type != "clean"),
      from = "label",
      to = i,
      by = "variable_name"
    )

    # translate vars
    names(generic_data) <- matchmaker::match_vec(
      names(generic_data),
      dictionary = dict_vars |>
        filter(type != "clean"),
      from = "variable_name",
      to = i
    )

    # select appropriate filename
    appropriate_filename <- dict_dataset_names[
      dict_dataset_names$dataset_name == j,
      i
    ]

    # chuck in list
    # (so can check in R if need)
    dats[[as.character(appropriate_filename)]] <- generic_data

    # export to appropriate {appliedepidata} folder
    rio::export(
      generic_data,
      paste0("inst/extdata/", appropriate_filename, ".xlsx")
    )
  }
}

Chunk naming

Remember to make sure all the code chunks in your project qmd/rmds are named. Running the following code will sequentially name code chunks along with the name of the file. Note that it doesn’t rename the setup chunk. (see blog on the value of naming code chunks)

namer::name_dir_chunks("pages/", unname = TRUE)

Translating documents

Be sure to read the section on github branches above.

To translate the .qmd files themselves you can use the aetranslations::translate_doc() function, this should be run while you have the case studies repo R-project open. Using “wmcloud” takes around five minutes to translate a case study, using “deepl” is faster (see the vignette("translators", package = "aetranslations") for a comparison and setup of these options).

langs <- c("es", "fr", "pt")

for (i in langs) {
  translate_doc(
    source_path = "pages/r_practical.qmd",
    target_path = paste0("pages/r_practical.", i, ".qmd"),
    source_lang = "en",
    target_lang = "fr",
    translator = "wmcloud",
    include_headings = TRUE,
    include_text = TRUE,
    include_comments = TRUE)
)
}

Rendering the website

You could render the website with babelquarto::render_website(), however this requires you do have all files in all languages present. To avoid this the aetranslations::render_resource() function will find which pages are missing in other languages and add in a placeholder file which just says “under construction”. This then allows you to render all the case studies no problem. You dont need to pass any arguments as the defaults should simply work.

aetranslations::render_resource()