Case study translation
case-study-translation.RmdMake sure you have first read the README
for the repo. Beyond what is explained below you will need to add
_quarto.yml, index.qmd and
instructions.qmd files for each new language.
Github branches
To start a new language translation create a new branch off main and name appropriately. Update the relevant files described above, following instructions below. Each human language translator should then branch off your branch - naming it according to language. Then once those are all reviewed and merged, the original branch can be reviewed and merged in to main.
Translating dictionaries
For an overview of which translator to use, see the vignette on
translators:
vignette("translators", package = "aetranslations").
To go from a dataset to a translated dictionary use the
aetranslations::translate_dict() function. This produces a
list with a dictionary for variable names and a dictionary for values,
along with translation columns. Importantly, you should
just use the “For Multiple Datasets” code in the drop-down below - even
if you just have one dataset. This is because it also translates the
file name and is used to setup googlesheets for human translation.
## DEMO FOR HOW TO TRANSLATE ONE DATA (USE THE CODE FROM THE NEXT CHUNK INSTEAD)
# load dataset from appliedepidata
appliedepidata::get_data("mpox_linelist")
# create dictionary with a column for each language
dictionaries <- translate_dict(mpox_linelist,
source_lang = "en",
target_lang = c("es", "fr", "pt"),
translator = "wmcloud"
)For multiple datasets
If you want to loop over multiple datasets you can do it as below
# Define languages you want to process
langs <- c("es", "fr", "pt")
# Define the names of the datasets you want to process
dataset_names <- c("mpox_linelist", "mpox_aggregate_table")
# create translations of file names
dataset_names_df <- data.frame(
dataset_name = dataset_names,
en = dataset_names
)
dataset_names_df[langs] <- ""
# Create an empty list to store the results
all_dictionaries <- list()
# Loop through each dataset name
for (ds_name in dataset_names) {
# Load the data and get the object
appliedepidata::get_data(ds_name)
dataset <- get(ds_name)
# Run the translation and store it in the list, named after the dataset
message(paste("--- Translating:", ds_name, "---"))
all_dictionaries[[ds_name]] <- translate_dict(
dataset,
source_lang = "en",
target_lang = langs,
translator = "wmcloud"
)
}Alternatively for existing dictionary
If you already have a dictionary you could use the
aetranslations::translate_df() function. This just
translates a single column to a single language. (The below example
shows creating a dictionary from a dataset, but just imagine you instead
had imported a pre-existing dictionary). Once in googlesheets the
translation should be reviewed by humans.
appliedepidata::get_data("mpox_linelist")
var_dict <- datadict::dict_from_data(mpox_linelist)
val_dict <- datadict::coded_options(var_dict)
var_dict <- translate_df(
var_dict,
column = "variable_name",
source_lang = "en",
target_lang = "fr",
translator = "wmcloud"
)
var_dict <- translate_df(
val_dict,
column = "variable_name",
source_lang = "en",
target_lang = "fr",
translator = "wmcloud"
)You can then either export datasets with {rio}, then upload to googledrive - but be sure to convert it to a googlesheet. Alternatively you can upload directly to googlesheet using the code below. Note however that this just puts it in your generic drive, so you will need to then move it to the appropriate share folder.
# This requires the {googlesheets4} package
# install.packages("googlesheets4")
library(googlesheets4)
# Authenticate with your Google account. This will likely open a browser window
# for you to log in and grant permissions the first time you run it.
gs4_auth()
# Create a new, empty spreadsheet
ss <- gs4_create("mpox_dictionaries", sheets = list(dataset_names = dataset_names_df))
# Loop through the list of dictionaries and write each one to a new sheet
for (ds_name in dataset_names) {
for (dict_name in names(all_dictionaries[[ds_name]])) {
# Create a unique sheet name by combining the dataset and dictionary names
# e.g., "mpox_linelist_dataset_variables"
unique_sheet_name <- paste(ds_name, dict_name, sep = "_")
message(paste("Writing to sheet:", unique_sheet_name))
# Write the data frame to the spreadsheet with the unique name
sheet_write(data = all_dictionaries[[ds_name]][[dict_name]],
ss = ss, sheet = unique_sheet_name)
}
}Creating datasets
Once the dictionary is translated you can use {matchmaker} to create
the new language datasets (see below). The new dataset should be added
to {appliedepidata} by following the instructions.
For creation, you should adapt the below script and place it in the
data-raw file for the english version of the dataset. This
way we are able to track how datasets were created.
Ideally, the dictionary should also be added as a dataset to
{appliedepidata}.
# load data
appliedepidata::get_data("mpox_linelist")
data_raw <- mpox_linelist
# load translation dictionaries
# (you could just copy the "1rf..." code from the url but the below is easier)
sheet_id <- googlesheets4::as_sheets_id(
"https://docs.google.com/spreadsheets/d/1YvDvFBvAYH7wzAoRPocxEct3Airsjt73qr76Ik6D0Us/edit?gid=1509937281#gid=1509937281"
)
# read in the names of the translated files
dict_dataset_names <- googlesheets4::read_sheet(
ss = sheet_id,
sheet = "dataset_names"
)
# create datasets
dats <- list()
for (j in dataset_names) {
dict_vars <- googlesheets4::read_sheet(
ss = sheet_id,
sheet = paste0(j, "_dataset_variables")
)
dict_vals <- googlesheets4::read_sheet(
ss = sheet_id,
sheet = paste0(j, "_dataset_values")
)
for (i in langs) {
# overwrite a generic dataset for both notif and labs
generic_data <- data_raw
# translate vals first
# otherwise var names not in dict
generic_data <- matchmaker::match_df(
x = generic_data,
dictionary = dict_vals |>
# this filter is a leftover from when we added
# in variables created in the script to the dictionary
# (you could remove it but doesnt hurt to just leave here,
# incase we decided to do the same again)
filter(type != "clean"),
from = "label",
to = i,
by = "variable_name"
)
# translate vars
names(generic_data) <- matchmaker::match_vec(
names(generic_data),
dictionary = dict_vars |>
filter(type != "clean"),
from = "variable_name",
to = i
)
# select appropriate filename
appropriate_filename <- dict_dataset_names[
dict_dataset_names$dataset_name == j,
i
]
# chuck in list
# (so can check in R if need)
dats[[as.character(appropriate_filename)]] <- generic_data
# export to appropriate {appliedepidata} folder
rio::export(
generic_data,
paste0("inst/extdata/", appropriate_filename, ".xlsx")
)
}
}Chunk naming
Remember to make sure all the code chunks in your project qmd/rmds are named. Running the following code will sequentially name code chunks along with the name of the file. Note that it doesn’t rename the setup chunk. (see blog on the value of naming code chunks)
namer::name_dir_chunks("pages/", unname = TRUE)Translating documents
Be sure to read the section on github branches above.
To translate the .qmd files themselves you can use the
aetranslations::translate_doc() function, this should be
run while you have the case studies repo R-project open. Using “wmcloud”
takes around five minutes to translate a case study, using “deepl” is
faster (see the
vignette("translators", package = "aetranslations") for a
comparison and setup of these options).
Rendering the website
You could render the website with
babelquarto::render_website(), however this requires you do
have all files in all languages present. To avoid this the
aetranslations::render_resource() function will find which
pages are missing in other languages and add in a placeholder file which
just says “under construction”. This then allows you to render all the
case studies no problem. You dont need to pass any arguments as the
defaults should simply work.
aetranslations::render_resource()