FAQ
Frequently Asked Questions
Data extraction
source
You may find the document (PDF, website, dataset) associated with the source under the field fileName in the folder all sources to extract. However, as these were created following a different convention (when the metadata was created), this column may be empty.
If there is no fileName, then find the document associated with the source checking the sourceIdentifier. If the sourceIdentifier takes you to a broken URL, then use the sourceTitle to search for the source online. If you still do not find it, check with Ivo and Adam, because if the source is there there there should be a digital object related to it.
If you find a more complete file for the source (PDF or any digital format) you are extracting, you should replace the current version in the all_sources_to_extract folder with the one you have.
If you find a new file for the source that did not have one (PDF or any digital format), add it to the folder new_sources using the name of the sourceID.
The content in the table tetrapods_sources_to_extract should never be edited. This is because we don’t want to spend time correcting the information in this file as all the correct information will be stored in the file you are extracting.
You should only add details to the fields: dataExtractor, progress, redlistTypeOfContent, extractionType, numberOfRedlists, dateStarted, fullyExtracted, and extractionRemarks.
If there are any comments you want to make about any of the metadata being wrong (e.g., the year is incorrect), you should make them in the field extractionRemarks.
If you find a new source that is not in the table of sources, first check with those involved in the metadata creation (Ivo and Adam) to see why it is not in the list (maybe there is a reason why it was skipped).
If it needs to be added, fill in the details about the new source in the new_sources file and store any associated files in new_sources folder.
If the source should be included in the table of sources, it will need to be added by creating a new line at the end of the file and creating the sourceID as a consecutive number. Ivo or Flo will be in charge of creating it.
- Check the NRL data and assess whether the source you have is in the NRL database.
- Double-check that the source is effectively fully extracted, as sometimes NRL has a source but only a few species (e.g., only Mammals not all tetrapods).
If the source has already been fully extracted by NRL:
- Indicate this in the table of sources by filling in the
extractionType=NRL.
- The NRL Data Harmoniser will have to adapt the source to RegRed’s structure. You will be done with the source, and you will start over from step 1.2. Pick a source and report progress.
Else, if the source has not already been fully extracted by NRL:
- Keep working on the following steps.
AI-based extraction using large language models (LLMs) is useful for sources that are difficult to extract manually, e.g., because the information is provided inside text descriptions (and not tables). However, automated extraction depends on scientific species names and threat information being present on the same page.
If the source provides text descriptions and does not contain tables that are easy to copy and paste.
If the source contains only a few species (e.g., a reptile red list from a country with ~10 species). If scientific species names and threat status information are not present on the same page of the text (as the LLM will fail to detect this).
Some sources might include unsorted tables of species (e.g. Fauna, Vertebrates, Amphibians and Reptiles). Separating species into classes by hand is a difficult process. If you encounter a source that is simple to extract all at once, but difficult to separate by class, you can extract the data for all species as a single redlist and use our split_script to split the xlsx file into multiple files.
Instructions to use the file
1. Copy the script into your working directory where your xlsx with unsorted taxon_assessment data sheet is located.
2. The script is designed to work with files based on our data extraction template. Fill out all other sheets beforehand.
3. Follow the instructions in the script. The output will be multiple files named filename_Class.xlsx.
4. The only sheet modified by the script is taxon_assessment, all else stays the same.
5. Finally, you need to manually fix the taxonomicScope column and fileName
sourceIdentifier?
The identifier is a resolvable HTTP URI for DOIs and URLs, and an URN for ISBNs.
- DOI:
https://doi.org/<doi>, e.g.,https://doi.org/10.2909/9a752c28-cb5f-4ead-9922-2a8173e0306b - ISBN:
urn:isbn:<isbn>, e.g.,urn:isbn:0-486-27557-4 - ISSN:
urn:issn:<issn>,e.g.,urn:issn:0953-4563 - URL to catalog: e.g.,
https://portals.iucn.org/library/node/10315 - URL to source: e.g.,
https://example.org/resource.pdf
sourceIdentifier?
You may have more than one potential identifier, e.g., a DOI and also an URL.
- Always follow this hierarchy of preferred identifiers:
- DOI
- ISSN/ISBN
- URL to the electronic catalog entry (e.g., a national library)
- URL to the PDF or interactive resource.
- URLs should only be used when the source has neither a DOI nor an ISSN/ISBN. If you use a URL, make sure it resolves correctly.
- If there are additional ISBN or URLs directly linked to the PDF or interactive resource, please include all of them in the Zotero shared library when adding the bibliographic citation (e.g., under ISBN, URL, or in Extra).
redlist
A red list is uniquely identified by the combination of:
taxonomicScope: Each red list must have a unique taxonomic scope. As the project focuses on tetrapods, begin by identifying the class of organisms in the source (e.g., Amphibia, Reptilia, Aves, Mammalia). The maximumtaxonomicScopeis class; the minimum could be any. To determine this, considering the following:Primary division: Split by class (Amphibia, Reptilia, Aves, Mammalia) as the default level of
taxonomicScope.Lower-level division: If the source provides distinct assessments within the same class (e.g., separate evaluations for breeding vs. non-breeding birds), you must divide further into order, family, or other relevant groups. However, do not divide into lower taxonomic levels unless the source clearly distinguishes them with separate assessments (e.g., a species has “LC” for a breeding population and “EN” for a non-breeding).
redlistLocation: Each red list must have a unique location. This can be a country,stateProvince,county,locality, or a named custom region.redlistDate: Each red list must have a unique date. We will use the year the assessment was conducted.
If you are uncertain about how to define the taxonomicScope for a source, consult Flo for guidance.
redlistIdentifier or statusAssessmentTypeIdentifier when there is more than one file or document?
Check the previous question “How do I fill the sourceIdentifier when there is more than one file or document?”. The same applies to these identifiers.
statusAssessmentType and statusMappingID
The
statusAssessmentTypeis the type of assessment system used to define the status codes for the taxa in the redlist. Most redlists follow an IUCN criteria, however, many countries create their own protocols to assign conservation statuses. This is why we need to translate all statuses to a common standard.The
statusMappingIDis the ID for thestatusMappingSourcethat has the mapping (i.e., translation) between statuses.
All the verbatimStatusCodes for the species in our database will be mapped to the latest version of the IUCN regional statusCodes. This is “IUCN. (2012). Guidelines for application of IUCN Red List criteria at regional and national levels: Version 4.0. IUCN. https://portals.iucn.org/library/node/10336”, and has statusMappingSourceID = 1.
If the statusAssessmentType = IUCN but it is not the standard version, or if the statusAssessmentType = Non-IUCN, you will also have to create the status mapping source (see below 2.3.1 Dealing with the status mapping source).
statusMappingSourceID used?
You can find the statusMappingSourceID in the mapping_sources_used file.
location
The geographic_entities are based in the following databases:
- geoBoundaries: A global database of political administrative boundaries. See https://www.geoboundaries.org/visualize.html.
- WDPA: The World Database on Protected Areas, the most comprehensive global database on terrestrial and marine protected areas. See https://www.protectedplanet.net/en/search-areas?geo_type=region.
- Global Islands: The Global Island Database is a global shoreline and associated global islands database. See https://doi.org/10.1080/1755876X.2018.1529714.
Refer always to the website of geographic_entities https://regred-project.github.io/geographic_entities/.
Instructions to use the geographic entities file
If you are looking for
- a
country(ADM0): geoboundaries_country.
- a
stateProvinceor second order administrative region (ADM1): geoboundaries_stateProvince.
- a
countyor the lowest administrative region possible (ADM2): geoboundaries_county.
- a
localitythat is a protected area: wdpa_sep2025.
- an
islandthat is a country: geoboundaries_country.
- an
islandthat is astateProvinceor second order administrative region: geoboundaries_stateProvince.
- an
islandthat iscountyor the lowest administrative region possible: geoboundaries_county.
- an
islandthat does not fit any of the above: global_islands.
You should send Gabriel an email with the name of the locality and a picture (screenshot) of an accompanying map if available.
taxon_assessment
If you have a red list that has multiple classes in one (e.g., Fauna, Vertebrates, Amphibians & Reptiles) you can use the split_script to automatically divide the Excel file into multiple class-specific files.
Citations
- If you are accessing the source online, you can use Zotero connector, communicating with your internet browser (Firefox, Chrome, Edge).
- The Zotero connector icon is located in the upper right corner of the browser.
- Connector will download available metadata information about the source and capture a page snapshot, but you still need to review them
- Always verify the information in the Zotero dashboard and manually correct any errors
- Go to the webpage you want to cite in your web browser.
- Click the Zotero connector icon located in your browser’s toolbar.
- Select a collection from the dropdown menu and press Enter.
- Go to the desktop application.
- Find the newly saved web page item.
- Review the information pane on the right side of the Zotero window.
- Correct possible errors and add any missing information.
Controlled vocabulary
Basic instructions can be found also here: https://support.microsoft.com/en-us/office/create-a-drop-down-list-7693307a-59ef-400a-b769-c5402dce407b.
- Unlock the sheet with the list of controlled vocabulary.
- Add a new item to the end of selected column.
- Sort the list in the selected column from A to Z (for easier work).
- Find the column on the worksheet that needs to be changed.
- Select the first cell in the column (below the heather).
- Go to ribbon Data and then Data Validation.
- In the Settings tab, as a validation criteria (allow) select option List.
- Click on Source and then select whole range of items that will be included in the scroll-down list.
- Copy the change in the whole column by double click on bottom-right corner of the first cell
- After all the changes, lock the sheet with the list of controlled vocabulary again
- Add the changes you have made to the CHANGE LOG in the README table.
- Let the team know about the new template version and its changes.