FAQ

Frequently Asked Questions

Data extraction

`source`

Where do I find the document associated with the source?

You may find the document (PDF, website, dataset) associated with the source under the field fileName in the folder all sources to extract. However, as these were created following a different convention (when the metadata was created), this column may be empty.

If there is no fileName, then find the document associated with the source checking the sourceIdentifier. If the sourceIdentifier takes you to a broken URL, then use the sourceTitle to search for the source online. If you still do not find it, check with Ivo and Adam, because if the source is there there there should be a digital object related to it.

If you find a more complete file for the source (PDF or any digital format) you are extracting, you should replace the current version in the all_sources_to_extract folder with the one you have.

If you find a new file for the source that did not have one (PDF or any digital format), add it to the folder new_sources using the name of the sourceID.

What should I do if I find errors in the metadata for the source?

The content in the table tetrapods_sources_to_extract should never be edited. This is because we don’t want to spend time correcting the information in this file as all the correct information will be stored in the file you are extracting.

You should only add details to the fields: dataExtractor, progress, redlistTypeOfContent, extractionType, numberOfRedlists, dateStarted, fullyExtracted, and extractionRemarks.

If there are any comments you want to make about any of the metadata being wrong (e.g., the year is incorrect), you should make them in the field extractionRemarks.

What if I find a new source?

If you find a new source that is not in the table of sources, first check with those involved in the metadata creation (Ivo and Adam) to see why it is not in the list (maybe there is a reason why it was skipped).

If it needs to be added, fill in the details about the new source in the new_sources file and store any associated files in new_sources folder.

If the source should be included in the table of sources, it will need to be added by creating a new line at the end of the file and creating the sourceID as a consecutive number. Ivo or Flo will be in charge of creating it.

How to know if the source has been extracted by the NRL?

Check the NRL data and assess whether the source you have is in the NRL database.
Double-check that the source is effectively fully extracted, as sometimes NRL has a source but only a few species (e.g., only Mammals not all tetrapods).

If the source has already been fully extracted by NRL:

Indicate this in the table of sources by filling in the extractionType = NRL.
The NRL Data Harmoniser will have to adapt the source to RegRed’s structure. You will be done with the source, and you will start over from step 1.2. Pick a source and report progress.

Else, if the source has not already been fully extracted by NRL:

Keep working on the following steps.

Why use automated methods?

AI-based extraction using large language models (LLMs) is useful for sources that are difficult to extract manually, e.g., because the information is provided inside text descriptions (and not tables). However, automated extraction depends on scientific species names and threat information being present on the same page.

When should a source be extracted automatically?

If the source provides text descriptions and does not contain tables that are easy to copy and paste.

When should a source be extracted manually?

If the source contains only a few species (e.g., a reptile red list from a country with ~10 species). If scientific species names and threat status information are not present on the same page of the text (as the LLM will fail to detect this).

What happens if I have multiple classes in one source

Some sources might include unsorted tables of species (e.g. Fauna, Vertebrates, Amphibians and Reptiles). Separating species into classes by hand is a difficult process. If you encounter a source that is simple to extract all at once, but difficult to separate by class, you can extract the data for all species as a single redlist and use our split_script to split the xlsx file into multiple files.

Instructions to use the file
1. Copy the script into your working directory where your xlsx with unsorted taxon_assessment data sheet is located.
2. The script is designed to work with files based on our data extraction template. Fill out all other sheets beforehand.
3. Follow the instructions in the script. The output will be multiple files named filename_Class.xlsx.
4. The only sheet modified by the script is taxon_assessment, all else stays the same.
5. Finally, you need to manually fix the taxonomicScope column and fileName

How do I fill the sourceIdentifier?

The identifier is a resolvable HTTP URI for DOIs and URLs, and an URN for ISBNs.

DOI: https://doi.org/<doi>, e.g., https://doi.org/10.2909/9a752c28-cb5f-4ead-9922-2a8173e0306b
ISBN: urn:isbn:<isbn>, e.g., urn:isbn:0-486-27557-4
ISSN: urn:issn:<issn>,e.g., urn:issn:0953-4563
URL to catalog: e.g., https://portals.iucn.org/library/node/10315
URL to source: e.g., https://example.org/resource.pdf

What happens if there are several possibilities for a sourceIdentifier?

You may have more than one potential identifier, e.g., a DOI and also an URL.

Always follow this hierarchy of preferred identifiers:

DOI
ISSN/ISBN
URL to the electronic catalog entry (e.g., a national library)
URL to the PDF or interactive resource.

URLs should only be used when the source has neither a DOI nor an ISSN/ISBN. If you use a URL, make sure it resolves correctly.
If there are additional ISBN or URLs directly linked to the PDF or interactive resource, please include all of them in the Zotero shared library when adding the bibliographic citation (e.g., under ISBN, URL, or in Extra).

`redlist`

What is a red list for RegRed?

A red list is uniquely identified by the combination of:

taxonomicScope: Each red list must have a unique taxonomic scope. As the project focuses on tetrapods, begin by identifying the class of organisms in the source (e.g., Amphibia, Reptilia, Aves, Mammalia). The maximum taxonomicScope is class; the minimum could be any. To determine this, considering the following:
- Primary division: Split by class (Amphibia, Reptilia, Aves, Mammalia) as the default level of taxonomicScope.
- Lower-level division: If the source provides distinct assessments within the same class (e.g., separate evaluations for breeding vs. non-breeding birds), you must divide further into order, family, or other relevant groups. However, do not divide into lower taxonomic levels unless the source clearly distinguishes them with separate assessments (e.g., a species has “LC” for a breeding population and “EN” for a non-breeding).
redlistLocation: Each red list must have a unique location. This can be a country, stateProvince, county, locality, or a named custom region.
redlistDate: Each red list must have a unique date. We will use the year the assessment was conducted.

If you are uncertain about how to define the taxonomicScope for a source, consult Flo for guidance.

How to fill the redlistIdentifier or statusAssessmentTypeIdentifier when there is more than one file or document?

Check the previous question “How do I fill the sourceIdentifier when there is more than one file or document?”. The same applies to these identifiers.

How to deal with the fields statusAssessmentType and statusMappingID

The statusAssessmentType is the type of assessment system used to define the status codes for the taxa in the redlist. Most redlists follow an IUCN criteria, however, many countries create their own protocols to assign conservation statuses. This is why we need to translate all statuses to a common standard.
The statusMappingID is the ID for the statusMappingSource that has the mapping (i.e., translation) between statuses.

All the verbatimStatusCodes for the species in our database will be mapped to the latest version of the IUCN regional statusCodes. This is “IUCN. (2012). Guidelines for application of IUCN Red List criteria at regional and national levels: Version 4.0. IUCN. https://portals.iucn.org/library/node/10336”, and has statusMappingSourceID = 1.

If the statusAssessmentType = IUCN but it is not the standard version, or if the statusAssessmentType = Non-IUCN, you will also have to create the status mapping source (see below 2.3.1 Dealing with the status mapping source).

Where do I find the statusMappingSourceID used?

You can find the statusMappingSourceID in the mapping_sources_used file.

`location`

What sources does RegRed use for geographic locations?

The geographic_entities are based in the following databases:

geoBoundaries: A global database of political administrative boundaries. See https://www.geoboundaries.org/visualize.html.
WDPA: The World Database on Protected Areas, the most comprehensive global database on terrestrial and marine protected areas. See https://www.protectedplanet.net/en/search-areas?geo_type=region.
Global Islands: The Global Island Database is a global shoreline and associated global islands database. See https://doi.org/10.1080/1755876X.2018.1529714.

Where do I find the standardised information of geographic locations?

Refer always to the website of geographic_entities https://regred-project.github.io/geographic_entities/.

What is a custom region?

A region for RegRed is an official administrative unit. It could be a country (adm0), stateProvince (adm1), county (adm2), or any administrative unit beyond that, which is officially recognised by a country.

Since geoBoundaries provides spatial geometries only for adm0, adm1, and adm2, regions representing adm3, or higher may be confused with custom regions. In doubt, contact Gabriel early on in the extraction period to make a decision.

Examples of a custom region
“Spanish territory without the Islas Canarias”. This group of administrative units does not exist as an administrative unit; therefore, it is a custom region.

Examples of locations that could be custom regions, but they are not
“Provincia Antártica Chilena” is a county (adm2), with two municipalities (adm3): “Cabo de Hornos” and “Antártica”. Separately, they are official administrative regions absent from geoBoundaries; therefore, they are not custom regions.

What happens if I find an error or if none of these options fits my case?

Contact Gabriel. Include the name of the locality, a description, and a picture (screenshot) of an accompanying map if available. The description should be as precise as possible. For example, “The Spanish territory without the Canary Islands”.

`taxon_assessment`

How to deal with red list with multiple taxonomic classes?

If you have a red list that has multiple classes in one (e.g., Fauna, Vertebrates, Amphibians & Reptiles) you can use the split_script to automatically divide the Excel file into multiple class-specific files.

Citations

How do I use Zotero connector?

If you are accessing the source online, you can use Zotero connector, communicating with your internet browser (Firefox, Chrome, Edge).
The Zotero connector icon is located in the upper right corner of the browser.
Connector will download available metadata information about the source and capture a page snapshot, but you still need to review them
Always verify the information in the Zotero dashboard and manually correct any errors

How do I save a website reference?

Go to the webpage you want to cite in your web browser.
Click the Zotero connector icon located in your browser’s toolbar.
Select a collection from the dropdown menu and press Enter.
Go to the desktop application.
Find the newly saved web page item.
Review the information pane on the right side of the Zotero window.
Correct possible errors and add any missing information.

Controlled vocabulary

How to add a new item in the controlled vocabulary (drop-down list) in excel templates

Basic instructions can be found also here: https://support.microsoft.com/en-us/office/create-a-drop-down-list-7693307a-59ef-400a-b769-c5402dce407b.

Unlock the sheet with the list of controlled vocabulary.
Add a new item to the end of selected column.
Sort the list in the selected column from A to Z (for easier work).
Find the column on the worksheet that needs to be changed.
Select the first cell in the column (below the heather).
Go to ribbon Data and then Data Validation.
In the Settings tab, as a validation criteria (allow) select option List.
Click on Source and then select whole range of items that will be included in the scroll-down list.
Copy the change in the whole column by double click on bottom-right corner of the first cell
After all the changes, lock the sheet with the list of controlled vocabulary again
Add the changes you have made to the CHANGE LOG in the README table.
Let the team know about the new template version and its changes.