flowchart TD
Start(["source"]) --> IUCN{"Uses IUCN v4.0 (2012)?"}
IUCN -- "Yes" --> Use1(["Use statusMappingSourceID = 1"])
IUCN -- "No" --> Exists{"Existing mapping source in RegRed?"}
Exists -- "Yes" --> Reuse(["Reuse existing statusMappingSourceID"])
Exists -- "No" --> Choose{"Choose creation scenario"}
%% Scenario A
Choose -- "(1) same source provides mapping" --> A0
subgraph A0["Mapping in the same source"]
direction TB
A1(["Create Excel and PDF"])
A2(["statusAssessmentTypeCitation = original source"])
A3(["statusAssessmentTypeIdentifier = DOI or URL or catalog ID"])
A4(["Create new statusMappingSourceID"])
A1 --> A2 --> A3 --> A4
end
%% Scenario B
Choose -- "(2) official mapping published" --> B0
subgraph B0["Official mapping source"]
direction TB
B1(["Create Excel and PDF"])
B2(["statusAssessmentTypeCitation = official document"])
B3(["statusAssessmentTypeIdentifier = DOI or URL"])
B4(["Create new statusMappingSourceID"])
B1 --> B2 --> B3 --> B4
end
%% Scenario C
Choose -- "(2) no official or mixed sources -> create custom" --> C0
subgraph C0["Custom mapping"]
direction TB
C1(["Create Excel and PDF"])
C2(["statusAssessmentTypeCitation = custom mapping"])
C3(["statusAssessmentTypeIdentifier = GitHub URL"])
C4(["Create new statusMappingSourceID"])
C1 --> C2 --> C3 --> C4
end
%% --- CUSTOM COLORS ---
%% Define a class for decision nodes (Exists + Choose)
classDef decision_no fill:#FFE9B3,stroke:#CC9900,stroke-width:2px,color:#000;
classDef decision_yes fill:#DFF5D4,stroke:#4CAF50,stroke-width:2px,color:#000;
classDef boxes fill:#f2f3f5,stroke:#363636,stroke-width:0.5px,color:#000;
%% Apply the class to the nodes
class Exists,Choose decision_no;
class Use1,Reuse decision_yes;
%%class A0,A1,A2,A3,A4,B0,B1,B2,B3,B4,C0,C1,C2,C3,C4 boxes;
Step 2: Extract the data
Everyone is involved in the extraction
The RegRed database has six tables that relate to each other through IDs - they will be filled as follows.
- The data extractors will extract data manually using the data_extraction_template for the tables
source,redlist, andlocation, and will extract data either manually or automatically, depending on theextractionTypedetected, for thetaxon_assessmenttable. They will also have to create a file for mappingverbatimStatusCodeto a standardisedstatusCode, following the mapping_sources_template.
- The spatial data generator will be in charge of verifying the
locationtable, and generating the spatial polygons for allredlistLocations following the geographic_entities standard.
- The taxonomy harmoniser will be in charge of the
taxonomytable, and will have to reconcile the scientific names in the extracted data to match a taxonomic backbone.
Golden rules when using the data extraction template
- Do not modify the data_extraction_template under any circumstances. If you find an error and changes are needed, report it to Flo.
- Always refer to this protocol to fill the template. Do not assume any decisions that are not contemplated in this document. If you have doubts about the protocol, contact Flo.
- Always check the “Definitions” tab and do not assume any definitions on your side.
- Follow the respective controlled vocabulary. If the value you want to add in a column is not there, you have to report this to Káča or Flo. If needed, the option will be added to the template.
- Do not copy and paste values blindly. Verify always what you are copying. Remove double spaces, and spaces in the end or start of values (e.g., “Mammals” or “Mammals”).
- Do not use the value “
Unknown” for information that is missing. Instead, just leave the field empty. UseUnknownonly in those cases that have the value as part of the controlled vocabulary.
- Verify all the data you extract. Errors can occur, so a careful review is essential to ensure data integrity.
2.2 Filling the source table
Some sources might include unsorted tables of species (e.g. Fauna, Vertebrates, Amphibians and Reptiles). Separating species into classes by hand is a difficult process. If you encounter a source that is simple to extract all at once, but difficult to separate by class, you can extract the data for all species as a single redlist and use our split_script to split the xlsx file into multiple files.
Instructions to use the file
1. Copy the script into your working directory where your xlsx with unsorted taxon_assessment data sheet is located.
2. The script is designed to work with files based on our data extraction template. Fill out all other sheets beforehand.
3. Follow the instructions in the script. The output will be multiple files named filename_Class.xlsx.
4. The only sheet modified by the script is taxon_assessment, all else stays the same.
5. Finally, you need to manually fix the taxonomicScope column and fileName
sourceIdentifier?
The identifier is a resolvable HTTP URI for DOIs and URLs, and an URN for ISBNs.
- DOI:
https://doi.org/<doi>, e.g.,https://doi.org/10.2909/9a752c28-cb5f-4ead-9922-2a8173e0306b - ISBN:
urn:isbn:<isbn>, e.g.,urn:isbn:0-486-27557-4 - ISSN:
urn:issn:<issn>,e.g.,urn:issn:0953-4563 - URL to catalog: e.g.,
https://portals.iucn.org/library/node/10315 - URL to source: e.g.,
https://example.org/resource.pdf
sourceIdentifier?
You may have more than one potential identifier, e.g., a DOI and also an URL.
- Always follow this hierarchy of preferred identifiers:
- DOI
- ISSN/ISBN
- URL to the electronic catalog entry (e.g., a national library)
- URL to the PDF or interactive resource.
- URLs should only be used when the source has neither a DOI nor an ISSN/ISBN. If you use a URL, make sure it resolves correctly.
- If there are additional ISBN or URLs directly linked to the PDF or interactive resource, please include all of them in the Zotero shared library when adding the bibliographic citation (e.g., under ISBN, URL, or in Extra).
Check the definition, examples and recommendations for each field in the Definitions tab of the template for data extraction.
sourceID |
find the value in the column sourceID of the table of sources, check it, and add it. This should also be the name of the folder you created on step 1.5 Create a folder and name it using the sourceID |
sourceTitle |
find the value in the column sourceTitle of the table of sources, check it, and add it |
sourceIdentifier |
find the value in the column sourceIdentifier of the table of sources, check that it is correct and add it following the recommendations on “How to fill the sourceIdentifier when there is more than one file or document?” |
sourceDate |
the table of sources will only have the year as sourceDate, you must check if there is a full date available, and add the value. Full dates are preferred (in the format YYYY-MM-DD), but you can also add year (YYYY), or month and year (YYYY-MM) |
sourceLanguage |
find the value in the column sourceLanguage of the table of sources, and select values from the list of controlled vocabulary (this could be a list) |
sourceLicense |
select the value from the list of controlled vocabulary (could be unknown, but it is very important that you try your best to find it). This value will be used for the citation of the source |
sourceType |
select the value from the list of controlled vocabulary |
sourceFormat |
select on or many values from the list of controlled vocabulary |
extractionType |
select the value from the list of controlled vocabulary |
sourceCategory |
select the value from the list of controlled vocabulary |
sourcePublisher |
find the publisher or publishers in the source, and add them separate by “|”. Use the full name of the publishers followed by the acronym in brackets. A publisher can be an institution or people |
sourcePublisherType |
select from the list of controlled vocabulary. If multiple publishers, separate by “|” following the same order as sourcePublisher |
bibliographicCitation |
use Zotero to create this value by entering the source in the shared RegRed Zotero library (Sources Collection). Export the bibliographic citation in APA 7 style (Englsih UK) and use that value (more information in the Citation Rules appendix) |
2.3 Filling the redlist table
redlistIdentifier or statusAssessmentTypeIdentifier when there is more than one file or document?
Check the previous question “How do I fill the sourceIdentifier when there is more than one file or document?”. The same applies to these identifiers.
statusAssessmentType and statusMappingID
The
statusAssessmentTypeis the type of assessment system used to define the status codes for the taxa in the redlist. Most redlists follow an IUCN criteria, however, many countries create their own protocols to assign conservation statuses. This is why we need to translate all statuses to a common standard.The
statusMappingIDis the ID for thestatusMappingSourcethat has the mapping (i.e., translation) between statuses.
All the verbatimStatusCodes for the species in our database will be mapped to the latest version of the IUCN regional statusCodes. This is “IUCN. (2012). Guidelines for application of IUCN Red List criteria at regional and national levels: Version 4.0. IUCN. https://portals.iucn.org/library/node/10336”, and has statusMappingSourceID = 1.
If the statusAssessmentType = IUCN but it is not the standard version, or if the statusAssessmentType = Non-IUCN, you will also have to create the status mapping source (see below 2.3.1 Dealing with the status mapping source).
Check the definition, examples and recommendations for each field in the Definitions tab of the template for data extraction
redlistID |
fill this value according to the source’s sourceID and how many redlist the source has (see numberOfRedlists in the table of sources), e.g., if sourceID = 2, and you are filling the information for the first red list in the source, then the redlistID should be 2_1 |
redlistTitle |
this field will be generated automatically as “Redlist of <taxonomicScope> of <redlistLocation> (<year of (redlistDate)>)”. Therefore, these fields must be filled first to build the redlistTitle |
redlistIdentifier |
this may be the same value as sourceIdentifier, but if the red list has a different identifier, use it. Check that it is correct and add it following the recommendations suggested for other identifiers. |
redlistDate |
this may be the same value as sourceDate, but if each red list has a different date, add that value. Use only year here (YYYY) |
geospatialScope |
select from the list of controlled vocabulary |
taxonomicScope |
select from the list of controlled vocabulary |
isTaxonomicScopeFullyReported |
select from the list of controlled vocabulary. Fill with Yes if the number of species reported on the redlist is the same as the number of species in the taxonomicScope |
redlistLocation |
this must be a unique value, not a list. Check if the location refers to a country, stateProvince, or county, if it is a location smaller than a county, if the location is a protected area or if it is an island, and search for the correct name in the table of geographic_entities |
statusAssessmentType |
select from the list of controlled vocabulary. If it is Non-IUCN, you must create the status mapping source following the template |
statusAssessmentTypeCitation |
use Zotero to create this field by entering the source in the shared RegRed Zotero library (Assessment systems Collection). Export the bibliographic citation in APA style and use that value (see Appendix) |
statusAssessmentTypeIdentifier |
check if the mapping source has an identifier and add the value |
statusMappingSourceID |
check if the mapping source has an identifier and add the value |
2.3.1 Handling Status Mapping Sources
The following details will help you define when and how to select or create a status mapping source so that all verbatimStatusCodes in RegRed are consistently mapped to the latest IUCN regional status codes.
All verbatimStatusCodes are mapped, by default, to the IUCN regional framework, and this mapping has statusMappingSourceID = 1:
IUCN (2012). Guidelines for application of IUCN Red List criteria at regional and national levels: Version 4.0. IUCN. https://portals.iucn.org/library/node/10336
Decision tree
If the assessment did not use the latest IUCN regional guidelines (v4.0), you must find or create an appropriate mapping source (statusAssessmentTypeCitation) and reference its statusMappingSourceID.
Before creating a new source, always check if the mapping source already exists in RegRed. If it does, reuse its existing statusMappingSourceID.
Instructions to select the statusMappingSourceID
If the assessment was done with the latest version of IUCN, then use
statusMappingSourceID = 1.If not, you will have to create the status mapping source (
statusAssessmentTypeCitation). Even if the source you are extracting (1) provides the mapping between non-IUCN and IUCN statuses as part of the PDF/book, or (2) if the mapping of statuses are already published online. However, if the mapping source has already been created, you should use that one and the respectivestatusMappingSourceID.2.1. To create the mapping source check:
If the same source provides the translation: you will use this criteria. You will create the Excel and PDF files, but you will cite the source in the
statusAssessmentTypeCitation, and use its identifier in thestatusAssessmentTypeIdentifier.If there are official mapping sources published by the countries or researchers: you will use this criteria. You will create the Excel and PDF files, but you will cite the official document and use its identifier.
If there aren’t any mapping sources available or you have a mix of sources: you will have to create your own criteria. You will create the Excel and PDF files, and you will cite the mapping source you created and use as identifier the URL from our GitHub repository.
So, for all options you will have to create a separate Excel file with its respective documentation PDF (and
statusMappingSourceID), but only in the last case you will use a custom identifier.You can create your own criteria based on some previous mapping source (e.g., Belarus statuses are equivalent to Russian ones), but do not copy mapping sources blindly. In all cases, you can check with Flo how to define the statuses. Always try to come up with something, so that we can finalise together.
Instructions to create the status mapping source
Creating the mapping source involves preparing two documents:
- The table status_mapping_source_X
- This Excel file will store the equivalent values of
statusCodeforverbatimStatusCode, according to a specific source. You will create it using the mapping_sources_template_xlsx. - You should name the file as follows: <status_mapping_source_X.xlsx>, with X =
statusMappingSourceID. - You should store the file in the status_mapping_sources folder, and name the folder after the
statusMappingSourceID.
- When you are done with the mapping source (Excel and PDF), contact Flo to store it in RegRed’s GitHub repository. This repository will be archived in the future in Zenodo.
Fields to be filled in:
statusMappingID- fill this with the ID of the status_mapping_source_X (it should be the same in every row).
verbatimStatusCode- fill this with the original status code used in the source.
statusCode- fill with the mapped IUCN status code.
statusCategory- fill with the status full definition (e.g., “Vulnerable”).
statusMappingBy- select the value from the list of controlled vocabulary.
statusMappingRemarks- fill with any specifications about your decisions.
- The PDF documentation status_mapping_source_X
- This PDF document will have a verbal description and a list of the sources used and decisions made to define equivalent values of
statusCodeforverbatimStatusCode(e.g., to determine thatX1 = CR). You will create it using the mapping_sources_template_docx, and then save it as a PDF. - You should name the file as follows: <status_mapping_source_X.pdf>, with X =
statusMappingID. - You should store the file in the status_mapping_sources folder with the correct
statusMappingSourceID.
Content of the document:
- Author: Modelling of Biodiversity Lab (MOBI Lab), Faculty of Environmental Sciences (FZP CZU), <your name>, and <the names of anyone else who assisted you>.
- Place and date = (e.g. Prague, 30 November 2025)
- Title: “Mapping of <place> status codes from <year> to the standard of IUCN (2012) version 3.1 (2nd edn)” (e.g. “Mapping of Belarus status codes from 1993 to the standard of IUCN (2012) version 3.1 (2nd edn)”).
- Summary: A brief description that clarifies the title or adds more information and explains whether you are using any bibliographical material to guide your interpretation of the categories. In some cases, two or more geographical regions for which mappings already exist use the same categories with identical descriptions but assign them different numbers or letters.
- Bibliography: The documents and sources you used to define the criteria.
Final step:
When you are done with the mapping source (Excel and PDF), contact Flo to store it in RegRed’s GitHub repository. This repository will be archived in the future in Zenodo.
If you got this far and you still have doubts about status mapping sources, contact Sofi.
statusMappingSourceID used?
You can find the statusMappingSourceID in the mapping_sources_used file.
2.4 Filling the location table
The geographic_entities are based in the following databases:
- geoBoundaries: A global database of political administrative boundaries. See https://www.geoboundaries.org/visualize.html.
- WDPA: The World Database on Protected Areas, the most comprehensive global database on terrestrial and marine protected areas. See https://www.protectedplanet.net/en/search-areas?geo_type=region.
- Global Islands: The Global Island Database is a global shoreline and associated global islands database. See https://doi.org/10.1080/1755876X.2018.1529714.
Refer always to the website of geographic_entities https://regred-project.github.io/geographic_entities/.
Instructions to use the geographic entities file
If you are looking for
- a
country(ADM0): geoboundaries_country.
- a
stateProvinceor second order administrative region (ADM1): geoboundaries_stateProvince.
- a
countyor the lowest administrative region possible (ADM2): geoboundaries_county.
- a
localitythat is a protected area: wdpa_sep2025.
- an
islandthat is a country: geoboundaries_country.
- an
islandthat is astateProvinceor second order administrative region: geoboundaries_stateProvince.
- an
islandthat iscountyor the lowest administrative region possible: geoboundaries_county.
- an
islandthat does not fit any of the above: global_islands.
You should send Gabriel an email with the name of the locality and a picture (screenshot) of an accompanying map if available.
Check the definition, examples and recommendations for each field in the Definitions tab of the template for data extraction.
locationID |
fill this value according to the geographic_entities locationID value |
geometry |
do not fill this field |
geometrySource |
do not fill this field |
verbatimSRS |
do not fill this field |
footprintSRS |
do not fill this field |
continent |
fill this value according to the geographic_entities continent value |
country |
fill this value according to the geographic_entities country value |
countryCode |
fill this value according to the geographic_entities countryCode value |
stateProvince |
fill this value according to the geographic_entities stateProvince value |
county |
fill this value according to the geographic_entities county value |
locality |
fill this value according to the geographic_entities locality value |
isCustomRegion |
select from the list of controlled vocabulary. This will be Yes if the redlistLocation spans multiple administrative regions and therefore a custom polygon is needed for the geometry, e.g., “The Carpathians” |
2.4 Filling the taxon_asseessment table
Check the definition, examples and recommendations for each field in the Definitions tab of the template for data extraction.
statusID |
do not fill this field |
verbatimIdentification |
fill this value with the exact name as it appears in the red list. This could be a species or a subspecies name |
verbatimStatusCode |
fill this value with the original abbreviated status category |
statusCriteria |
fill this value with the criteria used to define the verbatimStatusCode of the verbatimIdentification (if provided in the red list). This value should be the original; you shouldn’t modify it |
If you have a red list that has multiple classes in one (e.g., Fauna, Vertebrates, Amphibians & Reptiles) you can use the split_script to automatically divide the Excel file into multiple class-specific files.
Instructions to use the split_script
The script is located in our shared OneDrive code/Splitter/split_script.R.
- Copy the script into the working directory where your
.xlsxfile with the unsortedtaxon_assessmentsheet is located.
- The script is designed for files that follow our data extraction template. Make sure ALL fields are completed beforehand; this should be the last step.
- Follow the instructions provided within the script.
- The script will generate multiple files named
filename_<class>.xlsx. Only thetaxon_assessmentsheet will be modified; all other sheets remain unchanged. - After the split, manually correct the
taxonomicScopecolumn and update thefilenamefield as needed.
2.4.1 Instructions to extract the verbatimIdentification and verbatimStatusCode automatically
This section needs to be developed, but if you have questions, ask Adam.