OUR PROJECT

Overview & User Guide Methodology Data Used

Overview

Detailed and contrasted pedigree information is crucial for breeding in plant research, agronomy or in commercial breeding programs, to understand the genetics of simply and quantitatively inherited traits and to plan appropriate crosses for breeding new cultivars. However, plenty of pedigree relationships are often unknown or uncertain. In addition, automated and extensive pedigree reconstructions encompassing cultivars of most historical and commercial relevance are not normally easily displayed to the public.

Here we present PERSEUS, a user-friendly visualization web-tool suitable for the visualization of pedigrees. The visualizations allow to explore Parent-Offspring (PO) and close pedigree relationships, as well as mine their relevant information. The visualizations are presented as an interactive directed graph networks, distributed using a force-repulsion method. It is compatible with almost all desktop and mobile systems, and it does not require of any license and/or database complexely formatted.

PERSEUS becomes a straightforward tool to identify the genotypes that have contributed most to the genetic variation of a specific species, and to trace back donors of relevant traits. It can help in the characterization and validation of qualitative and quantitative traits and assist in the preservation of variability and diversity of breeding materials, among other uses.

So far, PERSEUS presents a handful of reported pedigrees in five of the most economically important woody crops (see the data used). Nonetheless, it has the potential and impact to be continuedly increased by adding more reconstructions from any other species, and additional features as genomic markers information.

PERSEUS user guide

Last update: 12th December 2023.

This user guide is thought as a detailed step-by-step use of the web-based tool. For a thorough understanding of the interface, the almond pedigree is showed as an example. The steps numbered on the video-example above are linked to the below sections.

Disclaimer: On Firefox navigation web the graph networks might not be seen with the original format.

Discover Pedigrees tab:

Users can search for complete pedigree visualizations of the species of interest.

1. Select species for visualization: Select a species and click on the 'Search' button to render the pedigree.
The complete visualization is displayed at the centre of the webpage as a force-directed graph network. Information of the species and the literature from which the pedigree data was obtained is shown at the bottom of the graph.
Within the graph, each one of the accessions and/or cultivars are represented as nodes in blue, and their Parent-Offspring (PO) relationships as links/edges in grey. In the case of self-compatible species, self-fertilization crosses are shown in light blue. Links are tagged with a letter describing how the relationship was resolved:
· Historical (H): PO relationship determined through historical data.
· Genotypic (G): PO relationship only resolved genotypically (with molecular markers - SSR, SNP, or DArT™ data).
· Absolute (A): PO relationship determined through genotypic data and confirmed with historical data.

2. Download information of all nodes: By clicking at the 'Download CSV - All nodes' button, the entire pedigree dataset can be downloaded in .csv format, alongside with phenotypical characterization data (passport data). This information has been extracted from the literature and public fruit crop databases (see the data used).

3. Interactive selection of nodes: Users can move and select nodes individually or by multiple selection. When a node is clicked on, the node and its related links are coloured in bright yellow.
Passport data of the selected individuals can be looked at by clicking on the 'Check the selected cultivars - Passport data' button that appears on the upper-left corner. Attached to this section, the 'Download CSV - Selected nodes' button can be clicked to download the data of the selected individuals.

4. Search a specific individual: Individual nodes can be searched in the complete pedigree visualization with the 'Search an Individual' selector at the top of the page. When clicked, a sub-window appears containing a dropdown selection of all the available accessions/cultivars' names. Right after the name of interest has been picked, the node is automatically coloured in the graph. If multiple names are selected, these are differentially coloured.

5. Colour the pedigree by a particular trait: The 'Colour by trait' selector at the top of the page allows the users to choose and display a set of nodes with a certain attribute associated to a property or trait of interest available in their passport data. For example, an almond breeder wants to know the accessions within the pedigree that have been associated with a bitter or sweet flavour. On this selector, the breeder looks for the "Flavour" trait on the dropdown button, and the several effects available in the dataset associated with flavour appear. By clicking the "Slightly bitter" or the "Sweet" button, the nodes that have these effect associated are automatically coloured and the rest of the nodes disappear. In case the nodes don't have information for that trait, their effect is displayed as "Undefined", and they are coloured in grey.
*Keep in mind that the effects of the traits available in PERSEUS have been mined from the literature and/or public fruit crop databases. They have been associated to each accession/cultivar by their name in the event that the trait was not associated with a particular genotype. Hence, part of the characterization data might not be true-to-type.

6. Personalize the graph network: Users can personalize the graph visualization with the 'Personalize the visualization' selector. The graph background can be changed to a medium grey or light grey colours. Nodes can be modified regarding their size, RGB colour and colour of the label name, while links' colour and labels can be changed in size and between white/grey colour. The colour to highlight selected nodes and links can also be changed on 'Highlight Selection' to yellow, red, purple or green. The new visualization is then maintained when using all the functions of the web-tool. Refresh the page to recover the initial visualization.

7. Upload user data: Users are able to upload own pedigree data form one or multiple .csv documents, and merge it with the existent datasets by clicking on the 'Submit' button. We provide a template to assist users in understanding the correct format for uploading data. This template is readily available for download, ensuring seamless data submission.
The section 'See your uploaded data' shows the uploaded data in table format, for an easier tracking of the uploaded documents.

7.1. Automatically on the directed graph network, the new nodes appear in green, and new links are tagged as (U). In case the user data contains accessions with the same name as accessions already available on the PERSEUS dataset, only the novel links are uploaded. Whenever new links are inconsistent when compared with available PO relationships, these new links will appear in red colouring. These inconsistent PO relationshiphs might appear when, for instance, a third parent is is given to a node which already had assigned two links of entrance.
At the bottom of the webtool, a new button 'Download CSV - Nodes (Without User Data)' permits the download of the data solely available on the PERSEUS dataset, while the button 'Download CSV - All Nodes' incorporates previous data with the user data all in one .csv document. The new graph directed network has all previous functionalities available.

8. Search for a specific pedigree: A particular individual and its close relationships can be searched to obtain a subset of the general pedigree at the 'Look for a specific pedigree' selector. The name of the accession/cultivar of interest can be seeked on the autocomplete form. Then, the number of generations the user would like to render can be specified on the 'Number of distance jumps' tab. Right after clicking the 'Search' button, a new window appears with the subset pedigree and the searched individual coloured in red. The search the user just made can be checked on the upper-right corner of the website. This visualization contains all the functionalities of the general graph, including the upload of user data (see steps 1 to 7).
8.1. Hierarchical view: By clicking on the 'Change to hierarchical view' button, the subset pedigree can also be renderized as a hierarchical tree (showing the oldest generations on top of the graph and the subsequent offsprings downside the view).

Methodology

Webtool environment

A NodeJS runtime environment [1] compounded with an ExpressJS framework [2] and a MongoDB database [3] were created for the development of the PERSEUS webtool. The JavaScript language was selected considering its optimal implementation across nearly all internet browsers.

Graph visualization

The interactive directed graph networks were developed as independent SVG elements using the D3 (Data Driven Documents)-based JavaScript v6 library (d3.js) [4]. The networks and hierarchical graph views were constructed by using a force-repulsion method within the d3.js library.

The reported pedigree data used for the graph visualizations was stored inside the open-source graph databaase Neo4j Software [5, 6]. A Python code [7] was developed using the Pandas [8] and NetworkX [9] libraries to generate the connection and data storage into the Neo4j database. From this connection, the PERSEUS web interface extracted the information and rendered the graph networks.

Data collection & Web scrapping

The pedigree data presented on PERSEUS was obtained from open-access literature. The data search was focused on literature that reported historical and/or genotypic Parent-Offspring (PO) relationships. If two or more reports were selected for the graph construction, the compatible relationships were merged within the same graph network (see the data used). Wether any inconsistent PO relationships were found between the literature, the relationship was discarded as a way of evading erroneous data.

For almost all of the selected literature, the pedigree relationships were based on genotypic that was data partially confirmed with historical records. However, in the case of the graph visualization of Prunus Persica, the reconstruction was mostly based on historical data because just a few genotypic peach pedigree reconstructions have been developed to date.

Moreover, a web scrapping procedure from the literature and public fruit crop databases (see the data used) was developed in order to extract, when available, relevant agronomical information and phenotypic characterization of the accesions/cultivars within the pedigrees (such as the country of origin, identification code, harvest time, resistances or quality traits). All this information was gathered and referenced on a passport data. A cautious data merge and curation was followed as a way to compile all the publicly-related data in a single data repository. In case more than 15 different values were associated with a particular trait or property, a simplified version of the data was created and added to the database. The original version of the data can be found under the name "trait_original", while the simplified versions can be depicted as "trait_simplified".

It is worth mentioning that, in order to perform the web scrapping and merge of the pedigree and passport data, the mining of information through the public crop databases was made by searching the names of the accessions/cultivars. Hence, even though the historical pedigree data of the public databases was reviewed, the data was not associated to particular genotypes and might not be true-to-type. In addition, no maternal DNA information was added for the graph network development, thence the parents order is arbitrary.

Data Used

Select one of the species to unfold the data used. Two blocks of data will appear:
1. The literature used for rendering the pedigree graph visualization and/or gathering agronomical and phenotypical data.
2.The public fruit crops databases and the literature from which the agronomical and phenotypical data were extracted.