Installation#

Prerequisites#

Python 3.11+
A UNIX-like environment (e.g. MacOS, WSL, Ubuntu)
A recent version of PostgreSQL (ideally at least 11+)
A modern Java runtime (if using DynamoDB for the Gene Normalizer database)

Library installation#

Install FUSOR from PyPI:

pip install fusor

Data setup#

Universal Transcript Archive (UTA)#

The UTA is a dataset of genome-transcript aligned data supplied as a PostgreSQL database. Access in FUSOR is supplied by way of Cool-Seq-Tool; see the Cool-Seq-Tool UTA docs for some opinionated setup instructions.

At runtime, UTA connection information can be relayed to FUSOR (by way of Cool-Seq-Tool) either as an initialization argument or via the environment variable UTA_DB_URL. By default, it is set to postgresql://uta_admin:uta@localhost:5432/uta/uta_20210129b. See the Cool-Seq-Tool configuration docs for more info.

SeqRepo#

SeqRepo is a controlled dataset of biological sequences. As with UTA, access in FUSOR is given via Cool-Seq-Tool, which provides documentation on getting it set up.

At runtime, the file location of the SeqRepo instance directory can be defined (by way of Cool-Seq-Tool) either as an initialization argument or via the environment variable SEQREPO_ROOT_DIR. By default, it’s expected to be /usr/local/share/seqrepo/latest. See the Cool-Seq-Tool configuration docs for more info.

Gene Normalizer#

Finally, FUSOR uses the Gene Normalizer to ground gene terms. See the Gene Normalizer documentation for setup instructions.

Connection information for the normalizer database can be set using the environment variable GENE_NORM_DB_URL. See the Gene Normalizer docs for more information on connection configuration.

Check data availability#

Use the fusor.tools.check_data_resources() method to verify that all data dependencies are available:

>>> from fusor.tools import check_data_resources
>>> status = await check_data_resources()
>>> assert all(status)  # passes if all resources can be acquired successfully