Installation

Prerequisites

  • Python 3.11+

  • A UNIX-like environment (e.g. MacOS, WSL, Ubuntu)

  • A recent version of PostgreSQL (ideally at least 11+)

  • A modern Java runtime (if using DynamoDB for the Gene Normalizer database)

Library installation

Install FUSOR from PyPI:

pip install fusor

Data setup

Universal Transcript Archive (UTA)

The UTA is a dataset of genome-transcript aligned data supplied as a PostgreSQL database. Access in FUSOR is supplied by way of Cool-Seq-Tool; see the Cool-Seq-Tool UTA docs for some opinionated setup instructions.

At runtime, UTA connection information can be relayed to FUSOR (by way of Cool-Seq-Tool) either as an initialization argument or via the environment variable UTA_DB_URL. By default, it is set to postgresql://uta_admin:uta@localhost:5432/uta/uta_20210129b. See the Cool-Seq-Tool configuration docs for more info.

SeqRepo

FUSOR relies on Seqrepo, which you must download yourself.

FUSOR uses Seqrepo to retrieve sequences at given positions on a transcript.

From the root directory:

pip install seqrepo
sudo mkdir /usr/local/share/seqrepo
sudo chown $USER /usr/local/share/seqrepo
seqrepo pull -i 2024-12-20/  # Replace with latest version using `seqrepo list-remote-instances` if outdated

If you get an error similar to the one below:

PermissionError: [Error 13] Permission denied: '/usr/local/share/seqrepo/2024-12-20/._fkuefgd' -> '/usr/local/share/seqrepo/2024-12-20/'

You will want to do the following: (_Might not be ._fkuefgd, so replace with your error message path_)

sudo mv /usr/local/share/seqrepo/2024-12-20._fkuefgd /usr/local/share/seqrepo/2024-12-20
exit

Use the SEQREPO_ROOT_DIR environment variable to set the path of an already existing SeqRepo directory. The default is /usr/local/share/seqrepo/latest.

Gene Normalizer

Finally, FUSOR uses the Gene Normalizer to ground gene terms. See the Gene Normalizer documentation for setup instructions.

Connection information for the normalizer database can be set using the environment variable GENE_NORM_DB_URL. See the Gene Normalizer docs for more information on connection configuration. As a default, this connects to port 8000: http://localhost:8000.

Docker

FUSOR’s dependencies can be installed using a Docker container.

Important

This section assumes you have a local SeqRepo installed at /usr/local/share/seqrepo/2024-12-20. If you have it installed elsewhere, please add a SEQREPO_ROOT_DIR environment variable in .env.shared.

If you’re using Docker Desktop, you must go to Settings → Resources → File sharing and add /usr/local/share/seqrepo under the Virtual file shares section. Otherwise, you will get the following error:

OSError: Unable to open SeqRepo directory /usr/local/share/seqrepo/2024-12-20

To build, (re)create, and start containers:

docker volume create uta_vol
docker compose up

Tip

If you want a clean slate, run docker compose down -v to remove containers and volumes, then run docker compose up --build to rebuild and start fresh containers.

In Docker Desktop, you should see the following for a successful setup:

Docker Desktop Container

Note

python-dotenv can be used to load environment variables needed for analysis notebooks in the notebooks directory. Environment variables can be found in .env.shared.

Check data availability

Use the fusor.tools.check_data_resources() method to verify that all data dependencies are available:

>>> from fusor.tools import check_data_resources
>>> status = await check_data_resources()
>>> assert all(status)  # passes if all resources can be acquired successfully