.. _install:
Installation
============
Prerequisites
-------------
* Python 3.11+
* A UNIX-like environment (e.g. MacOS, WSL, Ubuntu)
* A recent version of PostgreSQL (ideally at least 11+)
* A modern Java runtime (if using DynamoDB for the Gene Normalizer database)
Library installation
--------------------
Install ``FUSOR`` from `PyPI `_:
.. code-block:: shell
pip install fusor
Data setup
----------
Universal Transcript Archive (UTA)
++++++++++++++++++++++++++++++++++
The `UTA `_ is a dataset of genome-transcript aligned data supplied as a PostgreSQL database. Access in FUSOR is supplied by way of ``Cool-Seq-Tool``; see the `Cool-Seq-Tool UTA docs `_ for some opinionated setup instructions.
At runtime, UTA connection information can be relayed to FUSOR (by way of Cool-Seq-Tool) either as an initialization argument or via the environment variable ``UTA_DB_URL``. By default, it is set to ``postgresql://uta_admin:uta@localhost:5432/uta/uta_20210129b``. See the `Cool-Seq-Tool configuration docs `_ for more info.
SeqRepo
+++++++
`SeqRepo `_ is a controlled dataset of biological sequences. As with UTA, access in FUSOR is given via `Cool-Seq-Tool`, which provides `documentation `_ on getting it set up.
At runtime, the file location of the SeqRepo instance directory can be defined (by way of Cool-Seq-Tool) either as an initialization argument or via the environment variable ``SEQREPO_ROOT_DIR``. By default, it's expected to be ``/usr/local/share/seqrepo/latest``. See the `Cool-Seq-Tool configuration docs `_ for more info.
Gene Normalizer
+++++++++++++++
Finally, ``FUSOR`` uses the `Gene Normalizer `_ to ground gene terms. See the `Gene Normalizer documentation `_ for setup instructions.
Connection information for the normalizer database can be set using the environment variable ``GENE_NORM_DB_URL``. See the `Gene Normalizer docs `_ for more information on connection configuration.
Check data availability
+++++++++++++++++++++++
Use the :py:meth:`fusor.tools.check_data_resources` method to verify that all data dependencies are available:
.. code-block:: pycon
>>> from fusor.tools import check_data_resources
>>> status = await check_data_resources()
>>> assert all(status) # passes if all resources can be acquired successfully