Getting started with Companion

Prepare your sequence

The Companion service requires input data to be in valid FASTA, GenBank or EMBL format. In files containing both annotations and sequences (such as EMBL and GenBank), only sequences will be considered. Any already existing annotation will be discarded and not included in the final result.

Some additional requirements to consider:

the maximal allowed size of an uploaded file is 4000 MB
the maximal allowed number of individual sequences (e.g. scaffolds) in the uploaded file is 3000
it might make sense to filter the input sequences to not include many small sequences shorter than, for instance, 1 kilobase
please make sure that the sequence headers do not contain special characters
input sequence headers need to be unique up to the first whitespace as these strings will be used as sequence identifiers in the resulting annotation files

Upload and run

Once you have prepared the input sequence on your computer, you can navigate to the Companion job submission page to create a new annotation job. You will need to provide some information:

Assign job name and prefix
The job name is just a free text identifier used by Companion to denote your job. It will not be used in the annotated output and is solely meant to help you distinguish individual runs. In contrast, the species prefix will be used to construct all sorts of identifiers in the final result such as pseudochromosome or gene IDs. For example, if you have picked a species prefix of WXYZ then genes in your annotated genome will be assigned gene IDs prefixed with that string, e.g. WXXZ_00006700, with transcripts called WXXZ_00006700.1, etc. Pseudochromosome IDs constructed by Companion will be prefixed with this string, e.g. WXXZ_04, WXXZ_IV or WXXZ_00.

If you have already registered a new project with one of the public databases (such as ENA), then the prefix should be the 'locus tag prefix' assigned to you or chosen by you.
Upload your target sequence
Simply select the sequence to annotate from your local disk. The FASTA, EMBL or GenBank file can be gzip- or bzip2-compressed. If it is a compressed file, it must have a .gz or .bz2 suffix.
Upload transcriptomics evidence (optional)
Accuracy of de novo gene annotation (i.e. of genes with no reference counterpart) can be improved by adding extrinsic evidence to guide the process. Companion can use assembled transcripts in GTF format, for example as produced by Cufflinks, to improve the results. If you have prepared such a file, you can upload it to be used in gene finding (maximum size 4000 MB). Please do not upload raw reads or alignments, the transcripts will have to be assembled in the coordinate space of your uploaded sequences. This is the case, for example, if you have run Cufflinks against the sequence you are submitting.
Select a reference
Companion will try to transfer information, such as gene structure or product information from a highly conserved ortholog in the reference genome. Also, predictor models trained on the genes of the reference will be used for de novo gene finding. We currently offer 434 reference data sets across many parasites and related species, imported from GeneDB and EuPathDB.
Check whether contiguation is needed
It might be helpful to order and orientate the input sequences according to the reference chromosomes, if they are known. This makes it possible to quickly check for structural variants, allows to number gene IDs by chromosome and also helps creating useful comparative graphs.

Companion uses ABACAS2, a successor to the original ABACAS tool, to perform this contiguation step. It will create new 'pseudochromosome' sequences as well as layout files describing how the input sequences were assembled into pseudochromosomes. All unassembled input sequences, e.g. scaffolds, will be concatenated into a single 'bin' pseudochromosome. This results in a manageable number of sequences even when faced with hundreds to thousands of input sequences.
Submit!
After reviewing all chosen parameters, just click the 'Submit' button and your files will be uploaded and validated. When all information is confirmed to be OK, your job will be enqueued in the system.

Wait for your job to finish

After a job has been successfully enqueued, the system will assign your job an alphanumerical ID (e.g. 258ea8f6468bb5809c5056ac) and a URL to check your job status (e.g. https://companion.gla.ac.uk/jobs/258ea8f6468bb5809c5056ac). If no other jobs have been enqueued before yours, then it will start processing right away and you can visit the URL regularly to check on the progress of your job. There are the following states:

Not started yet meaning that your job is waiting for execution and will start as soon as the previous ones have finished
Working meaning that your job is currently being executed on the server
Failed meaning that there has been a problem while running your annotation job, and
Completed meaning that the job has finished and results are ready

In case you chose to provide a notification email address at the time of submission, you will also get emails when your job starts and when it has finished, together with the links to the result page.

Inspect your results

When the job has finished successfully, the URL you were given by the system will point to the results page, which will allow you to download and browse the results of your annotation job. Please take a look at the example results page to get an idea of what the results will contain.

We only use strictly necessary cookies for security purposes when submitting jobs. No data is stored for tracking purposes. By using our services, you agree to our use of cookies.