Compartmentalized Assembler

Compartmentalized assembler is a novel method for the assemlby of high quality physical maps from fingerprinted clones. Our method exploits the presence of genetic markers at the genomic level that allows us to pre-cluster the clones. For each cluster of clones, a local physical map is first constructed. Then, all the individual maps are carefully merged into the final physical map.

The compartmentalized assembler produces significantly more accurate maps, and that it can detect and isolate clones that would induce “chimeric” contigs if used in the final assembly.

How to use Compartmentalized Assembler

Download the tool by clicking the link below.

Here, we use DIR to represent the absolute path where the tool is downloaded.

Type ‘tar xvfz DIR/compartmentalized_assembler.tar.gz’ to uncompress the package.

Below are the steps/modules to generate a compartmentalized map.

1. Pre-clustering

In this module, clones are clustered based on hybridization fingerprint data. To compile pre-clustering module

1. Open a terminal window
2. Go into compartmentalized_assembler/src/pre-clustering by typing ‘cd DIR/compartmentalized_assembler/src/pre-clustering/
3. Type make

This will generate a program called ‘cluster_clones

The parameters of cluster_clones is as following:

-c pool_cloneset_file
-p pool_pool_size_file
-t threshold
-o output_file..new_pools->clones table..

pool_cloneset_file contains the pool-clone association (i.e., list of clones that are identified in an oligonucleotide pool)

Format of pool_cloneset_file is as following. Click for an example.

pool name <TAB> #clones <TAB> Clone names separated by SPACE
pool name <TAB> #clones <TAB> Clone names separated by SPACE

pool_pool_size_file that contains the number of probes in each oligonucletodie pool.

Format of pool_pool_size_file is as following. Click for an example.

pool name <TAB> pool size
pool name <TAB> pool size

threshold is the clustering threshold (between 0-2, 0 least stringent, 2 most stringent). In our works, a threshold value around 1.5 gave good results.

output_file..cluster_to_clones table.. stores the output of the pre-clustering.

The output format is as following

cluster ID <TAB> #clones <TAB> Clone names in the cluster separated by SPACE
cluster ID <TAB> #clones <TAB> Clone names in the cluster separated by SPACE

The output of pre-clustering on the sample files above with the threshold of 1.3 can be downloaded here.

NOTE: As you may notice that the format of the output file and pool_cloneset_file is identical. If you would like to perform soft pre-clustering on your dataset, you could ignore this step and use pool_cloneset_file as an input for the next module (i.e., as if it is the ouptut of pre-clustering module).

2. Generating SIZE files

You need to generate the restriction fingerprint file (i.e., size file in FPC format) for the clones in each cluster generated in the previous step.

To do this:

1. Open a terminal window
2. Type ‘cd DIR/compartmentalized_assembler/src/generating_size_files
3. Type ‘perl GenerateSizeFilesPerPool.pl <cluster_to_clones_table> <size_file>’ in which <cluster_to_clones_table>is the cluster file generated in the previous step and <size_file> is the size file that contain restriction fingerprint data of all clones. This will generate ‘.sizes‘ file for each cluster. Click here for a sample size file, which contains the band sizes for the clones in the sample cluster_to_clones_table mentioned in the previous step.
4. Type ‘mv *.sizes DIR/compartmentalized_assembler/demo/clone_set_size_files’ to move all the size file to the demo project.

3. Updating the configuration file

To change the parameters of the compartmentalized assembler (such as project name, tolerance, cutoff, PATH)

1. Open a terminal window
2. Type ‘cd DIR/compartmentalized_assembler/demo’
3. Open configuration.txt‘, modify the parameters accordingly, save, and close it.

You can modify FPC parameters such as tolerance, cutoff, fromEnd (Click here for more information about FPC); Merge-similar-contigs parameters, similarity_project_threshold[_2,_3]; project name (project_name); and path to the binary files of our tool (programs_path).

4. Updating the ‘gellen‘ parameter

Based on the restriction fingerprinting technology (i.e., HICF or agarose), you might need to set a different gellen parameter (Click here for more information about gellen parameter). The default parameter is 3300. If you need to modify this value, follow the steps below.

1. Open a terminal window
2. Type ‘cd DIR/compartmentalized_assembler/demo/sample_project
3. Type ‘fpc sample_project.fpc‘ (assuming that the path to fpc is in your PATH environment variable. Otherwise, open FPC and load sample_project.fpc (sample_project.fpc is a typical physical map with no clones. It is used as seed map in physical mapping step as will be mentioned below.)
4. Open the configuration panel, modify the gellen value, save, and closed the project.

5. Physical mapping

Phsyical mapping is the second major module of our tool. Steps 2-4 are performed to generate some of the necessary parameters/files for this step.

To perform physical mapping

1. Open a terminal window
2. Type ‘cd DIR/compartmentalized_assembler/demo
3. Type ‘./generateInitialMaps clone_set_size_files’ (which will generate an initial physical map for each cluster) (NOTE: If generateInitialMaps is not executable type ‘chmod 755 generateInitialMaps’ to make it executable.
4. Type ‘perl Build_Assembly_HYB.pl’(if clusters are disjoint) or ‘./Build_Assembly_sHYB.pl’ (if clusters are not disjoint, i.e. soft pre-clustering is used)

IMPORTANT NOTE: Please make sure that fpc is in your PATH environment variable. Otherwise this step will not work properly.

In this step, several intermediate FPC files will be generated and final assembly will be saved as ‘<Project_Name>_FINAL.FPC‘ in ‘DIR/compartmentalized_assembler/demo’ where <Project_Name> is the name of the project as specified in configuration.txt

Download

To obtain Source & Demo, please contact Serdar Bozdag

Systems Requirements

Linux/Unix, Mac OSX
BioPerl (http://www.bioperl.org)
FPC (http://www.agcol.arizona.edu/software/fpc/)
Boost C++ Library (http://www.boost.org)

Rice Compartmentalized Maps

Rice-HYB, Rice-sHYB, Rice-RAND, Rice-REST maps can be downloaded here.

Copyright

All rights reserved. Compartmentalized map is free for academic use only. It should not be redistributed or used for any commercial purpose without written permission from the authors.

For any questions please contact Serdar Bozdag

How to cite

If you use our tool in your project, please cite one of the following publications.

S. Bozdag, T.J. Close and S. Lonardi, “A Compartmentalized Approach to the Assembly of Physical Maps”, BMC Bioinformatics 2009, 10:217

S. Bozdag, T.J. Close and S. Lonardi, “A Compartmentalized Approach to the Assembly of Physical Maps”, Proceedings of IEEE International Symposium on Bioinformatics & Bioengineering (BIBE’07), pp. 218-225, Boston, MA, 2007