Customization
You can find more custom to suit your needs in this section.
Startup Argvs
-I [input_file] Input file
-O [output_file_name] Output file name
-L [int] Length of read
-P [int] Select the process mode
-T [int] The total number of clusters in the file
-D [int] The depth of the tree, the higher the value the higher the accuracy of the clustering, but it will greatly increase the memory.
-V [int] Vertical drift value, the higher the value the higher the accuracy, but it will increase the elapsed time.
-H [int] Horizontal drift values, too large or small values will reduce the accuracy
–no-tag Be sure to add this option if you use Clover for sequence clustering, which means that the input sequence is unlabeled.
–no-fast If you don’t have enough memory, you can add this option, which will reduce memory usage, but will increase the time consumption.
–low If you need to cluster very large files (100 million+ sequences), you can add this option, which will enable the lowest memory usage mode, which will default to –no-tag and –no-fast and will output multiple parallel output files in multiple processes, which you can merge yourself.
–align Adding this option will enable the global comparison feature. Turning it on will improve the clustering effect, but will seriously slow down the clustering speed. We allow you to customize the global matching algorithm, you just need to replace the align.py file and change ‘now_align_alg’ to true in config.json
–stat Statistical mode, we allow to turn on statistical mode in –no-tag mode. After the clustering is finished, the statistics mode will give some feature statistics of the input file. We will provide the use of statistics mode after the paper is accepted
Customize Config
You can also modify the config_dict in load_config.py to get more custom options.
read_len
number
Length of read
Default: 152
end_tree_len
number
Depth of the tree at both ends
Default: 15
other_tree_len
number
Depth of other trees
Defalut: 15
other_tree_nums
number
Number of other trees
Default: 2
thd_tree_loc
number
Location of the third tree
40
four_tree_loc
number
Location of the fourth tree
40
Vertical_drift
number
Vertical drift value
2
Horizontal_drift
number
Horizontal drift value
3
tree_threshold
number
Number of tree drift retrievals
10
now_clust_threshold
number
Drift threshold for new clusters
8
tag_nums
number
Number of clusters
1
processes_nums
number
Process mode
0
Cluster_size_threshold
number
Minimum number of elements in a cluster
1
h_index_nums
number
Number of bases in the leading primer
0
e_index_nums
number
Number of bases of back-end primers
0
read_len_min
number
Minimum processing length of read
0
align_fuc
boolean
Global Matching Mode
Default: false
mmr_mode
boolean
Minimum memory usage mode
Default: false
Virtual_mode
boolean
Input file with tags or not
Default: true
fast_mode
boolean
High-speed mode (in this mode, the program will read the file into memory first)
Default: true
tag_mode
boolean
Whether to enter a tag
Default: false
Statistical_model
boolean
Feature statistics model
Default: false
same_tree_len
boolean
The tree is the same length. If you want to customize the tree at the middle end, please modify this parameter.
Default: false
now_align_alg
boolean
Whether to replace the global comparison algorithm, if so please modify this parameter.
Default: false
Customize Align
You can customize align.py and then modify the global matching algorithm. We recommend that you modify the global matching algorithm for better results before turning on the global matching feature.
The modified algorithm requires that the input is two sequences, returns a list with elements in tuple format, each tuple contains two elements, the position that does not match, and the base at that position in read_2.
Note: You need to change the now_align_alg in config_dict in load_config.py to True after modifying the global matching algorithm.
Customize Tree
We allow you to modify tree.py for more customization. Among other things you can modify the dna_dict in the trie class to allow Clover to handle DNA sequences that are not composed of ATGC. The format of the dictionary requires the key to be the type of base and the value to be a natural number starting from 0. There is no restriction on the exact order.