A command-line utility to manipulate biological sequences from a FASTA or FASTQ file. It can, given a list of identifiers, get only a subset of the sequences (or their complement, i.e., sequences NOT in the list). Can also get sequence number N only. Compressed sequences files are supported if readable by zcat.
Features
- collect only some sequences out of a large FASTA or FASTQ file
- get sequence number N only, regardless of ID
- complement mode: return all sequences that are NOT in the list of IDs
- "matching" mode: choose which part (between | characters) of the ID should match
- sequence names provided one per line in a text file (first word in line used, or whatever is given to the -k option)
- the > and @ symbols are ignored if present in the beginning of IDs in the list (useful if using FASTA or FASTQ identifiers)
- if only one sequence is needed, its ID can be given directly to the -l option (no need of a file)
- add a suffix to IDs before searching (useful when IDs come from proteins that have _1 in the ID, but genes do not)
- compressed sequence database files (-s) are supported
- quite mode, output only important warnings and errors
Categories
Bio-InformaticsLicense
GNU General Public License version 3.0 (GPLv3)Follow selectseq
Other Useful Business Software
Passwordless Authentication and Passwordless Security
It’s no secret — passwords can be a real headache, both for the people who use them and the people who manage them. Over time, we’ve created hundreds of passwords, it’s easy to lose track of them and they’re easily compromised. Fortunately, passwordless authentication is becoming a feasible reality for many businesses. Duo can help you get there.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of selectseq!