https://github.com/seq-lang/seq Skip to content Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Issues - + Integrations - + GitHub Sponsors - + Customer stories- * Team * Enterprise * Explore + Explore GitHub - Learn and contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Education - [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} seq-lang / seq Public * Notifications * Star 484 * Fork 34 A high-performance, Pythonic language for bioinformatics seq-lang.org Apache-2.0 License 484 stars 34 forks Star Notifications * Code * Issues 13 * Pull requests 8 * Actions * Projects 0 * Wiki * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Wiki * Security * Insights develop Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags 11 branches 25 tags Code * Clone HTTPS GitHub CLI [https://github.com/s] Use Git or checkout with SVN using the web URL. [gh repo clone seq-la] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @arshajii arshajii Fix deps repos ... 322d9ef Sep 8, 2021 Fix deps repos These will be updated in the next release. 322d9ef Git stats * 2,638 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github Fix exit code on sys.exit() call Jul 27, 2021 compiler Fix CFG Aug 29, 2021 docs Bump version Aug 23, 2021 runtime Fix Tuple.N0 scoring Jul 29, 2021 scripts Fix deps repos Sep 7, 2021 stdlib Annotate List.extend argument type Aug 5, 2021 test Annotate List.extend argument type Aug 5, 2021 util clang-format and trim whitespace [ci skip] Oct 14, 2020 .clang-format clang-format Apr 5, 2021 .gitattributes Use zlib for file IO Sep 9, 2019 .gitignore Ignore deps directory Jan 23, 2021 CMakeLists.txt Fix -mavx flag usage Aug 23, 2021 CODEOWNERS Update CODEOWNERS Feb 19, 2021 CONTRIBUTING.md Update CONTRIBUTING.md [ci skip] Jan 23, 2020 LICENSE Re-license to Apache Oct 24, 2019 README.md Fix README logo Jul 12, 2021 View code [ ] Seq -- a language for bioinformatics Introduction Examples Install Pre-built binaries Build from source Documentation Citing Seq README.md Seq Seq -- a language for bioinformatics Build Status Gitter Version License Introduction A strongly-typed and statically-compiled high-performance Pythonic language! Seq is a programming language for computational genomics and bioinformatics. With a Python-compatible syntax and a host of domain-specific features and optimizations, Seq makes writing high-performance genomics software as easy as writing Python code, and achieves performance comparable to (and in many cases better than) C/C++. Think of Seq as a strongly-typed and statically-compiled Python: all the bells and whistles of Python, boosted with a strong type system, without any performance overhead. Seq is able to outperform Python code by up to 160x. Seq can further beat equivalent C/C++ code by up to 2x without any manual interventions, and also natively supports parallelism out of the box. Implementation details and benchmarks are discussed in our paper. Learn more by following the tutorial or from the cookbook. Examples Seq is a Python-compatible language, and the vast majority of Python programs should work without any modifications: def check_prime(n): if n > 1: for i in range(2, n): if n % i == 0: return False return True else: return False n = 1009 print n, 'is', 'a' if check_prime(n) else 'not a', 'prime' Here is an example showcasing Seq's bioinformatics features: s = s'ACGTACGT' # sequence literal print s[2:5] # subsequence print ~s # reverse complement kmer = Kmer[8](s) # convert to k-mer K2 = Kmer[2] # type definition # iterate over length-3 subsequences # with step 2 for sub in s.split(3, step=2): print sub[-1] # last base # iterate over 2-mers with step 1 for kmer in sub.kmers[K2](step=1): print ~kmer # '~' also works on k-mers Seq provides native sequence and k-mer types, e.g. a 8-mer is represented by Kmer[8] as above. Here is a more complex example that counts occurrences of subsequences from a FASTQ file (argv[2]) in sequences obtained from a FASTA file (argv[1]) using an FM-index: from sys import argv from bio.fmindex import FMIndex fmi = FMIndex(argv[1]) k, step, n = 20, 20, 0 def add(count: int): global n n += count @prefetch def search(s: seq, fmi: FMIndex): intv = fmi.interval(s[-1]) s = s[:-1] # trim last base while s and intv: # backwards-extend intv intv = fmi[intv, s[-1]] s = s[:-1] # trim last # return count of occurrences return len(intv) FASTQ(argv[2]) |> seqs |> split(k, step) |> search(fmi) |> add print 'total:', n The @prefetch annotation tells the compiler to perform a coroutine-based pipeline transformation to make the FM-index queries faster, by overlapping the cache miss latency from one query with other useful work. In practice, the single @prefetch line can provide a 2x performance improvement. Install Pre-built binaries Pre-built binaries for Linux and macOS on x86_64 are available alongside each release. We also have a script for downloading and installing pre-built versions: /bin/bash -c "$(curl -fsSL https://seq-lang.org/install.sh)" Build from source See Building from Source. Documentation Please check docs.seq-lang.org for in-depth documentation. Citing Seq If you use Seq in your research, please cite: Ariya Shajii, Ibrahim Numanagic, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: a high-performance language for bioinformatics. Proc. ACM Program. Lang. 3, OOPSLA, Article 125 (October 2019), 29 pages. DOI: https://doi.org/10.1145/3360551 BibTeX: @article{Shajii:2019:SHL:3366395.3360551, author = {Shajii, Ariya and Numanagi\'{c}, Ibrahim and Baghdadi, Riyadh and Berger, Bonnie and Amarasinghe, Saman}, title = {Seq: A High-performance Language for Bioinformatics}, journal = {Proc. ACM Program. Lang.}, issue_date = {October 2019}, volume = {3}, number = {OOPSLA}, month = oct, year = {2019}, issn = {2475-1421}, pages = {125:1--125:29}, articleno = {125}, numpages = {29}, url = {http://doi.acm.org/10.1145/3360551}, doi = {10.1145/3360551}, acmid = {3360551}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Python, bioinformatics, computational biology, domain-specific language, optimization, programming language}, } About A high-performance, Pythonic language for bioinformatics seq-lang.org Topics python programming-language bioinformatics compiler genomics computational-biology domain-specific-language Resources Readme License Apache-2.0 License Releases 25 v0.10.3 Latest Aug 23, 2021 + 24 releases Contributors 9 * @arshajii * @inumanag * @jordanwatson1 * @glram * @traviscibot * @jodiew * @markhend * @ghuls * @matthewha123 Languages * C++ 95.6% * OCaml 1.4% * Python 1.2% * JavaScript 0.5% * CMake 0.5% * Shell 0.4% * Other 0.4% * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.