https://github.com/seq-lang/seq

Skip to content
 
Sign up

  * Why GitHub?
    Features -
      + Mobile -
      + Actions -
      + Codespaces -
      + Packages -
      + Security -
      + Code review -
      + Issues -
      + Integrations -
      + GitHub Sponsors -
      + Customer stories-
  * Team
  * Enterprise
  * Explore
      + Explore GitHub -

    Learn and contribute

      + Topics -
      + Collections -
      + Trending -
      + Learning Lab -
      + Open source guides -

    Connect with others

      + The ReadME Project -
      + Events -
      + Community forum -
      + GitHub Education -
      + GitHub Stars program -
  * Marketplace
  * Pricing
    Plans -
      + Compare plans -
      + Contact Sales -
      + Education -

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}

seq-lang / seq Public

  * Notifications
  * Star 484
  * Fork 34

A high-performance, Pythonic language for bioinformatics

seq-lang.org
Apache-2.0 License
484 stars 34 forks
Star
Notifications

  * Code
  * Issues 13
  * Pull requests 8
  * Actions
  * Projects 0
  * Wiki
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Wiki
  * Security
  * Insights

develop
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags
11 branches 25 tags
Code

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/s]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone seq-la]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Go back

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Go back

Launching Xcode

If nothing happens, download Xcode and try again.

Go back

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@arshajii
arshajii Fix deps repos
...
322d9ef Sep 8, 2021
Fix deps repos

These will be updated in the next release.

322d9ef

Git stats

  * 2,638 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.github
Fix exit code on sys.exit() call
Jul 27, 2021
compiler
Fix CFG
Aug 29, 2021
docs
Bump version
Aug 23, 2021
runtime
Fix Tuple.N0 scoring
Jul 29, 2021
scripts
Fix deps repos
Sep 7, 2021
stdlib
Annotate List.extend argument type
Aug 5, 2021
test
Annotate List.extend argument type
Aug 5, 2021
util
clang-format and trim whitespace [ci skip]
Oct 14, 2020
.clang-format
clang-format
Apr 5, 2021
.gitattributes
Use zlib for file IO
Sep 9, 2019
.gitignore
Ignore deps directory
Jan 23, 2021
CMakeLists.txt
Fix -mavx flag usage
Aug 23, 2021
CODEOWNERS
Update CODEOWNERS
Feb 19, 2021
CONTRIBUTING.md
Update CONTRIBUTING.md [ci skip]
Jan 23, 2020
LICENSE
Re-license to Apache
Oct 24, 2019
README.md
Fix README logo
Jul 12, 2021
View code
[                    ]
Seq -- a language for bioinformatics Introduction Examples Install
Pre-built binaries Build from source Documentation Citing Seq

README.md

                                 Seq

                 Seq -- a language for bioinformatics

                 Build Status Gitter Version License

 Introduction

    A strongly-typed and statically-compiled high-performance
    Pythonic language!

Seq is a programming language for computational genomics and
bioinformatics. With a Python-compatible syntax and a host of
domain-specific features and optimizations, Seq makes writing
high-performance genomics software as easy as writing Python code,
and achieves performance comparable to (and in many cases better
than) C/C++.

Think of Seq as a strongly-typed and statically-compiled Python: all
the bells and whistles of Python, boosted with a strong type system,
without any performance overhead.

Seq is able to outperform Python code by up to 160x. Seq can further
beat equivalent C/C++ code by up to 2x without any manual
interventions, and also natively supports parallelism out of the box.
Implementation details and benchmarks are discussed in our paper.

Learn more by following the tutorial or from the cookbook.

 Examples

Seq is a Python-compatible language, and the vast majority of Python
programs should work without any modifications:

def check_prime(n):
    if n > 1:
        for i in range(2, n):
            if n % i == 0:
                return False
        return True
    else:
        return False

n = 1009
print n, 'is', 'a' if check_prime(n) else 'not a', 'prime'

Here is an example showcasing Seq's bioinformatics features:

s = s'ACGTACGT'    # sequence literal
print s[2:5]       # subsequence
print ~s           # reverse complement
kmer = Kmer[8](s)  # convert to k-mer
K2 = Kmer[2]       # type definition

# iterate over length-3 subsequences
# with step 2
for sub in s.split(3, step=2):
    print sub[-1]  # last base

    # iterate over 2-mers with step 1
    for kmer in sub.kmers[K2](step=1):
        print ~kmer  # '~' also works on k-mers

Seq provides native sequence and k-mer types, e.g. a 8-mer is
represented by Kmer[8] as above.

Here is a more complex example that counts occurrences of
subsequences from a FASTQ file (argv[2]) in sequences obtained from a
FASTA file (argv[1]) using an FM-index:

from sys import argv
from bio.fmindex import FMIndex
fmi = FMIndex(argv[1])
k, step, n = 20, 20, 0

def add(count: int):
    global n
    n += count

@prefetch
def search(s: seq, fmi: FMIndex):
    intv = fmi.interval(s[-1])
    s = s[:-1]  # trim last base
    while s and intv:
        # backwards-extend intv
        intv = fmi[intv, s[-1]]
        s = s[:-1]  # trim last
    # return count of occurrences
    return len(intv)

FASTQ(argv[2]) |> seqs |> split(k, step) |> search(fmi) |> add
print 'total:', n

The @prefetch annotation tells the compiler to perform a
coroutine-based pipeline transformation to make the FM-index queries
faster, by overlapping the cache miss latency from one query with
other useful work. In practice, the single @prefetch line can provide
a 2x performance improvement.

 Install

 Pre-built binaries

Pre-built binaries for Linux and macOS on x86_64 are available
alongside each release. We also have a script for downloading and
installing pre-built versions:

/bin/bash -c "$(curl -fsSL https://seq-lang.org/install.sh)"

 Build from source

See Building from Source.

 Documentation

Please check docs.seq-lang.org for in-depth documentation.

 Citing Seq

If you use Seq in your research, please cite:

    Ariya Shajii, Ibrahim Numanagic, Riyadh Baghdadi, Bonnie Berger,
    and Saman Amarasinghe. 2019. Seq: a high-performance language for
    bioinformatics. Proc. ACM Program. Lang. 3, OOPSLA, Article 125
    (October 2019), 29 pages. DOI: https://doi.org/10.1145/3360551

BibTeX:

@article{Shajii:2019:SHL:3366395.3360551,
 author = {Shajii, Ariya and Numanagi\'{c}, Ibrahim and Baghdadi, Riyadh and Berger, Bonnie and Amarasinghe, Saman},
 title = {Seq: A High-performance Language for Bioinformatics},
 journal = {Proc. ACM Program. Lang.},
 issue_date = {October 2019},
 volume = {3},
 number = {OOPSLA},
 month = oct,
 year = {2019},
 issn = {2475-1421},
 pages = {125:1--125:29},
 articleno = {125},
 numpages = {29},
 url = {http://doi.acm.org/10.1145/3360551},
 doi = {10.1145/3360551},
 acmid = {3360551},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Python, bioinformatics, computational biology, domain-specific language, optimization, programming language},
}

About

A high-performance, Pythonic language for bioinformatics

seq-lang.org

Topics

python programming-language bioinformatics compiler genomics 
computational-biology domain-specific-language

Resources

Readme

License

Apache-2.0 License

Releases 25

 
v0.10.3 Latest
Aug 23, 2021
+ 24 releases

Contributors 9

  * @arshajii
  * @inumanag
  * @jordanwatson1
  * @glram
  * @traviscibot
  * @jodiew
  * @markhend
  * @ghuls
  * @matthewha123

Languages

  * C++ 95.6%
  * OCaml 1.4%
  * Python 1.2%
  * JavaScript 0.5%
  * CMake 0.5%
  * Shell 0.4%
  * Other 0.4%

  * (c) 2021 GitHub, Inc.
  * Terms
  * Privacy
  * Security
  * Status
  * Docs

 

  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.