https://github.com/Nixtla/nixtla

Skip to content
 
Sign up

  * Why GitHub?
      + Features -
      + Mobile -
      + Actions -
      + Codespaces -
      + Packages -
      + Security -
      + Code review -
      + Issues -
      + Integrations -
      + GitHub Sponsors -
      + Customer stories -
  * Team
  * Enterprise
  * Explore
      + Explore GitHub -
      + Learn and contribute
      + Topics -
      + Collections -
      + Trending -
      + Learning Lab -
      + Open source guides -
      + Connect with others
      + The ReadME Project -
      + Events -
      + Community forum -
      + GitHub Education -
      + GitHub Stars program -
  * Marketplace
  * Pricing
      + Plans -
      + Compare plans -
      + Contact Sales -
      + Education -

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}

Nixtla / nixtla Public

  * Notifications
  * Star 79
  * Fork 4
  * 

Automated time series processing and forecasting.

MIT License
79 stars 4 forks
Star
Notifications

  * Code
  * Issues 4
  * Pull requests 0
  * Actions
  * Projects 0
  * Wiki
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Wiki
  * Security
  * Insights

main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
Loading
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
Loading
View all tags
3 branches 0 tags
Code
Loading

Latest commit

@FedericoGarza
FedericoGarza Merge pull request #28 from Nixtla/new/readme-terraform
...
2201096 Dec 4, 2021
Merge pull request #28 from Nixtla/new/readme-terraform

New/readme terraform

2201096

Git stats

  * 201 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.github
add sdk image
Oct 20, 2021
api
add backtest argument
Oct 19, 2021
data
long format input
Sep 29, 2021
iac/terraform/aws
fix: grant cloud watch logs permissions to sagemaker
Nov 25, 2021
sdk
update license badge
Oct 20, 2021
tsbenchmarks
add fbprophet benchmark
Oct 19, 2021
tsfeatures
fix statsmodels version
Nov 3, 2021
tsforecast
update tsforecast README
Oct 13, 2021
tspreprocess
update tspreprocess README
Oct 13, 2021
utils
move docker/gpu to docker-gpu
Nov 10, 2021
.gitignore
feature: add terraform gitignore configuration
Nov 24, 2021
CONTRIBUTING.md
add contributing file
Oct 14, 2021
Dockerfile
add local api generation
Nov 18, 2021
LICENSE
Initial commit
Sep 23, 2021
Makefile
fix: fix Makefile conflicts
Nov 24, 2021
README.md
add terraform first line
Dec 5, 2021
requirements.txt
add initial reqs for api and Makefile
Sep 23, 2021
workflows.sh
Tspreprocess endpoint (#17)
Sep 28, 2021
View code
[                    ]
Nixtla Python SDK Basic Usage Install PyPI How to use Basic usage
Upload dataset to S3 Send the job to make forecasts Download
forecasts Forecasting Pipeline as a Service tspreprocess tsfeatures
tsforecast tsbenchmarks Build your own time-series processing service
using AWS Why ? How? 1. Terraform (infrastructure as Code) 2. Create
AWS resources using the console Create S3 buckets Create ECR
repositorires Lambda Function API Gateway Usage plan API Keys
Deployment GitHub secrets Run the API locally

README.md

 Nixtla

    Nixtla is an open-source time series forecasting library.

We are helping data scientists and developers to have access to open
source state-of-the-art forecasting pipelines. For that purpose, we
built a complete pipeline that can be deployed in the cloud using AWS
and consumed via APIs or consumed as a service. If you want to set up
your own infrastructure, follow the instructions in the repository
(Azure coming soon). With our Infrastructure as Code written in
Terraform, you can deploy our solution in minutes without much
effort.

You can use our fully hosted version as a service through our python
SDK (autotimeseries). To consume the APIs on our own infrastructure
just request tokens by sending an email to federico@nixtla.io or
opening a GitHub issue. We currently have free resources available
for anyone interested.

We built a fully open-source time-series pipeline capable of
achieving 1% of the performance in the M5 competition. Our
open-source solution has a 25% better accuracy than Amazon Forecast
and is 20% more accurate than fbprophet. It also performs 4x faster
than Amazon Forecast and is less expensive.

To reproduce the results: Open In Colab or you can read this Medium
Post.

At Nixtla we strongly believe in open-source, so we have released all
the necessary code to set up your own time-series processing service
in the cloud (using AWS, Azure is WIP). This repository uses
continuous integration and deployment to deploy the APIs on our
infrastructure.

 Python SDK Basic Usage

CI python sdk

 Install

 PyPI

pip install autotimeseries

 How to use

Check the following examples for a full pipeline:

  * M5 state-of-the-art reproduction.
  * M5 state-of-the-art reproduction in Colab

 Basic usage

import os

from autotimeseries.core import AutoTS

autotimeseries = AutoTS(bucket_name=os.environ['BUCKET_NAME'],
                        api_id=os.environ['API_ID'],
                        api_key=os.environ['API_KEY'],
                        aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
                        aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])

 Upload dataset to S3

train_dir = '../data/m5/parquet/train'
# File with target variables
filename_target = autotimeseries.upload_to_s3(f'{train_dir}/target.parquet')
# File with static variables
filename_static = autotimeseries.upload_to_s3(f'{train_dir}/static.parquet')
# File with temporal variables
filename_temporal = autotimeseries.upload_to_s3(f'{train_dir}/temporal.parquet')

Each time series of the uploaded datasets is defined by the column
item_id. Meanwhile the time column is defined by timestamp and the
target column by demand. We need to pass this arguments to each call.

columns = dict(unique_id_column='item_id',
               ds_column='timestamp',
               y_column='demand')

 Send the job to make forecasts

response_forecast = autotimeseries.tsforecast(filename_target=filename_target,
                                              freq='D',
                                              horizon=28,
                                              filename_static=filename_static,
                                              filename_temporal=filename_temporal,
                                              objective='tweedie',
                                              metric='rmse',
                                              n_estimators=170,
                                              **columns)

 Download forecasts

autotimeseries.download_from_s3(filename='forecasts_2021-10-12_19-04-32.csv', filename_output='../data/forecasts.csv')

 Forecasting Pipeline as a Service

Our forecasting pipeline is modular and built upon simple APIs:

 tspreprocess

CI/CD tspreprocess Lambda CI/CD tspreprocess docker image

Time series usually contain missing values. This is the case for
sales data where only the events that happened are recorded. In these
cases it is convenient to balance the panel, i.e., to include the
missing values to correctly determine the value of future sales.

The tspreprocess API allows you to do this quickly and easily. In
addition, it allows one-hot encoding of static variables (specific to
each time series, such as the product family in case of sales)
automatically.

 tsfeatures

CI/CD tsfeatures Lambda CI/CD tsfeatures docker image

It is usually good practice to create features of the target variable
so that they can be consumed by machine learning models. This API
allows users to create features at the time series level (or static
features) and also at the temporal level.

The tsfeatures API is based on the tsfeatures library also developed
by the Nixtla team (inspired by the R package tsfeatures) and the
tsfresh library.

With this API the user can also generate holiday variables. Just
enter the country of the special dates or a file with the specific
dates and the API will return dummy variables of those dates for each
observation in the dataset.

 tsforecast

CI/CD tsforecast Lambda CI/CD tsforecast docker image

The tsforecast API is responsible for generating the time series
forecasts. It receives as input the target data and can also receive
static variables and time variables. At the moment, the API uses the
mlforecast library developed by the Nixtla team using LightGBM as a
model.

In future iterations, the user will be able to choose different Deep
Learning models based on the nixtlats library developed by the Nixtla
team.

 tsbenchmarks

CI/CD tsbenchmarks Lambda CI/CD tsbenchmarks docker image

The tsbenchmarks API is designed to easily compare the performance of
models based on time series competition datasets. In particular, the
API offers the possibility to evaluate forecasts of any frequency of
the M4 competition and also of the M5 competition.

These APIs, written in Python and can be consumed through an SDK also
written in Python. The following diagram summarizes the structure of
our pipeline:

[sdk]

 Build your own time-series processing service using AWS

 Why ?

We want to contribute to open source and help data scientists and
developers to achieve great forecasting results without the need to
implement complex pipelines.

 How?

If you want to use our hosted version send us an email or open a
github issue and ask for API Keys.

If you want to deploy Nixtla on your own AWS Cloud you will need:

  * API Gateway (to handle API calls).
  * Lambda (or some computational unit).
  * SageMaker (or some bigger computational unit).
  * ECR (to store Docker images).
  * S3 (for inputs and outputs).

You will end with an architecture that looks like the following
diagram

[Architectu]

Each call to the API executes a particular Lambda function depending
on the endpoint. That particular lambda function instantiates a
SageMaker job using a predefined type of instance. Finally, SageMaker
reads the input data from S3 and writes the processed data to S3,
using a predefined Docker image stored in ECR.

To create that infrastructue you can use our own Terraform code
(infrastructure as code) or you can create the services from the
console.

 1. Terraform (infrastructure as Code)

Terraform is an open-source Infrastructure as Code tool that allows
you to synthesize all the manual development into an automatic
script. We have written all the needed steps to facilitate the
deployment of Nixlta in your infrastructure. The Terraform code to
create your infrastructure can be found at this link. Just follow the
next steps:

 1. Define your AWS credentials. You can define them using:

export AWS_ACCESS_KEY_ID="anaccesskey"
export AWS_SECRET_ACCESS_KEY="asecretkey"

These credentials require permissions to use the S3, ECR, lambda and
API Gateway services; in addition, you must be able to create IAM
users.

 2. To use Terraform, you must install it. Here is an excellent guide
    to do so.

 3. Position yourself in the iac/terraform/aws folder.

 4. Run the command terraform init. This command will initialize the
    working directory with the necessary configuration.

 5. Finally, you just need to use terraform apply. First, the list of
    services to be built will be displayed. You will have to accept
    to start the build. Once finished, you will get the API key
    needed to run the process, as well as the addresses of each of
    the APIs.

 2. Create AWS resources using the console

 Create S3 buckets

For each service:

 1. Create an S3 bucket. The code of each lambda function will be
    uploaded here.

 Create ECR repositorires

For each service:

 1. Create a private repository for each service.

 Lambda Function

For each service:

 1. Create a lambda function with Python 3.7 runtime.
 2. Modify the runtime setting and enter main.handler in the handler.
 3. Go to the configuration:
      + Edit the general configuration and add a timeout of 9:59.
      + Add an existing role capable of reading/writing from/to S3
        and running Sagemaker services.
 4. Add the following environment variables:
      + PROCESSING_REPOSITORY_URI: ECR URI of the docker image
        corresponding to the service.
      + ROLE: A role capable of reading/writing from/to S3 and also
        running Sagemaker services.
      + INSTANCE_COUNT
      + INSTANCE_TYPE

 API Gateway

 1. Create a public REST API (Regional).
 2. For each endpoint in api/main.py... add a resource.
 3. For each created method add an ANY method:
      + Select lambda function.
      + Select Use Lambda Proxy Integration.
      + Introduce the name of the lambda function linked to that
        resource.
      + Once the method is created select Method Request and set API
        key required to true.
 4. Deploy the API.

 Usage plan

 1. Create a usage plan based on your needs.
 2. Add your API stage.

 API Keys

 1. Generate API keys as needed.

 Deployment

 GitHub secrets

 1. Set the following secrets in your repo:
      + AWS_ACCESS_KEY_ID
      + AWS_SECRET_ACCESS_KEY
      + AWS_DEFAULT_REGION

 Run the API locally

 1. Create the environment using make init.
 2. Launch the app using make app.

About

Automated time series processing and forecasting.

Topics

aws machine-learning time-series time-series-forecasting 
software-as-a-service

Resources

Readme

License

MIT License

Contributors 5

  * @FedericoGarza
  * @KielRodriguez
  * @cchallu
  * @mergenthaler
  * @kdgutier

Languages

  * Python 59.2%
  * Jupyter Notebook 15.1%
  * HTML 11.4%
  * HCL 10.0%
  * Makefile 3.2%
  * Dockerfile 0.7%
  * Shell 0.4%

  * (c) 2021 GitHub, Inc.
  * Terms
  * Privacy
  * Security
  * Status
  * Docs

 

  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.