https://github.com/dennis-tra/nebula

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
dennis-tra / nebula Public

  * 
  * Notifications
  * Fork 22
  * Star 234
  * 

 A network agnostic DHT crawler, monitor, and measurement tool that
exposes timely information about DHT networks.

License

Apache-2.0 license
234 stars 22 forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 8
  * Pull requests 2
  * Actions
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Security
  * Insights

dennis-tra/nebula

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

      Name              Name          Last commit       Last commit
                                        message            date
Latest commit

 

History

503 Commits
 
.github           .github                              

cmd/nebula        cmd/nebula                           

config            config                               

core              core                                 

db                db                                   

devp2p            devp2p                               

discv4            discv4                               

discv5            discv5                               

discvx            discvx                               

docs              docs                                 

kubo              kubo                                 

libp2p            libp2p                               

maxmind           maxmind                              

nebtest           nebtest                              

tele              tele                                 

udger             udger                                

utils             utils                                

.dockerignore     .dockerignore                        

.gitignore        .gitignore                           

.goreleaser.yaml  .goreleaser.yaml                     

CITATION.cff      CITATION.cff                         

Dockerfile        Dockerfile                           

LICENSE           LICENSE                              

Makefile          Makefile                             

README.md         README.md                            

gen.go            gen.go                               

go.mod            go.mod                               

go.sum            go.sum                               

sqlboiler.toml    sqlboiler.toml                       

version           version                              

View all files

Repository files navigation

  * README
  * Apache-2.0 license

Nebula Logo

Nebula

 

standard-readme compliant go test readme nebula GitHub license Hits

A network agnostic DHT crawler and monitor. The crawler connects to
DHT bootstrappers and then recursively follows all entries in their
k-buckets until all peers have been visited. The crawler supports the
following networks:

  * IPFS - Amino DHT
  * Ethereum - Consensus Layer
  * Ethereum - Testnet Holesky (alpha)
  * Filecoin
  * Polkadot
  * Kusama
  * Rococo
  * Westend
  * Celestia - Mainnet
  * Celestia - Arabica
  * Celestia - Mocha

The crawler was:

  *  awarded a prize in the DI2F Workshop hackathon. 
  *  used for the ACM SigCOMM'22 paper Design and Evaluation of
    IPFS: A Storage Layer for the Decentralized Web 

 ProbeLab is publishing weekly reports for the IPFS Amino DHT based
on the crawl results here! 

 You can find a demo on YouTube: Nebula: A Network Agnostic DHT
Crawler 

Screenshot from a Grafana dashboard

Table of Contents

 

  * Table of Contents
  * Project Status
  * Usage
  * Install
      + From source
  * How does it work?
      + crawl
      + monitor
      + resolve
  * Development
      + Database
      + Tests
  * Report
  * Related Efforts
  * Demo
  * Maintainers
  * Contributing
  * Support
  * Other Projects
  * License

Project Status

 

The crawler is powering critical IPFS Amino DHT KPIs, used for Weekly
IPFS Reports as well as for many metrics on probelab.io. The main
branch will contain the latest changes and should not be considered
stable. The latest stable release that is production ready is version
2.2.0. The gathered numbers about the IPFS Amino DHT network are in
line with existing data like from the wiberlin/ipfs-crawler. Their
crawler also powers a dashboard which can be found here. Numbers of
the Ethereum Consensus Layer do not match existing numbers from other
teams like MigaLabs' as can be seen on their dashboard. However, this
seems to be because of different ways to aggregate and group peers in
the network.

Install

 

Precompile Binaries

 

Head over to the release section and download binaries from the
latest stable release.

From source

 

Nebula has a hard dependency on Go 1.19 because Nebula requires
go-libp2p <0.30. With version 0.30 go-libp2p dropped support for the
quic transport and only continues to support quic-v1 (release notes).
However, many peers in the IPFS Amino DHT still only listen on quic
addresses (as opposed to quic-v1). Many of them also listen over tcp
but from experiments I saw that they often refuse connections over
tcp. As of 2023-12-02 this results in a significant increase of
undialable peers that Nebula was previously able to connect to and
identify.

Until the error incurred by dropping the quic transport is negligible
or some new go-libp2p feature justifies an update, Nebula will stick
to the old go-libp2p version.

Because go-libp2p has a dependency on quic-go and specific versions
of quic-go can only be compiled with specific versions of Go. I'm
currently sticking to Go 1.19, but it might be possible to update to
Go 1.20 - I just haven't had the time to test this yet.

git clone https://github.com/dennis-tra/nebula
cd nebula
make build

Now you should find the nebula executable in the dist subfolder.

Usage

 

Nebula is a command line tool and provides the crawl sub-command.

Dry-Run

 

To simply crawl the IPFS Amino DHT network run:

nebula --dry-run crawl

The crawler can store its results as JSON documents or in a postgres
database - the --dry-run flag prevents it from doing any of it.
Nebula will just print a summary of the crawl at the end instead. A
crawl takes ~5-10 min depending on your internet connection. You can
also specify the network you want to crawl by appending, e.g.,
--network FILECOIN and limit the number of peers to crawl by
providing the --limit flag with the value of, e.g., 1000. Example:

nebula --dry-run crawl --network FILECOIN --limit 1000

To find out which other network values are supported, you can run:

nebula networks

JSON Output

 

To store crawl results as JSON files provide the --json-out command
line flag like so:

nebula --json-out ./results/ crawl

After the crawl has finished, you will find the JSON files in the ./
results/ subdirectory.

When providing only the --json-out command line flag you will see
that the *_neighbors.json document is empty. This document would
contain the full routing table information of each peer in the
network which is quite a bit of data (~250MB for the Amino DHT as of
April '23) and is therefore disabled by default

Track Routing Table Information

 

To populate the document, you'll need to pass the --neighbors flag to
the crawl subcommand.

nebula --json-out ./results/ crawl --neighbors

The routing table information forms a graph and graph visualization
tools often operate with adjacency lists. To convert the
*_neighbors.json document to an adjacency list, you can use jq and
the following command:

jq -r '.NeighborIDs[] as $neighbor | [.PeerID, $neighbor] | @csv' ./results/2023-04-16T14:32_neighbors.json > ./results/2023-04-16T14:32_neighbors.csv

Postgres

 

If you want to store the information in a proper database, you could
run make database or make databased (for running it in the
background) to start a local postgres instance and run Nebula like:

nebula --db-user nebula_test --db-name nebula_test crawl --neighbors

At this point, you can also start Nebula's monitoring process, which
would periodically probe the discovered peers to track their uptime.
Run in another terminal:

nebula --db-user nebula_test --db-name nebula_test monitor

When Nebula is configured to store its results in a postgres
database, then it also tracks session information of remote peers. A
session is one continuous streak of uptime (see below).

---------------------------------------------------------------------

There are a few more command line flags that are documented when you
runnebula --help and nebula crawl --help:

How does it work?

 

crawl

 

The crawl sub-command starts by connecting to a set of bootstrap
nodes and constructing the routing tables (kademlia k-buckets) of
these peers based on their PeerIDs. Then nebula builds random PeerIDs
with common prefix lengths (CPL) that fall each peers buckets, and
asks each remote peer if they know any peers that are closer (XOR
distance) to the ones nebula just constructed. This will effectively
yield a list of all PeerIDs that a peer has in its routing table. The
process repeats for all found peers until nebula does not find any
new PeerIDs.

This process is heavily inspired by the basic-crawler in libp2p/
go-libp2p-kad-dht from @aschmahmann.

If Nebula is configured to store its results in a database, every
peer that was visited is written to it. The visit information
includes latency measurements (dial/connect/crawl durations), current
set of multi addresses, current agent version and current set of
supported protocols. If the peer was dialable nebula will also create
a session instance that contains the following information:

CREATE TABLE sessions (
    -- A unique id that identifies this particular session
    id                      INT GENERATED ALWAYS AS IDENTITY,
    -- Reference to the remote peer ID. (database internal ID)
    peer_id                 INT           NOT NULL,
    -- Timestamp of the first time we were able to visit that peer.
    first_successful_visit  TIMESTAMPTZ   NOT NULL,
    -- Timestamp of the last time we were able to visit that peer.
    last_successful_visit   TIMESTAMPTZ   NOT NULL,
    -- Timestamp when we should start visiting this peer again.
    next_visit_due_at       TIMESTAMPTZ,
    -- When did we notice that this peer is not reachable.
    first_failed_visit      TIMESTAMPTZ,
    -- When did we first notice that this peer is not reachable anymore.
    last_failed_visit       TIMESTAMPTZ,
    -- When did we last visit this peer. For indexing purposes.
    last_visited_at         TIMESTAMPTZ   NOT NULL,
    -- When was this session instance updated the last time
    updated_at              TIMESTAMPTZ   NOT NULL,
    -- When was this session instance created
    created_at              TIMESTAMPTZ   NOT NULL,
    -- Number of successful visits in this session.
    successful_visits_count INTEGER       NOT NULL,
    -- The number of times this session went from pending to open again.
    recovered_count         INTEGER       NOT NULL,
    -- The state this session is in (open, pending, closed)
    -- open: currently considered online
    -- pending: peer missed a dial and is pending to be closed
    -- closed: peer is considered to be offline and session is complete
    state                   session_state NOT NULL,
    -- Number of failed visits before closing this session.
    failed_visits_count     SMALLINT      NOT NULL,
    -- What's the first error before we close this session.
    finish_reason           net_error,
    -- The uptime time range for this session measured from first- to last_successful_visit to
    uptime                  TSTZRANGE     NOT NULL,

    -- The peer ID should always point to an existing peer in the DB
    CONSTRAINT fk_sessions_peer_id FOREIGN KEY (peer_id) REFERENCES peers (id) ON DELETE CASCADE,

    PRIMARY KEY (id, state, last_visited_at)

) PARTITION BY LIST (state);

At the end of each crawl nebula persists general statistics about the
crawl like the total duration, dialable peers, encountered errors,
agent versions etc...

    Info: You can use the crawl sub-command with the global --dry-run
    option that skips any database operations.

Command line help page:

NAME:
   nebula crawl - Crawls the entire network starting with a set of bootstrap nodes.

USAGE:
   nebula crawl [command options] [arguments...]

OPTIONS:
   --addr-dial-type value                               Which type of addresses should Nebula try to dial (private, public, any) (default: "public") [$NEBULA_CRAWL_ADDR_DIAL_TYPE]
   --addr-track-type value                              Which type addresses should be stored to the database (private, public, any) (default: "public") [$NEBULA_CRAWL_ADDR_TRACK_TYPE]
   --bootstrap-peers value [ --bootstrap-peers value ]  Comma separated list of multi addresses of bootstrap peers (default: default IPFS) [$NEBULA_CRAWL_BOOTSTRAP_PEERS, $NEBULA_BOOTSTRAP_PEERS]
   --limit value                                        Only crawl the specified amount of peers (0 for unlimited) (default: 0) [$NEBULA_CRAWL_PEER_LIMIT]
   --neighbors                                          Whether to persist all k-bucket entries of a particular peer at the end of a crawl. (default: false) [$NEBULA_CRAWL_NEIGHBORS]
   --network nebula networks                            Which network should be crawled. Presets default bootstrap peers and protocol. Run: nebula networks for more information. (default: "IPFS") [$NEBULA_CRAWL_NETWORK]
   --protocols value [ --protocols value ]              Comma separated list of protocols that this crawler should look for [$NEBULA_CRAWL_PROTOCOLS, $NEBULA_PROTOCOLS]
   --workers value                                      How many concurrent workers should dial and crawl peers. (default: 1000) [$NEBULA_CRAWL_WORKER_COUNT]

   Network Specific Configuration:

   --check-exposed  Whether to check if the Kubo API is exposed. Checking also includes crawling the API. (default: false) [$NEBULA_CRAWL_CHECK_EXPOSED]


monitor

 

The monitor sub-command polls every 10 seconds all sessions from the
database (see above) that are due to be dialed in the next 10 seconds
(based on the next_visit_due_at timestamp). It attempts to dial all
peers using previously saved multi-addresses and updates their
session instances accordingly if they're dialable or not.

The next_visit_due_at timestamp is calculated based on the uptime
that nebula has observed for that given peer. If the peer is up for a
long time nebula assumes that it stays up and thus decreases the dial
frequency aka. sets the next_visit_due_at timestamp to a time further
in the future.

Command line help page:

NAME:
   nebula monitor - Monitors the network by periodically dialing previously crawled peers.

USAGE:
   nebula monitor [command options] [arguments...]

OPTIONS:
   --workers value  How many concurrent workers should dial peers. (default: 1000) [$NEBULA_MONITOR_WORKER_COUNT]
   --help, -h       show help

resolve

 

The resolve sub-command goes through all multi addresses that are
present in the database and resolves them to their respective
IP-addresses. Behind one multi address can be multiple IP addresses
due to, e.g., the dnsaddr protocol. Further, it queries the GeoLite2
database from Maxmind to extract country information about the IP
addresses and UdgerDB to detect datacenters. The command saves all
information alongside the resolved addresses.

Command line help page:

NAME:
   nebula resolve - Resolves all multi addresses to their IP addresses and geo location information

USAGE:
   nebula resolve [command options] [arguments...]

OPTIONS:
   --udger-db value    Location of the Udger database v3 [$NEBULA_RESOLVE_UDGER_DB]
   --batch-size value  How many database entries should be fetched at each iteration (default: 100) [$NEBULA_RESOLVE_BATCH_SIZE]
   --help, -h          show help (default: false)

Development

 

To develop this project, you need Go 1.19 and the following tools:

  * golang-migrate/migrate to manage the SQL migration v4.15.2
  * volatiletech/sqlboiler to generate Go ORM v4.14.2
  * docker to run a local postgres instance

To install the necessary tools you can run make tools. This will use
the go install command to download and install the tools into your
$GOPATH/bin directory. So make sure you have it in your $PATH
environment variable.

Database

 

You need a running postgres instance to persist and/or read the crawl
results. Run make database or use the following command to start a
local instance of postgres:

docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=password -e POSTGRES_USER=nebula_test -e POSTGRES_DB=nebula_test --name nebula_test_db postgres:14

    Info: You can use the crawl sub-command with the global --dry-run
    option that skips any database operations or store the results as
    JSON files with the --json-out flag.

The default database settings for local development are:

Name     = "nebula_test"
Password = "password"
User     = "nebula_test"
Host     = "localhost"
Port     = 5432

Migrations are applied automatically when nebula starts and
successfully establishes a database connection.

To run them manually you can run:

# Up migrations
make migrate-up

# Down migrations
make migrate-down

# Generate the ORM with SQLBoiler
make models # runs: sqlboiler
# This will update all files in the `pkg/models` directory.

# Create new migration
migrate create -ext sql -dir pkg/db/migrations -seq some_migration_name

Tests

 

To run the tests you need a running test database instance:

make database
make test

Related Efforts

 

  * wiberlin/ipfs-crawler - A crawler for the IPFS network, code for
    their paper (arXiv).
  * adlrocha/go-libp2p-crawler - Simple tool to crawl libp2p networks
    resources
  * libp2p/go-libp2p-kad-dht - Basic crawler for the Kademlia DHT
    implementation on go-libp2p.
  * migalabs/armiarma - Armiarma is a Libp2p open-network crawler
    with a current focus on Ethereum's CL network
  * migalabs/eth-light-crawler - Ethereum light crawler by @cortze.

Demo

 

The following presentation shows a ways to use Nebula by showcasing
crawls of the Amino, Celestia, and Ethereum DHT's:

Nebula: A Network Agnostic DHT Crawler - Dennis Trautwein

Maintainers

 

@dennis-tra.

Contributing

 

Feel free to dive in! Open an issue or submit PRs.

Support

 

It would really make my day if you supported this project through Buy
Me A Coffee.

Other Projects

 

You may be interested in one of my other projects:

  * pcp - Command line peer-to-peer data transfer tool based on
    libp2p.
  * image-stego - A novel way to image manipulation detection.
    Steganography-based image integrity - Merkle tree nodes embedded
    into image chunks so that each chunk's integrity can be verified
    on its own.

License

 

Apache License Version 2.0 (c) Dennis Trautwein

About

 A network agnostic DHT crawler, monitor, and measurement tool that
exposes timely information about DHT networks.

Topics

golang crawler ipfs cid hacktoberfest libp2p filecoin

Resources

Readme

License

Apache-2.0 license
Activity

Stars

234 stars

Watchers

11 watching

Forks

22 forks
Report repository

Releases 7

 
Release 2.2.0 Latest
Feb 19, 2024
+ 6 releases

Sponsor this project

 
Sponsor
Learn more about GitHub Sponsors

Contributors 4

  * @dennis-tra dennis-tra Dennis Trautwein
  * @coryschwartz coryschwartz Cory Schwartz
  * @guillaumemichel guillaumemichel Guillaume Michel
  * @iand iand Ian Davis

Languages

  * Go 89.4%
  * PLpgSQL 10.2%
  * Other 0.4%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.