https://github.com/liuliu/dflat Skip to content Sign up Sign up * Why GitHub? Features - + Mobile - + Actions - + Codespaces - + Packages - + Security - + Code review - + Project management - + Integrations - + GitHub Sponsors - + Customer stories- * Team * Enterprise * Explore + Explore GitHub - Learn and contribute + Topics - + Collections - + Trending - + Learning Lab - + Open source guides - Connect with others + The ReadME Project - + Events - + Community forum - + GitHub Education - + GitHub Stars program - * Marketplace * Pricing Plans - + Compare plans - + Contact Sales - + Education - [ ] [search-key] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up Sign up {{ message }} liuliu / dflat * Notifications * Star 167 * Fork 3 Structured Data Store for Mobile dflat.io BSD-3-Clause License 167 stars 3 forks Star Notifications * Code * Issues 0 * Pull requests 0 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights unstable Switch branches/tags [ ] Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 2 branches 6 tags Go to file Code Clone HTTPS GitHub CLI [https://github.com/l] Use Git or checkout with SVN using the web URL. [gh repo clone liuliu] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit @liuliu liuliu Fix a potential unit tests race condition. ... 68bc501 Mar 14, 2021 Fix a potential unit tests race condition. 68bc501 Git stats * 199 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github/workflows Print error from workflow. Mar 2, 2021 app Fix bug in upsert where the object is not updated. Mar 2, 2021 bazel Try to setup Linux build. Oct 28, 2020 docs Update the docs. Oct 31, 2020 external Add apollo-ios in preparation for GraphQL support. Mar 14, 2021 scripts Update pre-commit hook so git within bazel won't interfere. Feb 2, 2021 src Fix a potential unit tests race condition. Mar 15, 2021 .bazelrc No point to pretend otherwise, for Swift, we can only use clang. Oct 30, 2020 .gitignore Update so camel case is supported. Feb 13, 2021 .swift-format.json Use swift-format from Bazel for formatting. Dec 23, 2020 BUILD Use swift-format from Bazel for formatting. Dec 23, 2020 LICENSE Initial commit Jun 25, 2020 Package.swift Update to the Swift flatbuffers v1.0.0 Feb 14, 2021 README.md Add back macos-bazel, public repo GitHub Action is free. Feb 22, 2021 WORKSPACE Add some basic code for ApolloCodegenLib usage Mar 15, 2021 deps.bzl Update flatbuffers to the version fixed the typo. Feb 21, 2021 dflat.bzl Use swift-format from Bazel for formatting. Dec 23, 2020 dflatc.py Make path pointing to the right place. Sep 1, 2020 focus.py Add two scripts to help development. Jun 27, 2020 generate_xcodeproj.sh Initial commit. Jun 25, 2020 sourcedocs.sh Fix documentation. Jun 29, 2020 View code Dflat: SQLite [?] FlatBuffers Features 30 Seconds Introduction Installation Install with Bazel Install with Swift Package Manager Example Schema Evolution Namespace Dflat Runtime API Transactions Data Fetching Data Subscription Close Benchmark CRUD Change Subscription README.md Dflat: SQLite [?] FlatBuffers [6874747073] [6874747073] macos-spm macos-bazel ubuntu-spm ubuntu-bazel If you are familiar with Core Data or Realm, Dflat occupies the same space as these two in your application. It helps you to persist and retrieve objects to or from disk for your application needs. Unlike these two, Dflat has a different set of features and makes very different trade-offs. These features and trade-offs are grounded from real-world experiences in writing some of the world largest apps. Dflat is also built from ground-up using Swift and hopefully, you will find it is natural to interact with in the Swift language. Features I've been writing different structured data persistence systems on mobile for the past a few years. Dflat is an accumulation of lessons-learned when building these proprietary systems. On iOS particular, the go-to choice long has been Core Data. It works, and is the internal data persistence mechanism for many system apps. But when deploying structured data persistence system to hundreds of millions mobile devices, there are certain challenges, both on the intrinsics of how data is persisted, and on a higher-level how the rest of the app interact with such system. The Dflat codebase is still in a very young stage. However, the underlying principles have been proving successful in other proprietary systems. Dflat implemented the following features in no particular order: 1. The system returns immutable data objects that can be passed down to other systems (such as your view model generators); 2. All queries and objects can be observed. Updates will be published through either callbacks or Combine framework; 3. Mutation can only happen on separate threads that caller has little control over, thus, asynchronously; 4. Data fetching can happen concurrently and synchronously on any thread by caller's choice; 5. Strict serializable multi-writer / multi-reader mode is supported but users can choose single-writer (thus, trivially strict serializable) / multi-reader mode if they desire; 6. Data queries are expressed with Swift code, and will be type-checked by the Swift compiler; 7. Schema upgrades require no write-access to the underlying database (strict read-only is possible with SQLite 3.22 and above). Unlike Core Data, Dflat is built from ground-up with Swift. You can express your data model by taking full advantage of the Swift language. Thus, a native support for struct (product-type), enum (sum-type), with type-checked queries and observing with Combine. 30 Seconds Introduction Dflat consists two parts: 1. dflatc compiler that takes a flatbuffers schema and generate Swift code from it; 2. Dflat runtime with very minimal API footprint to interact with. The Dflat runtime uses SQLite as the storage backend. The design itself can support other backends such as libmdbx in the future. The only hard dependency is flatbuffers. To use Dflat, you should first use dflatc compiler to generate data model from flatbuffers schema, include the generated code in your project, and then use Dflat runtime to interact with the data models. Installation Dflat at the moment requires Bazel. To be more precise, Dflat runtime can be installed with either Swift Package Manager or Bazel. But the dflatc compiler requires Bazel to build relevant parts. You can install Bazel on macOS following this guide. Install with Bazel If your project is already managed by Bazel, Dflat provides fully-integrated tools from code generation to library dependency management. Simply add Dflat to your WORKSPACE: git_repository( name = "dflat", remote = "https://github.com/liuliu/dflat.git", commit = "3dc11274e8c466dd28ee35cdd04e84ddf7d420bc", shallow_since = "1604185591 -0400" ) load("@dflat//:deps.bzl", "dflat_deps") dflat_deps() For your swift_library, you can now add a new schema like this: load("@dflat//:dflat.bzl", "dflatc") dflatc( name = "post_schema", src = "post.fbs" ) swift_library( ... srcs = [ ... ":post_schema" ], deps = [ ... "@dflat//:SQLiteDflat" ] ) Install with Swift Package Manager You can use dflatc compiler to manually generate code from flatbuffers schema. ./dflatc.py --help You can now add the generated source code to your project and then proceed to add Dflat runtime with Swift Package Manager: .package(name: "Dflat", url: "https://github.com/liuliu/dflat.git", from: "0.3.1") Example Assuming you have a post.fbs file somewhere look like this: enum Color: byte { Red = 0, Green, Blue = 2 } table TextContent { text: string; } table ImageContent { images: [string]; } union Content { TextContent, ImageContent } table Post { title: string (primary); // This is the primary key color: Color; tag: string; priority: int (indexed); // This property is indexed content: Content; } root_type Post; // This is important, it says the Post object will be the one Dflat manages. You can then ether use dflatc compiler to manually generate code from the schema: ./dflatc.py -o ../PostExample ../PostExample/post.fbs Or use dflatc rule from Bazel: dflatc( name = "post_schema", src = "post.fbs" ) If everything checks out, you should see 4 files generated in ../ PostExample directory: post_generated.swift, post_data_model_generated.swift, post_mutating_generated.swift, post_query_generated.swift. Adding them to your project. Now you can do basic Create-Read-Update-Delete (CRUD) operations on the Post object. import Dflat import SQLiteDflat let dflat = SQLiteWorkspace(filePath: filePath, fileProtectionLevel: .noProtection) Create: var createdPost: Post? = nil dflat.performChanges([Post.self], changesHandler: { (txnContext) in let creationRequest = PostChangeRequest.creationRequest() creationRequest.title = "first post" creationRequest.color = .red creationRequest.content = .textContent(TextContent(text: "This is my very first post!")) guard let inserted = try? txnContent.submit(creationRequest) else { return } // Alternatively, you can use txnContent.try(submit: creationRequest) which won't return any result and do "reasonable" error handling. if case let .inserted(post) = inserted { createdPost = post } }) { succeed in // Transaction Done } Read: let posts = dflat.fetch(for: Post.self).where(Post.title == "first post") Update: dflat.performChanges([Post.self], changesHandler: { (txnContext) in let post = posts[0] let changeRequest = PostChangeRequest.changeRequest(post) changeRequest.color = .green txnContent.try(submit: changeRequest) }) { succeed in // Transaction Done } Delete: dflat.performChanges([Post.self], changesHandler: { (txnContext) in let post = posts[0] let deletionRequest = PostChangeRequest.deletionRequest(post) txnContent.try(submit: deletionRequest) }) { succeed in // Transaction Done } You can subscribe changes to either a query, or an object. For an object, the subscription ends when the object was deleted. For queries, the subscription won't complete unless cancelled. There are two sets of APIs for this, one is vanilla callback-based, the other is based on Combine. I will show the Combine one here. Subscribe a live query: let cancellable = dflat.publisher(for: Post.self) .where(Post.color == .red, orderBy: [Post.priority.descending]) .subscribe(on: DispatchQueue.global()) .sink { posts in print(posts) } Subscribe to an object: let cancellable = dflat.pulisher(for: posts[0]) .subscribe(on: DispatchQueue.global()) .sink { post in switch post { case .updated(newPost): print(newPost) case .deleted: print("deleted, this is completed.") } } Schema Evolution The schema evolution in Dflat Follows exact with flatbuffers. The only exception is that you cannot add more primary keys or change primary key to a different property once it is selected. Otherwise, you are free to add or remove indexes, rename properties. Properties to be removed should be marked as deprecated, new properties should be appended to the end of the table, and you should never change the type of a property. There is no need for versioning as long as you follow the schema evolution path. Because the schema is maintained by flatbuffers, not SQLite, there is no disk ops required for schema upgrade. Schema upgrade failures due to lack of disk space or prolonged schema upgrade time due to pathological cases won't be a thing with Dflat. Namespace Dflat schema supports namespace, as does flatbuffers schema. However, because Swift doesn't really support proper namespace, the namespace implementation relies on public enum and extensions. Thus, if you have namespace: namespace Evolution.V1; table Post { title: string (primary); } root_type Post; You have to declare the namespace yourself. In your project, you need to have a Swift file contains following: public enum Evolution { public enum V1 { } } And it will work. You can then access the Post object through Evolution.V1.Post or typealias Post = Evolution.V1.Post. Dflat Runtime API Dflat runtime has very minimal API footprint. There are about 15 APIs in total from 2 objects. Transactions func Workspace.performChanges(_ transactionalObjectTypes: [Any.Type], changesHandler: @escaping (_ transactionContext: TransactionContext) -> Void, completionHandler: ((_ success: Bool) -> Void)? = nil) The API takes a changesHandler closure, where you can perform transactions such as object creations, updates or deletions. These mutations are performed through ChangeRequest objects. The first parameter specifies relevant object you are going to transact with. If you read or update any objects that is not specified here, an assertion will be triggered. When the transaction is done, the completionHandler closure will be triggered, and it will let you know whether the transaction is successful or not. The transaction will be performed in a background thread, exactly which one shouldn't be your concern. Two different objects can have transactions performed concurrently, it follows strict serializable protocol in that case. func TransactionContext.submit(_ changeRequest: ChangeRequest) throws -> UpdatedObject func TransactionContext.try(submit: ChangeRequest) -> UpdatedObject? func TransactionContext.abort() -> Bool You can interact with Dflat with above APIs in a transaction. It handles data mutations through submit. Note that errors are possible. For example, if you created an object with the same primary key twice (you should use upsertRequest if this is expected). try(submit: method simplified the try? submit dance in case you don't want to know the returned value. It will fatal if there are conflict primary keys, otherwise will swallow other types of errors (such as disk full). When encountered any other types of errors, Dflat will simply fail the whole transaction. abort method will explicitly abort a transaction. All submissions before and after this call will have no effect. Data Fetching func Workspace.fetch(for ofType: Element.Type).where(ElementQuery, limit = .noLimit, orderBy = []) -> FetchedResult func Workspace.fetch(for ofType: Element.Type).all(limit = .noLimit, orderBy = []) -> FetchedResult func Workspace.fetchWithinASnapshot(_: () -> T) -> T Data fetching happens synchronously. You can specify conditions in the where clause, such as Post.title == "first post" or Post.priority > 100 && Post.color == .red. The returned FetchedResult acts pretty much like an array. The object itself (Element) is immutable, thus, either the object or the FetchedResult is safe to pass around between threads. fetchWithinASnapshot provides a consistent view if you are going to fetch multiple objects: let result = dflat.fetchWithinASnapshot { () -> (firstPost: FetchedResult, highPriPosts: FetchedResult) in let firstPost = dflat.fetch(for: Post.self).where(Post.title == "first post") let highPriPosts = dflat.fetch(for: Post.self).where(Post.priority > 100 && Post.color == .red) return (firstPost, highPriPosts) } This is needed because Dflat can do transactions in between fetch for firstPost and highPriPosts. The fetchWithinASnapshot won't stop that transaction, but will make sure it only observe the view from fetching for firstPost. Data Subscription func Workspace.subscribe(fetchedResult: FetchedResult, changeHandler: @escaping (_: FetchedResult) -> Void) -> Subscription func Workspace.subscribe(object: Element, changeHandler: @escaping (_: SubscribedObject) -> Void) -> Subscription The above are the native subscription APIs. It subscribes changes to either a fetchedResult or an object. For object, it will end when object deleted. The subscription is triggered before a completionHandler on a transaction triggered. func Workspace.publisher(for: Element) -> AtomPublisher func Workspace.publisher(for: FetchedResult) -> FetchedResultPublisher func Workspace.publisher(for: Element.Type).where(ElementQuery, limit = .noLimit, orderBy = []) -> QueryPublisher func Workspace.publisher(for: Element.Type).all(limit = .noLimit, orderBy = []) -> QueryPublisher These are the Combine counter-parts. Besides subscribing to objects or fetchedResult, it can also subscribe to a query directly. What happens under the hood is the query will be made upon subscribe (hence, on whichever queue you provided if you did subscribe(on:), and subscribe the fetchedResult from then on. Close func Workspace.shutdown(completion: (() -> Void)? = nil) This will trigger the Dflat shutdown. All transactions made to Dflat after this call will fail. Transactions initiated before this will finish normally. Data fetching after this will return empty results. Any data fetching triggered before this call will finish normally, hence the completion part. The completion closure, if supplied, will be called once all transactions and data fetching initiated before shutdown finish. Benchmark Benchmark on structured data persistence system is notoriously hard. Dflat won't claim to be fastest. However, it strives to be predictable performant. What that means is there shouldn't be any pathological cases that the performance of Dflat degrades unexpectedly. It also means Dflat won't be surprisingly fast for some optimal cases. Following data are collected, and can be reproduced from: ./focus.py app:Benchmarks I compared mainly against Core Data, and listed numbers for FMDB and WCDB from WCDB Benchmark (from v1.0.8.2) to give a better overview of what you would expect from the test device. The test device is a iPhone 11 Pro with 64GB memory. A Disclaimer: you should take a grain of salt for any benchmark numbers. These numbers I presented here simply to demonstrate some pathological cases for frameworks involved. It shouldn't be taken out of this context. In practice, structured data persistence systems rarely are the bottle-neck. It is more important to understand how you use it rather than what's the raw numbers in a light-workload device looks like. The code for app:Benchmarks was compiled in Release mode (--compilation-mode=opt) with -whole-module-optimization on. The WCDB Benchmark was compiled in Release mode whatever that means in their project file. The benchmark itself is not peer-reviewed. In some cases, it represents the best case scenarios for these frameworks. In other cases, it represents the worst case scenarios. It is not designed to reflect real-world work-load. Rather, these benchmarks designed to reflect the framework's characteristics under extreme cases. CRUD First, we compared Dflat against Core Data on object insertions, fetching, updates and deletions. 10,000 objects are generated, with no index (only title indexed in Core Data). Fetching 1,667 Objects evaluated both frameworks on querying by non-indexed property. Update 10,000 Objects Individually evaluated updating different objects in separate transactions 10,000 times. Fetching 10,000 Objects Individually evaluated fetching different objects by title (indexed in Core Data and is the primary key in Dflat) 10,000 times. These are obviously not the best way of doing things (you should update objects in one big transaction, and fetch them in batch if possible), but these are the interesting pathological cases we discussed earlier. A proper way of doing multi-thread insertions / deletions in Core Data are considerably more tricky, I haven't got around to do that. The Multi-thread Insert 40,000 Objects and Multi-thread Delete 40,000 Objects are only for Dflat. [dflat-vs-c] Some of these numbers looks too good to be true. For example, on insertions, Dflat appears more than twice as fast as Core Data. Some of these numbers didn't make intuitive sense, why multi-thread insertions are slower? Putting it in perspective is important. [wcdb-vs-fm] The chart compared against numbers extracted from WCDB Benchmark (v1.0.8.2) without any modifications. It compares ops per seconds rather than time spent fetching 33,334 objects. Note that in WCDB Benchmark, Baseline Read did fetch all, which is the best case scenario in SQLite. It also compares a simple table with only two columns, a key and a blob payload (100 bytes). Multi-thread writes is indeed slower in our ideal case, because SQLite itself cannot execute writes concurrently. Thus, our multi-writer mode really just means these transaction closures can be executed concurrently. The writes still happen serially at SQLite layer. It is still beneficial because in real-world cases, we spend significant time in the transaction closure for data transformations, rather than SQLite writes. The ceiling for writes is much higher than what Dflat achieved. Again, WCDB represents an ideal case where you have only two columns. Dflat numbers in real-world would also be lower than what we had here, because we will have more indexes and objects with many fields, even arrays of data. Since Dflat doesn't introduce any optimizations for batch operations, it shouldn't be a surprise that Dflat performance scales linearly w.r.t. dataset size, as the follow chart will show. [dflat-scal] Change Subscription Every framework has slightly different design for how changes subscription works. Core Data implements this in two ways: NSFetchedResultsController delegate callbacks, and NSManagedObjectContextObjectsDidChange. From developer's perspective, NSFetchedResultsController can be interpreted as counter-part for FetchedResult subscription on Dflat side. Both supports making SQL-like queries and sending updates for the result set. You can build the Dflat object subscription mechanism in Core Data based on NSManagedObjectContextObjectsDidChange notification. For the purpose of being objective, I will simply observe the latency for NSManagedObjectContextObjectsDidChange notification when compare these two, assuming the underlying delivery to individual object subscription is a no-op. There are three parts of the benchmark: 1. Subscribe changes to 1,000 fetched results, each observe exactly one object (fetched by the primary key). Subsequent transaction will update 10,000 objects, including these subscribed 1,000 objects. Measuring the latency from when saved, to the time when updates delivered. For Core Data, a child context of viewContext was set up, and the latency was measured before saving the child context, to the time it is delivered. This should be before data persisted (viewContext.save() was called after child context saved). On Dflat side, this happens after data persisted. 2. Subscribe changes to 1,000 fetched objects. Subsequent transaction will update 10,000 objects, including these subscribed 1,000 objects. Measuring the latency from when saved, to the time when updates delivered. For Core Data, NSManagedObjectContextObjectsDidChange was subscribed for the viewContext object. It measures the latency from before saving the child context, to the time notification was delivered. 3. Subscribe changes to 1,000 fetched results, each observe around 1,000 objects (fetched by a range query). Subsequent transaction will update 10,000 objects, rotate all objects in each fetched results, while maintaining 1,000 objects per result. The measurement setup on Core Data is the same as 1. [dflat-core] The number for both fetched results observation, especially on case 1, represents the most pathological case of them all. It is particularly troublesome for Dflat because fetching 1,000 objects from disk individually would take around 20 milliseconds. Thus, if we would take SQLite.swift approach of identifying whcih table changed and simply refetch every query on that table, we could end up more performant. Although for case 3, refetching from disk would definitely be slower (close to 6 seconds for 1,000 queries, each with 1,000 objects). From the benchmark, Core Data suffered similar problem, while being worse. Again, this is a extreme case. For mobile apps, you should only have handful of query subscriptions, with probably at most thousands of objects for each query, and unsubscribe changes as you navigate away to other pages. These extreme cases hardly realistic, you are not going to see 35-second stutter from Core Data just because there are 10,000 objects updated and you happen to have 1,000 table views need to be updated. In reality, subscribe to individual queries by primary key seems to be a big no-no. If you want to observe individual object, you should just subscribe individual object as case 2 shows. However, it does expose that our message-sorting-and-delivery mechanism not working as efficiently as we expected. Fundamentally, Dflat's change subscription works best with incremental changes, because we evaluate every changed objects against all fetched request subscriptions related to that object. This design avoids trip to the disk on every transaction, but also relies on a reasonable implementation to evaluate every changed objects efficiently. A quick test shows that looping over 10,000 objects with 1,000 string equality evaluation in Swift takes about 30 milliseconds. Profile shows majority time was spent on objects retain / release and function calls for Swift runtime. There are two ways to improve: 1. Current evaluation relies on Swift protocol with associated types. It seems certain Swift usage has higher runtime cost than others. Switching to a better linear scan, either with a interpreted VM or simply optimizing the evaluation procedure, would probably show 5 to 10x improvements. 2. Algorithmically, it can be improved. Current implementation is naive in a way that we evaluate each object against each subscribed query. From the study of database implementation, we know accelerated data structures can be be helpful. Particularly, each FieldExpr in a query can be used to build a sorted set, Comparable queries can be accelerated with these sorted sets. Both are quite doable, while each has its own challenges. For 1, we need to wrestling with Swift runtime, and its behavior can be erratic at times for obvious gains to be possible. Because I am not intended to delegate parts to C, it makes all harder. For 2, while it is not hard to implement, we use 3-value logic internally (to support isNull / isNotNull queries), that means for every turn, we need to sort with UNKNOWN. Having a robust and correct such implementation means to have much more unit tests to feel comfortable. We also need to balance when to linear scan and when to use accelerated data structures because for small number of changes, linear scan could be faster from previous empirical studies. About Structured Data Store for Mobile dflat.io Resources Readme License BSD-3-Clause License Releases 6 v0.3.1 Latest Oct 31, 2020 + 5 releases Packages 0 No packages published Languages * Swift 94.2% * Starlark 2.3% * C++ 2.1% * Other 1.4% * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.