[HN Gopher] Althttpd: Simple webserver in a single C file
___________________________________________________________________
Althttpd: Simple webserver in a single C file
Author : miles
Score : 651 points
Date : 2021-06-08 08:30 UTC (14 hours ago)
(HTM) web link (sqlite.org)
(TXT) w3m dump (sqlite.org)
| pjc50 wrote:
| /* ** Test procedure for ParseRfc822Date */
| void TestParseRfc822Date(void){ time_t t1, t2;
| for(t1=0; t1<0x7fffffff; t1 += 127){ t2 =
| ParseRfc822Date(Rfc822Date(t1)); assert( t1==t2 );
| } }
|
| There's only two billion integers, guess we can test them all.
| Well, substantially fewer than two billion with that skip. I
| wonder if that completes in a few seconds.
| taspeotis wrote:
| https://randomascii.wordpress.com/2014/01/27/theres-only-fou...
| ithkuil wrote:
| Computers are insanely fast
| tyingq wrote:
| Another comment shows 10 seconds on a relatively decent CPU
| from 2017. So it is a fairly heavyweight task, though I
| suppose could be rewritten to use more than one core.
| nneonneo wrote:
| This loop doesn't test them all - it tests every 1/127th
| integer, so it only runs about 16 million times.
| [deleted]
| Cthulhu_ wrote:
| And a modern CPU runs billions of instructions per second, so
| it shouldn't take too long at all. Depends on the
| implementation of the function as well though, and how the
| compiler can optimize, unroll or optimize both the function
| under test and the test itself.
| postalrat wrote:
| How would you write a better test?
|
| IMO most tests are fundamentally flawed. The way most testing
| is done it would be easier and better to just write everything
| twice, have a method to compare results, and hope you got it
| right at least once.
| louisvgchi wrote:
| I do think of tests as a kind of double accounting.
| unwind wrote:
| That was quite easy to cut and paste and compile:
| $ time ./althttpd-time-parse Test completed in
| 10518961 us real 0m10,521s user
| 0m10,486s sys 0m0,004s
|
| The "Test completed" line is from my main() "driver", I wanted
| to measure time inline too and the measurements seem to agree.
|
| This is on a Dell Latitude featuring a Core i5-7300U at 2.6
| GHz, running Ubuntu 20.10.
| asveikau wrote:
| Looks like test code, rather than something that runs when
| serving in production.
|
| It fails an assert if the parse doesn't work I guess?
| millerm wrote:
| I really wish more developers took the time to comment code as
| well as the sqlite.org devs.
| 01127790781 wrote:
| jht tSl
| henvic wrote:
| Take a look at Fossil too: https://www.fossil-scm.org/
|
| It's the distributed version control (and more) used by SQLite.
| Most people have no idea about how cool the SQLite ecosystem is,
| and how it's used even on avionics!
| pdimitar wrote:
| The lack of ability to squash PRs into single commits is a deal
| breaker for many, myself included.
|
| No, having a commit "fix typo" in the main branch's history is
| not at all useful and won't ever be. It's noise.
|
| In a work setting it's much better to reduce noise.
| SQLite wrote:
| Fossil does has the ability to squash commits.
|
| The difference is that Fossil does not promote the use of
| commit-squashing. While it can be done, it takes a little
| work and knowledge of the system. Consider the premature-
| merge problem in which a feature branch is merged into trunk
| before it is ready, and subsequent typo fixes need to be
| added. To do this in Fossil you first move the errant merge
| onto a new branch (accomplished by adding a tag to the merge
| check-in) then fix the typo on the original feature branch,
| then merge again. So in Fossil it is a multi-step process.
| Fossil does not have a "rebase" command to do all that in one
| convenient step. Also, Fossil preserves the original errant
| check-in on the error branch, rather than just "disappearing"
| the check-in as Git tends to do.
|
| The difference here is a question of priorities. What is more
| important to you, an accurate history or a clean history that
| tells a story? Fossil prioritizes truth over beauty. If you
| prefer a retouched or "photoshopped" history over an
| auditable record of what really happened, Fossil might not be
| the right choice for you.
|
| To put it another way, Fossil can squash commits, but another
| system might work better for you if commit-squashing is your
| go-to method of dealing with configuration management
| problems.
| thayne wrote:
| > Consider the premature-merge problem in which a feature
| branch is merged into trunk before it is ready
|
| That isn't the normal use case for commit squashing though.
| Generally, trunk/master/main isn't ever rewritten.
| Squashing is usually done on feature branches _before_
| merging. What does that look like in fossil?
|
| It seems like part of the problem is that fossil is
| designed for a very different workflow. See
| https://www.fossil-scm.org/home/doc/trunk/www/fossil-v-
| git.w.... The autosync, don't commit until it is ready to
| be merged workflow might work well for a small flat
| organization, but I'm not sure how that scales to large
| organizations that have different privilege levels, formal
| review requirements, and hundreds or thousands of
| contributors.
| pdimitar wrote:
| I appreciate the different approaches and I'm grateful for
| the info (and the opinionated take) right from the source.
| Thank you.
|
| In terms of Git and the usual commercial SCM practices, I'm
| speaking empirically. Everywhere I worked in a team,
| leaders and managers wanted main branch's history to be a
| bird's-eye view, to have every commit fully build in CI/CD,
| and be able to find who introduced a problem (admittedly
| this requires a little more digging compared to Fossil,
| though). Squashed commits help when auditing for production
| breakages, and apparently also helps managers do release
| management (as well as billing customers sometimes).
|
| Do I have all sorts of minor commits in my own projects?
| Sure! I actually pondered using fossil for them but alas,
| learning new tools just never gets enough priority due to
| busy life. I'd love to learn and use it one day. I'm sick
| of Git.
|
| But I don't think your analogy with a photoshopped /
| retouched picture is fair. Squashing PRs into a single
| commit is not done for aesthetic reasons or for
| deliberately disappearing information -- a link to the
| original PR with its branch and all commits in it remain
| and can be fully audited after all. No information actually
| disappeared.
|
| I believe a better analogy would be with someone who
| prefers to have one big photo album that contains smaller
| albums which in turn contain actual photos of separate life
| events that are mostly (but not exactly) in chronological
| order -- as opposed to Fossil's approach which can be
| likened to a classic big photo album with all semantically
| unrelated photos put in strict chronological order.
|
| I'll reiterate that my observations and opinions are mostly
| empirical. And let me say that I don't like Git at all. But
| the practice I described does help in a classic team of
| programmers and managers.
|
| I concede that Git and its quirks represent a local maxima
| that absolutely can be improved upon, but at least to me
| the jury is still out on what's the better approach -- and
| I'm not sure a flat history is it.
| xyzzy_plugh wrote:
| We probably use the tools differently. I squash minor
| commits in my local branch, so I never want the review
| system squashing branches. If I put a branch for review,
| it is a series of patches which should be applied as-is.
| See the kernel mailing list and associated patch sets for
| the sort of thing I mean.
|
| I frequently use git rebase in interactive mode to
| rearrange and curate my commits to form whatever
| narrative I'm aiming for. Commits are semi-independent
| stories which can be merged, in order, at any rate and
| still make sense. Each commit makes sense with respect to
| history, but doesn't care about the future.
|
| I squash and rearrange and fixup commits until they look
| they way I want, and would want to see if I was looking
| at a history, and then send them for review.
|
| Whether you merge my branches, or fast-forward and rebase
| the individual patches, makes little difference to me.
| But please don't squash my hard work.
| pdimitar wrote:
| We definitely do use tools differently.
|
| Not looking to pick a fight here, mind you, but the Linux
| kernel is hardly a representative demonstration oh how to
| consume Git out there in the wild.
|
| The way you describe your usage it already seems you
| kinda sorta do your own squashed commits, only you want
| several of them get merged into the main branch, not just
| one. So you're still rewriting history a bit, no?
| xyzzy_plugh wrote:
| Yes! Absolutely I am rewriting history. I don't care
| about _my_ messy history, and nor should anyone else. I
| routinely force-push to _my_ branches because, again, it
| 's my mess. Tools like mailing lists obviously handle
| this fine because every _revision_ of my patches is
| published there, so who cares. Tools like Gerrit handle
| this well because it stores patch sets regardless of
| branch state -- as they are unrelated -- which feels like
| the right model. Tools like Github just suck at this in
| general and fall apart when you start force-pushing PR
| branches, but whatever.
|
| The problem with squashing at review time is that it is
| incompatible with my model. I'd rather teach engineers to
| not push shitty "fix the thing" or "oops" commits that
| are ultimately useless for me, the reviewer or reader.
| Just use git commit --fixup and git rebase --autosquash
| as $DEITY intended and force push away. It's your world,
| you'll be deleting the branch after you land it anyways.
|
| The tool works so well when used as intended: a
| _distributed_ version control system. The centralized
| model adds so much pain and suffering.
| dimitrios1 wrote:
| Consider editing. Yes, a paper goes through revisions, which
| are holistic, and represent one iteration to the next. But in
| between those revisions, you can see markups, and editor
| notes as to why a change was performed.
|
| Sometimes those insights are just as useful as the packaged
| whole.
| pdimitar wrote:
| Everybody who ever wrote their own diffing program has
| considered that and that's not the problem. The problem is
| how does this approach scale in a repo with 50+
| contributors? It doesn't, sadly.
| dimitrios1 wrote:
| > how does this approach scale in a repo with 50+
| contributors?
|
| You surely aren't going to have 50 contributors all
| simultaneously working on the same mainline or feature
| (or have 50 working together at the same time if ever, if
| you do, I'd say that's poor project management). The
| reality is a portion of the developers work on this
| feature in this part of the code base, a few over here,
| they'll be on their own branches or lines, and everything
| will be fine.
|
| This scenario where we have 50+ devs all crashing and
| bumping into each other rarely happens, if ever. I have
| personally only seen one instance of it happen in kernel
| development. And even then it was relatively
| straightforward to sort out.
|
| To go further, in a hypothetical scenario where there is
| one feature and 50+ open source developers are all vying
| to push their patches in, there is still going to be one
| reference point to work off of, and reviewers are going
| to base everything off that. It's a sequential process,
| not concurrent.
| emn13 wrote:
| I never understood the point of squashing if you just want
| the short version, only read the merge commits; if you want
| the full details to figure out some bug or whatever, then
| yes, I want those fix typo commits most definitely, because
| as often as not, those are at fault.
|
| Squashing buys you next to nothing, and costs you the ability
| to dive into the history in greater detail.
|
| I suppose if your project is truly huge, it becomes worth it
| to reduce load on your VCS, but beyond that...
| pdimitar wrote:
| Having full history in the feature branch is mandatory --
| nobody is disputing that, myself included.
|
| All due diligence is done there, not in the main branch.
|
| The main branch only needs to have one big commit saying
| "merging PR #2169". If you need more details you'll go that
| PR/branch and get your info.
|
| The "fix typo" commit being in the main branch buys you
| nothing. It's only useful in its separate branch.
| emn13 wrote:
| But that's what a merge commit _is_ - the diff along the
| merge commit is what the squashed commit would be; the
| only thing squashing does is "forget" that the second
| parent exists (terminology for non-git VCS's may be
| slightly different).
|
| Why not merge?
| pdimitar wrote:
| Because in big commercial projects several people or
| teams merge inside the main branch on a regular basis.
| It's easier to look at a history only including squashed
| commits each encompassing an entire PR (feature or a
| bug).
|
| It gives you better high-level observability and a good
| bird's-eye view. And again -- if you need more details
| you can go and check all separate commits in the
| PR/branch anyway.
|
| And finally, squashed commits are kind of atomic commits.
| Imagine history of three separate PRs happening at
| roughly the same time. And now all those commits are
| interspersed in the history of the main branch.
|
| How is _that_ useful or informative? It 's chaos.
|
| EDIT: my bad, I conflated merging with rebasing. Still, I
| prefer a single squashed commit for most of the reasons
| above, plus those of the other two commenters (useful git
| blame output and buildable history).
| emn13 wrote:
| Oh yeah, rebasing branches as a "merge" policy is
| definitely tricky like that. (I mean, I'm sure some
| people do that with and perhaps with good reason, but it
| makes this kind of stuff clearly worse).
| skeeter2020 wrote:
| This is fine if its how your team develops but not for
| everyone. We don't care about full history in branches;
| maybe it has more detail than main/master but it should
| still be contextually meaningful. I'd never approve a
| commit into main with the message "merging PR #xxx"
| either; it's redundant (merging), has no summary about
| what it actually does and relies on an external system
| (your PR/MR process) for details. I do agree that keeping
| noise out of your main is key, but would go even further
| than you to keep it clean AND self-contained.
| pdimitar wrote:
| Well sure, the title was just an example. It usually is
| more like this:
|
| "Encrypt project's sensitive fields (#1234)"
|
| With the number being a PR or an issue # (which does
| contain a link to the PR).
|
| I do care about history in branches though. And many
| others do. I agree that it varies from team to team.
| throwaway525142 wrote:
| Squashing for example allows me to have a history where
| each commit builds. This has been very useful for bisecting
| for me. I wouldn't call it "next to nothing".
|
| The "greater detail" part can cost me a lot of time.
| jiofih wrote:
| Having commits that do not build represents the history
| more accurately. It could very well be a "fix" to some
| build error that silently introduces an issue, that
| context is lost when you squash.
| plorkyeran wrote:
| Storing every single version of the file which ever hit
| disk locally on my machine in the history would be the
| most accurate, yet no one seems to advocate for that.
| Even with immutable history, which versions go into the
| history is a choice the developer makes.
| coldtea wrote:
| > _Storing every single version of the file which ever
| hit disk locally on my machine in the history would be
| the most accurate, yet no one seems to advocate for
| that._
|
| You'd be surprised. It's only because we understand (and
| are used to) tool limitations (regarding storage, load,
| etc) that we don't advocate for that, not because some
| other way is philosophically better.
|
| I'd absolutely like to have "every single version of the
| file which ever hit disk locally on my machine in the
| history".
| emn13 wrote:
| I mean, this isn't even really all that far-fetched,
| other systems do work like that, such as e.g. word's
| track changes or gdoc's history - or even a database's
| transaction log.
|
| And while those histories are typically unreadable, it is
| possible to label (even retroactively) relevant moments
| in "history"; and in any case just because a consumer-
| level wordprocessor doesn't export the history in a
| practical way doesn't mean a technical VCS couldn't do
| better - it just means we can't take git-as-is or google-
| docs-as-is and add millions of tiny meaningless "commits"
| and hope it does anything useful.
| Ekaros wrote:
| Is that with or without auto-save? Though, that might
| actually be interesting for some academic research if
| enough data was collected.
| sangnoir wrote:
| > Having commits that do not build represents the history
| more accurately.
|
| Sure it does, but sometimes that level of detail in
| history is not helpful. Individual keystrokes are an even
| finer/"more accurate" representation of history; but who
| wants that? At some point, having more granular detail
| becomes noise - the root of the disconnect is that people
| have a difference in opinion on which level that is: for
| some (like you), it's at individual commit-level. For
| others (like me), it's at merge-level: inspecting
| individual commits is like trying to parse someone's
| stream-of-consciousness garbage from 2 years ago. I
| _really_ don 't care to know you were "fixing a typo" in
| a0d353 on 2019-07-15 17:43:32, but your commit is just
| tripping-up my git-bisect for no good reason.
| IanCal wrote:
| Can you not just skip all non-merge commits?
| sangnoir wrote:
| Sure, I _can_ - but _should_ I? That 's the fundamental
| difference in opinion (which I don't think can be
| reconciled). I don't need to know what the developer was
| thinking or follow the individual steps when they
| developing a feature or fixing a bug, for me, the merge
| is the fundamental unit of work, and not individual
| commits. Caveat: I'm the commit-as-you-go type of
| developer, as most developers are (branching really is
| cheap in Git). If everyone was disciplined enough not to
| make commits out of WIP code, and every commit was self-
| contained and _complete_ , I'd be all for taking commits
| as the fundamental unit of code change
|
| If the author did something edgy or hard-to-understand
| with the change-set, I expect to see an explanation why
| it was done that way as a comment near the code in
| question, rather than as a sequence of commit-messages,
| that is the last place I will look - but that's just me
| azernik wrote:
| Representing the history more accurately is not a useful
| design goal
| darig wrote:
| Representing the history more accurately is a useful
| design goal
| user-the-name wrote:
| Why not? I do not see any reason to have a history at all
| for anything except to be able to go back to a specific
| version to track down a problem. Inaccurate history makes
| that less useful.
| chousuke wrote:
| Typo commits and the usual iteration during development
| isn't "accurate history". Noise in commit logs provides
| negative value.
|
| Ideally, each commit should be something that you could
| submit as a stand-alone patch to a mailing list; whether
| it's a single commit that was perfect from the get-go or
| fifty that you had to re-order and rewrite twenty times
| does not matter at all; the final commit message should
| contain any necessary background information.
|
| It would be needlessly restrictive to prevent users from
| making intermediate commits if that helps their workflow:
| I want to be able to use my source-code management tool
| locally in whichever way I please and what you see
| publicly does not need to have to have _anything_ to do
| with my local workflow. Thus, being able to "rewrite
| history" is a necessary feature.
| jiofih wrote:
| Patch-perfect commits are an idealistic goal. The truth
| is that, as already mentioned, many of those typos and
| "dirty" commits can be the source of bugs that you're
| looking for. Hiding them hides the history.
| throwaway525142 wrote:
| You're not saying anything new as far as I can tell. Your
| grandparent already said what you said. I only disputed
| the "this costs next to nothing" part, which you don't
| seem to comment on.
| jiofih wrote:
| I was commenting exactly on that. You think having every
| commit build, at the cost of destroying history, is
| gaining something.
|
| I think representing history correctly is best, and agree
| that "squashing buys you next to nothing" other than
| visually pleasant output. Clearer?
|
| My favorite approach is rebase + non-ff merge. Best of
| both worlds.
| jhasse wrote:
| No. They think having every commit build is gaining
| something.
|
| You think the disadvantage (destroying history) is more
| important, but you can't say that it "buys you nothing".
| emn13 wrote:
| ideally your branch commits would usually build too; but
| admittedly, I tend to bisect by hand, following the
| --first-parent. I suppose if the set of commits were
| truly huge that might be more of an issue. And of course,
| there are people that have hacked their way to success
| here: https://stackoverflow.com/a/5652323, but that
| sounds a little fragile.
| layoutIfNeeded wrote:
| I want to be able to tell _why_ a given line of code was
| introduced. Seeing "Fix indentation" in the output of `git
| blame` won't help me with that.
| emn13 wrote:
| git blame --first-parent
| elliotf wrote:
| I also want to be able to tell _why_ which is why I
| dislike working on codebases that squash commmits. Too
| many times I've done a blame to see why a change was
| made, and it's a giant (> 10) list of commit messages.
| Oftentimes, the macro description of what was going on
| does not help me with the line-level detail.
|
| Also, in case it helps you in the future, `blame -wC` is
| what I use when doing blame; it ignores whitespace
| changes and tracks changes across files (changes happened
| before a rename, for example.)
| gregmac wrote:
| Neither does squashing though: you still can't tell if
| that line was introduced or modified.
|
| I've come across "fix indentation" or "fix typo" commits
| where a bug was introduced, like someone accidentally
| comitted a change (maybe they were debugging something,
| or just accidentally modified it).
|
| For example: I'm tracing a bug where a value isn't
| staying cached. I find a line of code DefaultCacheAge=10
| (which looks way too short) and git blame shows the last
| change was modifying that value from 86400. What I do
| next will be very different if the commit message says
| "fix indentation" vs "added new foobar feature" or
| "reduced default cache time for (reason)".
| skeeter2020 wrote:
| greater detail can ultimately lead to less information if
| noisy commits crowd out the important ones. IME you want a
| commit to mean something and that usually leads to tweaking
| the work/change/commit relationship, which is where
| squashing helps.
| bastardoperator wrote:
| Who squashes like this? I rebase all of my PR's not because
| I want to trash history, but because I want history to be
| meaningful. If I include all of my "whoops typo fix" and
| "pay respect to the linter gods" commits, I have made my
| default branch history much less readable.
|
| I would say what you're describing is a break down in CI/CD
| and code review. How is code that is that broken getting
| into your default branch in the first place?
| emn13 wrote:
| I certainly don't, but it's a standard option in git
| merge (and IIRC github supports it), so I'm pretty sure
| some teams do.
|
| As to rebases to clean up _history_ (and not just the PR
| itself)... personally, I don 't think that's worth it. My
| experience with history like this is that it's relevant
| during review, and then around 95% of it is irrelevant -
| you may not know which 5 % are relevant beforehand, but
| it's always some small minority. It's worth cleaning up a
| PR for review, but not for posterity. And when it comes
| to review, I _like_ commits like "pay respect to the
| linter gods" and the like, because they're easy to
| ignore, whereas if you touch code and reformat even
| slightly in one commit, it's often harder to skip the
| boring bits; to the point that I'll even intentionally
| commit poorly formatted code such that the diff is easy
| to read and then do the reformat later. Removing clear
| noise (as in code that changes back and forth and for no
| good reason) is of course nice, but it's easy to overdo;
| a few typo commits barely impact review-ability (imho),
| and rebases can and do introduce bugs - you must have
| encountered semantic merge conflicts before, and those
| are 10 times as bad with rebases, because they're
| generally silent (assuming you don't test each commit
| post-rebase), but leave the code in a really confusing
| situation, _especially_ when people fix the final commit
| in the PR, but not the one were the semantic merge
| conflict was introduced, and laziness certainly
| encourages that.
|
| It also depends on how proficient you are with merge
| conflicts and git chicanery. If you are; then the history
| is yours to reshape; but not everybody is, and then I'd
| rather review an honest history with some cruft, rather
| than a frankenstein history with odd seams and mismatched
| stuff in a commit that basically exists because "I kept
| on prodding git till it worked".
| brundozer wrote:
| I agree with you that such a commit has nothing to do in the
| main branch. But it has nothing it do in any branch that is
| shared with anyone either. Git has enough ways to keep your
| own history clean at all times to not require a hack like
| squash PRs to compensate the lack of discipline of a team.
| With squash PRs you lose so much valuable information that it
| gets impossible to use commands like bisect or to have a
| proper context on a blame.
| pdimitar wrote:
| I agree that you lose the benefit of a direct bisect but
| this is usually shrugged off with "you can go to the PR and
| inspect the commits one by one" which, while not ideal, is
| deemed a good tradeoff if you want your main branch to only
| contain big commits each laser-focused on one feature or
| bug.
|
| As I replied to @SQLite above, I am not saying this is the
| optimal state of affairs -- not at all. But it's what is
| required everywhere I ever worked for 19.5 years (and
| similar output was desired when we worked with CVS and
| Subversion and it was harder to achieve there).
|
| But I'll disagree this is lack of discipline. It's not that
| at all. It's a compromise between programmers,
| managers/supervisors, CTOs / directors of engineering, and
| release engineers. They want a bird's-eye view of the
| project in the main branch.
| swiley wrote:
| I don't want forum/web software built into my dvcs. I don't see
| how this improves over git.
| networked wrote:
| Fossil has enough wiki and theming features to power a
| customized website and to let you edit its contents in the
| browser. I've joked that Fossil SCM is secretly "Fossil CMS".
|
| My personal website, https://dbohdan.com/, is powered by
| Fossil. A year ago I was shopping for a wiki engine, didn't
| love any I looked at, and realized I could try something I was
| already familiar with: Fossil. It did take a few hacks to make
| it work how I wanted. The wiki lacks category and transclusion
| features and, at least for now [1], can't generate tables of
| contents. I've invented a simple notation for tags and generate
| a "tag page" [2] using a Tcl script [3]. The script runs every
| time I synchronize my local repository with dbohdan.com. The
| TOC is generated in IE11-compatible JavaScript in the reader's
| browser [4]. The redirects are in the Caddyfile (not in the
| repo). Maybe I'll migrate to a more full-featured wiki later
| [5], but I am enjoying this setup right now. I am happy I gave
| Fossil a try.
|
| Fossil also has a built-in forum engine [6]. I am thinking of
| migrating a forum running on deprecated software to it.
|
| Edit: My favorite music page and sitemap are generated on sync,
| too. [7] The sitemap uses Fossil's "unversioned content"
| feature to avoid polluting the timeline (commit history). [8]
|
| -----
|
| [1] In the forum thread https://fossil-
| scm.org/forum/forumpost/b635dc56cb?t=h DRH talks about
| implementing a server-side TOC.
|
| [2] The page lists the tags and what pages are tagged with
| each. Tags on other pages link to their section of the tag
| page. https://dbohdan.com/wiki/special:tags.
|
| [3] https://dbohdan.com/artifact/8297b54f5d
|
| [4] https://dbohdan.com/artifact/d81bb60a0e
|
| [5] PmWiki seems like a nice lightweight option--an order of
| magnitude less code than its closest competitor DokuWiki, very
| stable, and has a better page history view. Caveat: it is
| written in old school PHP. https://pmwiki.org/.
|
| [6] https://fossil-scm.org/home/doc/trunk/www/forum.wiki
|
| [7] https://dbohdan.com/wiki/music-links with
| https://dbohdan.com/artifact/053d0ff993,
| https://dbohdan.com/uv/sitemap.xml with
| https://dbohdan.com/artifact/c21444f7c9.
|
| [8] https://fossil-scm.org/home/doc/trunk/www/unvers.wiki
| sgbeal wrote:
| > The wiki lacks category ...
|
| Just FYI: we recently improved the internals to be able to
| add propagating tags to wiki pages[1], so it will eventually
| be possible to use those to categorize/group your wiki pages.
| What's missing now is UIs which can make use of that feature.
| The CLI tag command can make use of them, but that doesn't
| help your UI much.
|
| > ... and transclusion features
|
| For the wiki it seems unlikely to me that transclusion will
| ever be a thing. It can hypothetically be done with the
| embedded docs feature if the fossil binary is built with
| "th1-docs" support, but, alas, we can't currently support
| propagating tags on file-level content. (i have an idea how
| it might be integrated, but figuring out whether or not it
| internally makes sense requires trying it out (and that
| doesn't have a high priority).)
|
| [1] https://fossil-scm.org/forum/forumpost/3d4b79a3f9?t=h
| networked wrote:
| This is excellent news! I hope Fossil can eventually
| obsolete both my tag script and the JavaScript TOC.
|
| As for transclusions, I don't expect Fossil to implement
| them. While something like
| https://www.pmwiki.org/wiki/PmWiki/IncludeOtherPages would
| be cool, it seems probably out of scope for Fossil.
| systems wrote:
| long long long time ago Fossil destroyed code for Zed Shaw
|
| And since that day, not one looked at Fossil the same way, at
| least not the same way they looked at git
|
| https://www.mail-archive.com/fossil-users@lists.fossil-scm.o...
|
| I can't tell for sure how much of impact this had on fossil's
| adoption, its hard to beat git no matter how good you are, but
| I think it was a bit hit
| sgbeal wrote:
| > long long long time ago Fossil destroyed code for Zed Shaw
|
| 1) No, it didn't. Please read the part of the thread after
| Zed's initial panic attack.
|
| 2) To the best of our[1] knowledge, fossil itself has never
| caused a single byte of data loss. When fossil checks in new
| data, it reads that data back (in the same SQL transaction
| the data was written in) to ensure than it can read what it
| wrote with 100% fidelity, so it's nearly impossible to get
| corrupted data into fossil without going into the db and
| massaging it by hand. People _have_ lost data by storing
| their only copy of a repository on failing /failed storage or
| on a network drive, but no software can protect against
| hardware failure and nobody in their right mind tries to
| maintain an active sqlite db over a network drive (plenty of
| people do it, despite the repeated warnings of anyone who
| knows anything about sqlite, and they have only themselves to
| blame when it goes pear shaped). Fossil makes syncing to/from
| a remote copy absolutely trivial, so any failure to regularly
| sync copies to a backup is end-user error.
|
| [1] = the fossil developers.
| wizzwizz4 wrote:
| Technically, Fossil didn't destroy the code; Zed destroyed it
| by running `fossil revert`. All Fossil did was run off into
| la la land where nothing makes sense and everything is
| invisible and the working directory is empty.
|
| Still an impressive bug - but the first rule of "my CVS has
| broken" is "stop running commands, and copy the CVS directory
| somewhere else". (I've needed to do this to a .git twice.)
| Had he done this, he wouldn't've lost work. While he
| shouldn't've _had_ to, "stop doing everything and take a
| read-only copy" is the _first step_ when any database
| containing important data has corrupted.
| Jach wrote:
| I'm pretty familiar with a lot of Zed's doings but I had
| missed this one. I doubt it impacted the adoption much,
| losing one advocate like that isn't going to doom your
| project.
|
| I finally gave Fossil a serious try last year for a few
| months, just on my own but I don't think my opinions would
| change if I tried in a team setting. I still love the idea,
| but the execution is... ghetto. Serviceable, certainly, but
| ghetto is the best overarching description I have for it. You
| have to be willing to look past a lot of things, sure many of
| them are petty, in order to take it over dedicated polished
| services for the things it combines. And git+[choice of issue
| tracker]+[choice of forum]+[choice of wiki]+[choice of
| project website]+etc. is the real competition against Fossil,
| not git+nothing, so even if it was better on the pure version
| control bits it would still be a tough battle. (Not to
| mention the elephant git+github is a pretty good kitchen sink
| on its own if you don't want to think about choices and have
| something serviceable that's also a lot less ghetto.)
|
| I also realized how much I love git's staging area concept
| once it was gone -- even when I had to use Perforce a lot, at
| least Perforce has the concept of pending changelists so you
| have something similar. I've never been a big fan of git's
| rebase, but it's also brought up a lot as a feature people
| are unwilling to give up, and I see the appeal. In summary, I
| think the adoption issue is just that people who do
| eventually give it a shot find usability issues/missing
| functionality they aren't willing to put up with.
| papaf wrote:
| _And git+[choice of issue tracker]+[choice of
| forum]+[choice of wiki]+[choice of project website]+etc. is
| the real competition against Fossil_
|
| I think that this is only true for some projects. For some
| people, the ease of self hosted setup (one executable) and
| the fact that you can change the documentation, edit code
| and close bugs offline is a big win that no centralized
| service can compete with.
| Jach wrote:
| Sure, one executable is nice, that might be enough of a
| payoff for some people to overlook the rest. I think you
| may be underestimating the amount of choice that's out
| there though. For documentation alone there's countless
| options that don't require a centralized online thing.
| Three I'd use over Fossil again are 1) simple project
| doc/ folder with .md files inside (which you can render
| locally to look just like github with
| https://github.com/joeyespo/grip) 2) Doxygen or a
| language-specific equivalent if it's better 3)
| libreoffice docs stored either in the same repo or
| another one (perhaps a submodule).
|
| For project management I'm less familiar with the options
| out there but I'd be surprised if there was nothing that
| gives a really stellar offline experience. I'd give a
| preemptive win to Fossil on the narrow aspect that your
| issue tracking changes can be synced and merged
| automatically with a collaborative server when you come
| back online, whereas if you stood up your own instance of
| Trac for instance I'm not sure if they have any support
| for syncing. If you're working by yourself, though, then
| there's no problem, Trac and many others work just like
| Fossil and stand up a local server (or are dedicated
| standalone programs) and work the same whether you're
| offline or online. But when I'm working solo I prefer
| low-tech over anything that resembles Jira (and I don't
| even really dislike Jira) -- I've played with
| https://github.com/dspinellis/git-issue as another
| offline/off-platform option but in my most recent ongoing
| solo project I'm quite happy with a super low-tech issues
| text file that has entries like (easy to make with
| https://github.com/dhruvasagar/vim-table-mode)
| +--------------+ | Add thing |
| +==============+ | Done whens / | | other
| info | +--------------+
|
| and when I'm closing one I just move it to the issues-
| closed file as part of the closing commit. I might give
| it an identifier if I need to reference it in the
| code/over multiple commits.
| badsectoracula wrote:
| Which is a shame because from the followup emails it becomes
| clear that the loss was actually Zed Shaw's fault (he
| explicitly typed "fossil revert" which is what deleted his
| code) and not the bug (there was a bug he encountered but it
| didn't end up in data loss).
| metalliqaz wrote:
| from the linked thread:
|
| > I just cloned your repo. Everything is working fine.
| Breath, Zed.
|
| seems like he panicked and made it worse, then rage-quit
| jessermeyer wrote:
| Right. Reading the entire conversation shows the parent
| messaging is FUD.
| boygobbo wrote:
| I've just read it myself and was very impressed by
| Richard Hipp's calm, courteous and honest demeanour. I
| would recommend this to anyone as a paragon of how to
| respond to a difficult situation.
| JoeQuery wrote:
| It's been a few years but my one and only interaction with
| Zed via an email exchange showed me he was quite unstable
| and prone to outbursts. Never meet your heroes, they say.
| scruple wrote:
| I had exactly one interaction with him, too, whereby he
| shared with me a very long list of companies that he knew
| of which were hiring at the time, this would've been
| around 2012 or so. I got quite a few interviews as a
| result, and quite a few offers, FWIW, though I ended up
| taking a job with a company that wasn't on this list.
|
| I understand both that he is and why he is such a
| polarizing figure. Just wanted to put my positive
| anecdote on the pile, since they seem to be a less common
| when he comes up in online comments.
| JoeQuery wrote:
| I believe it. Some people exhibit pretty extreme
| behavior. It sounds like that was extremely nice of him
| :)
| judofyr wrote:
| Heh, this reminded me of _why's comments from the Zed drama
| waaaaay back: https://gist.github.com/brianjlandau/186701
|
| > Let me put it this way. Suppose you've got Zed Shaw. No,
| wait, say you've got "a person." (We'll call this person
| "Hannah Montana" for the sake of this exercise.) And you look
| outside and this young teen sensation is yelling, throwing
| darts at your house and peeing in your mailbox. For reals.
| You can see it all. Your mailbox is soaked. Defiled. The flag
| is up.
|
| > Now, stop and think about this. This is a very tough
| situation. This young lady has written one of THE premiere
| web servers in the whole wide world. Totally, insanely RFC
| complaint. They give it away on the street, but everyone
| knows its secretly worth like a thousand dollars. And there
| was nothing in that web server that hinted to these postal
| urinations.
| [deleted]
| asddubs wrote:
| I misread the title as "in a single line of C-code". I thought
| "must be a pretty long line"
| codazoda wrote:
| 91,933 characters.
|
| I was curious about the number of lines and calculating
| characters was a simple select all from there. There are 2,592
| lines.
| ulises314 wrote:
| This pretty much sounds like my networks class final project.
| dmux wrote:
| There's a lot of cool technology in the Tcl and SQLite ecosystem.
| Wapp [0] is a tiny web application framework (a single file) also
| written by D. Richard Hipp
|
| [0] https://wapp.tcl.tk/home/doc/trunk/README.md
| ktpsns wrote:
| Here is the actual single C-code file:
| https://sqlite.org/althttpd/file?name=althttpd.c
|
| Something I absolutely love about text based protocols such as
| HTTP/1 is how easy you can implement it in any virtually
| programming language. Sure, the implementation is not top-of-the-
| notch, but it just damned works, it is portable, it is
| understandable by humans. That's something what's got lost with
| HTTP/2 and HTTP/3, respectively.
| anthk wrote:
| Gopher is easier than that.
| secondcoming wrote:
| Being text based is a huge flaw with HTTP in my opinion (and
| also elsewhere, like Redis). It leads to parsing bugs and
| overly verbose communications. Humans are good at reading text,
| CPUs prefer binary.
| dnautics wrote:
| Being binary based leads to errors and loss of momentum when
| you are debugging something that's deployed to prod. It's all
| about tradeoffs.
| mbreese wrote:
| The use of text for popular protocols is for a reason --
| computers don't write programs, people do. And while CPUs
| prefer binary, it's easier for programmers to
| read/write/reason about text. This makes it easier to work
| with a new protocol.
|
| From a practical perspective, with a binary protocol, it can
| be difficult to use across different languages or add support
| for a new language. If you use the simplest possible
| encoding, you'd send raw struct data. But this doesn't always
| work across different OS/arch/versions/etc. if the server is
| in C, but the client is in Python, reading the binary
| protocol would require a far more complicated parser.
|
| Obviously a more formal encoding (protobuf, etc) would be
| preferred, but if you already need to use an encoding
| mechanism, why not wrap it in a text format? It's easier to
| write clients that can read/write text protocols in any
| language. The reason why text protocols are so popular aren't
| because they are necessarily "better" but easier to adopt.
| This is why the most popular protocols are text based.
| michaelmior wrote:
| Except it quickly gets messy when you start dealing with
| real data and making sure encoding and escaping is done
| correctly.
|
| > with a binary protocol, it can be difficult to use across
| different languages
|
| This is also true of text protocols that aren't well-
| designed. I don't think it's necessarily the case that
| binary protocols are more difficult to deal with. You just
| have a different set of concerns to address.
|
| > If you use the simplest possible encoding, you'd send raw
| struct data.
|
| This is the "simplest" in the sense that it's definitely
| easy to just copy this data on the wire, but I think this
| is a straw man. I don't think it's really any more
| difficult to write a simple protocol that uses binary data
| compared to text.
| mbreese wrote:
| _> I don 't think it's really any more difficult to write
| a simple protocol that uses binary data compared to
| text._
|
| I don't really think so either... I mean, I've done both
| and it's really not terrible to use binary. I think text
| is marginally easier to parse, but once to have the
| routines to read the right endian-ness, the advantage is
| minor. As you said, the biggest concern (as always)
| should be the design. A good design can be implemented
| easily with either mode.
|
| However, it is significantly easier to debug a text
| protocol. Attaching a monitor or capturing packets is
| easier with text as the parsers are much easier and more
| generic.
| michaelmior wrote:
| That's fair. Although tools like Wireshark have made this
| much better.
| kijin wrote:
| HTTP is serious about backward compatibility, so a client that
| speaks HTTP/1.1 (or HTTP/1.0 + Host header) can still talk to
| most servers out there.
|
| Similarly, if you write a server that only speaks HTTP/1.1 (or
| HTTP/1.0 + Host header), you can put it behind a reverse proxy
| or load balancer that handles higher versions, does connection
| management, and terminates TLS. It will work perfectly fine,
| only without some of the latest performance optimizations that
| you might or might not even need.
| chippiewill wrote:
| > you can put it behind a reverse proxy or load balancer that
| handles higher versions, does connection management, and
| terminates TLS
|
| This is even standard practice in many production
| deployments. Typically you want the proxy or load balancer
| anyway and there's often little benefit (if any) to using
| HTTP/2 or HTTP/3 over a very low-latency, high reliability
| local network.
| bawolff wrote:
| That seems like a high risk of http desync attacks, if you're
| only implementing a subset of http/1.1.
| unixhero wrote:
| How would one attack a static page?
| bawolff wrote:
| Who said anything about static? This webserver supports
| cgi, which means it supports php, perl, etc.
|
| However even if it didn't, js based client-side apps are
| probably still attackable in the right set of
| circumstances.
| ric2b wrote:
| Why would you even need a reverse proxy for a static
| page? This isn't about that.
| unixhero wrote:
| I see. Thanks!
| TazeTSchnitzel wrote:
| Once you have to support HTTP/1.1 it's not really a text-based
| protocol, because support for chunked encoding is mandatory for
| clients. (Yes the chunk headers are technically text, but it's
| interspersed with arbitrary binary data. It's not something you
| can easily read in a text editor.)
| mbreese wrote:
| HTTP/1 requests for binary files also aren't text based under
| that criterion. The protocol itself is text based, but the
| payload(s) can be binary. That's doesn't change between
| HTTP/1 and /1.1.
| textmode wrote:
| I make most HTTP requests using netcat or similar tcp clients
| so I write filters that read from stdin. Reading text files
| with the chunk sizes in hex interspersed is generally easy.
| Sometimes I do not even bother to remove the chunk sizes.
| Where it becomes an issue is when it breaks URLs. Here is a
| simple chunked transfer decoder that reads from stdin and
| removes the chunk sizes. flex -8iCrfa <<eof
| int fileno (FILE *); xa "\15"|"\12" xb "\15\12"
| %option noyywrap nounput noinput %%
| ^[A-Fa-f0-9]+{xa} {xa}+[A-Fa-f0-9]+{xa}
| {xb}[A-Fa-f0-9]+{xb} %% int main(){
| yylex();exit(0);} eof cc -std=c89 -Wall
| -pipe lex.yy.c -static -o yy045
|
| Example
|
| Yahoo! serves chunked pages printf 'GET /
| HTTP/1.1\r\nHost: us.yahoo.com\r\nConnection:
| close\r\n\r\n'|openssl s_client -connect us.yahoo.com:443
| -ign_eof|./yy045
| lmilcin wrote:
| Binary protocols aren't (or at least don't have to be) any more
| difficult and frequently are even easier to implement.
|
| Text protocols have difficult problems like escaping or
| detecting the end of particular field that are frequent source
| of mistakes.
|
| The issue is that many (especially scripting) languages treat
| binary data as second class.
|
| The only real issue is that inspecting the binary message
| visually is little bit more difficult. You can usually easily
| tell if your data structure is broken if it is in text form.
| What I do is I usually have some simple converter that converts
| the binary message to text form and this helps me inspect the
| message easily.
| daitangio wrote:
| Yes and no. Debugging a binary protocol is a bit more
| difficult then a text one; which is the reason HTTP is ascii
| based.
| squarefoot wrote:
| It depends from the nature of the payload. If I have to
| send text, I'll use a text based protocol because I can
| debug it more easily, but if I have to send binary
| information, I'd rather send it using well defined
| structures after taking into account word sizes, alignment,
| endianess etc. on the involved hardware. Back in the day I
| had to do that with small agents on different achitectures
| (IBM Power and x86) with different languages (C and Object
| Pascal), and the correct textbook way to do that was to use
| xml, but that way everything had to be converted to text,
| interpreted and then translated back for replies. No way,
| hardware was slow and we aimed at speed, so we used plain
| C/Pascal structures and no other manipulation except for
| adjusting type sizes and endianess if the hardware or
| language required so to make all modules compatible no
| matter what they were written in and where they would run.
| I also built my own tools for testing and debugging which
| were nothing more than agents that sent predefined packets
| and echoed at screen the return values, so that any error
| in the data would be quickly spotted, while a xml
| translation of a corrupt buffer could have missed any error
| not comprised in the marked text, hence hiding there was a
| problem somewhere. A solution like that is highly debatable
| for sure, but I'm among the ones who want their software to
| crash loudly at the first problem, not try hiding it until
| it creates a bigger damage.
| Iwan-Zotow wrote:
| > which is the reason HTTP is ascii based.
|
| well, other side of HTTP was telnet in the terminal - this
| is why it is ASCII to begin with
| tdeck wrote:
| This is not true. HTTP was developed for the first
| (graphical) web browser, WorldWideWeb. It was
| specifically intended to serve HTML and images, neither
| of which is particularly well suited for browsing with
| telnet.
| zaarn wrote:
| The reason HTTP is ASCII based isn't because it's easier to
| debug. It's because back then the other end was as likely
| to be a human as a piece of software. Because HTTP in it's
| early days barely had headers or even formatting, so people
| typed "GET /" at the server directly or used the same
| method to send mail directly.
|
| Nobody does that anymore and debugging is easily solved by
| converting your binary protocol to a textual form.
| choeger wrote:
| I don't think this is about human readability (albeit it
| might be to some extent during the design phase).
| Instead, I think this was the sensible thing to do when
| byte order was way more variable across a network. If you
| only send single bytes to begin with, you can as well use
| a textual format, IMO.
| throw0101a wrote:
| I question the ascertain that humans were doing "GET /"
| or "HELO my.fq.dn" on any regular basis. There were mail
| clients from the very beginning for example:
|
| * https://en.wikipedia.org/wiki/History_of_email
|
| Perusing document-based information was also done via
| clients, first with Gopher and then with the WWW: either
| GUIs like Mosaic, or on the CLI via (e.g.) Lynx.
| michaelmior wrote:
| I definitely did back in the day :) But I'm only one
| single human.
| lmilcin wrote:
| I did both send mails over SMTP as well as downloading
| files from FTP using Telnet.
| jbverschoor wrote:
| same
| zaarn wrote:
| I would question if this weren't the case often enough
| that people would demand it be compatible with their
| teleprompter.
| mucholove wrote:
| I did implement a webserver (that supports websocket-a binary
| protocol) and also implemented a `make_binary_string`
| function along the wah so as to not lose my bearings. Printf
| is 100% amazing and I generally leave my printf statements
| inside all my code because I can switch them on and off very
| easily for any individual function by using macros. Also
| clutch :)
| chrismorgan wrote:
| And capitalisation issues, and encodings... give me a binary
| protocol to parse any time. They're normally comparatively
| well-defined, whereas text protocols are seldom properly
| defined and so have undefined behaviour left, right and
| centre, which inevitably leads to security bugs--or even if
| they _are_ well-defined, they're probably done in a way that
| makes your language's string type unsuitable, and makes
| parsing more complicated.
| pjc50 wrote:
| People say this, but then there's ASN.1 which has had
| critical security bugs in: https://cve.mitre.org/cgi-
| bin/cvekey.cgi?keyword=asn.1
|
| I will agree that exhaustively defining text protocols is
| _extremely_ hard, starting from character set / encoding
| and getting worse from there.
| admax88q wrote:
| I guess if you're going to pick the absolute worst
| example of a binary protocol.
| lmilcin wrote:
| That is absolutely not fair.
|
| BER-TLV is really nice protocol. I have worked with it
| for couple of years when I worked on an EMV application.
| It uses BER-TLV to communicate with the credit card but
| it is also very convenient format for all sorts of other
| uses and I would use it wherever I could. Think of it as
| Json but in binary form. It is not complicated and I
| would not even bother parsing the messages -- I could
| interpret hex dumps of them by sight very easily.
| lmilcin wrote:
| ASN.1 is not a binary protocol. It is a language to
| describe messages.
|
| Typically you create message description which is then
| compiled to code that can serialize/deserialize messages
| in BER-TLV or PER-TLV.
|
| I know because I wrote a complete parser/serializer for
| BER-TLV. It is simple protocol and any security issue is
| in the parser/serializer and not the protocol itself.
| That for simple reason that the protocol is nothing more
| than a format to serialize/deserialize the data.
| touisteur wrote:
| Just use recordflux for binary parser proved absent of
| runtime errors... ;-)
| einpoklum wrote:
| Can you expand on that a little?
| michaelmior wrote:
| RecordFlux[0] is a DSL written in Ada for specifying
| messages in a binary protocol. Code for parsing these
| messages is then generated automatically with a number of
| useful properties automatically proven including that no
| runtime errors will occur.
|
| [0] https://github.com/Componolit/RecordFlux
| scruple wrote:
| > give me a binary protocol to parse any time
|
| I don't disagree in principle but I've come across a
| handful of _very poorly_ documentated binary protocols in
| my years and that is an extremely painful thing to deal
| with compared to text-based protocols.
| jrockway wrote:
| I think it's pretty easy to make poor use of HTTP to the
| same end. Imagine you're dumping traffic between a client
| and a server and the exchange is: GET
| /asdfasdfasdfdsaf X-Asdf-Asdf: 83e7234
| HTTP/1.0 202 X-83e7233: 1 X-83f730b: 4
|
| It's text, but you still have no idea what's going on.
|
| Overall, I think it's kind of a wash. It's basically
| equally easy to take a documented text or binary protocol
| and write a working parser. Neither format solves any
| intrinsic problems -- malicious input has to be accounted
| for, you have to write fuzz tests, you have to deal with
| broken producers. It's a wash.
|
| People like text because they can type it into telnet and
| see something work, which is kind of cool, but probably
| not a productive use of time. I can type HTTP messages,
| but use curl anyway. (SMTP was always my favorite. "HELO
| there / MAIL FROM foo / RCPT TO bar / email body / ."
| Felt like a real conversation going on. Still not sure
| how to send an email that consists of a single dot on its
| own line though.)
| taejo wrote:
| > Still not sure how to send an email that consists of a
| single dot on its own line though.
|
| A line starting with a period is escaped by adding an
| extra period; the receiving side removes the first
| character of a line if it is a period.
| knome wrote:
| >Still not sure how to send an email that consists of a
| single dot on its own line though.
|
| It seems you prefix any line of data starting with a
| period with an additional period, to therefore
| distinguish it from the end of mail period.
|
| https://datatracker.ietf.org/doc/html/rfc5321#section-4.5
| .2
| chrismorgan wrote:
| Ugh, flashbacks. Now that you've mentioned that I feel
| the burning need to change my wording to add the
| condition "fairly described". I have also come across
| very poorly documented binary protocols!
| jchw wrote:
| They definitely don't have to be. I've echoed these
| sentiments before.
|
| However, HTTP/2 and HTTP/3 certainly _are_ , though the
| reasons why they are complicated have nothing to do with
| choosing to use a binary based format. (They are complicated
| for good reason, though, and I hope that browsers and servers
| can continue to support HTTP/1 as the baseline till the sun
| burns out, just to make life easier.)
| slver wrote:
| It's not a property of text-based protocols, rather it's a
| property of simple protocols. HTTP/1 is not merely text based,
| it's ASCII based (technically ISO-8859-1, which includes
| ASCII). One char, one byte, one encoding. HTTP itself is mostly
| very simple, text "name: value" pairs separated by newlines,
| followed by arbitrary content as the body.
|
| I think the solution is to start with a simple protocol and
| upgrade to more complex protocols after that. While technically
| you don't need to support HTTP/1 to support 2 and 3, the
| upgrade over TCP happens mostly in that way.
| chrismorgan wrote:
| > _While technically you don 't need to support HTTP/1 to
| support 2 and 3, the upgrade over TCP happens mostly in that
| way._
|
| This is not actually true of HTTP/2 (the upgrade doesn't use
| HTTP/1), and slightly misleading of HTTP/3 (there's nothing
| to upgrade because it sits _beside_ TCP HTTP; but
| _advertising_ HTTP /3 support is currently done over TCP
| HTTP).
|
| Now for the details:
|
| HTTP/2 upgrade is done at the TLS level, via ALPN.
| Essentially the client TLS handshake says "hello there! BTW
| do you do HTTP/2?" and the server either responds "hi!" and
| starts talking HTTP/1, or "hi! Let's talk h2!" and starts
| talking HTTP/2.
|
| So it's perfectly possible (though not a good idea) to have a
| fully-functioning HTTP/2 server that doesn't speak a lick of
| HTTP/1.
|
| (HTTP/2 over cleartext, h2c, does use the old HTTP/1 Upgrade
| header mechanism, but h2c is more or less just not used by
| anyone.)
|
| HTTP/3 upgrade, well, "upgrade" is the wrong word. It's
| operating over UDP rather than TCP, so you're not upgrading
| an existing HTTP thing to HTTP/3, you're starting a new
| connection because you've learned that the server supports
| HTTP/3. This bootstraping is currently done by advertising h3
| support via the Alt-Svc header (HTTP/1+) or by an ALTSVC
| frame (HTTP/2+), which the client can then remember so it
| uses the best protocol next time.
|
| For best results, they're working on allowing you to
| advertise HTTP/3 support on DNS, so that after a while it
| should be genuinely possible (though a very bad idea) to have
| a fully-functioning HTTP/3 server that doesn't speak any TCP
| at all, yet works with sufficiently recent browsers in
| network environments that don't break HTTP/3.
| https://blog.cloudflare.com/speeding-up-https-and-
| http-3-neg... is good info on this part.
| saba2008 wrote:
| HTTP/1 simplicity comes not from text nature, but from it's
| clear concept and limited functionality.
|
| At core, it's just request-reply + key-value metadata. Whenever
| it's text, or binary, it does not matter much. But writing
| HTTP/2 frame types in letters would not make them any easier to
| understand.
| ahartmetz wrote:
| There are also keep-alive, caching (a big topic), chunked
| transfer encoding, header parsing peculiarities, and
| authentication in HTTP. The combination of these creates some
| nice opportunities for implementation bugs. Source: have
| worked on a client-side implementation.
|
| Now, HTTP/2 isn't even conceptually simple, I agree about
| that... it seems ugly.
| jbverschoor wrote:
| That came with http/1.1 ;-)
| merb wrote:
| > implementation bugs. Source: have worked on a client-side
| implementation
|
| well because the hard part is the client-side. caching is a
| client-side only thing with http, keep-alive is a thing
| that a server pushes to a client, the same with chunked
| transfer, which is not as easy to implement for a client
| like it was with content length.
|
| basically a server does only need to implement certain
| headers, but a client needs to know all. also most clients
| even accept bad servers, like content for head requests,
| etc.. most stateless protocols put a lot of burden into
| clients.
|
| h2 on the other hand is stateful and keeps the same hard
| semantics onto the client side and also makes servers more
| complex, because it's a state machine.
| tokamak-teapot wrote:
| One good thing is that you don't have to support everything
| if you're writing a server. It just depends on what kind of
| use cases you want to support.
| secondcoming wrote:
| HTTP allows a TimedOut response to be sent by a server to
| request you haven't yet sent. So it's not strictly
| request/reply.
| saba2008 wrote:
| '100 Continue' would probably be better example of breaking
| request/reply flow, as it provides useful functionality and
| requires non-trivial implementation and compatibility
| measures.
|
| While '408 Request Timeout' is somewhat dubious fig leaf
| over TCP RST
| api wrote:
| Recording state info in static variables makes it only suitable
| for embedded.
|
| I really hate when people do that. Static should only ever be
| const, init once, or something intrinsically singleton. There
| are very few exceptions.
| jmaygarden wrote:
| Are you thinking of C++ issues with constructors? I don't see
| anything wrong with the use of static variables in this
| single file C program. This isn't a library; it's a
| standalone web server.
| api wrote:
| I am thinking of multithreading. Mutable static variables
| pretty much destroy any possibility of multithreading
| without a major refactor. Test harnessing is an issue too.
|
| But if this only ever wants to be an app binary, I guess
| it's sort of okay.
| strangeattractr wrote:
| I prefer binary protocols but I think you make a good point
| about HTTP/1 especially from a learning perspective. I remember
| how enlightening it was when I followed a tutorial to create a
| http server and found that all I needed to do (ignoring TCP/IP)
| was write 'HTTP/1.1 200 OK' plus a few headers and some raw
| html in a string and it would actually show up in my browser.
| geoffdunbar wrote:
| I love C, but it's pretty scary sometime. 5 minutes ago, "I
| wonder if I can find a potential memory overwrite in 5
| minutes?"
|
| Sure enough, the function StrAppend potentially overflows a
| size_t size (without checking), and then writes into memory
| could be past the end of the allocated buffer. Given 5 minutes,
| I didn't look thoroughly if this is actually exploitable, but
| it's definitely a red-flag for the code. Be careful out there!
| Hopefully I am missing something, or this is just a simple
| oversight, but I would carefully audit this code before using
| it.
|
| Submitted a ticket through the Althttpd website.
|
| static char _StrAppend(char_ zPrior, const char _zSep, const
| char_ zSrc){ char _zDest; size_t size; size_t n0, n1, n2;
| if( zSrc==0 ) return 0; if( zPrior==0 ) return
| StrDup(zSrc); n0 = strlen(zPrior); n1 =
| strlen(zSep); n2 = strlen(zSrc); size = n0+n1+n2+1;
| zDest = (char*)SafeMalloc( size ); memcpy(zDest, zPrior,
| n0); free(zPrior); memcpy(&zDest[n0],zSep,n1);
| memcpy(&zDest[n0+n1],zSrc,n2+1); return zDest; }
|
| _
| spacechild1 wrote:
| > Sure enough, the function StrAppend potentially overflows a
| size_t size
|
| How should this happen in practice? The three strings would
| have to be larger than the available address space...
| ectospheno wrote:
| Yeah. The function in question is called in only one place.
| It would seem you'd need to send the web server more than a
| size_t of data for this to be an issue.
| gallier2 wrote:
| In most places it uses int for string and buffer sizes
| lengths. It wouldn't surprize me if 2GiB of data could
| trigger several overflows.
| geoffdunbar wrote:
| Yes, absolutely. If the webserver is compiled 32-bit,
| that is only 4GB of data, which might be feasible? I
| don't know enough to say. Assuming a hacker kindly won't
| overflow your buffer is never a good idea.
|
| However, the presence of one piece of code that is not
| integer-overflow safe definitely makes me nervous. This
| is just the one I found in 5 minutes, what else is in
| there?
| thisgoodlife wrote:
| MAX_CONTENT_LENGTH is 250MB. You won't be able to send
| 4GB of data.
| acqq wrote:
| It's not an integer overflow that would be needed but an
| _unsigned_ overflow. The way I see it, on 32-bits, that
| means that the size HTTP request would have to be bigger
| than what 's available to both user application and the
| OS together. In short, one just can't get the input
| request that big. Of course, if you manage that, you'll
| disprove this claim.
| ectospheno wrote:
| None that stand out to me, including what you posted. Do
| you have a real example?
| acqq wrote:
| Exactly. In a single file C nobody can expect to get
| universal library functions that work in any possible
| imaginable context. The only relevant context is the code
| the function is in. And in that context, the function is
| doing enough.
| jackewiehose wrote:
| And there's only one call to StrAppend() which is easily
| verified as safe.
| TZubiri wrote:
| Really simple, I implemented a couple. Remember that the
| letters after the status code are aesthetic, so make sure to
| put your style in there:
|
| 200 FINE 200 KTHX 404 MISS 403 NO 500 FUCK
|
| Another similar server in one file is busybox httpd command if
| you are interested
| https://git.busybox.net/busybox/tree/networking/httpd.c
| Sohcahtoa82 wrote:
| I always wanted an HTTP response code for when the server
| detects a malicious request. Like, 430 DONT BE AN ASSHOLE
| andai wrote:
| For people who like text-based protocols (and dislike
| surveillance and web bloat), I suggest taking a look at Gemini,
| which was designed so you could write a client for it over the
| weekend.
|
| https://gemini.circumlunar.space/
| peterhil wrote:
| Gemini is quite interesting in that brings nostalgic feeling
| from when the web was a new thing, but is also modern.
|
| The Lagrange browser seems quite polished.
| cube00 wrote:
| > A separate process is started for each incoming connection, and
| that process is wholly focused on serving that one connection.
|
| It makes you wonder just how "heavy" operating system processes
| actually are. We may not need to worry about the complexity of
| trying to multiple run async requests in a single process/thread
| in all cases.
| wongarsu wrote:
| Apache's thread-per-connection model used to run basically the
| entire internet until nginx came along and demonstrated 10k
| simultaneous connections on a single server.
|
| If you only have around 100 concurrent confections, a separate
| thread per connection is entirely feasible. A whole new process
| is probably fine on Linux, but e.g. Windows takes pretty long
| to spawn a process
| jiofih wrote:
| You might want to review that knowledge. You can spawn a few
| thousand threads on a modern machine without much contention.
| merb wrote:
| well the thing is he confuses processes with threads.
| apache2-prefork used/uses processes and not threads.
| einpoklum wrote:
| If you have 100 concurrent confections you're probably baking
| cookies.
| TZubiri wrote:
| To be fair, I think it would be ideal if every service
| would run their own 100 concurrent connections, instead of
| everyone using a service that handles 1 trillion.
| wongarsu wrote:
| But you can probably get away with using one regular
| commercial oven instead of getting a baking tunnel oven :)
| minusf wrote:
| the default mode in apache/apache2 used to be _process_ per
| connection (pre forked workers) not thread per connection.
| threads came much later.
| frou_dh wrote:
| I like to remember that threading APIs were a later add-on
| after Unix had already existed for years. They're not
| fundamental like the Process is.
|
| Isn't it an appealing model to not even have to talk about
| threads because every process is 1 thread by definition?
| astrange wrote:
| Multiple processes are a bit easier to deal with than threads
| per servers - mostly because POSIX signals interacts poorly
| with threads.
|
| It's also more secure because you should not mix different
| users' requests in the same process if you can avoid it.
|
| nginx runs on a one process per core model and more or less
| does everything correctly.
| lrem wrote:
| Linux processes are actually surprisingly lightweight, thanks
| to using copy-on-write memoryu. I.e. fork is cheap, exec is
| expensive and changing things in already allocated memory is
| expensive. But this execution model does not do exec and mostly
| allocates new buffers for anything it needs to do. As an extra
| benefit, you get garbage collection as a gift from the kernel
| ;)
| tyingq wrote:
| Fork() on modern Linux is very fast/lightweight. This isn't
| true for all POSIXy operating systems though. This would
| perform really terribly, for example, on any Unix
| implementation from the early 2000s or before, and maybe on
| some current ones.
| cryptonector wrote:
| fork() is evil[0]. [0] https://gist.github.co
| m/nicowilliams/a8a07b0fc75df05f684c23c18d7db234
| secondcoming wrote:
| The downside is excessive context switching, and sharing data
| between processes becomes difficult (counters, etc).
| sydthrowaway wrote:
| So what was the point of the async/callback web programming
| revolution if processes were good enough?
| ori_b wrote:
| Millions of concurrent connections. On 20 year old hardware
| with a small fraction of the power of today's.
| falcolas wrote:
| To add on to every other sibling comment, switching threads
| or processes requires a trip back up to the kernel space (a
| context switch), instead of just remaining in user space. in
| this switch, all of your caches get busted.
|
| Not a problem for most folks, but when you want the greatest
| possible performance, you want to avoid these kinds of
| transitions. Basically, the same reason some folks use user-
| space networking stacks.
| jiofih wrote:
| Memory use. Even though threads / processes are "cheap" right
| now, it wasn't the case in the past and they are still quite
| far from the couple-of-kbs per connection needed in async
| servers. You're not getting a million paralel processes
| processing requests any time soon.
| emn13 wrote:
| There wasn't a lot of point. Almost nobody needs this; but
| since everybody wants to do what the hyper-successful mega-
| scalers are doing...
| TeMPOraL wrote:
| The point was that you didn't have the ability to spawn new
| threads _at all_. async lets you pretend you have threads, at
| the cost of everything being run through a hidden event loop.
| cube00 wrote:
| I thought it was to get the best of both worlds in that you
| could max out a core with async to avoid context switching
| or waiting for blocked I/O but you still open up additional
| threads on more cores if you were becoming CPU bound.
| merb wrote:
| yeah, basically. a lot of people do not know, but
| async/await does have NOTHING do to with threads or
| processes. you can use threads with async/await, but you
| do not. async/await basically means you are running your
| "green threads"/tasks/promises on a event loop, the event
| loop can be either single threaded or it can run the
| stuff on multiple threads.
|
| a lot of people just did not get the difference between
| concurrency vs. parallelism. threads and processes are
| basically parallelism, while concurrency is async
| programming. good talk about that stuff from rob pike
| (go): https://www.youtube.com/watch?v=oV9rvDllKEg
| rakoo wrote:
| It's because they weren't good enough for the thousands of
| concurrent requests of C10k or the millions that came after
| it with C10m.
|
| Granted, 10k concurrent requests is a problem for the 1% of
| websites, so processes were (and still are) good enough for
| the long tail of personal or small-scale websites.
| ecmascript wrote:
| I'm all for SQLite and I am a fan of the author of the project
| but for a webserver I have turned my back away from Nginx for
| https://caddyserver.com/ because of the simplicity.
|
| Caddy is just really awesome as a reverse proxy (2 line config!!)
| and I am in the processes of moving all my projects to it. It is
| fast enough as well since other things will be the bottle neck
| way before that.
|
| I am not affiliated with Caddy in any way, just blown away by the
| quality of it.
| terminalserver wrote:
| The thing I love most about caddy is it automatically does all
| the ssl certificate garbage which is so painful in every other
| web server ever. Yes certbot makes it less painful but it's
| still a big PITA, unlike caddy where SSL is just like magic.
| KronisLV wrote:
| The only thing I don't really know how to do with it is round
| robin DNS for many servers with LetsEncrypt HTTPS.
|
| It feels like then I'd probably need either shared storage
| for the certificate files (which goes against the idea of
| decentralization somewhat) or to use a DNS challenge type.
|
| Anyone have experience with something like that?
| francislavoie wrote:
| Shared storage is the solution. Caddy supports multiple
| different storage backends (filesystem by default, and
| Redis, Consul, DynamoDB via plugins) and uses the storage
| to write locks so that one instance of Caddy can initiate
| the ACME order, and another can solve the challenge. See
| the docs: https://caddyserver.com/docs/automatic-
| https#storage
|
| I'm doing this exact thing, with the Redis plugin behind
| DNSRR and it works seamlessly.
| xupybd wrote:
| And zero dependencies!
|
| This might solve my problem with older servers that no longer
| support the latest SSL.
|
| I really need to upgrade those rickety old machines.
| petre wrote:
| You mean zero runtime deps because it pulls a lot of stuff
| when it does get built. Still great but I'd use traefik for
| more than 10 sites.
| francislavoie wrote:
| Caddy can serve thousands of sites without a sweat. What
| are your concerns exactly?
| petre wrote:
| The webui based config helps for lots of sites and my
| clients can do it themselves without bothering me.
| squiggleblaz wrote:
| I suppose you mean zero runtime dependencies? It seems to
| have few dozen build dependencies.
|
| Runtime dependencies create a nuisance as you have to
| update several things together. On the other hand, they can
| allow components with separate update cycles and
| responsibilities to be update separately.
|
| Build dependencies create maintainability and security
| problems. They can also solve maintainability and security
| problems. It depends on what your consideration is. But as
| a matter of practice, many developers seem too concerned
| with possible behavioral/API breakage, that they like to
| pin to specific versions of their dependencies, which now
| means that you aren't getting any security fixes.
|
| (Technically, Althttpd doesn't achieve zero runtime
| dependencies in comparison to a modern http server that
| does HTTPS, because it requires a separate program to
| terminate TLS. But these connect through general mechanisms
| that are much easier to combine and update separately.)
|
| Everyone has to make a judgement about how they maintain
| their own systems, but being excited about "zero (runtime)
| dependencies!" isn't the way the judgement concludes.
| Winsaucerer wrote:
| I fiddled around for many hours with traefik, and could not get
| it to do what I wanted -- something I'd done before and had a
| known example working config of.
|
| 10 minutes of caddy, I had everything running exactly as I
| wanted and the job was done.
| GordonS wrote:
| I tried Traefik for the first time around 6 months ago
| (version 2) - man, coming from nginx (which I wouldn't call
| simple), I found Traefik config to be _really_ confusing. It
| felt like I had to specify the same stuff 2 or 3 times, and
| in general it was just so unintuitive. And the docs (at the
| time at least) only showed snippets of trivial examples.
|
| I don't think I'd choose to use it again. Instead, I'll try
| Caddy, or HAProxy if I need massive performance.
| eatonphil wrote:
| A big missing feature in Caddy for me is an embedded language
| like Lua for nginx so you can write tiny hooks. The Caddy
| authors have indicated on HN a while ago that Caddy 2 may have
| an embedded scripting language but I can't find anything about
| it in their docs.
| coder543 wrote:
| Seems like it was postponed[0].
|
| For _very_ tiny hooks, you might be able to get away with
| using request matchers[1] and respond[2].
|
| [0]: https://caddy.community/t/missing-starlark-
| documentation/958...
|
| [1]: https://caddyserver.com/docs/caddyfile/matchers
|
| [2]:
| https://caddyserver.com/docs/caddyfile/directives/respond
| francislavoie wrote:
| Writing plugins for Caddy is so easy that it's generally not
| necessary to have scripting built-in. You can just build
| yourself a new binary with a Go plugin just by using the
| "xcaddy" build tool. https://caddyserver.com/docs/extending-
| caddy
|
| But yeah, it's still something at the back of our minds, and
| we were considering Starlark for this, but that hasn't really
| materialized because it's usually easier to just go with the
| plugin route.
| terminalserver wrote:
| Interesting but why? There's a bazillion web servers out there,
| surely one of them can do the job?
|
| What have I missed?
| TZubiri wrote:
| Simplicity
| squiggleblaz wrote:
| I don't know of any other well-known web server with the same
| featureset. For instance, it has no configuration file, it's
| run from xinetd statelessly/single-threaded, it runs itself in
| a chroot and it's short enough to be readable without specific
| effort.
|
| It also isn't brand new: it's been around since 2004. So that
| probably narrows the range of possible competitors even more.
|
| If you can find a webserver that meets all of those
| constraints, please let us know.
| rkeene2 wrote:
| filed [0] is written to be readable, and stateless, and runs
| from a chroot, and has no configuration file. It doesn't run
| from xinetd and it's multi-threaded, though.
|
| I wrote it because no other web server could serve files fast
| enough on my system (not lighttpd, not nginx, not Apache
| httpd, not thttpd) to keep movies from buffering.
|
| [0] https://filed.rkeene.org/
| e12e wrote:
| > I wrote it because no other web server could serve files
| fast enough on my system (not lighttpd, not nginx, not
| Apache httpd, not thttpd) to keep movies from buffering.
|
| Could you expand on that? What type of files, how many
| clients? I seem to recall plain apache2 from spinning rust
| streaming fine to vlc over lan - but last time I did that
| was before HD was much of a thing... Now I seem to stream
| 4k hdr over ZeroTierOne over the Internet to my Nvidia
| Shield via just DLNA/UpNP (still to vlc) just fine. But I'm
| considering moving to caddy and/or http/webdav - as a
| reasonable web server with support for range request seem
| to handle skipping in the stream much better.
| rkeene2 wrote:
| You might want to try filed !
|
| This was for serving MPEG4-TS files with, IIRC, H.264
| video and MPEG-III audio streams -- nothing fancy -- from
| a server running a container living on a disk attached
| via USB/1.1.
|
| While USB/1.1 has enough bandwidth to stream the video,
| the other HTTP servers were too slow with Range requests,
| because they would do things like wait for logs to
| complete and open the file to serve (which is synchronous
| and requires walking the slow disk tree).
| e12e wrote:
| > a server running a container living on a disk attached
| via USB/1.1.
|
| Ah, ok. That makes sense. USB 1.1 can certainly challenge
| cache layers and software assumptions.
|
| I do wonder how far apache2 might have been pushed,
| dropping logs and adjusting proxy/cache settings.
| rkeene2 wrote:
| I do not know, though I do know that just disabling
| logging wasn't sufficient.
| mro_name wrote:
| dependency awareness. External stuff bears surprises. Not
| everybody feels comfortable with that.
| terminalserver wrote:
| There's plenty of lightweight minimal web servers with
| minimal dependencies.
|
| But hang on is dependency anxiety really the reason or did
| you just make that up?
| nix23 wrote:
| >There's plenty of lightweight minimal web servers with
| minimal dependencies.
|
| In 2001?
| mro_name wrote:
| no, not made up - selfmade is even less dependencies than a
| 3rd party without other deps. And minimalis more than zero,
| still.
| fraktl wrote:
| You missed the date in the comment of the C file: 2001-09-15
|
| Back then, there weren't bazillion web servers out there. A
| patchy server was still.. patchy. And engine that solves
| problem X (c10k) was not created yet :)
|
| (for whoever reads my comment, I am referring to Apache and
| nginx)
| OJFord wrote:
| > [Althttpd ...] has run the https://sqlite.org/ website since
| 2004
|
| I don't know what the landscape was like in 2004 really, but
| probably at least an order of magnitude less than today's
| bazillion (whatever that would be!).
| aembleton wrote:
| I don't think Nginx was out then, so I was using Apache
| HTTPD. Maybe Dwayne considered that too heavy for what he
| needed to serve up.
| c17r wrote:
| Nginx's first public release was Oct 2004, so fits with
| your theory.
| unwind wrote:
| Nginx was first released to the public in 2004 [1]. Apache
| was released in 1995 [2].
|
| On a more personal note, wow! I had no idea I started using
| the Internet for realz _before_ the release of Apache, in
| 1994. This young made me feel, not.
|
| [1]: https://en.wikipedia.org/wiki/Nginx
|
| [2]: https://en.wikipedia.org/wiki/Apache_HTTP_Server
| mro_name wrote:
| > separate process is started for each incoming connection
|
| wow. Thanks also for elaborating the xinetd & stunnel4 configs.
| andrewmcwatters wrote:
| I have a great appreciation for D. Richard Hipp's work.
| ** May you do good and not evil. ** May you find
| forgiveness for yourself and forgive others. ** May
| you share freely, never taking more than you give.
| abriosi wrote:
| It takes years of complexity to achieve this level of simplicity
| geocrasher wrote:
| I have yet to try running this for anything, but I do appreciate
| how it really sticks to the "do one thing well" ethos. Modern web
| servers can be extremely complicated with a lot of moving parts.
| This boils it down to just one thing and lets a person focus on
| the project instead of the infrastructure. Granted, it's very
| simplistic, but that's its strength.
| tyingq wrote:
| I do respect the technical chops around sqlite. However, I
| think a "fork for every single http request" server isn't
| really useful in many situations.
|
| That the sqlite website is able to run this way is more a
| testament to Linux's work on a lightweight/fast fork() than
| anything else. This would perform terribly on a more
| traditional Unix.
| vidarh wrote:
| I ran a webmail service with 2m users that forked _and_ exec
| 'd a CGI for every request 20 years ago. 20 year old hardware
| was already fast enough that we were usually IO bound on the
| storage backend rather than constrained by the (much cheaper)
| frontends.
|
| Forking for every request is slow, sure.
|
| But if your code is written with it in mind it's faster than
| most people might expect, and most people never get to a
| scale where it matters.
|
| It's not the right choice for everything, but people have
| ironically gotten obsessed with things we introduced a long
| time ago as workarounds for slow hardware (and fork used to
| be slow on Linux too) decades after the original problems
| were largely solved.
|
| I do agree there are times this won't be useful, though.
| tyingq wrote:
| _" I ran a webmail service with 2m users that forked and
| exec'd a CGI for every request"_
|
| Yes, but that met expectations of that time period, and
| expectations for a webmail service. I'm curious if you also
| forked for every static asset...that's what this setup
| appears to do.
|
| I just don't see the benefit of mysql choosing to use this
| today. It works, but there are other minimal http servers
| that would be just as simple, but would be faster and use
| fewer resources. I suppose they don't need to change it,
| but it's not really a great example of anything other than
| "fork is cheap on linux" to me.
| vidarh wrote:
| Performance expectations were if anything for the most
| part tighter than what people tend to get away with
| today. People hadn't gotten used to slow dynamic sites
| yet.
|
| We didn't fork for for every static asset, but the vast
| majority of overall requests were dynamic past the
| initial pageload, so the vast majority of requests
| resulted in fork.
|
| In terms of benefits, the simplicity is attractive. It's
| an approach that is in general quite resilient to errors.
| cogburnd02 wrote:
| What are your thoughts on darkhttpd?
|
| https://unix4lyfe.org/darkhttpd/
| yawaramin wrote:
| But pretty much nobody is is running a more traditional Unix
| nowadays. Almost everyone uses Linux for web servers. So
| let's judge the tool based on its actual context, not on an
| unrealistic one.
| tyingq wrote:
| I'm saying it's not terribly interesting or broadly useful,
| unlike the rest of sqlite, which is. There are other
| minimal http servers that are vastly more efficient without
| being much more complicated.
| [deleted]
| fortran77 wrote:
| See Jef's original thttpd
|
| https://acme.com/software/thttpd/
| skywal_l wrote:
| I would like to see something like:
|
| althttpd -exec some_executable {}.method {}.body
|
| So you could quickly call executable from a browser and redirect
| the output in the response.
| 0xdeadbeefbabe wrote:
| That's a good idea. Don't know why you are being modded down.
| xrstf wrote:
| You mean CGI[1]?
|
| [1] https://www.wikiwand.com/en/Common_Gateway_Interface
| pjc50 wrote:
| Isn't this just CGI again?
| pragma63 wrote:
| Always has been.
| notRobot wrote:
| Also see: darkhttpd: https://github.com/emikulic/darkhttpd
| chasil wrote:
| Similar is thttpd.
|
| I was not familiar with darkhttpd. Both of these are similar to
| the sqlite server in security design (chroot capability), but
| unlike in that a single process serves all requests and does
| not fork.
|
| I have used stunnel in front of thttpd, and chrome has no
| complaints.
|
| https://acme.com/software/thttpd/
| mro_name wrote:
| althttpd has CGI, sometimes interesting.
| chasil wrote:
| The thttpd server also has CGI. I wrote about combining it
| with stunnel. The rfc-1867 tool is overly dated, and I've
| replaced it for my internal use:
|
| https://www.linuxjournal.com/content/secure-file-transfer
| 0xbadcafebee wrote:
| > As of 2018, the althttpd instance for sqlite.org answers about
| 500,000 HTTP requests per day (about 5 or 6 per second)
| delivering about 50GB of content per day (about 4.6
| megabits/second) on a $40/month Linode. The load average on this
| machine normally stays around 0.1 or 0.2
|
| Interesting. If the load avg is consistently low, it could mean
| they're over-paying for CPU. If this was a non-dedicated AWS
| instance you might want low load so you don't chew up CPU
| credits, but you'd also want to use an instance type that creates
| _some_ load so you 're utilizing what you're paying for. Linode
| VPCs don't use cpu credits so the calculation is a bit simpler.
| I'm also curious how much of that bandwidth couldn't be offset by
| a CDN or mirrors.
|
| If you were using a serverless platform, you'd ideally want to
| use something like static site hosting feature where you're
| mostly just paying for storage and egress. Or a serverless
| application platform to auto-scale traffic as needed. The main
| problem with doing this, of course, is the cost of egress: cloud
| providers with fancy serverless platforms often have redonkulous
| egress costs, so using a plain old VM on a VPC provider can be
| cheaper if you have more bandwidth demands than compute.
|
| (I am aware none of this is a concern if you'd rather just spend
| $40 and forget about it. I am a nerd.)
| GordonS wrote:
| > I'm also curious how much of that bandwidth couldn't be
| offset by a CDN or mirrors
|
| As you say, at $40/m it's academic for a lot of people, but
| AFAIK, the whole site is static, so presumably if you put it
| behind Cloudflare's free tier it would serve all but the file
| downloads from the edge. A pure guess, but I'd imagine that
| would mean serving 75% of requests from the edge.
| SQLite wrote:
| Look again. The entire Althttpd website is 100% dynamic.
| Notice that the hyperlink at the very top of this HN article
| is to a Markdown file (althttpd.md). A CGI runs to convert
| this into HTML for your web-browser.
|
| The core SQLite website has a lot of static content, but
| there are dynamic elements, such as Search
| (https://www.sqlite.org/search?s=d&q=sqlite) and the source
| code repository
| (https://www.sqlite.org/src/timeline?n=100&y=ci).
|
| So far today, 23.48% of HTTP requests to the sqlite.org
| domain are for dynamic content, according to server logs.
| SQLite wrote:
| Linode doesn't follow the cafeteria pricing style. You buy a
| package. $40/month is the minimum for us to get the disk space
| and I/O bandwidth we need. We could get by with less CPU,
| perhaps, but the extra memory and extra cores do reduce latency
| and they are nice to have on days when SQLite is a top story at
| HN. (Load avg has been running at about 0.95% all day today.)
| callumprentice wrote:
| I use SQLite in applications via their C/C++ interface and it's
| spectacular. I'd love to see a version of this that I could embed
| too.
| foobar33333 wrote:
| I'd be putting this in 1000 layers of sandboxing since its a C
| program with network access.
| petee wrote:
| I hate to tell you but half the internet runs on C. You can't
| type ".com" without passing through C code
| dvfjsdhgfv wrote:
| Maybe I'm getting old but these days I'm having a hard time
| telling if comments like these are serious or sarcastic.
| foobar33333 wrote:
| It's both. Of course we have to rely on C now because most
| stuff is C but this is not an ideal situation and C shouldn't
| be used for new software.
| foo_barrio wrote:
| It's definitely not you. There is a general term for this
| called "Poe's Law" that says something like: "a sufficiently
| thorough parody is indistinguishable from the original". That
| GP might be an example of this.
| mro_name wrote:
| how many bugs may the 1000 layers bring. If each layer has 3
| LOC it's more than the webserver in the first place.
| Cthulhu_ wrote:
| Your point being? It's been serving sqlite.org just fine for
| all this time. You seem to be making some pretty big
| assumptions about security here without actually explaining
| what your specific concerns are.
| foxes wrote:
| I think the original comment is partially a joke. Maybe there
| isn't specifically anything wrong with this, but there is the
| fact that it is written in C. Historically there is a good
| precedent of this being an issue. C does not guarantee
| correctness to the same level as more modern languages.
|
| If it was written in Haskell or Rust for example you could be
| more sure about correctness. I believe for something like
| this, correctness is fairly important. Not to mention you
| probably wont even lose speed. As for if the code is
| understandable, it is a 2600 lines of terse-ish C [0]. Do you
| really think about the entire blob at the same time?
|
| [0] https://sqlite.org/althttpd/file?name=althttpd.c
| kahlonel wrote:
| People need to stop assuming that memory safety ==
| functional safety. They are two very different things. Rust
| ensures memory safety, but it won't stop you from making
| logical mistakes. You can't be "sure about correctness"
| unless proven mathematically.
| creatonez wrote:
| Does it being written by the Sqlite devs not improve the
| outlook for you?
| znpy wrote:
| if you run gnu/linux and are worried by C code running... I
| have bad news for you.
| ttt0 wrote:
| His operating system is probably GNU/Docker.
| foobar33333 wrote:
| Not far off it. I run fedora silverblue with is flatpak and
| podman
| foobar33333 wrote:
| The google security team is working on fixing this with rust.
| atatatat wrote:
| Indirectly, they're also pushing for "fixing" of Firefox
| with Rust (only 84%[?] to go!)
| layoutIfNeeded wrote:
| Hello Rust user! Great meme you have there!
| spyke112 wrote:
| Sure it lives a single file, but considering the length, wouldn't
| it actually be better to split it into multiple files? I'm not
| that familiar with C, it seems to be a "thing" in C to just have
| giant files.
| dmux wrote:
| I've been thinking about this a lot recently. For most of my
| career I've been a Java programmer and for the majority of
| projects, each class is put in its own file. The amount of
| jumping (between files / following method calls) can get really
| tedious when you're trying to grok a code base. I've been
| working on Typescript projects recently where the standard has
| been to have slightly larger files -- possibly containing a
| class definition, but more often it's an entire module -- and
| it's actually been kind of nice to just read the entire thing
| top to bottom in one go. I've looked for studies on "locality"
| of source code, but haven't really found anything.
| GordonS wrote:
| Yes, it can be really jarring to have to constantly move
| between several small files.
|
| I mostly use C#, and a while back I settled on a middle
| ground, where closely-related classes and interfaces are
| grouped together in a single file.
|
| When I'm working on web apps/APIs, I usually follow the
| "feature folder" concept too, where all the most central
| parts are together in the same file.
| millerm wrote:
| SQLite is actually maintained in many files, but they are
| concatenated into one file for distribution. Here is the
| reasoning: https://sqlite.org/amalgamation.html
| Tepix wrote:
| Althttpd is less complex than other HTTPDs because it doesn't
| support encryption by itself and instead recommends to use
| stunnel4.
| thesnide wrote:
| Isn't that the same philosophy behind varnish?
| vmsp wrote:
| Blocks some referers by default: static const
| char *azDisallow[] = { "skidrowcrack.com",
| "hoshiyuugi.tistory.com", "skidrowgames.net",
| };
|
| Anyone know why?
| banana_giraffe wrote:
| There's also this: }else if(
| strcasecmp(zFieldName,"Referer:")==0 ){ zReferer =
| StrDup(zVal); if( strstr(zVal, "devids.net/")!=0 ){
| zReferer = "devids.net.smut"; Forbidden(230); /*
| LOG: Referrer is devids.net */ }
|
| Which I can appreciate why it's there, it's still odd.
| Hypergraphe wrote:
| It is located in a #ifdef 0 block. So my guess is it might be
| for testing purpose.
| mikkelam wrote:
| I remember seeing these crackers around back in the days.
| Skidrow at least cracted some grand theft auto games.
|
| Not sure why they're blocked
| lnxg33k1 wrote:
| One file of 2600 lines, guess if you give up common sense and
| follow this approach you can even create a kernel in one file
| mro_name wrote:
| have you ever looked at the source of sqlite?
| littlecranky67 wrote:
| AFAIK FreeRTOS is a kernel and distributed as a single file.
| nneonneo wrote:
| Maybe that was true at some point, but it's no longer true.
| FreeRTOS consists of a core of 6-7 .c files (some of which
| may be optional) plus 2-3 board support source files. See
| their GitHub mirror: https://github.com/FreeRTOS/FreeRTOS-
| Kernel
| Cthulhu_ wrote:
| Whose common sense though? You seem to have one particular
| opinion on how to organize a project, but that's not the only
| one. Sometimes, having just one file is easier.
| mhd wrote:
| For parsing/protocol handling the implementations where this
| was distributed amongst several files and/or classes have
| usually been the worst, in my experience.
|
| But now I want to see how badly you could Uncle Bob this thing.
| My screen should be wide enough for the resulting function
| names.
| alex_smart wrote:
| I am not sure exactly where you were trying to go with this,
| but 2600 lines is actually pretty short for a C program.
| brobdingnagians wrote:
| Depends on your development style and aims.
|
| One file is easy to add into a project, and the compiler
| optimizes translation units better, so you get a bit of a
| performance increase in some cases.
|
| Having "Find symbol in file" is nice too if you know you are
| looking for it just in this one file related to the code. Most
| editors aren't as ergonomic for finding "symbol in current
| directory" as they are for "symbol in current file".
| iveqy wrote:
| I thought forking webservers was too slow?
| layoutIfNeeded wrote:
| Too slow for what?
| dspillett wrote:
| It depends how complex a process you are forking and what OS
| you are running on. It has been some time since I wrote any
| real code for Linux or Windows that would be significantly
| affected by such things, but it used to be that forking could
| be almost as efficient under Linux as starting a new thread in
| an existing process. Under Windows this was very much not the
| case. Threads make communication between parts easier (no IPC
| needed) but that isn't usually an issue for the work a web
| server does.
|
| Comparing to purely event based web servers forking can still
| be better as no request should fully block another, or usually
| crash another (which is more likely with threads), and thread
| or fork based servers can make better use of concurrency which
| is significant for CPU heavy jobs.
|
| So swings & roundabouts. Each type (event, thread, process,
| some hybrid of the above) has strengths, and of course
| weaknesses.
| cesarb wrote:
| > but it used to be that forking could be almost as efficient
| under Linux as starting a new thread in an existing process.
|
| That's because internally it's nearly the same thing. Both
| forking and starting a new thread on Linux is a variant of
| the clone() system call, the only difference being which
| things are shared between parent and child.
| ankurpatel wrote:
| Performance is really bad. This is good for running a small HTTP
| server on an embedded device but if plan is to use it for HTTP
| server to serve production web traffic performance is really bad.
| Below is report of running the server and hitting a minimal
| index.html page and hitting it with artillery.
|
| All virtual users finished Summary report @ 09:39:57(-0400)
| 2021-06-08 Scenarios launched: 33645 Scenarios completed: 2573
| Requests completed: 2573 Mean response/sec: 42.57 Response time
| (msec): min: 0 max: 9029 median: 2 p95: 6027.7 p99: 8778.8
| Scenario counts: Get index.html: 33645 (100%) Codes: 200: 2573
| Errors: ETIMEDOUT: 31008 EPIPE: 48 ECONNRESET: 16
| e12e wrote:
| > hitting it with artillery
|
| This?
|
| https://github.com/artilleryio/artillery
| 40four wrote:
| Fair enough, but when considering the reasons and decisions
| behind using this server from the developers, isn't your point
| kind of moot?
|
| It's not optimized for high 'performance'. It's optimized for
| low resource usage, and the ability to reliably serve large
| amounts of requests on a small budget, right?
|
| They state that the website is currently serving 500K requests
| & 50GB of bandwidth per day. Respectfully, this is quite the
| opposite of your 'only good for small embedded devices' claim.
|
| I think this is very interesting, and I'm glad I know this
| exists now! Worth considering if you have the right type of use
| case.
| wubawam wrote:
| That's not a lot of requests.
|
| My hobby website serves more traffic for a 1/4 of the cost
| and is easy to configure.
| krferriter wrote:
| And yet in years of using sqlite I have never once had a
| problem loading their website.
| dmux wrote:
| >but if plan is to use it for HTTP server to serve production
| web traffic performance is really bad.
|
| But it seems to be "good enough", no? As stated on the page, it
| serves 500k requests a day.
|
| Were you running your tests using xinetd or stunnel?
| SQLite wrote:
| Indeed. And a Citation biz-jet is way faster, flies higher,
| goes further, and carries more passengers than a Carbon Cub. On
| the other hand, the Citation costs more, burns more gas, takes
| more maintenance, and is more complex to fly, and you should
| not try to land a Citation on a sandbar in a remote Alaskan
| river.
|
| Choose the right tool for the job.
|
| Changing the https://sqlite.org/ website to run off of Nginx or
| Apache instead of alhttpd would just increase the time I spend
| on administration and configuration auditing.
| dsalzman wrote:
| Love the Carbon Cub reference. STOL!!
| jmercouris wrote:
| It is not clear whether you would spend more time on
| administration with another webserver. I don't have
| experience with your webserver, but mine are 'set it' and
| 'forget it' affairs.
| nine_k wrote:
| The design goal is not top performance here. It is simplicity,
| observability of the source, and security.
|
| It absolutely will fail under a DDoS-like punishing load which,
| say, nginx would have a chance to fend off.
|
| It's still plenty _adequate_ for many real-world configurations
| and load patterns, much like Apache 1.x has been. Only this is
| like 2% the size of the Apache 1.x.
| speg wrote:
| It serves sqlite.org just fine.
|
| Most people don't need FANG tools.
| 0xbadcafebee wrote:
| I'm not sure I would call Apache or Nginx "FANG tools" (or
| FAANG tools)
| zanethomas wrote:
| the beauty and power of simplicity
| 0xdeadbeefbabe wrote:
| Does z in zTmpNam or zProtocol signify global variable?
| https://sqlite.org/althttpd/file/althttpd.c
|
| Also, I'd like to complain about the hn hug of death, because it
| isn't happening.
| SQLite wrote:
| The "z" prefix is intended to denote a "zero-terminated
| string", or more specifically a pointer to a zero-terminated
| string.
| niutech wrote:
| There is also redbean (https://justine.lol/redbean/) - a single-
| file web server with embedded Lua interpreter as an Actually
| Portable Executable by Justine Tunney, the creator of
| Cosmopolitan.
| jedimastert wrote:
| Everytime I come across Justine's work, I'm always amazed.
| john-tells-all wrote:
| The idea you can have an executable that is incredibly small,
| _and_ runs on macOS /Linux/Windows, _and_ is very fast, _and_
| has features, is mind-blowing.
|
| Justine Tunney is a treasure!
| pstuart wrote:
| That is amazing. And it looks like she may embed SQLite in as
| well.
| paulclinger wrote:
| Already happened;
| https://github.com/jart/cosmopolitan/pull/185 (Add sqlite3
| support to Lua scripts in Redbean) has been merged.
___________________________________________________________________
(page generated 2021-06-08 23:00 UTC)