[HN Gopher] The art of high performance computing
___________________________________________________________________
The art of high performance computing
Author : rramadass
Score : 415 points
Date : 2023-12-30 14:01 UTC (8 hours ago)
(HTM) web link (theartofhpc.com)
(TXT) w3m dump (theartofhpc.com)
| mkoubaa wrote:
| UT Austin really is a fantastic institution for HPC and
| computational methods.
| bee_rider wrote:
| Every BLAS you want to use has at least some connection to UT
| Austin's TACC.
| mgaunard wrote:
| aren't the lapack people in tennessee?
| bee_rider wrote:
| Sort of like BLAS, LAPACK is more than just one
| implementation. Dongarra described what everybody should do
| from Tennesse, but other places implemented it elsewhere.
| victotronics wrote:
| Not quite. Every modern BLAS is (likely) based on Kazushige
| Goto's implementation, and he was indeed at TACC for a while.
| But probably the best open source implementation "BLIS" is
| from UT Austin, but not connected to TACC.
| bee_rider wrote:
| Oh really? I thought BLIS was from TACC. Oops, mea culpa.
| RhysU wrote:
| https://github.com/flame/blis/
|
| Field et al, recent winners of the James H. Wilkinson
| Prize for Numerical Software.
|
| Field and Goto both collaborated with Robert van de
| Geijn. Lots of TACC interaction in that broader team.
| davidthewatson wrote:
| I was asked to share a TA role on a graduate course in HPC a
| decade ago. I turned down the offer.
|
| After a cursory glance, I can honestly say that if this book were
| available then, I'd have taken the opportunity.
|
| The combination of what I perceive to be Knuth's framing of art,
| along with carpentry and the need to be a better devops person
| than your devops person is compelling.
|
| Kudos to the author for such an achievement. UT Austin seems to
| have achieved in computer science what North Texas State did in
| music.
| atrettel wrote:
| I took a course on scientific computing in 2013. It was cross-
| listed under both the computer science and applied math
| departments. The issue is that the field is pretty broad overall
| and a lot of topics were covered in a cursory manner, including
| anything related to HPC and parallel programming in particular. I
| don't regret taking the course, but it was too broad for the
| applications I was pursuing.
|
| I haven't looked at what courses are being offered in several
| years, but when I was a graduate student, I really would have
| benefited from a dedicated semester-long course on parallel
| computing, especially going into the weeds about particular
| algorithms and data structures in parallel and distributed
| computing. Those were handled in a super cursory manner in the
| scientific computing course I took, as if somehow you'd know
| precisely how to parallelize things the first time you try. I've
| since learned a lot of this stuff on my own and from colleagues
| over the years, as many people do in HPC, but books like these
| would have been invaluable as part of a dedicated semester-long
| course.
| dist1ll wrote:
| It's very interesting how abtracted away HPC sometimes looks from
| hardware. The books seem to revolve a lot around SPMD
| programming, algo & DS, task parallelism, synchronization etc,
| but very little about computer architecture details like
| supercomputer memory subsystems, high-bandwidth interconnects
| like CXL, GPU architecture and so on. Are the abstractions and
| tooling already good enough that you don't need to worry about
| these details? I'm also curious if HPC practitioners have to
| fiddle a lot of black-box knobs to squeeze out performance?
| bee_rider wrote:
| I don't think I do HPC (I only will use up to, say, 8 nodes at
| a time), but the impression I get is that they are already
| working on quite hard problems at the high-level, so they need
| to lean on good libraries for the low-level stuff, otherwise it
| is just too much.
| atrettel wrote:
| Yes and no.
|
| MPI and OpenMP are the primary abstractions from the hardware
| in HPC, with MPI being an abstracted form of distributed-memory
| parallel computing and OpenMP being an abstracted form of
| shared-memory parallel computing. Many researchers write their
| codes purely using those, often both in the same code. When
| using those, you really do not need to worry about the
| architectural details most of the time.
|
| Still, some researchers who like to further optimize things do
| in fact fiddle with a lot of small architectural details to
| increase performance further. For example, loop unrolling is
| pretty common and can get quite confusing in my opinion. I
| vaguely recall some stuff about trying to vectorize operations
| by preferring addition over multiplication due to the
| particular CPU architecture, but I do not think I've seen that
| in practice.
|
| Preventing cache misses is another major one, where some codes
| are written so that the most needed information is stored in
| the CPU's cache rather than memory. Most codes only handle this
| by ensuring column-major order loops for array operations in
| Fortran or row-major order loops in C, but the concept can be
| extended further. If you know the cache size for your
| processors, you could hypothetically optimize some operations
| to keep all of the needed information inside the cache to
| minimize cache misses. I've never seen this in practice but it
| was actively discussed in the scientific computing course I
| took in 2013.
|
| The use of particular GPUs depends heavily on the problem being
| solved, with some being great on GPUs and others being too
| difficult. I'm not too knowledgeable about that, unfortunately.
| bee_rider wrote:
| Of course, not every problem can be solved by BLAS, but if
| you are doing linear algebra, the cache stuff should be
| mostly handled by BLAS.
|
| I'm not sure how much multiplication vs addition matters on a
| modern chip. You can have a bazillion instructions in flight
| after all, as long as they don't have any dependencies, so
| I'd go with whichever option shortens the data dependencies
| on the critical path. The computer will figure out where to
| park longer instruction if it needs to.
| atrettel wrote:
| You're right that the addition vs. multiplication issue
| likely does not matter on a modern chip. I just gave the
| example because it shows how the CPU architecture can
| affect how you write the code. I do not recall precisely
| when or where I heard the idea, but it was about a decade
| ago --- ages ago by computing standards.
| MichaelZuo wrote:
| Memory architecture and bandwidth are still very important,
| most of IBM's latest performance gains for both mainframes and
| POWER are reliant on some novel innovations there.
| eslaught wrote:
| No, the abstractions are not sufficient. We do care about these
| details, a lot.
|
| Of course, not every application is optimized to the hilt. But
| if you _want_ to so optimize an application, exactly things you
| 're talking about are what come into play.
|
| So yes, I would expect every competent HPC practitioner to have
| a solid (if not necessarily intimate) grasp of hardware
| architecture.
| mgaunard wrote:
| Regardless of what you do, domain knowledge tends to be more
| valuable than purely technical skills.
|
| Knowing more numerical analysis will get probably get you
| further in HPC than knowledge of specific hardware
| architectures.
|
| Ideally you want both, of course.
| jandrewrogers wrote:
| For most HPC, you will not be able to maximize parallelism and
| throughput without intimate knowledge of the hardware
| architecture and its behavior. As a general principle, you want
| the topology of the software to match the topology of the
| hardware as closely as possible for optimal scaling behavior.
| Efficient HPC software is strongly influenced by the nature of
| the hardware.
|
| When I wrote code for new HPC hardware, people were always
| surprised when I asked for the system hardware and architecture
| docs instead of the programming docs. But if you understood the
| hardware design, the correct way of designing software for it
| became obvious from first principles. The programming docs
| typically contained quite a few half-truths intended to make
| things seem misleadingly easier for developers than a proper
| understanding would suggest. In fact, some HPC platforms failed
| in large part because they consistently misrepresented what was
| required from developers to achieve maximum performance in
| order to appear "easy to use", and then failing to deliver the
| performance the silicon was capable of if you actually wrote
| software the way the marketing implied would be effective.
|
| You can write HPC code on top of abstractions, and many people
| do, but the performance and scaling losses are often
| unavoidably integer factor. As with most software, this was
| considered an acceptable loss in many cases if it allowed less
| capable software devs to design the code. HPC is like any other
| type of software in that most developers that notionally
| specialize in it struggle to produce consistently good results.
| Much of the expensive hardware used in HPC is there to mitigate
| the performance losses of worse software designs.
|
| In HPC there are no shortcuts to actually understanding how the
| hardware works if you want maximum performance. Which is no
| different than regular software, in HPC the hardware systems
| are just bigger and more complex.
| crabbone wrote:
| You'd be surprised how actually backwards and primitive are the
| tools used in HPC.
|
| Take for instance the so-called workload managers, of which the
| most popular ones are Slurm, PBS, UGE, LSF. Only Slurm is
| really open-source, PBS has a community edition, the rest is
| proprietary stuff executed in the best traditions of enterprise
| software which locks you into using pathetically bad tools,
| ancient and backwards tech with crappy / nonexistent
| documentation and inept tech support.
|
| The interface between WLMs and the user who wants to use some
| resources is through submitting "jobs". These jobs can be
| interactive, but most often they are the so-called "batch
| jobs". A batch job is usually defined as... a Unix Shell
| script, where the comments are parsed to interpret those as
| instructions to the WLM. In the world with dozens of
| configuration formats... they chose to do this: embed
| configuration into Shell comments.
|
| Debugging job failures is a nightmare, mostly because WLM
| software has really poor quality of execution. Pathetic error
| reporting. Idiotic defaults. Everything is so fragile it falls
| apart if you just as much as look at it in the wrong way.
| Working with it reminds me the very early days of Linux, when
| sometimes things just won't build, or would segfault right
| after you've tried running them, and there wasn't much you
| could do beside spending days or weeks trying to debug it just
| to get some basic functionality going.
|
| When I have to deal with it, I feel kind of like in a steam-
| punk movie. Some stuff is really advanced, and then you find
| out that this advanced stuff is propped by some DIY retro
| nonsense you thought have died off decades ago. The advanced
| stuff is usually more on the side of hardware, while software
| is not keeping up with it for the most part.
| StableAlkyne wrote:
| > Working with it reminds me the very early days of Linux
|
| The other cool thing about HPC is it is one of the last areas
| where multi-user Unix is used! At least, if you're using a
| university or NSF cluster that is!
|
| Only other place I really see multiple humans using the same
| machine is SDF or the Tildes
| victotronics wrote:
| It's saturday afternoon. [login1 ~:3] who |
| cut -d ' ' -f 1 | sort -u | wc -l 41
| bee_rider wrote:
| Having switched from LSF to slurm, I have to appreciate that
| the ecosystem is so bash-centric. Lots of re-use in the
| conversion. If I'd had to learn some kind of slurm-markup-
| language or slurmScript or find buttons in some SlurmWizard,
| it would have been a nightmare.
| crabbone wrote:
| Oh LSF... I don't know if you know this. LSF is perhaps the
| only system alive today that I know of that uses literal
| patches as a means of software distribution.
|
| Fist time I saw it, I had a flashback to the times when I
| worked for HP, and they were making some huge SAP knock-
| off, and that system was so labor-intensive to deploy that
| their QA process involved actual patches. As in pre-release
| QA cycle involved installing the system, validating it
| (which could take a few weeks) and if it's not considered
| DoD, then the developers are given the final list of things
| they need to fix and those fixes would have to be submitted
| as patches (sometimes, literal diffs that need to be
| applied to the deployed system with the patch tool).
|
| This is, I guess, how the "patch version component" came to
| be in SemVer spec. It's kind of funny how lots of tools are
| using this component today for completely unrelated
| purposes... but yeah, LSF feels like the time is ticking
| there at a different pace :)
| OPA100 wrote:
| I've dug deeply into LSF in the last few years and it's like
| a car crash - you can't look away. It feels like something
| that started in the early unix days but was developed into
| perhaps the late 90s, but in reality LSF was only started in
| the 90s (in academia). As far as I can tell development all
| but stopped when IBM acquired it some ten years ago.
| convolvatron wrote:
| HPC software is one area where we have arguably regressed in
| the last 30 years. Chapel is the only light I see in the
| darkness
| victotronics wrote:
| You do a lot of scare quotes. Do you have any suggestions on
| how things could be different? You need batch jobs because
| the scheduler has to wait for resources to be available. It's
| kinda like Tetris in processor/time space. (In fact, that's
| my personal "proof" that workload scheduling is NP-complete:
| it's isomorphic to Tetris.)
|
| And what's wrong with shell scripts? It's a lingua franca,
| generally accepted across scientific disciplines, cluster
| vendors, workload managers, .... Considering the complexity
| of some setups (copy data to node-local file systems; run
| multiple programs, post-process results, ... ) I don't see
| how you could set up things other than in some scripting
| language. And then unix shell scripts are not the worst idea.
|
| Debugging failures: yeah. Too many levels where something can
| go wrong, and it can be a pain to debug. Still, your average
| cluster processes a few million jobs in its lifetime. If more
| than a microscopic portion of that would fail, computing
| centers would need way more personnel than they have.
| romanows wrote:
| I really like using Slurm, the documentation is great
| (https://slurm.schedmd.com) and the model is pretty
| straightforward, at least for the mostly-single-node jobs I
| used it for.
|
| You can launch a job(s) via command-line, config in Bash
| comments, REST APIs, linking to their library, and I think a
| few more ways.
|
| I found it pretty easy to setup and admin. Scaling in the
| cloud was way less developed when I used it, so I just hacked
| in a simple script that allowed scaling up and down based on
| the job queue size.
|
| What do you like better and for what use-case? Mine was for a
| group of researchers training models, and the feature _I_
| desired most was an approximately fair distribution of
| resources (cores, GPU hours, etc.).
| dahart wrote:
| There is a lot of abstraction, but knowing which abstraction to
| use still takes knowing a lot about the hardware.
|
| > I'm also curious if HPC practitioners have to fiddle a lot of
| black-box knobs to squeeze out performance?
|
| In my experience with CUDA developers, yes the Shmoo Plot
| (https://en.wikipedia.org/wiki/Shmoo_plot, sometimes called a
| 'wedge' in some industries) is one of the workhorses of every
| day optimization. I'm not sure I'd call it black-box, though
| maybe the net effect is the same. It's really common to have
| educated guesses and to know what the knobs do and how they
| work, and still find big surprises when you measure. The first
| rule of optimization is measure. I always think of Michael
| Abrash's first chapter in the "Black Book": "The Best Optimizer
| is Between Your Ears"
| http://twimgs.com/ddj/abrashblackbook/gpbb1.pdf. This is a
| fabulous snippet of the philosophy of high performance (even
| though it's PC game centric and not about modern HPC.)
|
| Related to your point about abstraction, the heaviest knob-
| tuning should get done at the end of the optimization process,
| because as soon as you refactor or change anything, you have to
| do the knob tuning again. A minor change in register spills or
| cache access patterns can completely reset any fine-tuning of
| thread configuration or cache or shared memory size, etc..
| Despite this, some healthy amount of knob tuning is still done
| along the way to check & balance & get an intuitive sense of
| the local perf space of the code. (Just noticed Abrash talks a
| little about why this is a good idea.)
| squidgyhead wrote:
| Could you explain how you use a shmoo plot for optimization?
| Do you just have a performance metric at each point in
| parameter space?
| marcosdumay wrote:
| It's not intuitive, but for HPC is more about scalability than
| performance.
|
| You won't be able to use a supercomputer at all without
| scalability, and it's the one topic that is specific to it.
| But, of course, those computers time is quite expensive so
| you'll want to optimize for performance too. It's just
| secondary.
| bluedino wrote:
| I started in HPC about 2 years ago on a ~500 node cluster at a
| Fortune 100 company. I was really just looking for a job where
| I was doing Linux 100% of the time, and it's been fun so far.
|
| But it wasn't what I thought it would be. I guess I expected to
| be doing more performance oriented work, analyzing numbers and
| trying to get every last bit of performance out of the cluster.
| To be honest, they didn't even have any kind of monitoring
| running. I set some up, and it doesn't really get used. Once in
| a while we get questions from management about "how busy is the
| cluster", to justify budgets and that sort of thing.
|
| Most of my 'optimization' work ends up being things like making
| sure people aren't (usually unknowingly) requesting 384 CPUs
| when their script only uses 16, testing software to see what #
| of CPU's it works with before you see a degradation, etc. I've
| only had the Intel profiler open twice.
|
| And I've found that most of the job is really just helping
| researchers and such with their work. Typically running either
| a commercial or open-source program, troubleshooting it, or
| getting some code written by another team on another cluster
| and getting it built and running on yours. Slogging through
| terrible Python code. Trying to get a C++ project built on a
| more modern cluster in a CentOS 7 environment.
|
| It can be fun in a way. I've worked with different languages
| over the years so I enjoy trying to get things working, digging
| through crashes and stack traces. And working with such large
| machines, your sense of normal gets twisted when you're on a
| server with 'only' 128GB of RAM or 20TB of disk.
|
| It's a little scary when you know the results of some of this
| stuff are being used in the real world, and the people running
| the simulations aren't even doing things right. Incorrect code,
| mixed up source code, not using the data they thing they are, I
| once found a huge bug that had existed for 3 years. Doesn't
| this invalidate all the work you've done on this subject?
|
| The one drawback I find is that a lot of HPC jobs want you do
| have a masters degree. Even to just run the cluster. Doesn't
| make sense to me, I'm not writing the software you're running,
| we aren't running some state of the art, TOP500 cluster. We're
| just getting a bunch of machines networked together and running
| some code.
| throwawaaarrgh wrote:
| I always found that funny too. A business who needs a
| powerful computing solution can come up with some amazingly
| robust stuff, whereas science/research just buys a big
| mainframe and hopes it works.
| justin66 wrote:
| > The one drawback I find is that a lot of HPC jobs want you
| do have a masters degree.
|
| Is it possible that pretty much any specialization, outside
| of the most common ones, engages in a lot of gatekeeping? I
| remember how difficult it appeared to be after I graduated to
| break into embedded systems (I never did). I persisted until
| I realized it doesn't even pay very well, comparatively.
| bayindirh wrote:
| HPC admin here, generally serving "long tail of science"
| researchers.
|
| In today's x86_64 hardware, there's no "supercomputer memory
| subsystem". It's just a glorified NUMA system, and the biggest
| problem is putting the memory close to your core, i.e. keeping
| data local in your NUMA node to reduce latencies.
|
| Your resource mapping is handled by your scheduler. It knows
| your hardware, hence it creates a cgroup which satisfies your
| needs and as optimized as possible, and stuffs your application
| into that cgroup and runs it.
|
| Currently king of high performance interconnects is Infiniband,
| and it accelerates MPI at the fabric level. You can send
| messages, broadcasts and reduce results like there's no
| tomorrow. Because when the message arrives you, it's already
| reduced. When you broadcast, you only send a single message
| which is broadcasted at fabric layer. Multiple Context IB cards
| have many queues and more than one MPI job can run on the same
| node/card with queue/context isolation.
|
| If you're using a framework for GPU work, the architecture &
| optimization is done at that level automatically (the framework
| developers do the hard work generally). NVIDIA's drivers are
| pure black magic, too. They handle some parts of the
| optimization, too. InterGPU connection is handled by a physical
| fabric, managed by drivers and its own daemon.
|
| If you're CPU bound, your libraries are generally hand tuned by
| its vendor (Intel MKL, BLAS, Eigen, etc.). I personally used
| Eigen, and it has processor specific hints and optimizations
| baked in.
|
| The things you have to worry is to compile your code for the
| correct architecture, make sure that the hardware you run on
| can satisfy your demands (i.e.: do not make too many random
| memory accesses, keep the prefetcher and branch predictor happy
| if you're trying to go "all-out fast" on the node, do not abuse
| disk access, etc.).
|
| On the number crunching side, keeping things independent (so
| they can be instruction level parallelized/vectorized), making
| sure you're not doing unnecessary calculations, and not abusing
| MPI (reducing inter-node talk to only necessary chatter) is the
| key.
|
| It's way easier said than done, but when you get the hang of
| it, it becomes like a second nature to think about these
| things, if these kinds of things are your cup of tea.
| efxhoy wrote:
| I wrote scientific simulation software in academia for a few
| years. None of us writing the software had any formal software
| engineering training above what we'd pieced together ourselves
| from statistics courses. We wrote our simulations to run
| independently on many nodes and aggregated the results at the
| end, no use of any HPC features other than "run these 100
| scripts on a node each please, thank you slurm". That approach
| worked very well for our problem.
|
| I'd bet a significant part of compute work on HPC clusters in
| academia works the same way. The only thing we paid attention
| to was number of cores on the node and preferring node local
| storage over the shared volumes for caching. No MPI.
|
| There are of course problems requiring "genuine" HPC clusters
| but ours could have run on any pile of workers with a job
| queue.
| teleforce wrote:
| Is there something wrong with the GitHub files since I cannot
| render any of the textbooks PDF files?
|
| https://github.com/VictorEijkhout/TheArtofHPC_pdfs/blob/main...
| npalli wrote:
| I think the files are too large to render in the github browser
| and they give an error. You can pick the 'download raw' option
| to download locally and read the file. Worked for me.
| TimMeade wrote:
| I just "git clone
| https://github.com/VictorEijkhout/TheArtofHPC_pdfs.git" on my
| local drive. Had it all in under a minute.
| rramadass wrote:
| Just amazed at how the author has created (and shared for free)
| such a comprehensive set of books including teaching C++ and Unix
| tools! There is something to learn for all Programmers (HPC
| specific or not) here.
|
| Related: Jorg Arndt's "Matters Computational" book and FXT
| library - https://www.jjj.de/fxt/
| rlupi wrote:
| I am interested in the more hardware management side of HPC (how
| problems are detected, diagnosed, mapped into actions such as
| reboot/reinstall/repairs, how these are scheduled and how that is
| optimized to provide the best level of service, how this is done
| if there are multiple objectives to optimize at once e.g. node
| availability vs overall throughput, how different topologies
| affect the above, how other constraints affect the above, and in
| general a system dynamics approach to these problems).
|
| I haven't found many good sources for this kind of information.
| If you are aware of any, please cite them in a comment below.
| synergy20 wrote:
| check out openbmc project and DTMF association
| timoteostewart wrote:
| DMTF (not DTMF)
|
| https://www.dmtf.org/
| CoastalCoder wrote:
| This seemed like a big topic when I was interviewing with Meta
| and nVidia some months ago.
|
| Meta had a few good YouTube videos about the problems of
| dealing with this many GPUs at scale.
| keefle wrote:
| Could you link me the YouTube videos/articles in question? It
| happens to be my research area and I'm interested in knowing
| how big companies such as meta deal with multi-GPU systems
| CoastalCoder wrote:
| I don't have them bookmarked anymore, but they may have
| been from this playlist: [0]
|
| [0] https://www.youtube.com/playlist?list=PLBnLThDtSXOw_keP
| Wy3CS...
| nyrikki wrote:
| Assuming you are moving past just the typical nonblocking
| folded-Clos networksor Little's Law; and want to have a more
| engineering focus, "Queuing theory" is one discipline you want
| to dig into.
|
| Queuing theory seems trivial and easy how it is introduced, but
| it has many open questions.
|
| Performance metrics for a system with random arrival times,
| independent service times, with k servers (M/G/k) is still an
| open question as an example.
|
| https://www.sciencedirect.com/science/article/pii/S089571770...
|
| There are actually lots of open problems in queuing theory that
| one wouldn't expect.
| cavisne wrote:
| This paper from Microsoft [1] is the coolest thing I've seen in
| this space. Basically workload (deep learning in this case)
| level optimization to allow jobs to be resized and preempted.
|
| [1] https://arxiv.org/pdf/2202.07848.pdf
| justin66 wrote:
| There is some really good content here for any programmer.
|
| And with volume 3, such a contrast: the author teaches C++17
| and... Fortran2008.
| toddm wrote:
| Kudos to Victor for assembling such a wonderful resource!
|
| While I am not acquainted with him personally, I did my doctoral
| work at UT Austin the the 1990's and had the privilege of working
| with the resources (Cray Y-MP, IBM SP/2 Winterhawk, and mostly on
| Lonestar, a host name which pointed to a Cray T3E at the time)
| maintained by TACC (one of my Ph.D. committee members is still on
| staff!) to complete my work (TACC was called HPCC and/or CHPC if
| I recall the acronyms correctly).
|
| Back then, it was incumbent on the programmer to parallelize
| their code (in my case, using MPI on the Cray T3E in the UNICOS
| environment) and have some understanding of the hardware, if only
| because the field was still emergent and problems were solved by
| reading the gray Cray ring-binder and whichever copies of Gropp
| et al. we had on-hand. That and having a very knowledgeable
| contact as mentioned above :) of course helped...
| victotronics wrote:
| > Lonestar, a host name which pointed to a Cray T3E
|
| Lonestar5 was a Cray again. Currently Lonestar6 is an oil-
| immersion AMD Milan cluster with A100 GPUs. The times, they
| never stand still.
| LASR wrote:
| The hardware / datacenter side of this is equally fascinating.
|
| I used to work in AWS, but on the software / services side of
| things. But now and then, we would crash some talks from the
| datacenter folks.
|
| One key relevation for me was that increasing compute power in
| DCs is primarily a thermodynamics problem than actual computing.
| The nodes have become so dense that shipping power in and
| shipping heat out, with all kinds of redundancies is an extremely
| hard problem. And it's not like you can perform a software update
| if you've discovered some inefficiencies.
|
| This was ~10 years ago, so probably some things have changed.
|
| What blows me away is that Amazon, starting out as an internet
| bookstore is at the cutting edge of solving thermodynamics
| problems.
| projectileboy wrote:
| Seymour Cray used to say this all the way back in the 1970s:
| his biggest problems were associated with dissipating heat. For
| the Cray 2 he took an even more dramatic approach: "The
| Cray-2's unusual cooling scheme immersed dense stacks of
| circuit boards in a special non-conductive liquid called
| Fluorinert(tm)" (https://www.computerhistory.org/revolution/sup
| ercomputers/10...)
| cogman10 wrote:
| It always made me wonder why liquid cooling wasn't more of a
| thing for datacenters.
|
| Water has a massive amount of thermal capacity and can quickly
| and in bulk be cooled to optimal temperatures. You'd probably
| still need fans and AC to dissipate heat of non-liquid cooled
| parts, but for the big energy items like CPUs and GPUs/compute
| engines, you could ship out huge amounts of heat fairly quickly
| and directly.
|
| I guess the complexity and risk of a leak would be a problem,
| but for amazon sized data centers that doesn't seem like a
| major concern.
| jebarker wrote:
| I'm interested in what people think of the approach to teaching
| C++ used here. Any particular drawbacks?
|
| I'm a very experienced Python programmer with some C, C++ and
| CUDA doing application level research in HPC environments
| (ML/DL). I'd really like to level up my C++ skills and looking
| through book 3 it seems aimed exactly at the right level for me -
| doesn't move too slowly and teaches best practices (per the
| author) rather than trying to be comprehensive.
___________________________________________________________________
(page generated 2023-12-30 23:00 UTC)