Chris Adams is joined by Zachary Smith and My Truong both members of the Hardware Standards Working Group at the GSF. They dive into the challenges of improving hardware efficiency in data centers, the importance of standardization, and how emerging technologies like two-phase liquid cooling systems can reduce emissions, improve energy reuse, and even support power grid stability. They also discuss grid operation and the potential of software-hardware coordination to drastically cut infrastructure impact.
Chris Adams is joined by Zachary Smith and My Truong both members of the Hardware Standards Working Group at the GSF. They dive into the challenges of improving hardware efficiency in data centers, the importance of standardization, and how emerging technologies like two-phase liquid cooling systems can reduce emissions, improve energy reuse, and even support power grid stability. They also discuss grid operation and the potential of software-hardware coordination to drastically cut infrastructure impact.
Learn more about our people:
Find out more about the GSF:
Resources:
If you enjoyed this episode then please either:
TRANSCRIPT BELOW:
Zachary Smith: We've successfully made data centers into cloud computing over the past 20 or 25 years, where most people who use and consume data centers never actually see them or touch them. And so it's out of sight, out of mind in terms of the impacts of the latest and greatest hardware or refresh. What happens to a 2-year-old Nvidia server when it goes to die? Does anybody really know Hello, and welcome to Environment Variables,
Chris Adams: brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.
I'm your host, Chris Adams.
Hello and welcome to Environment Variables, the podcast where we explore the latest in sustainable software development. I'm your host, Chris Adams. Since this podcast started in 2022, we've spoken a lot about green software, how to make code more efficient so it consumes fewer resources or runs on a wider range of hardware to avoid needless hardware upgrades, and so on.
We've also covered how to deploy services into data centers where energy is the cleanest, or even when energy is the cleanest, by timing compute jobs to coincide with an abundance of clean energy on the grid. However, for many of these interventions to work, they rely on the next layer down from software,
the hardware layer, to play along. And for that to work at scale, you really need standards. Earlier this year, the SSIA, the Sustainable and Scalable Infrastructure Alliance, joined the Green Software Foundation. So now there's a hardware standards working group for HSWG within the Green Software Foundation too.
Today we're joined by two leaders in the field who are shaping the future of sustainable software. So, oops, sustainable hardware. We've got Zachary Smith formerly of Packet and Equinix, and My Truong from ZutaCore. We'll be discussing hardware efficiency, how it fits into the bigger sustainability picture, the role of the Open19 standard, and the challenges and opportunities of making data centers greener.
So let's get started. So, Zachary Smith, you are alphabetically ahead of My Truong, Mr. Truong. So can I give you the floor first to introduce yourself and tell a little bit about yourself for the listeners?
Zachary Smith: Sure. Thanks so much, Chris. It's a pleasure being here and getting to work with My on this podcast. As you mentioned, my name's Zachary Smith. I've been an entrepreneur, primarily in cloud computing for, I guess it's about 25 years now. I went to Juilliard. I studied music and ended up figuring that wasn't gonna pay my rent here in New York City and in the early two thousands joined a Linux-based hosting company. That really gave me just this full stack view on having to put together hardware. We had to build our own computers, ran data center space, oftentimes helped build some of the data centers, connect them all with networks, travel all around the world, setting that up for our customers. And so I feel really fortunate because I got to touch kind of all layers of the stack. My career evolved touch further into hardware. It just became a true passion about where we could connect software and hardware together through automation, through accessible interfaces, and other kinds of standardized protocols, and led me to start a company called Packet, where we did that across different architectures, X86 and ARM, which was really coming to the data center in the 2014/15 timeframe. That business Equinix, one of the world's largest data center operators. And at that point we really had a different viewpoint on how we could impact scale, with the sustainability groups within Equinix as one of the largest green power purchasers in the world, and start thinking more fundamentally about how we use hardware within data centers, how data centers could speak more or be accessible to software users which as we'll, unpack in this conversation, are pretty disparate types of users and don't often get to communicate in good ways. So, I've had the pleasure of being at operating companies. I now invest primarily businesses around the use of data centers and technology as well as circular models to improve efficiency and the sustainability of products.
Chris Adams: Cool. Thank you Zachary. And, My, can I give you the floor as well to introduce yourself from what looks like your spaceship in California?
My Truong: Thanks. Thanks, Chris. Yes. So pleasure being here as well. Yeah, My Truong, I'm the CTO at ZutaCore, a small two-phase liquid cooling organization, very focused on bringing sustainable liquid cooling to the marketplace. Was very fortunate to cross over with Zach at Packet and Equinix and have since taken my journey in a slightly different direction to liquid cooling. Super excited to join here. Come from, unfortunately I'm not a musician by a classical training. I am a double E by training. I'm joining here from California on the west coast of the Bay Area.
Chris Adams: Cool. Thank you for that, My. Alright then. So, my name is Chris. If you're new to this podcast, I work in the Green Web Foundation, which is a small Dutch nonprofit focused on an entirely fossil free internet by 2030. And I'm also the co-chair of the policy working group within the Green Software Foundation.
Everything that we talk about, we'll do our best to share links to in the show notes. And if there's any particular thing you heard us talking about that you're really interested that isn't in the show notes, please do get in touch with us because we want to help you in your quest to learn more about green software and now green hardware.
Alright then looks like you folks are sitting comfortably. Shall we start?
Zachary Smith: Let's do it.
Chris Adams: All right then. Cool. Okay. To start things off, Zachary, I'll put this one to you first. Can you just give our listeners an overview of what a hardware standards working group actually does and why having standards with like data centers actually helps?
I mean, you can assume that our listeners might know that there are web standards that make websites more accessible and easier to run on different devices, so there's a sustainability angle there, but a lot of our listeners might not know that much about data centers and might not know where standards would be helpful.
So maybe you can start with maybe a concrete case of where this is actually useful in helping make any kind of change to the sustainability properties of maybe a data center or a facility.
Zachary Smith: Yeah. That's great. Well, let me give my viewpoint on hardware standards and why they're so critical. We're really fortunate actually to enjoy a significant amount of standardization in consumer products, I would say. there's working groups, things like the USB Alliance, that have Really provided, just in recent times, for example, standardization, whether that's through market forces or regulation around something like USB C, right, which allowed manufacturers and accessories and cables and consumers to not have extra or throw away good devices because they didn't have the right cable to match the port.
Right? And so beyond this interoperability aspect to make these products work better across an intricate supply chain and ecosystem, they also could provide real sustainability benefits in terms of just reuse. Okay. In data centers, amazing thing, being that we can unpack some of the complexities related to the supply chain. These are incredibly complex buildings full of very highly engineered systems that are changing at a relatively rapid pace. But the real issue from my standpoint is, we've successfully made data centers into cloud computing over the past 20 or 25 years, where most people who use and consume data centers never actually see them or touch them. And so it's out of sight, out of mind in terms of the impacts of the latest and greatest hardware or refresh. What happens to a 2-year-old, Nvidia server when it goes to die? Does anybody really know? You kind of know in your home or with your consumer electronics, and you have this real waste problem, so then you have to deal with it.
You know not to put lithium ion batteries in the trash, so,
you find the place to put them. But you know, when it's the internet and it's so far away, it's a little bit hazy for, I think most people to understand the kind of impact of hardware and the related technology as well as what happens to it. And so that's, I'm gonna say, one of the challenges
in the broader sustainability space for data center and cloud computing. One of the opportunities is that maybe different from consumer, we know actually almost exactly where most of this physical infrastructure shows up. Data centers don't move around usually. Um, And so they're usually pretty big. They're usually attached to long-term physical plants, and there's not millions of them. There's thousands of them, but not millions. And so that represents a really interesting opportunity for implementing really interesting, which would seem complex, models. For example, upgrade cycles or parts replacement or upskilling, of hardware. Those things are actually almost more doable logistically in data centers than they are in the broader consumer world because of where they end up. The challenge is that we have this really disparate group of manufacturers that frankly don't always have all the, or aligned incentives, for making things work together. Some of them actually define their value by, "did I put my, logo on the left or did I put my cable on the right?" You have, a business model, which would be the infamous Intel TikTok model, which is now maybe Nvidia. My, what's NVIDIA's version of this?
IDK. But its 18 month refresh cycles are really like put out as a pace of innovation, which are, I would say in many ways quite good, but in another way, it requires this giant purchasing cycle to happen and people build highly engineered products around one particular set of technology and then expect the world to upgrade everything around it when you have data centers and the and related physical plant, maybe 90 or 95% of this infrastructure Can, be very consistent. Things like sheet metal and power supplies and cables and so like, I think that's where we started focusing a couple years ago was "how could we create a standard that would allow different parts of the value chain throughout data center hardware, data centers, and related to, benefit from an industry-wide interoperability. And that came to like really fundamental things that take years to go through supply chain, and that's things like power systems, now what My is working on related cooling systems, as well as operating models for that hardware in terms of upgrade or life cycling and recycling. I'm not sure if that helps but, this is why its such a hard problem, but also so important to make a reality.
Chris Adams: So if I'm understanding, one of the advantages having the standards here is that you get to decide where you compete and where you cooperate here with the idea being that, okay, we all have a shared goal of reducing the embodied carbon in maybe some of the materials you might use, but people might have their own specialized chips.
And by providing some agreed standards for how they work with each other, you're able to use say maybe different kinds of cooling, or different kinds of chips without, okay. I think I know, I think I know more or less where you're going with that then.
Zachary Smith: I mean, I would give a couple of very practical examples. Can we make computers that you can pop out the motherboard and have an upgraded CPU, but still use
Like the rest of the band. Yeah.
the power supplies, et cetera. Is that a possibility? Only with standardization could that work. Some sort of open standard. And standards are a little bit different in hardware.
I'm sure My can give you some color, having recently built Open19 V2 standard. It's different than the software, right? Which is relatively, I'm gonna say, quick to create,
quick to change.
And also different licensing models, but hardware specifications are their own beast and come with some unique challenges.
Chris Adams: Cool. Thank you for that, Zach. My, I'm gonna come bring to the next question to you because we did speak a little bit about Open19 and that was one thing that was a big thing with the SSIA. So as I understand it, the Open19 spec, which we referenced, that was one of the big things that the SSIA was a kind of steward of. And as I understand it, there's already an existing different standard that def, that defines like the dimensions of like say a 19 inch rack in a data center.
So, need to be the same size and everything like that. But that has nothing to say about the power that goes in and how you cool it or things like that. I assume this is what some of the Open19 spec was concerning itself with. I mean, maybe you could talk a little bit about why you even needed that or if that's what it really looks into and why that's actually relevant now, or why that's more important in, say, halfway through the 2020s, for example.
My Truong: Yeah, so Open19, the spec itself originated from a group of folks starting with the LinkedIn or organization at the time. Yuval Bachar put it together along with a few others.
As that organization grew, it was inherited by SsIA, which was, became a Linx Foundation project. What we did when we became a Linux Foundation project is rev the spec. the original spec was built around 12 volt power. It had a power envelope that was maybe a little bit lower than what we knew we needed to go to in the industry. And so what we did when we revised the spec was to bring, both 48 volt power, a much higher TDP to it, and brought some consistency the design itself.
So, as you were saying earlier, EIATIA has a 19 inch spec that defines like a rail to rail, but no additional dimensions beyond just a rail to rail dimension. And so what we did was we built a full, I'm gonna air quote, a "mechanical API" for software folk. So like, do we consistently deliver something you can create variation inside of that API, but the API itself is very consistent on how you go both mechanically, bring hardware into a location, how you power it up, how do you cool it? For variations of cooling, but have a consistent API for bringing cooling into that IT asset. What it doesn't do is really dive into the rest of the physical infrastructure delivery. And that was very important in building a hardware spec, was that we didn't go over and above what we needed to consistently deliver hardware into a location. And when you do that, what you do is you allow for a tremendous amount of freedom on how you go and bring the rest of the infrastructure to the IT asset.
So, in the same way when you build a software spec, you don't really concern yourself about what language you put in behind it, how the rest of that infrastructure, if you have like, a communication bus or is it like a semi API driven with a callback mechanism? You don't really try to think too heavily around that.
You build the API and you expect the API to behave correctly. And so what that gave us the freedom to do is when we started bringing 48 volt power, we could then start thinking about the rest of the infrastructure a little bit differently when you bring consistent sets of APIs to cooling and to power. And so when we started thinking about it, we saw this trend line here about like. We knew that we needed to go think about 400 volt power. We saw the EV industry coming. There was a tread line towards 400 volt power delivery. What we did inside of that hardware spec was we left some optionality inside of the spec to go and change the way that we would go do work, right?
So we gave some optional parameters the, infrastructure teams to go and change up what they needed to go do so that they could deliver that hardware, that infrastructure a little bit more carefully or correctly for their needs. So, we didn't over specify, in particular areas where, I'll give you a counter example and in other specifications out there you'll see like a very consistent busbar in the back of the infrastructure that delivers power. It's great when you're at a
Chris Adams: So if I can just stop for you for a second there, My. The busbar, that's the thing you plug a power thing instead of a socket. Is that what you're referring to there?
My Truong: Oh, so, good question Chris. So in what you see in some of the Hyperscale rack at a time designs, you'll
see two copper bars sitting in the middle of the rack in the back delivering power. And that looks great for an at scale design pattern, but may not fit the needs of smaller or more nuanced design patterns that are out there. Does that make sense?
Chris Adams: Yeah. Yeah. So instead of having a typical, kinda like three-way kind of kettle style plug, the servers just connect directly to this bar to provide the power. That's that's what one of those bars is. Yeah. Gotcha.
My Truong: Yep. And so we went a slightly different way on that, where we had a dedicated power connection per device that went into the Open19 spec. And the spec is up, I think it's up still up on our ssia.org, website. And so anybody can go take a look at it and see the, mechanical spec there.
It's a little bit different.
Chris Adams: Okay. All right. So basically previously there was just a spec said "computers need to be this shape if they're gonna be server computers in rack." And then Open19 was a little bit more about saying, "okay, if you're gonna run all these at scale, then you should probably have some standards about how power goes in and how power goes out."
Because if nothing else that allows 'em to be maybe some somewhat more efficient. And there's things like, and there's various considerations like that, that you can take into account. And you spoke about shifting from maybe 48 volts to like 400 volts and that there is efficiency gained by, which we probably don't need to go into too much detail about, when you do things like that because it allows you to use, maybe it allows you to move along more power without so much being wasted, for example.
These are some of the things that the standards are looking into and well, in the last 10 years, we've seen a massive, we've seen a shift from data center racks, which use quite a lot of power to some things which use significantly more. So maybe 10 years ago you might had a cloud rack would be between five and 15 kilowatts of power.
That's like, tens of homes. And now you we're looking at racks, which might be say, half a megawatt or a megawatt power, which is maybe hundreds if not thousands of homes worth of power. And therefore you need say, refreshed and updated standards. And that's where the V2 thing is moving towards.
Right.
My Truong: Okay.
Chris Adams: Okay, cool. So yeah.
Zachary Smith: Just, the hard thing about hardware standards, where the manufacturing supply chain moves slow
unless you are end-to-end verticalizer, like some of the hyperscale customers can really verticalize. They build the data center, they build the hardware, lots of the same thing.
They can force that. But a broader industry has to rely upon a supply chain. Maybe OEMs, third party data center operators, 'cause they don't build their own data center,
they use somebody else's. And so what we accomplish with V2 was allow for this kind of innovation within the envelope and do the, one of our guiding principles was how could we provide the minimal amount of standardization that we would allow for more adoption to occur while still gaining the benefits?
Chris Adams: Ah.
Zachary Smith: And so that it's a really difficult friction point because your natural reaction is to like, solve the problem. Let's solve the problem as best we can.
The that injects so much opinion that it's very hard to get adopted throughout the broader industry. And so even things like cooling,
single phase or two phase, full immersion or not, this kind of liquid or this way, different types of pressure, whatever. There's all kinds of implications, whether those are technology use, those are regulatory situations across different environments, so I think like that's the challenge that we've had with hardware standards, is how to make it meaningful while still allowing it to evolve for specific use cases.
Chris Adams: Alright. Okay. So, I think I am, I'm understanding a bit now. And like I'll try and put it in context to some of the other podcast episodes we've done. So we've had people come into this podcast from like Google for example, or Microsoft, and they talk about all these cool things that they're entirely vertically designed data centers where they're in the entire supply chain. They do all these cool things with the grid, right? But all those designs, a lot of the time they, there's maybe these might be custom designs in the case of Google when no one gets to see them. Or in some cases, like say Meta or some other ones, it may be open compute, which is a, it's a different size to most people's data centers, for example. So you can't just like drop that stuff in, like there's a few of them arouned, but it's still 19 inches that's the default standard in lots of places. And if I understand it, one of the things that, one of the goals of Open19 is to essentially bring everyone else along who already have standardized on these kind of sides so they can start doing some of the cool grid aware, carbon aware stuff that you see people talking about that you probably don't have that much access to if you're not already meta Google or Facebook with literally R&D budgets in the hundreds of millions.
Zachary Smith: Yeah, maybe add some zeros there.
Yeah, I think absolutely, right, which is democratizing access to some of this innovation, right? while still relying upon and helping within the broader supply chain. For example, if EVs are moving into 400 volt, like we can slipstream and bring that capability to data centre hardware supply chains.
'Cause the people making power supplies or components or cabling are moving in those directions, right? But then it's also just allowing for the innovation, right? Like, I think, we firmly seen this in software. I think this is a great part of Linux Foundation, which is,
no one company owns the, you know, monopoly on innovation. And what we really wanna see was not to like, can we make a better piece of hardware, but can we provide, some more foundational capabilities so that hundreds of startups or different types of organizations that might have different ideas or different needs or different goals could innovate around the sustainability aspect of data center hardware and, I think what we're focused on now within GSF is really taking that to a more foundational level. There's just a huge opportunity right now with the data center construction industry happening to really find a even more interesting place where we can take some of those learnings from hardware specifications and apply it to an even broader impact base
Chris Adams: Ah, okay. Alright. I'll come back to some of this because there's, I know there's a project called Project Mycelium that Asim Hussain, the executive Director of the Green Software Foundation is continually talking about. But like we've spoken a little about, you mentioned, if I understand it, like, this allows you to maybe have more freedom to talk about maybe, instead of having like tiny fans, which scream at massive, thousands and thousands of RPM, there's other ways that you could maybe call down chips for example. And like, this is one thing that I know that the hardware standards working group is looking at, is finding ways to keep the servers cool, for example. Like as I understand it,
using liquid is, can be more efficient, quite a bit more efficient than having tiny fans to cool at massive RPM to cool things down. But also, I guess there's a whole discussion about, well there's different kinds of, there's different ways of cooling things which might reduce the kind of local power draw, local water usage in a data center, for example.
And like, maybe this is one thing we could talk a little bit more about then, 'cause I dunno that, we've had people talk about, say, liquid calling and things like that before, as like, these are some alternative ways to more sustainably cool down data centers in terms of how much power they need, but also what their local footprint could actually be.
But we've never had people who actually have that much deep expertise in this. So maybe I could put the questions to one of you. Like, let's say you're gonna switch to liquid calling, for example, Instead of using itty bitty fans or even just bigger, slightly bigger fans, running a little bit slower. Like, how does that actually improve it? Maybe you could, maybe I could put this to you, My. 'Cause I think this is one thing that you've spent quite a lot of time looking into, like, yeah, where are the benefits? Like what, how does, how did the benefits materialize if you switch from, say, air to a liquid calling approach like this?
My Truong: Yeah, so on the liquid cooling front, there's a number of pieces here. The fans that you were describing earlier, they're moving air, which is effectively a liquid when you're using it in a cooling mode at 25,000 RPM, you're trying to more air across the surface and it doesn't have a great amount of,
Zachary Smith: Heat transfer capability.
My Truong: removal and rejection. Yeah. heat transfer capabilities. Right. So in this world where we're not moving heat with air, we're moving it with some sort of liquid, either a single phase liquid, like water or a two-phase liquid taking advantage of two phase heat transfer properties.
There's a lot of significant gains and those gains really start magnifying here in this AI space that we're in today. And I think this is where Project Mycelium started to come into fruition was to really think about that infrastructure end to end. When you're looking at some of these AI workloads, especially AI training workloads, their ability to go and move hundreds of megawatts of power simultaneously and instantaneously becomes a tricky cooling challenge and infrastructure challenge. And so really what we wanted to be able to think through is how do we go and allow software to signal all the way through into hardware and get hardware to help go and deal with this problem in a way that makes sense.
So I'll give you a concrete example.
If you're in the single phase space and you are in the 100 megawatt or 200 megawatt data center site, which is, this is what xAI built out Memphis, Tennessee. When you're going and swinging that workload, you are swinging a workload from zero to a hundred percent back to zero quite quickly. In the timescale of around 40 milliseconds or so, you can move a workload from zero to 200 megawatts back down to zero. When you're connected to a grid, when you're connected to a grid,
Chris Adams: right.
My Truong: that's called a grid distorting power event, right?
You can go swing an entire grid 200 megawatt, which is, probably like, maybe like a quarter of the LA area of like the ability to go and distort a grid pretty quickly. When you're an isolated grid like Ercot, this becomes like a very, tricky proposition for the grid to go and manage correctly. On the flip side of that, like once you took the power you, created about 200 megawatt of heat as well. And when you start doing that, you have to really think about what are you doing on your cooling infrastructure. If you're a pump based system, like single phase, that means that you're probably having to spool up and spool down your pump system quite rapidly to go respond to that swing in power demand. But how do you know? How do you prep the system? How do you tell that this is going to happen? And this is where we really need to start thinking about, these software hardware interfaces. Wouldn't it be great if your software workload could start signaling to your software or your hardware infrastructure? "Hey I'm about, to go and start up this workload, and I'm about to go and swing this workload quite quickly." You would really want to go signal to your infrastructure and say, "yes, I'm about to go do this to you," and maybe you want to even signal to your grid, "I'm about to go do this for you" as well. You can start thinking about other, options for managing your power systems correctly, maybe using like a battery system to go and shave off that peak inside of the grid and manage that appropriately. So we can start thinking about this. Once we have this ability to go signal from software to hardware to infrastructure and building that communication path, it becomes an interesting thought exercise that we can realize that this is just a software problem.
have been in this hardware, software space, we've seen this before. And is it worth synchronizing this data up? Is it worth signaling this correctly through the infrastructure? This is like the big question that we have with Project Mycelium. Like, it would be amazing for us to be able to do this.
Chris Adams: Ah, I see.
My Truong: The secondary effects of this is to really go think through, now, if you're in Dublin where you have offshore power and you now have one hour resolution on data that's coming through about the amount of green power that's about to come through, it would be amazing for you to signal up and signal down your infrastructure to say, you should really spool up your workload and maybe run it at 150% for a while, right?
This would be a great time to go really take green power off grid and drive your workload on green power for this duration. And then as that power spools off, you can go roll that power need off for a time window. So being able to think about these things that we can create between the software hardware interface is really where I think that we have this opportunity to really make game changing and really economy changing outcomes.
Chris Adams: Okay.
Zachary Smith: I have a viewpoint on that, Chris,
Chris Adams: too.
Yeah, please do.
Zachary Smith: My TLDR summary is like, infrastructure has gotten much more complicated and the interplay between workload and that physical infrastructure is no longer, "set it in there and just forget it and the fans will blow and the servers will work and nobody will notice the difference in the IT room."
These are incredibly complex workloads. Significant amount of our world is interacting with this type of infrastructure through software services. It's just got more complicated, right? And what we haven't really done is provide more efficient and advanced ways to collaborate between that infrastructure and the kind of workload. It's, still working under some paradigms that like, data centers, you put everything in there and the computers just run. And that's just not the case anymore. Right. I think that's what My was illustrating so nicely, is that workload is so different and so dynamic and so complex that we need to step up with some ways to, for the infrastructure and that software workload to communicate.
Chris Adams: Ah, I see. Okay. So I'll try and translate some of that for some of the listeners that we've had here. So you said something about, okay. A 200 megawatt like power swing, that's like, that's not that far away from a half a million people appearing on the grid, then disappearing on the grid every 14 milliseconds.
And like obviously that's gonna piss off people who have to operate the grid. But that by itself is one thing, and that's also a change from what we had before because typically cloud data centers were known for being good customers because they're really like flat, predictable power draw.
And now rather than having like a flat kind of line, you have something more like a kind of seesaw, a saw tooth, like up, down, up, down, up, down, up, down. And like if you just pass that straight through to the grid, that's a really good way to just like totally mess with the grid and do all kinds of damage to the rest of the grid.
But what it sounds like you're saying is actually, if you have some degree of control within the data center, you might say, "well, all this crazy spikiness, rather than pulling it from the grid, can I pull it from batteries, for example?" And then I might expose, or that I might expose that familiar flat pattern to the rest of the grid, for example.
And that might be a way to make you more popular with grid operators, but also that might be a way to actually make the system more efficient. So that's one of the things you said there. So that's one kind of helpful thing there. But also you said that there is a chance to like dynamically scale up how, when there is loads and loads of green energy, so you end up turning into a bit more of a kind of like better neighbor on the grid essentially.
And that can have implications for the fact that because we are moving to a, like you said before, there's complexity at the power level and it allows the data centers to rather than make that worse, that gonna address some of those things. So it's complimentary to the grid, is that what you're saying?
My Truong: Yeah. I think you got it, Chris. You got it, Chris. Yeah.
Exactly. So that's on the power side. I think that we have this other opportunity now that as we're starting to introduce liquid cooling to the space as well, we're effectively, efficeintly removing heat from the silicon. Especially in Europe
this is becoming like a very front and center, conversation of data centers operating in Europe is that this energy doesn't need to go to waste and be evacuated into the atmosphere. We have this tremendous opportunity to go and bring that heat into local municipal heat loops
and really think about that much more, in a much more cohesive way. And so this is again, like where we really, like, as Zach was saying, we really need to think about this a bit comprehensively and really rethink our architectures to some degree with these types of workloads coming through. And so bringing standards around the hardware, the software interface, and then as we start thinking through the rest of the ecosystem, how do we think through bringing consistency to some of this interface so that we can communicate "workload is going up, workload is going down. The city needs x amount of gigawatt of power into a municipal heat loop," like help the entire ecosystem out a little bit better. In the winter, probably Berlin or Frankfurt would be excited to have gigawatts of power in a heat loop to go and drive a carbon free heating footprint inside of the area. But then on the flip side of that, going and building a site that expects that in the winter, but in the summer where you're not able to take that heat off, how do we think about more innovative ways of driving cooling then as well? How do we go and use that heat in a more effective way to drive a cooling infrastructure?
Chris Adams: So, okay, so this is that. I'm glad you mentioned the example, 'cause I live in Germany and our biggest driver of fossil fuel use is heating things up when it gets cold. So that's one of the good, good ways to, like, if, there's a way to actually use heat, which doesn't involve burning more fossil fuels, totally. Or I'm all for that. There is actually, one question I might ask actually is like, what are the coolants that people use for this kind of stuff? Because the, when you, I mean, when we move away from air, you're not norm, you're not typically just using water in some, all of these cases, there may be different kinds of chemicals or different kinds of coolants in use, right?
I mean, maybe you could talk a little bit about that, because I know that we had switches from when we've looked at how we use coolant elsewhere, there's been different generations of coolants for our, and in Europe, I know one thing we, there's a whole ongoing discussion about saying, "okay, if we're gonna have liquid cooling, can we at least make sure that the liquid, the coolants we're using are actually not the things which end up being massively emitting in their own right," because one of the big drivers of emissions is like end of life refrigerants and things like that. Maybe you could talk a little bit about like what your options are if you're gonna do liquid cooling and like, what's on the table right now?
To actually do something which is more efficient, but is also a bit more kind of non-toxic and safe if you're gonna have this inside a, in inside a given space.
My Truong: Yeah. So in liquid cooling there's a number of fluids that we can use. the most well understood of the fluids, as used both in the facility and the technical loop side is standard de-ionized water. Just water across the cold plate. There's variations that are used out there with a propylene glycol mix to manage microbial growth. The organization that I'm part of, we use a two-phase approach where we're taking a two-phase fluid, and taking advantage of phase change to remove that heat from the silicon. And in this space, this is where we have a lot of conversations around fluids and fluid safety and how we're thinking about that fluid and end of life usage of that fluid. Once you're removing heat with that fluid and putting it into a network, most of these heat networks are water-based heat networks where you're using some sort of water with a microbial treatment and going through treatment regimes to manage that water quality through the system.
So this is a very conventional approach. Overall, there's good and bads to every system. Water is very good at removing heat from systems. But as you start getting towards megawatt scale, the size of plumbing that you're requiring to go remove that heat and bring that fluid through, becomes a real technical challenge.
And also
at megawatts. Yeah. Yeah.
Zachary Smith: If I'm not mistaken.
Also, there's challenges if you're not doing a two-phase, approach to actually removing heat at a hot enough temperature that you can use it for something else, right?
My Truong: Correct. Correct, Zach. It's, so there's, like, a number of like very like technical angles to this. So as you're, going down that path, Zach, so in single phase what we do is we have to move fluid across that surface a good enough clip to make sure that we're removing heat keeping that silicon from overheating. Downside of this is like, as silicon requires colder and colder temperatures to keep them operating well, their opportunity to drive that heat source up high enough to be able to use in a municipal heat loop becomes lower and lower. So let's say, for example, your best in class silicon today asking for what's known as a 65 degree TJ. That's a number that we see in the silicon side. So you're basically saying, "I need my silicon to be 65 degrees Celsius or lower to be able to operate properly." flip side of that is you're gonna ask your infrastructure go deliver water between 12 to 17 degrees Celsius to make sure that, that cooling is supplied. But the flip side of that is that if you allow for, let's say, a 20 degree Celsius rise, your exit temperature on that water is only gonna 20 degrees higher than the 70 degrees inlet, so that water temperature is so low
And that's not a very nice shower, basically. Yeah.
You're in a lukewarm shower at best.
So, we have to do, then we have to tr spend a tremendous amount of energy then bring that heat quality up so that we can use it in a heat network. And two phase approaches, what we're taking advantage of is the physics of two-phase heat transfer, where, during phase change, you have exactly one temperature, which that fluid will phase change.
To a gas. Yeah. Yeah.
To a gas. Exactly.
Yeah.
And so the easiest way, like, we'll use the water example, but this is not typically what's used in two phase tech technologies, is that water at a atmospheric pressure will always phase change about a hundred degrees Celsius. It's not 101, it's not 99. It's always a hundred degrees Celsius at hemispheric pressure. So your silicon underneath that will always be at a, around a hundred degree Celsius or maybe a little bit higher depending on what your heat transfer, characteristics look like. And this is the physics that we take advantage of. So when you're doing that, the vapor side this becomes like a very valuable energy source and you can actually do some very creative things with it on two phase.
So that's, there's some, every technology has a, is a double-edged sword and we're taking advantage of the physics of heat transfer to effectively and efficiently remove heat in two-phase solutions.
Chris Adams: Ah, so I have one kind of question about the actual, how that changes what data center feels like to be inside, because I've been inside data centers and they are not quiet places to be. Like, I couldn't believe just how uncomfortably loud they are. And like, if you're moving away from fans, does that change how they sound, for example?
Because if, even if you're outside some buildings, people talk about some of the noise pollution aspects. Does a move to something like this mean that it changes some of it at all?
Zachary Smith: Oh yeah.
My Truong: In inside of the white space. Absolutely. Like one of the things that we fear the most inside of a data center is dead silence.
You might actually be able to end up in a data center where there's dead silence, soon.
And that being a good thing. Yeah.
With no fans. Yeah. We'd love to remove the parasitic draw of power from fans moving air across data centers, just to allow that power to go back into the workload itself.
Chris Adams: So for context, maybe someone, if you haven't been in a data center... I mean, it was around, I think it felt like 80 to 90 decibels for me, which felt like a, I mean, defects have a
Yeah, plus could have been more actually. Yeah. So I mean, it was a, I mean, if you have like an, if you have a something on a wearable, on a phone, as soon as it's above 90 degrees, 90 decibels, that's like
louder than lots of nightclubs, basically. Like maybe there's a comp. So this is one thing that I fell and this sounds like it does, like it can introduce some changes there as well rather than actually just, we're just talking about energy and water usage. Right.
Zachary Smith: Yeah, most data center technicians wear ear protectors all the time, can't talk on the phone, have to scream at each other, because it's so loud. Certainly there's, some really nice quality of life improvements that can happen when you're not blowing that much air around and spinning up multiple thousand
My Truong: 25,000 to 30,000 RPM fans will, require you double hearing protection to be able to even function as out of the space.
Yeah, that's the thing.
A lot of energy there.
Chris Adams: Oh, okay. Cool. So, so this is the, these are some of the, this is some of the shifts that make possible. So the idea, you can have, you might have data centers of what you're able to be more active in terms of actually working with the grid because for all the kind of things we might do as software engineers, there's actually a standard which makes sure that the things that we see Google doing or Meta talking about in academic papers could be more accessible to more people.
That's one of the things that having standards and for like Open19 things might be because there's just so many more people using 19 inch racks and things like that. That seems to be one thing. So maybe I could actually ask you folks like. This is one thing that you've been working on and My, you obviously running an organization, Zuta Core here, and Zach, it sounds like you're working on a number of these projects.
Are there any particular like open source projects or papers or things with really some of these. Some of the more wacky ideas or more interesting projects that you would point people to? Because when I talk about data centers and things like this, there's a paper from, that's called the Ecoviser paper, which is all about virtualizing power so that you could have power from batteries going to certain workloads and power from the grid going to other workloads.
And we've always thought about it as going one way, but it sounds like with things like Project Mycelium, you can go have things going the other way. Like for people who are really into this stuff, are there any, are there any good repos that you would point people to? Or is there a particular paper that you found exciting that you would direct people to who are still with us and still being able to keep up with the kind of, honestly, quite technical discussion we've had here.?
Zachary Smith: Well, I would, not to tout My's horn, but, reading the Open19 V2 specification, I think is worthwhile. Some of the challenges we dealt with at a kind of server and rack level, I think are indicative of where the market is and where it's going. There's also great stuff within the OCP Advanced Cooling working group. And I found it very interesting, especially to see some of what's coming from Hyperscale where they are able to move faster through a verticalized integration approach. And then I've just been really interested in following along the power systems, and related from the EV industry, I think there's, that's an exciting area where we can start to see data centers not as buildings for IT, but data centers as energy components.
So when you're looking at, whether it's EV or grid scale kind of renewable management, I think there's some really interesting tie-ins that our industry, frankly is not very good at yet.
Ah.
Most people who are working in data centers are not actually power experts from a generation or storage perspective.
And so there's some just educational opportunities there. I've found, just as one resource, My, I don't know if they have it, at the, the seven by 24 conference group, which is the critical infrastructure conference, which everything from like water systems, power systems to data centers, has been really a great learning place for me.
But I'm not sure if they have a publication that is useful. We, have some work to do in moving our industry into transparent Git repos.
My Truong: Chris, my favorite is actually the open BMC codebase. It provides a tremendous gateway where this used to be a very closed ecosystem, and very hard for us to think about being able to look through a code repo of a redfish API, and able to rev that spec in a way that could be useful and, implementable into an ecosystem has been like my favorite place outside of hardware specifications like
Chris Adams: Ah, okay. So I might try and translate that 'cause I, the BMC thing, this is basically the bit of computing, which essentially tells software what's going on inside of data, how much power it's using and stuff like that. Is that what you're referring to? And is Open BMC, like something used to be proprietary, there is now a more open standard so that there's a visibility that wasn't there before.
Is that what it is?
My Truong: Right. that's exactly right. So there you have to, in years past, had a closed ecosystem on the service controller or the BMC, the baseboard controller dule inside of a server and being able to look into that code base was always very difficult at best and traumatic at worst. But having open BMC reference code out there,
being look and see an implementation and port that code base into running systems has been very useful, I think, for the ecosystem to go and get more transparency, as Zach was saying, into API driven interfaces.
oh.
What I'm seeing is that prevalence of that code base now showing up in a, number of different places and the patterns are being designated into, as Zach was saying, power systems. We're seeing this, become more and more prevalent in power shelves, power control,
places where we used to not have access or we used to use programmable logic controllers to drive this. They're now becoming much more software ecosystem driven and opening up a lot form possibilities for us.
Chris Adams: Okay. I'm now understanding the whole idea behind Mycelium, like roots reaching down further down into the actual hardware to do things that couldn't be done before that. Okay. This now makes a lot more sense. Yeah.
Peel it back. One more layer.
Okay. Stacks within Stacks. Brilliant. Okay. This makes sense. Okay folks, well, thank you very much for actually sharing that and diving into those other projects.
We'll add some, if we can, we'll add some links to some of those things. 'Cause I think the open BMC, that's one thing that is actually in production in a few places. I know that Oxide Computer use some of this, but there's other providers who also have that as part of their stack now that you can see.
Right.
My Truong: We also put into production when we were part of the Packet Equinix team. So we have a little bit of experience in running this tech base in, real production workloads.
Chris Adams: Oh wow. I might ask you some questions outside this podcast 'cause this is one thing that we always struggle with is finding who's actually exposing any of these numbers for people who are actually further up the stack because it's a real challenge. Alright. Okay, we're coming up to time, so, I just wanna leave one question with you folks, if I may.
If people have found this interesting and they want to like, follow what's going on with Zach Smith and My Truong, where do they look? Where do they go? Like, can you just give us some pointers about where we should be following and what we should be linking to in the show notes? 'Cause I think there's quite a lot of stuff we've covered here and I think there's space for a lot more learning actually.
Zachary Smith: Well, I can't say I'm using X or related on a constant basis, but I'm on LinkedIn @zsmith, connect with me there. Follow. I post occasionally on working groups and other parts that I'm part of. And I'd encourage, if folks are interested, like we're very early in this hardware working group within the GSF.
There's so much opportunity. We need more help. We need more ideas. We need more places to try. And so if you're interested, I'd suggest joining or coming to some of our working group sessions. It's very early and we're open to all kinds of ideas as long as you're willing to, copy a core value from Equinix,
as long as you can speak up and then step up, we'd love the help. there's a lot to do.
Brilliant, Zach. And My, over to you.
My Truong: LinkedIn as well. Love to see people here as part of our working groups, and see what we can move forward here in the industry.
Chris Adams: Brilliant. Okay. Well, gentlemen, thank you so much for taking me through this tour all the way down the stack into the depths that we as software developers don't really have that much visibility into. And I hope you have a lovely morning slash day slash afternoon depending on where you are in the world.
Alright, cheers fellas.
Thanks Chris.
Thanks so much.
Hey everyone, thanks for listening. Just a reminder to follow Environment Variables on Apple Podcasts, Spotify, or wherever you get your podcasts. And please do leave a rating and review if you like what we're doing. It helps other people discover the show, and of course, we'd love to have more listeners.
To find out more about the Green Software Foundation, please visit greensoftware.foundation. That's greensoftware.foundation in any browser. Thanks again, and see you in the next episode.