LLM Energy Transparency with Scott Chamberlin

In this episode of Environment Variables, host Chris Adams welcomes Scott Chamberlin, co-founder of Neuralwatt and ex-Microsoft Software Engineer, to discuss energy transparency in large language models (LLMs). They explore the challenges of measuring AI emissions, the importance of data center transparency, and projects that work to enable flexible, carbon-aware use of AI. Scott shares insights into the current state of LLM energy reporting, the complexities of benchmarking across vendors, and how collaborative efforts can help create shared metrics to guide responsible AI development.

Learn more about our people:

Chris Adams: LinkedIn | GitHub | Website
Scott Chamberlin: LinkedIn | Website

Find out more about the GSF:

The Green Software Foundation Website
Sign up to the Green Software Foundation Newsletter

News:

Set a carbon fee in Sustainability Manager | Microsoft [26:45]
Making an Impact with Microsoft's Carbon Fee | Microsoft Report [28:40]
AI Training Load Fluctuations at Gigawatt-scale – Risk of Power Grid Blackout? – SemiAnalysis [49:12]

Resources:

If you enjoyed this episode then please either:

Follow, rate, and review on Apple Podcasts
Follow and rate on Spotify
Watch our videos on The Green Software Foundation YouTube Channel!
Connect with us on Twitter, Github and LinkedIn!

TRANSCRIPT BELOW:

Scott Chamberlin: Every AI factory is going to be power constrained in the future. And so what does compute look like if power is the number one limiting factor that you have to deal with?

Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.

I'm your host, Chris Adams.

Hello and welcome to Environment Variables, where we bring you the latest news and updates from the world of sustainable software development. I'm your host, Chris Adams. We talk a lot about transparency on this podcast when talking about green software, because if you want to manage the environmental impact of software, it really helps if you can actually measure it.

And as we've covered on this podcast before, measurement can very quickly become quite the rabbit hole to go down, particularly in new domains such as generative AI. So I'm glad to have our guest, Scott Chamberlain today here to help us navigate as we plum these depths. Why am I glad in particular?

Well, in previous lives, Scott not only built the Microsoft Windows operating system power and carbon tracking tooling, getting deep into the weeds of measuring how devices consume electricity, but he was also key in helping Microsoft Azure work out their own internal carbon accounting standards. He then moved on to working at Intel to work on a few related projects, including work to expose these kinds of numbers in usable form to developers when people when making the chips that go in these servers. His new project Neuralwatt is bringing more transparency and control to AI language models.

And a few weeks back when I was asking on LinkedIn for pointers on how to understand the energy usage from LLMs I use, he shared a link to a very cool demo showing basically the thing I was asking for: real-time energy usage figures from Nvidia cards directly in the interface of a chat tool. The video's in the show notes if you're curious.

And it is really, cool. So Scott, thank you so much for joining us. Is there anything else that I missed that you'd like to add for the intro before we dive into any of this stuff?

Scott Chamberlin: No, that sounds good.

Chris Adams: Cool. Well, Scott, thank you very much once again for joining us. If you are new to this podcast, just a reminder, we'll try and share a link to every single project in the show notes.

So if there are things that are particularly interest, go to podcast.greensoftware.foundation and we'll do our best to make sure that we have links to any papers, projects, or demos like we said. Alright, Scott, I've done a bit of an intro about your background and everything like that, and you're calling me from a kind of pleasingly green room today.

So maybe I should ask you, can I ask where you're calling from today and a little bit about like the place?

Scott Chamberlin: So I live in the mountains just west of Denver, Colorado, in a small town called Evergreen. I moved here in the big reshuffles just after the pandemic, like a lot of people wanted to shift to a slightly different lifestyle. And so yeah, my kids are growing here, going to high school here, and yeah, super enjoy it.

It gives me quick ability to get outside right outside my door.

Chris Adams: Cool. All right. Thank you very much for that. So it's a green software podcast and you're calling from Evergreen as well, in a green room, right? Wow.

Scott Chamberlin: That's right. I have a, I actually have a funny story I want to share from the first time I was on this podcast. It was me and Henry Richardson from Watttime talking about carbon awareness. And I made some focus on how the future, I believe, everything's going to be carbon aware. And I used a specific example of my robot vacuum of like, it's certainly gonna be charging in a carbon aware way at some point in the future.

I shared the podcast with my dad and he listened to it and he comes back to me and says, "Scott, the most carbon reduced vacuum is a broom."

Chris Adams: Well, it, he's not wrong. I mean, it's a, it's manual but it does definitely solve the problem and it's definitely got lower embedded carbon, that's for sure, actually.

Scott Chamberlin: Yeah.

Chris Adams: Cool. So Scott, thank you very much for that. Now, I spoke a little bit about your kind of career working in ginormous trillion dollar or multi-billion dollar tech companies, but you are now working at a startup Neuralwatt, but you mentioned before, like during, in our prep call, you said that actually after leaving a couple of the big corporate jobs, you spent a bit of time working on like, building your own version of like what a cloud it might be.

And I, we kind of ended up calling it like, what I called it Scott Cloud, like the most carbon aware, battery backed up, like really, kind of green software, cloud possible and like pretty much applying everything you learned in your various roles when you were basically paid to become an expert in this.

Can you talk a little bit about, okay, first of all, if it's, if I should be calling it something other than Scott Cloud and like are there any particular takeaways you did from that? Because that's had like quite an interesting project and that's probably what I think half of the people who listened to this podcast, if they had essentially a bunch of time to build this, they'd probably build something similar.

So yeah. Talk. I mean, why did you build that and, yeah, what are the, were there any things you learned that you'd like to share from there?

Scott Chamberlin: Sure. So, I think it's important to know that I had spent basically every year from about 2019 through about 2022, trying to work to add features to existing systems to make them more, have less environmental impact, lower CO2, both embodied as well as runtime carbon.

And I think it's, I came to realize that adding these systems on to existing systems is always going to come with a significant amount of compromises or significant amount of challenges because, I mean, I think it's just a core principle of carbon awareness is that there is going to be some trade off with how the system was already designed.

And a lot of times it's fairly challenging to navigate those trade offs. I tend to approach them fairly algorithmically, doing optimization on them, but I had always in the back of my mind thought about what would a system look like if the most important principle that we were designing the system from was to minimize emissions? Like if that was the number one thing, and then say performance came second, reliability came second, security has to come first before everything. There's not a lot of tradeoffs you have to make with carbon awareness and security. So I started thinking, I'm like, "what does a data center architecture look like if this is the most important thing?"

So of course, starts with the lowest, it's not the lowest, it's the highest performance-per-watt hardware you can get your hands on. And so really serving the landscape of really what that looked like. Architecting all the, everything we know about carbon awareness into the platform so that developers don't necessarily have to put it into their code, but get to take advantage of it in a fairly transparent and automatic way. And so you end up having things like location shifting as a fundamental principle of how your platform looks to a developer. So, as the idea was, we'd have a data center in France and a data center in the Pacific Northwest of the United States, where you have fairly non-correlated solar and wind values, but you also have very green base loads, so you're not trying to overcome your base load from the beginning.

But that time shifting was basically transparent to the platform. I mean, not time shifting, I'm sorry. Location shifting was transparent to the platform. And then time shifting was implemented for the appropriate parts. but it was all done with just standard open source software, in a way that we minimized carbon while taking a little bit of a hit on performance a little bit of a hit on latency, but in a way the developer could continue to focus on performance and latency, but got all the benefits of carbon reduction at the same time.

Chris Adams: Ah, okay. So when you said system, you weren't talking about like just maybe like an orchestrator, like Kubernetes that just spins up virtual machines. You're talking about going quite a bit deeper down into that then, like looking at hardware itself?

Scott Chamberlin: I started the hardware itself. 'Cause you have to have batteries, you have to have ability to store renewable energy when it's available. You have to have low power chips. You have to have low powered networking. You have to have redundancy.

And there's always these challenges when you talk about shifting in carbon awareness of, I guess the word is, leaving your resource, your capital resources idle.

So you have to take costs into account with that. And so the goal, but the other challenge that I wanted to do was the goal was have this all location based, very basic carbon accounting, and have as close to theoretically possible minimizing the carbon, as you can. Because it's not possible to get to zero without market based mechanics in when you're dealing with actual hardware.

So get as close to net zero as possible from a location based very, basic emissions accounting. So that was kind of the principle. And so, on that journey, we got pretty far to the point of ready to productize it, but then we decided to really pivot around energy and AI, which is where I'm at now.

But, so I don't have a lot of numbers of what that actual like net, close to the zero theoretically, baseline is. But I'm pretty close. It's like drastically smaller than what we are using in, say, Hyperscale or public cloud today.

Chris Adams: Oh, I see. Okay. So you basically, so rather than retrofitting a bunch of like green ideas onto, I guess Hyperscale big box out outta town style data centers, which already have a bunch of assumptions already made into them, you, it was almost like a clean sheet of paper, basically. You're working with that and that's the thing you spend a bunch of time into. And it sounds like if you were making some of this stuff transparent, it was almost like it wasn't really a developer's job to figure out, know what it was like shifting a piece of code to run in, say, Oregon versus France, for example, that would, that, the system would take care of that stuff.

You would just say, I just want you to run this in the cleanest possible fashion and don't, and as long as you respect my requirements about security or where the data's allowed to go, and it would take care of the rest. Basically that was the idea behind some of that, right?

Scott Chamberlin: That's the goal because in the many years I've been spending on this, like there's a great set of passionate developers that want to like minimize the emissions of the code, but it's a small percent, and I think the real change happens is if you make it part of the platform that you get a majority of the benefit, maybe, 80th percentile of the benefit, by making it automatic in a way.

Chris Adams: The default?

Yeah.

Scott Chamberlin: My software behaves as expected, but I get all the benefits of carbon reduction automatically. 'Cause developers already have so much to care about. And again, like, it's not every developer actually is able to make the trade offs between performance and CO2 awareness appropriately.

Right. It's really hard and we haven't made it easy for people. So that was the goal. Like how do you actually like enable the system to do that for you while the developer can focus on the demands, the principles that they're used to focusing on, making their software fast, making their software secure, making it reliable, making it have good user experience, that kind of stuff.

Chris Adams: Ah, that's interesting though. That's almost like, so like the kind of green aspect is almost like a implementation detail that doesn't necessarily need to be exposed to the developers somewhat in a way that when people talk about, say, designing systems for end users to use, there's a whole discussion about whether you, whether it's fair to expect someone to feel terrible for using Zoom and using Netflix, when really like, it makes more sense to actually do the work yourself as a designer or as a developer to design the system so by default is green. So rather than trying to get people to change their behavior massively, you're essentially going with the fact that people are kind of frail, busy, distracted people, and you're working at that level almost.

Scott Chamberlin: Yeah, I think that's the exact right term. It is green by default. And that phrase, when I started working on this in Windows,

so you know, like you referred to earlier, like I created all the carbon aware features in Windows and there was a debate early on like how do we enable these? Like should the carbon awareness feature, should it be a user experience?

I mean, should the user be able to opt in, opt out, that kind of stuff? And it was actually my boss, I was talking to this, he's like, "if you're doing this, it has to be the default," right? And so, you're never going to make the impact on any system if somebody, at the scale we really need to make this impact on, if people have to opt in. It has to be the default. And then sure, they can opt out if there's certain reasons that they want a different behavior. But green by default has to be the main way we make impact.

Chris Adams: That's actually quite an interesting, like framing because particularly when you talk about carbon aware and at devices themselves, this is something that we've seen with like a, I guess there is a, there, there's a default and then there's maybe like you, the thing you said before about it's really important to leave people in control so they can override that, feels like quite an important thing.

'Cause I remember when Apple rolled out the whole kind of carbon away charging for their phones, for example. Some people are like, "oh, ah, this is really cool. Things have, are slightly greener by default based on what Apple have showed me." But there are some other people who absolutely hated this because the user experience from their point of view is basically, I've got a phone, I need to charge it up, and I plugged it into my wall.

And then overnight it's been a really, high carbon grid period. So my phone hasn't been charged up and I woke up and now I've go to work and I've got no phone charger. And it just feels like this is exactly the thing. Like if you don't provide the, like a sensible kind of get out clause, then that can lead to a really, awful experience as well.

So there is like quite a lot of thought that needs to guess go into that kind of default, I suppose.

Scott Chamberlin: Definitely. Like the user experience of all of these things have to ultimately satisfy the expectations and needs of the users, right. You're, it is

another like learning experience we had, it was a deep, it was really a thought experiment, right? When we were working on some of the, and Windows is actually, we were working on the ability to change the timer for how fast the device goes to sleep.

Because there's a drastic difference even in between an active mode, and the sleep state that, it's basically when the device will turn on if you touch the mouse, screen's off, it goes into low power state. And so one of the changes we made in Windows was to lower that value from the defaults.

And it's fairly complex about how these defaults get set. Basically, they're set by the OEMs and different power profiles. But we wanted to lower the default that all software was provided. And we did some analysis of what the ideal default would be. But the question in the user experience point of view was "if we set this too low, will there be too many people turning it to, basically, entirely off, rather than what the old default was, which was like 10 minutes?" So let's use these values. Theoretically, I can't remember what the exact values are, but old default, 10 minutes, new default three minutes for going from active to sleep.

If people were, if three minutes was not the right value and we got maybe 20% of the people entirely turning it off, is the carbon impact worse for the overall, fleet of Windows devices by those 20% people turning off 'cause we got a bad user experience by changing the default? So we had to do all these analyses, and have this ability to really look for unintended consequences of changing these.

And that's why the user experience is really critical when you're dealing with some of these things.

Chris Adams: Ah, that's, okay, that's quite useful nuance to actually take into account 'cause there is, there's a whole discussion about kind of setting defaults for green, but then there's also some of the other things. And I actually, now that you said that I realize I'm actually just, 'cause I am one of these terrible people who does that because I've, like,

I mean I'm using a Mac. Right. And, you see when people are using a laptop and it starts to dim and they start like touching the touch pat thing to kinda make it brighten again. And you see people do that a few times. There's an application called Caffeine on a Mac, and that basically stops it going to sleep, right. And so that's great. I mean, but It's also then introduces the idea of like, am Is my a DD bad adult brain gonna remember to switch that back off again? Like, this are the things that come up. So this is actually something that I have direct experience, so that is very much hitting true with me, actually.

Okay. So that was the thing you did with, I'm calling it Scott Cloud, but I assume there was another name that we had for that, but that's, that work eventually became something that Neuralwatt. That's like you went from there and move into this stuff, right?

Scott Chamberlin: Right. So, Scott Cloud or Carbon, Net Zero Cloud, was basically a science experiment. And I wanted to deploy it for the purposes of just really seeing, you learn so much when things are in production and you have real users, but before I did it, I started talking to a lot of people I trusted in my network.

And one of my old colleagues from Microsoft and a good friend of mine, he really dug into it and started pushing me on like some serious questions like, "well, what does this really impact in terms of energy?" Like it was a CO2 optimization exercise, was that project. And he's like, "well what's the impact on energy?

What's the impact on AI?" And actually to, Asim Hussain, he is, he's asked the same question. He's like, "you can't release anything today," and this is, let's rewind, like a year ago, he's like, "you can't release anything today that doesn't have some story about AI," right? And this was just a basic just compute platform with nothing specific about AI.

So both of those comments really struck home. I was like, okay, I gotta like figure out this AI stuff we got. And I've gotta answer the energy question, it's wasn't hard 'cause it was already being measured as part of the platform, but I just was focused on CO2. And what it turned out was that there were some really interesting implications once we started to apply some of the optimization techniques to the GPU and how the GPU was being run from energy point of view, that ended up being in, that we, when we looked into it and it ended up being like potentially more impactful in the short term than the overall platform. And so, that colleague Chad Gibson, really convinced me in our discussions to really spin that piece out of the platform as a basis of the startup that we went and decided to build, which we call Neuralwatt now.

So yeah, what Neuralwatt really is, like the legacy of that, all that work, but the pieces that we could really take out of it that were focused on GPU energy optimization, within the context of AI, growth and energy demands, because those are becoming really critical challenges, not just for just businesses, but there are critical challenges that are underlying all of our, the work against green software, underlying all of the work, and around trying to reduce emissions of compute as a whole.

Right? And we're just really looking at a new paradigm with the exponential increase in energy use of compute and what behaviors that's driving in terms of getting new generators online, as well as what is the user experience behaviors when LLMs are built into everything, LLMs or other AIs are built into everything?

And so I felt that was really important to get focused on as quickly as possible. And that's where we really, really jumped off, with Neuralwatt on.

Chris Adams: Oh, I see.

Okay. So the, basically there is a chunk of like, usage and there's the ability to kind of make an improvement in the existing set of like, like a fleet of servers and everything. Like that's already could have deployed around the world. But you see this thing which is growing really fast.

And if we look at things like the International Energy Agency's own report, AI and Energy, they basically say over the next five years looks like it's gonna be a rough, their various projections are saying it's probably gonna be the same energy use as all data centers. So it makes more sense to try and blunt some of that shift as early as possible.

Or like that's where you felt like you had more chance for leverage essentially.

Scott Chamberlin: More chance for leverage, more interest in really having an impact. Because, I mean, we were in really in a period of flat growth in terms of energy for data centers prior to the AI boom because the increase in use in data centers was basically equaled out by the improvement in energy efficiency of the systems themselves.

And there's a lot of factors that went into why that was really balancing, relatively balancing out, but the deployment of the GPUs and the deployment of massively parallel compute and utilization of those from the point of view of AI both training and inference, really changed that equation entirely. Right. And so basically from 2019 on, we've basically seen going from relatively flat growth in data centers to very steep ramp in terms of energy growth.

Chris Adams: Okay. Alright. Now we're gonna come back to Neuralwatt for a little bit later. Partly because the demo you shared was pretty actually quite cool actually, and I still haven't had anything that provides that kind of live information. But one thing that I did learn when I was asking about this, and this is probably speaks to your time when you're working in a number of larger companies, is that there is a bit of a art to get large companies who are largely driven by like, say, profits for the next quarter to actually invest in kind of transparency or sustainability measures. And one thing that I do know that when you were working at Microsoft, one thing I saw actually, and this is one thing I was surprised by when I was asked, I was asking on LinkedIn, like, okay, well if I'm using various AI tools, what's out there that can expose numbers to me?

And there was actually some work by a guy, Will Alpine, providing some metrics on existing AI for an existing kind of AI pipeline. that's one of the only tools I've seen that does expose the numbers or provide the numbers from the actual, the cloud provider themselves. And as I understood it, that wasn't a thing that was almost like a passion project that was funded by some internal kind of carbon fund or something.

Could you maybe talk a little bit about that and how that, and what it's like getting, I guess, large organizations to fund some ideas like that because I found that really interesting to see that, and I, and there was, and as I understand it, the way that there was actually a kind of pool of resources for employees to do that kind of work was actually quite novel.

And not something I've seen in that many, places before.

Scott Chamberlin: Yeah, no, I think that was great work and Will is, want to, I'm a big fan of Will's work and I had the fortune to collaborate with him at that period of both of our careers when really it was, I don't think carbon work is easy to get done anywhere, in my experience, but that, I think Microsoft had a little bit of forethought in terms of designing the carbon tax. And yeah, we did have the ability to really vet a mission vet projects that could have a material impact against Microsoft's net zero goals and get those funded by the carbon tax that was implemented internally.

And so the mechanism was, every, as Microsoft built the capability to audit and report on their carbon, they would assign a dollar value to that from teams and then that money went from those teams budget into a central budget that was then reallocated for carbon reduction goals.

And yeah, I think Will was really at the forefront of identifying that these AI and, we all just really said ML back then, but now we all just say AI, but this GPU energy use was a big driver of the growth and so he really did a ton of work to figure out what that looked like at scale, figure out the mechanics of really exposing it within the Hyperscale cloud environment, taking, essentially like NVIDIA's also done a great job in terms of keeping energy values in.

their APIs and exposed through their chips and through their drivers, so that you can use it fairly easy on GPU. I would say it's more challenging on CPUs to do so, or the rest of the system, but, so he like did a great job in collaboration with those interfaces to get that exposed into the Azure, I think it's the ML studio is what it's called.

So that it has been there for many years, this ability to see and audit your energy values, if you're using the Azure platform.

Yeah, those super good work.

Chris Adams: Yeah, so this was the thing. I forget the name of it and I'm a bit embarrassed to actually forget it. But, let, I'm just gonna play back to what I think you're saying. 'Cause when I was reading about this is something that I hadn't seen in that many other organizations. So like there's an internal carbon levy, which is basically for every ton that gets emitted, there was like a kind of a dollar amount allocated to that. And that went to like a kind of internal, let's call it a carbon war chest, right? So like there's a bunch of money that you could use. And then any member of staff was basically then able to say, I think we should use some of this to deliver this thing because we think it's gonna provide some savings or it's gonna help us hit our whatever kind of sustainability targets we actually have.

And one of the things that came outta that was essentially, actual meaningful energy report energy figures, if you're using these tools, and this is something that no, the other clouds, you're definitely not gonna get from Amazon right now. Google will show you the carbon but won't show you the energy.

And if you're using chat GPT, you definitely can't see this stuff. But it sounds like the APIs do exist. So it's just a, it has been largely a case of staff being prepared, they're being kind of will inside the system. And people being able to kind of win those, some of those fights to get people to allocate time and money to actually make this thing that's available for people, right?

Scott Chamberlin: The Nvidia APIs definitely exist. I think the challenge is the methodology and the standards, right? So, within a cloud there's a lot of like complexity around how cycles and compute is getting assigned to users and how do you fairly and accurately count for that? GPUs happen to be a little bit simpler 'cause we tend to allocate a single chip to a single user at a single time.

Whereas in like CPUs, there's a lot of like hyper threading, most clouds are moving to over subscription or even just single hardware threads are 10 are starting to get shared between multiple users. And how do we allocate the, first the energy, all this starts with energy, how to allocate first the energy, and then the CO2 based on a location.

And then, the big complexity in terms of the perception that these clouds want to have around net zero. They're, they want to, everyone wants to say they're net zero for a market-based mechanic. And what's the prevailing viewpoint within the, what is allowed with the GHG protocol or what is the perception that the marketing team wants to have?

Is a lot of the challenges. it tends to, at least in the GPU energy, there's not like huge technical challenges, but there's a lot of like marketing and

accounting and methodology challenges to overcome.

Chris Adams: So that's interesting. Well, so I did an interview with Kate Goldenring who was working at Fermyon at the time. We'll share a link to that for people and I will also share some links to both the internal carbon levy and how essentially large organizations have funded this kind of like climate kind of green software stuff internally.

'Cause I think other people working inside their companies will kind of want, will find that useful. But I'm just gonna play back to you a little bit about what you said there and then we'll talk a little bit about the, demo you shared with me. So it does seem like, so GPUs like, the thing that's used for AI accelerators, they can provide the numbers.

And that is actually something that's technically possible a lot of the time. And it sounds like that might be kind of tech technically slightly less complex at one level than way the way people sell kind of cloud computing. 'cause when we did the interview with Kate Goldenring, and we'll share the link to that, she basically told, she could have explained to me that, okay, let's say there is a server and it's got maybe, say 32 little processes like, cores inside this, what tends to happen, because not everyone is using all 32 cores at all the same time, you can pretty much get away with selling maybe 40 or 50 cores because not everyone's using all the same tool, all the cores at the same time. And that allows you to basically, essentially sell more compute.

So end up having, you make slightly more money and you end up having a much more kinda like profitable service. And that's been one of the kind of offers of cloud. And also from the perspective of people who are actually customers that is providing a degree of efficiency. So if you have, like, if you don't need to build another server because that one server is able to serve more customers, then there's a kinda hardware efficiency argument.

But it sounds like you're saying that with GPUs, you don't have that kind of over a subscription thing, so you could get the numbers, but there's a whole bunch of other things that might make it a bit more complicated elsewhere, simply because it's a new domain and we are finding out there are new things that happen with GpUs, for example.

Scott Chamberlin: Yeah. So, yeah, that's exactly what I was trying to say. And I think we are seeing emerging GPU over subscription, GPU sharing. So at the end of that will probably change at some point and at scale. It's certainly the technology is there. Like I think NVIDIA's acquisition of run.ai, enables some of this GPU sharing and that, they acquired that company of like six months ago and It's now open source and so people can take advantage of that.

But yes, I think that the core principle is like, from a embodied admissions point of view and in a, green software point of view, it's relatively a good practice to drive up the utilization of these embodied missions you've already like purchased and deployed. There are a lot, some performance implications around doing the sharing that how, it gives back user experience, but today the really, the state of the art is GPU, is that it's mostly just singly allocated and fully utilized when it's utilized or it's not fully utilized, but it's utilized for a single customer, at a time. But that is certainly changing.

Chris Adams: Oh, okay. So I basically, if I'm using a tool, then I'm not sharing it with anyone else in the same way that we typically we'd be doing with cloud and that, okay, that probably helps me understand that cool demo you shared with me then. So maybe we'll just talk a little bit about that. 'cause this was actually pretty, pretty neat when I actually asked that when you showed like, here's a video of literally the thing you wished existed, that was kind of handy.

Right? So, basically if you, we will share the link to the video, but the key thing that Scott shared with me was that using tools like say a chat GPT thing or anthropic where I'm asking questions and I'll see kind of tokens come out in us when I'm asking a question. It we were, we basically saw charts of realtime energy usage and it changing depending on what I was actually doing.

And, maybe you could talk a little bit about actually what's actually going on there and how you came to that. Because it sounds like Neuralwatt wasn't just about trying to provide some transparency. There's actually some other things you can do. So not only do you see it, but you can manage some of the energy use in the middle of a, for like an LLM session, for example, right?

Scott Chamberlin: So yeah, at the first stage, the question is really just what is, can we measure what's happening today and what does it really look like in terms of how you typically deploy, say, a chat interface or inference system? So, like I was mentioning, we have ability fairly easily because NVIDIA does great work in this space to read those values on the GPU specifically, again,

there's system implications for what's happening on the CPU what's happening on the network, the discs.

They tend to be outstripped by far because these GPUs use so much energy.

But so, the first step in really that demo is really just to show what the behavior is like because what we ultimately do within the Neuralwatt code is we take over all of the energy management, all of the system, and We train our own models to basically shift the behavior of servers from one that is focused on maximizing performance for the available energy to balancing the performance for the energy in a energy efficiency mode, essentially. So we are training models that shift the behavior of energy of the computer for energy efficiency.

And so that's why we want to visualize multiple things. We want to visualize what the user experience trade off is. Again, going back to the user experience. You have to have great user experience if you're gonna be doing these things. And we want to visualize the potential gains and the potential value add for our customers in making this shift.

Because, I think we talk about, Jensen Huang made a quote at GTC that we love is that, we are a power constrained industry.

Every AI factory is going to be power constrained in the future. And so what does compute look like if power is the number one limiting factor that you have to deal with?

So that's why we believe, we really want to enable service to operate differently than what they've done in the past. And we want there to be some, essentially think about, as, like energy awareness, right? That's the word I come back to. Like we want behavior of servers to be energy aware because of these power constraints.

Chris Adams: Ah, okay. Alright. you said a couple of things that I, that kind of, I just want to run by you to check. So, with the, there's this thing, there's all these new awarenesses, there's like carbon aware, then there's grid aware, then there's energy aware. This is clearly like an area where people were trying to figure out what to call things.

But the Neuralwatt, the neural, the thing that you folks are doing was basically okay, yes, you have access to the power and you can make that available, so I'm using something, but I'm just gonna try and run this by you and I might be right and you, I might need you to correct me on this, but it sounds a little bit like the thing that you are allowing to do is almost throttling the power that gets allocated to a given chip. 'Cause if you use, like things like Linux or certain systems they have, like they can introduce limits on the power that is allocated to a particular chip. But if you do that, that can have a unintended effect of making things run a little bit too slowly, for example.

But there, there's a bit of head, there's a bit of headroom there. But if you are able to go from giving absolute power, like, take as much power as you want to, having a kind of finite amount allocated, then you can basically still have a kind of a good, useful experience, but you can reduce it to the amount of power that's actually be consumed. It sounds like you're doing something a little bit like that, but with Neuralwatt thing. So rather than giving it, carte blanche to take all the power, you are kind of asking it to work within a kind of power envelope. That means that you're not having to use quite so much power to do the same kind of work.

Is that it?

Scott Chamberlin: Yeah. So if you go back to the history of like, before we had GPUs everywhere, the CPUs have fairly, let's call 'em like moderate level sophistication of terms of power management. They have sleep states, they have performance states, and there's components that run on the OS that are called, basically CPU governors that govern how a CPU behaves relative to various constraints.

And so, when you allocate a, let's say a Linux VM in the cloud, I don't know why this is, but a lot of 'em get default allocated with a, I'm the name of, it's slipping in my mind, but there's about five default CPU governors in the default Linux Distros, and they get out allocated with the power save one, actually.

And so what it does, it actually limits the top frequencies that you can get to, but it essentially is balancing power and performance is kind of the default that you get allocated. You can check these things, you can change it to a performance mode, which basically is gonna use all of the capability of the processor at a much higher energy use.

And, but on the GPU it's a lot less sophisticated, right? There's, GPUs don't tend to support any sleep states other than just power off and on. And they do have different performance states, but they're not as sophisticated as the CPU has historically been. And so essentially we are inserting ourselves into the OS Neuralwatt and managing it in a more sophisticated manner around exactly how you're describing.

We're making the trade off and we're learning the trade off really through modeling. We're learning the trade off to maintain great user experience, but get power gains, power savings, with our technology, and doing this across the system. So, yes, I think your description essentially, very good. And, we're just essentially adding a level sophistication into the OS, than what exists today.

Chris Adams: Okay. So basically, rather than being able to pull infinite power is, has, like, it's an upper limit by how much it can pull, but you'd probably want to kind of, the reason you're doing some of the training is you're based on how people use this, you'd like the upper limit, the kind of, the upper limit available to what's actually being needed so that you've, you're still giving enough room, but you're not, you're delivering some kind of savings.

Scott Chamberlin: Yeah, and it's important to understand that there's, it's fairly complex, which is why we train models to do this rather than do it,

Chris Adams: Like sit at one level and just one and done. Yeah.

Scott Chamberlin: Because think about like a LLM, right? So there's essentially two large phases in inference for an LLM. And one of the first phase is really compute heavy, and then the second phase is more memory heavy. And so we can use different, power performance trade-offs in those phases. And understanding what those phases are and what the transition looks like from a reservable state is part of what we do. And then the GPU is just one part of the larger system, right?

It's engaged in the CPU. A lot of these LLMs are engaged in the network. And so how do we balance all the, tradeoffs so to maintain the great user experience for the best amount of power efficiency? That's essentially like what we're, our fitness function is when we're essentially training.

Chris Adams: Ah. I think I understand that now. And, and what you said about the, those two phases, presumably that's like one of, one of it is like taking a model, loading it to something a bit like memory. And then there's a second part which might be accessing, doing the lookups against that memory. Because you need to have the thing, the lookup process when you're seeing the text come out that is quite memory intensive rather than CPU intensive.

So if you're able to change how the power is used to reflect that, then you can deliver some kind of savings inside that. And if you scale that up a data center level that's like, like 10%, 20, I mean, maybe even, yeah. Do you have an idea of like what kind

Scott Chamberlin: We tend to shoot for at least 20% improvements in what I would say performance per unit of energy. So tokens per Joule is the metric I tend to come back to fairly often. But again, how exactly you measure energy on these things, what is the right metric, I think is, I think you need to use a bunch of 'em.

But, I like tokens per Joule 'cause it's fairly simple and it's fairly, it's easy to normalize. But like, it's, it gets super interesting in this conversation about like, inference time, compute and thinking LLMs and stuff like that. 'Cause they're generating tons and tons of tokens and not all of 'em are exposed to, essentially improve their output.

And so people use all their metrics, but they're harder to normalize. So, yeah, long, long story short, I tend to come back to tokens for Joule is my favorite, but,

Chris Adams: So what, so it sounds like the thing that you're working on doing is basically through kind of judicious use of like power envelopes that more accurately match what is actually being required by a GPU or anything like that, you're able to deliver some savings that way. That's essentially, that's one of the things, and like you said before when we were talking about kind of Scott Cloud, that's transparent to the user.

I don't have to be thinking about my prompt or something like that. This is happening in the background, so I don't really, my experience isn't changed, but I am basically receipt of that, 20% of power is basically not being turned into carbon dioxide in the sky, for example, but it's basically the same other than that though.

Scott Chamberlin: That's the goal, right? Essentially we've informed our work continually on number one, user experience has to be great, number two, developer experience has to be great, which means the developer shouldn't have to care about it. So, yeah, it's a single container download, it runs in the background.

It does all the governance in a fairly transparent way. But you know, all throughout as well, like, we actually have CO2 optimization mode as well, so we can do all of this. Fall mode is really energy, but we actually can flip a switch and we get an extra degree of variability where if we're optimizing on CO2, average or marginal emissions. so, we can vary those behaviors of the system relative to the intensity of the carbon in the grid as well. So

Chris Adams: Okay. Or possibly, if not the grid, then the 29 data, 29 gas turbines that are powering that data center today, for example.

Scott Chamberlin: I think that's an emerging problem. And I actually would love to talk to somebody that has a data center that is having a microgrid with a gas turbine, because I actually do believe there's additional optimization available for these little microgrids, that are being deployed alongside these data centers.

If you were to do plummet all the way through in this energy, again, go back to energy awareness, right. Like if your servers knew how your microgrids were behaving relative to the macro grid that they were connected to, like, there's so many interesting optimizations available and, people are looking at this from the point of view of the site infrastructure, but like the reality is all of the load is generated by the compute on the server.

Right. And that's what we're really trying to bring it all the way through to where load originates and the behavior where, while maintain that user experience. So

Chris Adams: Okay. So you said something interesting there, I think, about this part, like the fact that a, you mentioned before that GPU usage is a little bit less sophisticated right now. You said it's either all on and all off. And when you've got something which is like, the power, the multiple thousands of homes worth of power, that can be all on and all off very, quickly.

That's surely gotta have to have some kind of implications, within the data center, but also any poor people connected to the data center. Right? Because, if you are basically making the equivalent to tens of thousands of people disappear from the grid, then reappear from the grid like inside in less than a second, there's gotta be some like a knock on effect for that.

Like, you spoke about like gas turbines. It's like, is there, do you reckon we're gonna see people finding something in the middle to act like a kind of shock absorber for these changes that kind of go through to the grid? Because if you're operating a grid, that feels like the kind of thing that's gonna really mess with you being able to provide like a consistent, kind quality of power to everyone else.

If you've got the biggest use of energy also swinging up and down the most as well, surely.

Scott Chamberlin: Yeah, and it's certainly like a, I don't know if existential problem is the right word, but it's certainly a emerging, very challenging problem,

Chris Adams: Mm-hmm.

Scott Chamberlin: within the space of data centers is the essentially like seeking up of some of these behaviors among the GPUs to cause correlated spikes and drops in power and it's not, it has implications within your data center infrastructure and, to the point where we hear from customers that they're no longer deploying some of their UPSs or their battery backups within the GPU clusters because they don't have the electronics to handle the loads shifting so dramatically, to the point where we're also getting emerging challenges in the grid in terms of how these loads ramp up or down and affect, say, I'm not gonna get into where I'm not an expert in terms of the generational, aspects of generation on the grid and maintaining frequency, but it has implications for that as well. But so, we, in the software we can certainly smooth those things out, but there's also, I mean, there's weird behaviors happening right now in terms of trying to manage this.

My favorite, and I don't know if you've heard of this too, Chris, is PyTorch has a mode now where they basically burn just empty cycles to keep the power from dropping down dramatically when, I think it's when weights are sinking, in PyTorch, I'm not exactly sure when it was implemented.

Because i've only read about it, but you know, when you maybe need to sink weights across your network and so some GPUs have to stop, what they've implemented is some busy work so that the power doesn't drop dramatically and cause this really spiky behavior.

Chris Adams: Ah.

Scott Chamberlin: So I think what

Chris Adams: you're referring to, Yeah. So this the PyTorch_no_powerplant_blowup=1 they had, right? Yeah. This, I remember reading about this in semi analysis. It just blew my mind. The idea that you have to essentially, keep it running because the spike would be so damaging to the rest of the grid that they have to kind of simulate some power, so it doesn't, so they don't have that that change propagate through to the rest of the grid, basically.

Scott Chamberlin: Correct. And so, that's one of the, we look at problems like that, there's that problem in terms of the thinking of the way the problem if you train, start a training run

where all the GPUs basically start at the same time and create a surge. And, so, we help with some of those situations in our software.

But yes, I think that some of the behaviors that are getting implemented, like the, no_powerplant_blowup=1 , they're fairly, I would say they're probably not great from a green software point of view because anytime we are, we're, doing busy work, that's an opportunity to reduce energy, reduce CO2 and there probably are ways of just managing that in a bit with a bit more sophistication depending on the amount of, the scale that you're working at, that is, probably may have been more appropriate than that.

So this is definitely

Chris Adams: still needs

Scott Chamberlin: to be looked at a little bit.

Chris Adams: So like, I mean before like in the pre AI days, there was this notion of like a thundering herd problem where everything tries to use the same connection or do the same at the same time. It sounds like this kind PyTorch_no_powerplant_blowup=1 is essentially like the kind of AI equivalent to like seeing that problem coming and then realizing it's a much greater magnitude and then figuring out, okay, we need to find a elegant solution to this in the long run.

But right now we're just gonna use this thing for now. Because it turns out that having incredibly spiky power usage kind of propagating outside the data center wrecks all kinds of havoc basically. And we probably don't want to do that if we want to keep being connected to the grid.

Scott Chamberlin: Yeah. But at it's really a, spiky behavior at scale is really problematic. Yes.

Chris Adams: Okay, cool. Dude, I'm so sorry. We're totally down this kind of like, this AI energy spikiness rabbit hole, but I guess it is what happens

when

Scott Chamberlin: it's certainly a, it's certainly customers are really interested in this because it's, it, I mean if we were to like bubble up one level, like there's this core challenge in the AI space where the energy people don't necessarily talk the same language as the software people.

And we, I think that's one place where maybe Hyperscale has a little bit more advantage 'cause it has emerged from software companies, but Hyperscale is not the only game in town and especially when we're going to neo clouds and stuff like that. And so, I think one of our like side goals is really how do we actually enable people talking energy and infrastructure to have the same conversations and create requirements and coordinate with the software people running the loads within the data centers? I think that's the only way to really solve this holistically. So,

Chris Adams: I think you're right. I mean, this is, to bring this to some of the kind of world of green software, I suppose, the Green Software Foundation did a merger with, I think they're called it, SSIA, the Sustainable Servers and Infrastructure Alliance. I think it's something like that. We had them on a couple of episodes a while ago, one where there was a whole discussion about, okay, how do, setting out some hardware standards to have this thing kind of crossing this barrier.

Because, like you said, it does it, as we've learned on this podcast, some changes you might make at AI level can have all these quite significant implications. Not just thinking about like the air quality and climate related issues of having masses and masses of on-premise gas turbines. But there's a whole thing about power quality, which is not something that you've had to think about in terms of relating to other people, but that's something that's clearly needs to be on the agenda as we go forward.

Just like, like responsible developers. I should, before we just kind of go further down there, I should just check, we just, we're just coming up to time. we've spent all this time talking about

this and like we've mentioned a couple of projects. Are there any other things that aren't related to, like spiky AI power that you are looking at and you find, Hey, I wish, I'm on this podcast, I wish more people knew about this project here or that project there.

Like, are there any things that you are, you've, read about the news or any people's work that you're really ins impressed by and you wish more people knew about right now?

Scott Chamberlin: Yeah, I mean, I think a lot of people probably on this podcast probably know about AI Energy Score. Like I think that's a promising project. Like I really believe that we need to have the ability to understand both the energy and the CO2 implications of some of these models we're using and the ability to compare them and compare the trade-offs.

I do think that, the level of sophistication needs to get a bit higher because it's, right now it's super easy to trade off model size and energy. Like, I can go, single GPU and, but I'm trading off capabilities for that. So how do we, I think on one of my blog posts, it was someone's ideas.

Like you really have to normalize against the capabilities and the energy at the same time for making your decisions about what the right model is for your use cases relative to the energy available to say the CO2 goals you have. So, but yeah, I think eventually they'll get there in that project.

So I think that's a super promising project.

Chris Adams: We will share a link to that. So we definitely got some of that stuff for the AI Energy Score, 'cause it's entirely open source and you can run it for open models, you can run it for private models and if you are someone with a budget you can require customers to, or you can require suppliers to publish the results to the leaderboard, which would be incredibly useful because this whole thing was about energy transparency and like.

Scott Chamberlin: Yeah,

Chris Adams: I'm glad you mentioned that. That's like one of the, I think that's one of the more useful tools out there that is actually relatively, like, relatively easy to kind of write into contracts or to put into a policy for a team to be using or team to be adopting, for example.

Scott Chamberlin: Correct. Yep. No, a big fan, so.

Chris Adams: Well that's good news for Boris. Boris, if you're hearing this then yeah, thumbs up, and the rest of the team there, I only mentioned Boris 'cause he's in one of the team I know and he's in the Climateaction.tech Slack that a few of us tend

Scott Chamberlin: Yeah. Boris and I talked last week. Yeah. A big fan of his work and I think Sasha Luccioni, who I actually never met, but yeah, I think she's also the project lead on that one.

Chris Adams: Oh, Scott, we are coming up to the time and I didn't get a chance to talk about anything going on in France, and with things like Mistral sharing some of their data, some of their environmental impact figures and stuff like that because it's actually, it's kind of, I mean, literal, just two days ago we had Mistral, the French kind of competitor to open AI,

they, for the first time started sharing some environmental figures and quite a lot of detail. More so than a single kind of like mention from Sam Altman about power, about the energy used by AI query. We've, we actually got quite a lot of data about the carbon and the water usage and stuff like that.

But no energy though. But that's something we'll have to speak about another time. So hopefully maybe I'll get, be able to get you on and we can talk a little bit about that and talk about, I don't know the off-grid data centers of Crusoe and all the things like that. But until then though, Scott, I really, I'm, I've really enjoyed this deep dive with you and I do hope that the, our listeners have been able to keep up as we go progressively more detailed.

And, if you have stayed with us, listeners, what we'll do is we'll make sure that we've got plenty of show notes so that people who are curious about any of this stuff can have plenty to read over the weekend. Scott, this has been loads of fun. Thank you so much for coming on and I hope you have a lovely day in Evergreen Town.

Scott Chamberlin: Thanks Chris.

Chris Adams: Alright, take care of yourself, Scott. Thanks.

Hey everyone, thanks for listening. Just a reminder to follow Environment Variables on Apple Podcasts, Spotify, or wherever you get your podcasts. And please do leave a rating and review if you like what we're doing. It helps other people discover the show, and of course, we'd love to have more listeners.

To find out more about the Green Software Foundation, please visit greensoftware. foundation. That's greensoftware. foundation in any browser. Thanks again, and see you in the next episode.