AI Energy Measurement for Beginners

Host Chris Adams is joined by Charles Tripp and Dawn Nafus to explore the complexities of measuring AI's environmental impact from a novice’s starting point. They discuss their research paper, A Beginner's Guide to Power and Energy Measurement and Estimation for Computing and Machine Learning, breaking down key insights on how energy efficiency in AI systems is often misunderstood. They discuss practical strategies for optimizing energy use, the challenges of accurate measurement, and the broader implications of AI’s energy demands. They also highlight initiatives like Hugging Face’s Energy Score Alliance, discuss how transparency and better metrics can drive more sustainable AI development and how they both have a commonality with eagle(s)!

Learn more about our people:

Chris Adams: LinkedIn | GitHub | Website
Dawn Nafus: LinkedIn
Charles Tripp: LinkedIn

Find out more about the GSF:

The Green Software Foundation Website
Sign up to the Green Software Foundation Newsletter

News:

Events:

NREL Partner Forum Agenda | 12-13 May 2025

Resources:

If you enjoyed this episode then please either:

Follow, rate, and review on Apple Podcasts
Follow and rate on Spotify
Watch our videos on The Green Software Foundation YouTube Channel!

Connect with us on Twitter, Github and LinkedIn!

TRANSCRIPT BELOW:

Charles Tripp: But now it's starting to be like, well, we can't build that data center because we can't get the energy to it that we need to do the things we want to do with it. we haven't taken that incremental cost into account over time, we just kind of ignored it. And now we hit like the barrier, right?

Chris Adams: Hello, and welcome to Environment Variables, brought to you by the Green Software Foundation. In each episode, we discuss the latest news and events surrounding green software. On our show, you can expect candid conversations with top experts in their field who have a passion for how to reduce the greenhouse gas emissions of software.

I'm your host, Chris Adams.

Welcome to Environment Variables, where we bring you the latest news and updates from the world of sustainable software development. I'm your host Chris Adams. If you follow a strict media diet, you switch off the Wi-Fi on your house and you throw your phone into the ocean, you might be able to avoid the constant stream of stories about AI in the tech industry. For the rest of us, though, it's basically unavoidable. So having an understanding of the environmental impact of AI is increasingly important if you want to be a responsible practitioner navigating the world of AI, generative AI, machine learning models, DeepSeek, and the rest. Earlier this year, I had a paper shared with me with the intriguing title A Beginner's Guide to Power and Energy Measurement, an Estimation for Computing and Machine Learning. And it turned out to be one of the most useful resources I've since come across for making sense of the environmental footprint of AI. So I was over the moon when I found out that two of the authors were both willing and able to come on to discuss this subject today. So joining me today are Dawn Nafus and Charles Tripp, who worked on the paper and did all this research. And well, instead of me introducing them, well, they're right here. I might as well let them do the honors themselves, actually. So, I'm just going to work in alphabetical order. Charles, I think you're slightly ahead of Dawn. So, if I, can I just give you the room to, like, introduce yourself?

Charles Tripp: Sure. I'm a machine learning researcher and

Stanford

algorithms researcher, and I've been programming pretty much my whole life since I was a little kid, and I love computers. I researched machine learning and reinforcement learning in particular at Stanford, started my own company, but kind of got burnt out on it.

And then I went to the National Renewable Energy Lab where I applied machine learning techniques to energy efficiency and renewable energy problems there. And while I was there, I started to realize that computing energy efficiency was a risingly, like, an increasingly important area of study on its own.

So I had the opportunity to sort of lead an effort there to create a program of research around that topic. And it was through that work that I started working on this paper, made these connections with Dawn. And I worked there for six years and just recently changed jobs to be a machine learning engineer at Zazzle.

I'm continuing to do this research. And, yeah.

Chris Adams: Brilliant. Thank you, Charles. Okay, so national, that's NREL that some people refer

Charles Tripp: That's right. It's one of the national labs.

Chris Adams: Okay. Brillinat. And Dawn, I guess I should give you the space to introduce yourself, and welcome back again, actually.

Dawn Nafus: Thank you. Great to be here. My name is Dawn Nafus. I'm a principal engineer now in Intel Labs. I also run the Socio Technical Systems Lab. And I also sit on Intel's Responsible AI Advisory Council, where we look after what kinds of machine learning tools and products do we want to put out the door.

Chris Adams: Brilliant, thank you, Dawn. And if you're new to this podcast, I mentioned my name was Chris Adams at the beginning of the podcast. I work at the Green Web Foundation. I'm the director of technology and policy there. I'm one of the authors of a report all about the environmental impact of AI last year, so I have like some background on this. I also work as the policy chair in the Green Software Foundation Policy Working Group as well. So that's another thing that I do. And if you, if there, we'll do our best to make sure that we link to every single paper and project on this, so if there are any particular things you find interesting, please do follow, look for the show notes. Okay, Dawn, I'm, let's, shall we start? I think you're both sitting comfortably, right? Shall I begin?

Okay, good. So, Dawn, I'm really glad you actually had a chance to both work on this paper and share and let me know about it in the first place. And I can tell when I read through it, there was quite an effort to, like, do all the research for this.

So, can I ask, like, what was the motivation for doing this in the first place? And, like, was there any particular people you feel really should read it?

Dawn Nafus: Yeah, absolutely. We primarily wrote this for ourselves. In a way. And I'll explain what I mean by that. So, oddly, it actually started life in my role in Responsible AI, where I had recently advocated that Intel should adopt a Protect the Environment principle alongside our suite of other Responsible AI principles, right?

Bias and inclusion, transparency, human oversight, all the rest of it. And so, the first thing that comes up when you advocate for a principle, and they did actually implement it, is "what are you going to do about it?" And so, we had a lot of conversation about exactly that, and really started to hone in on energy transparency, in part because, you know, from a governance perspective, that's an easy thing to at least conceptualize, right? You can get a number.

Chris Adams: Mmm.

Dawn Nafus: You know, it's the place where people's heads first go to. And of course it's the biggest part of, or a very large part of the problem in the first place. Something that you can actually control at a development level. And so, but once we started poking at it, it was, "what do we actually mean by measuring? And for what? And for whom?" So as an example, if we measured, say, the last training run, that'll give you a nice guesstimate for your next training run, but that's not a carbon footprint, right? A footprint is everything that you've done before that, which folks might not have kept track of, right?

So, you know, we're really starting to wrestle with this. And then in parallel, in labs, we were doing some socio technical work on, carbon awareness. And there too, we had to start with measuring. Right? You had to start somewhere. And so that's exactly what the team did. And they found interestingly, or painfully depending on your point of view, look, this stuff ain't so simple, right?

If what you're doing is running a giant training run, you stick CodeCarbon in or whatever it is, sure, you can get absolutely a reasonable number. If you're trying to do something a little bit more granular, a little bit trickier, it turns out you actually have to know what you're looking at inside a data center, and frankly, we didn't, as machine learning people primarily. And so, we hit a lot of barriers and what we wanted to do was to say, okay, there are plenty of other people who are going to find the same stuff we did, so, and they shouldn't have to find out the hard way. So that was the motivation.

Chris Adams: Well, I'm glad that you did because this was actually the thing that we found as well, when we were looking into this, it looks simple on the outside, and then it turned, it feels a bit like a kind of fractal of complexity, and there's various layers that you need to be thinking about. And this is one thing I really appreciated in the paper that we actually, that, that was kind of broken out like that.

So you can at least have a model to think about it. And Charles, maybe this is actually one thing I can, like, hand over to you because I spoke about this kind of hierarchy of things you might do, like there's

stuff you might do at a data facility level or right all the way down to a, like, a node level, for example.

Can you take me through some of the ideas there? Because I know for people who haven't read the paper yet, that seemed to be one of the key ideas behind this, that there are different places where you might make an intervention. And this is actually a key thing to take away if you're trying to kind of interrogate this for the first time.

Charles Tripp: Yeah, I think it's, both interventions and measurement, or I should, it's really more estimation at any level. And it also depends on your goals and perspective. So it, like, if you are operating a data center, right? You're probably concerned with the entire data center, right? Like the cooling systems, the idle power draw, the, converting power to different levels, right?

Like transformer efficiency, things like that. Maybe even the transmission line losses and all of these things. And you may not really care too much about, like, the code level, right? So the types of measurements you might take there or estimates you might make are going to be different. They're gonna be at, like, the system level.

Like, how much is my cooling system using in different conditions, different operating conditions, environmental conditions? From a user's perspective, you might care a lot more about, like, how much energy, how much carbon is this job using? And that's gonna depend on those data center variables. But there's also a degree of like, well, the data center is going to be running whether or not I run my job.

Right? So I really care about my jobs impact more. And then I might be caring about much shorter term, more local estimates, like ones that, might be from measuring the nodes that I'm running on's power or which was what we did it at NREL or, much higher frequency, but less accurate measurements that come from the hardware itself.

Most modern computing hardware has a way to get these hardware estimates of the current power consumption. And you could log those. And there's also difficulties. Once you start doing that is the measurement itself can cause energy consumption. Right? And also potentially interfere with your software and cause it to run more slowly and potentially use more energy.

And so, like, there's difficulties there at that level. Yeah, but there's a whole suite of tools that are appropriate for different uses and purposes, right? Like measuring the power at the wall, going into the data center may be useful at the data center or multiple data center level. Still doesn't tell you all the story, right?

Like the losses in the transmission lines and where did that power come from are still not accounted for, right? But it also doesn't give you a sense for, like, what happens that I take interventions at the user level? It's very hard to see that from that high level, right? Because there's many things running on the system, different conditions there. From the user's point of view, they might only care about, like, you know, this one key piece of my software that's running, you know, like the kernel of this deep learning network.

How much energy is that taking? How much additional energy is that taking? And that's like a very different thing that very different measurements are appropriate for and interventions, right?

Like changing that little, you know, optimizing a little piece of code versus like, maybe we need to change the way our cooling system works on the whole data center or the way that we schedule jobs. Yeah, and the paper goes through many of these levels of granularity.

Chris Adams: Yeah, so this is one thing that really kind of struck out at me because when you, it started at the kind of facility level, which is looking at an entire building where you mentioned things like say, you know, power coming into the entire facility. And then I believe you went down to looking at say the, within that facility, there might be one or more data centers, then you're going down to things like a rack level and then you're going down to

kind of at a node level and then you're all even going all the way down to like a particularly tight loop or the equivalent for that. And when you're looking at things like this, there are questions about like what you what...

if you would make something particularly efficient at, say, the bottom level, the node level, that doesn't necessarily impact, it might not have an impact higher up, for example, because that capacity might be just reallocated to someone else.

For example, it might just be that there's a certain kind of minimum amount of power draw that you aren't able to have much of an impact on. I mean, like, this is, these are some of the things

I was surprised by, or not surprised by, but I really appreciated breaking some of that, these out, because one thing that seemed to, one thing that was, I guess, counterintuitive when I was looking at this was that things you might do at one level can actually be counter, can hinder steps further down, for example, and vice versa.

Charles Tripp: Yeah, that's right. I mean, I think, two important sort of findings are, yeah, like battle scars that we got from doing these measurements. And one data set we produced is called BUTTER-E, which is like a really large scale measurement of energy consumption of training and testing neural networks and how the architecture impacts it.

And we were trying to get reasonable measurements while doing this. And, of the difficulties is in comparing measurements between runs on different systems, even if they're identically configured, can be tricky because different systems based on, you know, manufacturing variances, the heat, you know, like how warm is that system at that time?

Anything that might be happening in the background or over the network, anything that might be just a little different about its environment can have, real measurable impacts on the energy consumed. So, like comparing energy consumption between runs on different nodes, even with identical configurations, we had to account for biases and they're like, oh, this node draws a little bit more power than this one at idle.

And we have to like, adjust for that in order to make a clear comparison of what the difference was. And this problem gets bigger when you have different system configurations or even same configuration, but running in like a totally different data center. So that was like one tricky finding. And I think two other little ones I can mention, maybe we could go into more detail later. But,

another one, like you mentioned, is the overall system utilization and how that's impacted by a particular piece of software running a particular job running is going to vary a lot on what those other users of the system are doing and how that system is scheduled.

So, you can definitely get in the situations where, yeah, I reduced my energy consumption, but that total system is just going to, that energy is going to be used some other time, especially if the energy consumption savings I get are from shortening the amount of time I'm using a resource and then someone else.

But it does mean that the computing is being done more efficiently, right? Like, if everyone does that, then more computing can be done within the same amount of energy. But it's hard to quantify that. Like, what is my impact? It's hard to say, right?

Chris Adams: I see, yeah, and Dawn, go on, I can, see you nodding, so I want you to come in now.

Dawn Nafus: If I can jump in a bit, I mean, I think that speaks to one of the things we're trying to bring out, maybe not literally, but make possible, is this. Those things could actually be better aligned in a certain way, right? Like, the energy that is, you know, for example, when there is idle time, right?

I mean, there are things that data center operators can do to reduce that, right? you know, you can bring things into lower power states, all the rest of it, right? So, in a way, kind of, but at the same time, the developer can't control it, but if they don't actually know that's going on, and it's just like, well, it's there anyway, there's nothing for me to do, right, that's also a problem, right?

So in a way, you've got two different kinds of actors looking at it in very different perspectives. And the clearer we can get about roles and responsibilities, right, you can start to do things like reduce your power when things are idling. Yes, you do have that problem of somebody else is going to jump in. But Charles, I think as your work shows, you know, there's still some idling going on, even though you wouldn't think, so maybe you could talk a little bit about that.

Charles Tripp: Yeah, so one really interesting thing that I didn't expect going into doing these measurements in this type of analysis was, well, first, I thought, "oh great, we can just measure the power on each node, run things and compare them." And we ran into problems immediately. Like, you couldn't compare the energy consumption from two identically configured systems directly, especially if you're collecting a lot of data, because one is just going to use like slightly more than the other because of the different variables I mentioned.

And then when you compare them, you're like, well, that run used way more energy, but it's not because of anything about how the job was configured. It's just, that system used a little bit more. So if I switch them, I'd get the opposite result. So that was one thing. But then, as we got into it and we were trying to figure out, okay, well, now that we figured out a way to account for these variations, let's see what the impact is of running different software with different configurations, especially like neural networks, different configurations on energy consumption and our initial hypothesis was that it was based on mainly the size of the neural network and, you know, like how many parameters basically, like how many calculations, these sorts of things.

And if you look in the research, A lot of the research out there about making neural networks and largely algorithms in general more efficient focuses on how many operations, how many flops does this take, you know? And look, we reduced it by a huge amount. So that means that we get the same energy consumption reductions.

We kind of thought that was probably true for the most part. But as we took measurements, we found that had almost no connection to how much energy was consumed. And the reason was that the amount of energy consumed had way more to do with how much data was moved around on the computer. So how much data was loaded from the network?

How much data was loaded from disc? How much data was loaded from disc into memory, into GPU RAM for using the GPU, into the different caching levels and red, even the registers? So if we computed like how much data got moved in and out of like level two cache on the CPU, we could see that had a huge correlation, like almost direct correlation with energy consumption. Not the number of calculations.

Now, you could get in a situation where, like, basically no data is leaving cache, and I'm doing a ton of computing on that data. In that case, probably a number of calculations does matter, but in most cases, especially in deep learning, has almost no connections, the amount of data moved. So then we thought, okay, well, it's amount of data moved.

It's the data moving. The data has a certain cost. But then we look deeper, and we saw that actually. The amount of data moved is not really what's causing the energy to be consumed. It's the stalls while the system is waiting to load the data. It's waiting for the data to come from, you know, system memory into level three cache.

It needs to do some calculations on that data. So it's pulling it out while it's sitting there waiting. It's that idle power draw. Just it could be for like a millisecond or even a nanosecond or something, right? But it adds up if you have, you know, billions of accesses. Each of those little stalls is drawing some power, and it adds up to be quite a significant amount of power.

So we found that actually the driver of the energy consumption, the primary driver by far in what we were studying in deep learning was

the idle power draw while waiting for data to move around the system. And this was like really surprising because we started with number of calculations, it turns out almost irrelevant.

Right. And then we're like, well, is it the amount of data moved around? It's actually not quite the amount of data moved around, but that does like cause the stalls whenever I need to access the data, but it's really that idle power draw. And and I think that's probably true for a lot of software.

Chris Adams: Yes. I think that does sound about right.

I'm just gonna try if I follow that, because there was, I think there was a few quite key important ideas there. But there's also, if you aren't familiar with how computers are designed, you it might, there. I'll try to paraphrase it. So we've had this idea that the main thing is like, the number of calculations being done. That's like what we thought was the key idea.

But,

Charles Tripp: How much work, you know.

Chris Adams: Yeah, exactly. And, what we actually, what we know about is inside a computer you have like multiple layers of, let's call them say, caches or multiple layers at where you might store data so it's easy and fast to access, but that starts quite small and then gets larger and larger, which a little bit slower over time.

So you might have, like you said, L2 cache, for example, and that's going to be smaller, much, much faster, but smaller than, say, the RAM on your system, and then if you go a bit further down, you've got like a disk, which is going to be way, what larger, and then that's going to be somewhat slower still, so moving between these stages so that you can process, that was actually one of the things that you were looking at, and then it turned out that actually, the thing that, well, there is some correlation there, one of the key drivers actually is the chips kind of in a ready state, ready to actually waiting for that stuff to come in.

They can't really be asleep because they know the data is going to have to come in, have to process it. They have to be almost like anticipating at all these levels. And that's one of the things we, that's one of the big drivers of actually the resource use and

the energy use.

Charles Tripp: I mean, so, like, what we saw was, we actually estimated how much energy it took, like, per byte to move data from, like, system RAM to level three cache to level two to level one to a register at each level. And at some cases, it was so small, we couldn't even really estimate it. But in most cases, we were able to get an estimate for the For that, but a much larger cost was initiating the transfer, and even bigger than that was just the idle power draw during the time that the program executed and how long it executed for. And by combining those, we were able to estimate that most of that power consumption, like 99 percent in most cases was from that idle time, even those little micro stalls waiting for the data to move around. And that's because moving the data while it does take some energy doesn't take that much in comparison to the amount of energy of like keeping the ram on and the data is just like alive in the ram or keeping the CPU active, right?

Like CPUs can go into lower power states, but generally, at least part of that system has to shut down. So like doing it like at a very, fine grain scale is not really feasible.

Many systems can change power state at a like a faster rate than you might imagine, but it's still a lot slower than like out of, you know, per instruction per byte level of, like, I need to load this data.

Like, okay, shut down the system and wait a second, right? Like, that's, it just, not a second, like a few nanoseconds. It's just not practical to do that. And it's so it's just keeping everything on during that time. That's sucking up most of the power. the So one strategy, simple strategy, but it's difficult to implement in some cases is to initiate that load that transfer earlier.

So if you can prefetch the data into the higher levels of memory before you hit the stall where you're waiting to actually use it,

you can probably significantly reduce this power consumption, due to that idle wait. But it's difficult to figure out how to properly do that prefetching.

Chris Adams: Ah, I see. Thanks, charles. So it sounds like, okay, they, we might kind of approach this and there might be some things which feel kind of intuitive but it turns out there's quite a few counterintuitive things.

And like, Dawn, I can see you nodding away sagely here and I suspect there's a few things that you might have to add on this. Because this is, I mean, can I give you a bit of space, Dawn, to kind of talk about some of this too, because I know that this is something that you've shared with me before, is that yeah, there are maybe some rules of thumb you might use, but it's never that simple, basically, or you realise actually that there's quite a bit more to it than that, for example.

Dawn Nafus: Exactly. Well, I think what I really learned out of this effort is that measurement can actually recalibrate your rules of thumbs, right? So you don't actually have to be measuring all the time for all reasons, but even just that the simple, I mean, not so simple story that Charles told like, okay, you know, so I spent a lot of time talking with developers and trying to understand how they work and at a developer perception level, right?

What do they feel like? What's palpable to them, right? Send the stuff off, go have a cup of coffee, whatever it is, right? So they're not seeing all that, you know, and, you know, when I talk to them, most of them aren't thinking about the kinds of things that were just raised, right? Like how much data are you looking at a time?

You can actually set and tweak that.

And that's the kind of, you know, Folks develop an idea about that, and they don't think too hard about it usually, right. So, with measuring, you can start to actually recalibrate the things you do see, right? I think this also gets back to, you know, why is it counterintuitive that, you know, some of these mechanisms and how you actually are training, as opposed to how many flops you're doing, how many parameters, why is that counterintuitive?

Well, at a certain level, you know, the number of flops do actually matter, right? If we do actually have a gigantic, you know, I'm gonna call myself a foundation model type size stuff, I'm gonna build out an entire data center for it, it does matter. But as you get, you know, down and down and more specific, it's a, different ball game.

And there are these tricks of scale that are sort of throughout this stuff, right? Like the fact that, yes, you can make a credible claim, that foundation model will always be more energy intensive than, you know, something so small you can run on a laptop, right? That's always going to be true, right? No measurement necessary, right? You keep going down and down, and you're like, okay, let's get more specific. You can get to actually where this, where our frustration really started was, you, if you try to go to the extreme, right, try to chase every single electron through a data center, you're not going to do it. It feels like physics, it feels objective, it feels true, but at minimum you start to hit the observer effect, right, that, you know, which is what we did.

We were, my colleague Nicole Beckage was trying to measure at an epoch level, right, sort of essentially round, you know, mini round of training. And what she found was that, you know, she was trying to sample so often that she's pulling energy out of the processing and it just, it messed up the numbers, right? So you can try to get down, you know, into that, you know, what feels like more accuracy and then all of a sudden you're in a different ballpark. So these, tricks of like aggregation and scale and what can you say credibly at what level, I think are fascinating, but you kind of got to get a feel for it in the same way that you can get a feel for, "yep, if I'm sending my job off, I know I have at least, you know, however many hours or however many days," right?

Charles Tripp: There's also so much variation that's out of your control, right? Like one run to another one system to another, even different times where you ran on the same system can cause measureable and in some cases significant variations in the energy consumption.

So it's more about, I think about understanding what's causing the energy consumption.

I think that's the more valuable thing to do. But it's easy to like, be like, "I already understand it." And

I think there's a, there's like a historical bias towards number of operations because in old computers without much caching or anything like this, right? Like I restore old computers and, like an old 386 or IBM XT, right?

Like it's running, it has registers in the CPU and then it has main memory. And it, and almost everything is basically how many operations I'm doing is going to closely correlate with how fast the thing runs and

probably how much energy it uses, because most of the energy consumption on those systems Is just basically constant, no matter what I'm doing, right?

It's just it doesn't like idle down the processor while it's not working, right? And there's a historical bias. It's built up over time that, like, was focused on the, you know, and it's also at the programmer level. Like, I'm thinking about what is the computer doing?

Chris Adams: What do I have controll over?

Charles Tripp: But it's only through it's only through actually measuring it that you gain a clearer picture of, like, what is actually using energy.

And I think if you get that picture, then you'll gain an understanding more of

how can I make this software or the data center or anything in between like job allocation more energy efficient, but it's only through actually measuring that we can get that clear picture. Because if we guess, especially using kind of our biases from how we learn to use computers, how we learn about how computers work, we're actually very likely to get an incorrect understanding, incorrect picture of what's driving the energy consumption.

It's much less intuitive than people think.

Chris Adams: Ah, okay, there's a couple of things I'd like to comment on, and then Dawn, i might give you a bit of space on this, because, you said, so there's one, so we're just talking about like flops as a thing that people, okay, are used to looking at, and are like, it's literally written into the AI Act, like, things above a certain number of flops are considered, you know, foundational models, for example, so, you know, that's a really good example of what this actually might be.

And I guess the other thing that I wanted to kind of like touch on is that, I work in the kind of web land, and like, I mean, the Green Web Foundation is a clue in our organization's name. We've had exactly the same thing, where we've been struggling to understand the impact of, say, moving data around, and whether, how much credence you should give to that versus things happening inside a browser, for example.

It looks like you've got some similar kinds of issues and things to be wrestling, with here. But Dawn, I wanted to give you a bit of space because both of you alluded to this, about this idea of having an understanding of what you can and what you can't control and, how you might have a bias for doing one thing without, and then miss something really much larger elsewhere, for example.

Can I maybe give you a bit of space to talk about this idea of, okay, well, which things do you, should you be focusing on, and also understanding of what's within your sphere of influence? What can you control? What can't you control, for example?

Dawn Nafus: Exactly. I think it's in a sense you've captured the main point, which is, you know, that measurements are most helpful when they are relevant to the thing you can control, right? So as a very simple example, you know, there are plenty of AI developers who have a choice in what data centers they can use.

There are plenty who don't, right? You know, when Charles works or worked at NREL, right. The supercomputer was there. That was it. You're not moving, right?

So, if you can move, you know, that overall data center efficiency number that really matters because you can say, alright, "I'm putting my stuff here and not there." If you can't move, like, there's no need to mess with. It it is what it is, right? At the same, and this gets us into this interesting problem, again, a tension between what you might look at it from a policy perspective versus what a developer might look at. We had a lot of kind of, you know, can I say, come to Jesus?

We had a little moment

where we, is that on a podcast? I think I can. Where there was this question of,

are we giving people a bum steer by focusing at, you know, granular developer level stuff, right? Where it's so much actually is on how you run the data center, right? So you, again, you talk about tricks of scale. On the one hand, you know, the amount of energy that you might be directly saving just by, you know, not using or not using, by the time all of those things move through the grid and you're talking about coming, you know, energy coming off of the transmissions cables, right, in aggregate might not actually be directly that big. It might be, but it might not be. And then you flip that around and you think about what aggregate demand looks like and the fact that so much of AI demand is, you know, that's what's putting pressure on our electricity grid.

Right? Then that's the most effective thing you could do, is actually get these, you know, very specific individual jobs down and down, right? So, again, it's all about what you can control, but there are these, whatever perspective you take is just going to flip your, you know, your understanding of the issue around.

Chris Adams: So this was actually one thing I quite appreciated from the paper. There were a few things saying, and it does touch on this idea, that yeah, you, might be focusing on the thing that you feel that you're able to control, but just because you're able to, like, Make very efficient part of this spot here that doesn't necessarily translate into a saving higher up in the system. Simply because if it's, if you don't, if higher up in the system isn't set to actually take advantage of that, then you might never achieve some of these savings It's a little bit like when you're working in cloud, for example, people tell you do all these things to kind of optimize your cloud savings. But if people are not turning data centers off, at best, you might be slowing the growth of infrastructure rollout in future, and like these are, and these are much, much harder things to kind of claim responsibility for, or say that, "yeah, it was definitely, if it weren't for me doing those things, we wouldn't have had that happen."

This is one of the things that I appreciated the paper just making some allusions to and saying, look, yeah, this is, you know, this is why I mean, to be honest, when I was reading this, I was like, wow, there is, there was obviously some stuff for, beginners, but there's actually quite a lot here, which is quite meaty for people who are thinking of it as a much larger systemic level.

So there's definitely things like experts could take away from this as well. So, I just want to check, are there any particular takeaways the two of you would like to kind of draw people's attention to beyond what we've been discussing so far? Because I quite enjoyed the paper and there's a few kind of nice ideas from this. Charles, if I just give you a bit of space to, kind of, come in.

Charles Tripp: Yeah. I've got, kind of two topics that I think build on what we talked about before, but could be really useful for people to be aware of. So one is, sort of one of the outcomes of our studying of the impact of different architectures, data sets, hyper parameter settings on deep neural network energy consumption was that the most efficient networks, most energy efficient networks, and largely that correlates with most time efficient as well, but not always, the most efficient ones were not the smallest ones, and they were not the biggest ones, right?

The biggest ones were just required so much data movement. They were slow. The smallest ones, they took a lot more iterations, right? It took a lot more for them to learn the same thing. And the most efficient ones were the ones where the working sets, where the amount of data that was moved around, matched the different cache sizes.

So as you made the network bigger, it got more efficient because it learned faster. Then when it got so big that the data in like between layers, the communication between layers, for example, started to spill out of a cache level. Then it became much less energy efficient, because of that data movement stall happening.

So we found that like there is like an optimum point there. And for most algorithms, this is probably true where if the working set is sized appropriately for the memory hierarchy, you gain the most efficiency, right? Because generally, like, as I can use more data at a time, I can get my software to work better, right, more efficiently. But there's a point where it falls out of the cache and that becomes less efficient. Exactly what point is going to depend on the software. But I think focusing on that working set size and how it matches to the hardware is a really key piece for almost anyone looking to optimize software for energy efficiency is to think about that. How much data am I moving around and how does that map to the cache? So that's like a practical thing.

Chris Adams: Can I stop you Because I find that quite interesting, in that a lot of the time as developers we're kind of taught to kind of abstract away from

the underlying hardware, and that seems to be going the other way. That's saying, "no, you do need to be thinking about this.

You can't.

There, you know, there's no magic trick."

Charles Tripp: Right? And so, like, for neural networks, that could mean sizing my layers so that those working sets match the cache hierarchy, which is something that no one even considers. It's not even close in most architectures. Like, no one has even thought about this. The other thing is on your point about data center operations and kind of the different perspectives,

one thing that we started to think about as we were doing some of this work was it might make sense to allocate time or in the case of like commercial data center, commercial cloud operator, even like charge field based on at least partly the energy rather than the time, as to incentivize them to use less energy, right?

Like make things more energy efficient. Those can be correlated, but not always right. And another piece of it that I want to touch on of that same puzzle is, from a lot of data center operators perspective, they want to show their systems fully utilized, right? Like there's demand for the system, so we should build an even bigger system and a better system. When it comes to energy consumption.

That's probably not the best way to go, because that means that those systems are sitting there probably doing inefficient things. Maybe even idling a lot of time, right? Like a user allocated the node, but it's just sitting there doing nothing, right? It may be more useful instead of thinking about, like, how much is the system always being utilized?

But think about how much, how much computation or how many jobs or whatever your, like, utilization metric is, do I get, like, per unit energy, right? And you may think about how much, or per unit carbon, right? And you may also think about, like, how much energy savings can I get by doing things like shutting down nodes when they're unlikely to be utilized and more about like having a dynamic capacity, right?

Like full tilt. I can use I can do how many flops or whatever, right? But I can also scale that down to reduce my idle power draw by, you know, 50 percent in low demand conditions. And if you have that dynamic capacity, you may actually be able to get even more throughput. But it's with less energy because when there's no demand, I'm like shutting,

I'm like scaling down my data center, right? And then when there's demand, I'm scaling it up. But these are things that are requiring cultural changes in data center operations to happen.

Chris Adams: I'm glad you mentioned this thing here because, Dawn, I know that you had some notes about, it sounds like in order for you to do that, you need, you probably need different metrics exposed or different kinds of transparency to what we have right now.

Probably more actually. Dawn, can I give you a bit of space to talk about this? Because this is one thing that you told me about before and it's something that is actually touched on in the paper quite a few times actually.

Dawn Nafus: Yeah, I mean, I think we can notice a real gap in a way between the kinds of things that Charles brings his attention to, and the kinds of things that show up in policy environments, in responsible AI circles, right, where I'm a bit closer, we can be a bit vague, and I think we are at the stage where, at least my read on the situation, is that, you know, there's, regardless of where you sit in the debates, and there are rip roaring debates about what to do about the AI energy situation, but I think transparency is probably the one thing we can get the most consensus on, but then, like, just back to that, what the heck does that mean? And I think we need a little, like a, more beats than are currently given to actually where, what work are those measurements doing?

You know, some of the feedback we've gotten is, you know, "well, can't you just come up with a standard?" Like, what's the right standard? It's like, well, no, actually, if data centers aren't standard, and there are many different ways to build a model, then, yes, you can have a standard as a way of having a conversation across a number of different parties to do a very specific thing, like for example, Charles's example, you know, suggested that if we're charging on a per energy basis, that changes a whole lot. Right? But what you can't do is to say, this is the standard that is the right way to do it, and then that meets the requirement, because that's, you know, what we found is that clearly the world is far more, you know, complicated and specific than that.

So, I, you know, I would really encourage the responsible AI community to start to get very specific very quickly, which I don't yet see happening, but I think it's just on the horizon.

Chris Adams: Okay. Well I'm glad you mentioned about maybe taking this a little bit wider 'cause we've dived quite spent a lot of time talking about this paper, but there's other things happening in the world of AI actually, and I wanna give you folks a bit of space to kind of talk about anything that like, or things that you are, that you would like to kind of direct some attention to or you've seen that really you found particularly interesting.

Charles, can I give you some space first and then give Dawn the same, to like say it to like I know, either shout out or point to some particular things that, if they've found this conversation interesting so far, what they might want to be looking at. More data.

Charles Tripp: Yeah. I mean, I think, both in like computer program, computer science at large and especially in machine learning, we've kind of had an attitude, especially within deep learning within machine learning, an attitude of throwing more compute at the problem, right? And more data. The more data that we put through a model and the bigger, the more complicated the model is, the more capable it can be.

But this brute force approach is one of the main things that's driving this increasing computing energy consumption. Right? And I think that it is high time that we start taking a look at making the algorithms we use more energy efficient instead of just throwing more compute. It's easy to throw more compute at it, which is why it's been done.

And also because there hasn't been a significant like material incremental cost of like, Oh, you know, now we need. Twice made GPUs. I don't big deal. But now we're starting to hit constraints because we haven't thought about that incremental energy costs. We haven't had to, as an industry at large, right?

Like, but

now it's starting to be like, well, we can't build that data center because we can't get the energy to it that we need to do the things we want to do with it because we haven't taken that incremental cost into account over time, we just kind of ignored it. And now we hit like the barrier, right?

And so I think thinking about, the energy costs and probably this means investing in more finding more efficient algorithms, more efficient approaches as well as more efficient ways to run data centers and run jobs. That's gonna become increasingly important, even as our compute capacity continues to increase.

The energy costs are likely to increase along with that as we use more and more, and we need create more generation capacity, right? Like, it's expensive at some point where we're really driving that energy production, and that's going to be increasingly an important cost as well as it is now, like, starting to be a constraint to what kind of computing we can do.

So I think investing in more efficient approaches is going to be really key in the future.

Chris Adams: There's one thing that I, that I think Dawn might come in on this actually, is that, you're talking about, it seems that you're talking about having more of a focus on surfacing some of the kind of efficiency or the fact that resource efficiency is actually going to be something that we probably need to value or sharpen, I mean, because as I understand it so far, it's not particularly visible in benchmarks or anything like that right now, like, and if you have benchmarks deciding, what counts as a good model or a good use of this until that's included. You're not going to have anything like this. Is that the kind of stuff you're kind of suggesting we should probably have? Like, some more recognition of, like, or even like, you're taking at the energy efficiency of something and being that thing that you draw attention to or you include in counting something as good or not, essentially.

Dawn Nafus: You know, I have a particular view of efficiency. I suspect many of your listeners might, as well. You know, I think it's notable that at the moment when we're seeing the, you know, the the model of the month, apparently, or the set of models of DeepSeek has come onto the scene and immediately we're starting to see, for the first time, you know, a Jevons paradox showing up in the public discourse.

So this is the paradox that when you make things more efficient, you can also end up stimulating so much demand...

Chris Adams: Absolute use grows even though it gets individually more efficient.

Dawn Nafus: Yeah, exactly. Again, this is like this topsy turvy world that we're in. And so, you know, now the Jevons paradoxes is front page news, you know, my view is that yes, you know, again, we need to be particular about what sorts of efficiencies are we looking for where and not, you know, sort of willy nilly, you know, create an environment where, which I'm not saying you're doing Charles, but you know, what we don't want to do is create an environment where if you can just say it's more efficient, then, somehow, you know, we're all good, right. Which is, you know, what some of the social science of Energy Star has actually suggested that, that stuff is going on.

With that said, right,

I am a big fan of the Hugging Face Energy Star initiative. That looks incredibly promising. And I think one of the things that's really promising about it, so this is, you know, you know, leaderboards when, you know, people put their models up on Hugging Face. There's some energy measurement that happens, some carbon measurement, and then, you know, leaderboards are created and all the rest of it. And I think one of the things that's really good at, right, I can imagine issues as well, but you're A, you know, creating a way to give some people credit for actually looking. B, you're creating a way of distinguishing between two models very clearly, right? So in that context, do you have to be perfect about how many kilowatts or watts or whatever it is? No, actually, right? Right? You know, you're looking at more or less in comparable models. But C, it also interjects this kind of path dependence. Like, who is the next person who uses it? Right?

That really matters. If you're setting up something early on, yes, they'll do something a little bit different. They might not just run inference on it. But you're, changing how models evolve over time and kind of steering it towards even, you know, having energy presence at all. So that's pretty cool to my mind.

So I'm looking forward to...

Chris Adams: Cool. We'll share a link to the Hugging Face. I think they, I think, do you know what they were called? I think it's the, you might be, I think it's, it was initially called the Energy Star Alliance, and then I think they've been told that they need to change the name to the Energy Score Alliance from this, because I

think it, Energy Star turned out to be a trademark, but we can definitely add a link to that in the show notes, because, these, this actually, I think it's something that is officially visible now. It's something that people have been working on late last year, and now there is, we'll share a link to the actual GitHub repo, to the code on GitHub to kind of run this, because this works for both closed source models and open source models. So it does give some of that visibility. Also in France, there is the Frugal LLM challenge, which also sounds similar to what you're talking about, this idea of essentially trying to emphasize more than just the, you know, like to pay a bit more attention to the energy efficiency aspect of this and I'm glad you mentioned the DeepSeek thing as well because suddenly everyone in the world is an armchair expert on William Stanley Jevons paradox stuff.

Everybody knows! Yeah.

Dawn Nafus: Actually, if I could just add one small thing, since you mentioned the Frugal effort in France, there's a whole computer science community, sort of almost at a step's length from the AI development community that's really into just saying, "look, what, you know, what is the purpose of the thing that I'm building, period."

And even, and that, you know, frugal computing, computing within limits, all of that world really about how do we get, you know, just something that somebody is going to actually value, as opposed to, you getting to the next, you know, score on a benchmark leaderboard somewhere. so I think that's kind of also lurking in the background here.

Chris Adams: I'm glad you mentioned this, what we'll do, we'll add a we'll add links to both of those and, you immediately make me think of, there is this actual, so we're technologists mostly, the three of us, we're talking about this and I work in a civil society organization and, just this week, there was a big announcement, like a kind of set of demands from civil society about AI that's being shared at the AI Action Summit, this big summit where all the great and good are meeting in Paris, as you alluded to, next week to talk about what should we do about this? And, they, it's literally called Within Bounds, and we'll share a link to that. And it does talk about this, like, well, you know, if we're going to be using things like AI, what do, we need to have a discussion about what they're for. And that's the first thing I've seen which actually has discussions about saying, well, we should be actually having some concrete limits on the amount of energy for this, because we've seen that if this is a constraint, it doesn't stop engineers.

It doesn't stop innovation. People are able to build new things. What we should also do is we should share a link to, I believe, Vlad Coraoma. he did an interview with him all about Jevons paradox a few, I think, late last year, and that's a really nice deep dive for people who want to basically sound knowledgeable in these conversations on LinkedIn or social media right now, it's a really useful one there as well.

Okay, so we spoke a little bit about these ones here. Charles, are there any particular projects you'd like to kind of like name check before we start to wrap up? Because I think we're coming up to the hour now, actually.

Charles Tripp: I don't know, not particular, but I did mention earlier, you know, we published this BUTTER-E data set and a paper along with it, as well as a larger one without energy measurements called BUTTER. Those are available online. You can just search for it and you'll find it right away. I think, if that's of interest to anyone hearing this, you know, there's a lot of measurements and analysis in there, including, you know, all the details of analysis that I mentioned where we, had this journey from number of compute cycles to, like, amount of stall, in terms of what drives energy consumption.

Chris Adams: Ah, it's visible so people can see it. Oh, that's really cool. I didn't realize about that. Also, while you're still here, Charles, while I have access to you, before we did this interview, you mentioned, there's a whole discussion about wind turbines killing birds, and you were telling me this awesome story about how you were able to model the path of golden eagles to essentially avoid these kind of bird strike stuff happening.

Is that in the public domain? Is something, can we link to that? That sounded super cool.

Charles Tripp: There's several, papers. I'll have to dig up the links, but there's several papers we published and some software also to create these models.

But yeah, I worked on a project where we looked at, we took, eagle biologists and computational fluid dynamics experts and machine learning experts.

And we got together and we created some models based off of real data, real telemetry of tracking, golden eagle flight paths through, well, in many locations, including at wind sites, and match that up with the atmospheric conditions, the flow field, like, or graphic updrafts, which is where the wind hits, you know, like a mountain or a hill and it, some of it blows up.

Right. And golden eagles take advantage of this as well as thermal updrafts caused by heating at the ground. Right. Causing the air to rise to fly. Golden eagles don't really like flapping. They like gliding. And because of that, golden eagles and other soaring birds, their flight paths are fairly easy to predict, right?

Like, you may not know, like, oh, are they going to take a left turn here or right turn there, but generally they're going to fly in the places where there's strong updrafts and using actual data and knowledge from the eagle biologists and simulations of the flow patterns, we were able to create a model that allows wind turbines to be cited and also operate, right?

Like, what, under what conditions, like, what wind conditions in particular and what time of year, which also affects the eagles' behavior, should I perhaps reduce my usage of certain turbines to reduce bird strikes? And in fact, we showed that it could be done without significantly, or even at all, impacting the energy production of a wind site.

You could significantly reduce the chances of colliding with a bird.

Chris Adams: And it's probably good for the birds too, as well, isn't it? Yeah.

Alright, we definitely need to find some links for that. That's, going to be absolute catnip for the nerdy listeners who put, who are into this. Dawn, can I just give you the last word? Are there any particular things that you'd like to, I mean actually I should ask like, we'll add links to like you and Charles online, but if there's anything that you would draw people's attention to before we wrap up, what would you pay, what would you plug here?

Dawn Nafus: I actually did want to just give a shout out to National Renewable Energy Lab, period. One of the things that are amazing about them, speaking of eagles, a different eagle is, they have a supercomputer called Eagle. I believe they've got another one now. It is lovingly instrumented with all sorts of energy measurements, basically anything you can think to measure.

I think you can do it in there. There's another data set from another one of our co authors, Hilary Egan, that has some sort of jobs data. You can dig in and explore like what a real world data center job, you know, situation looks like.

So I just want to give all the credit in the world to National Renewable Energy Lab and the stuff they do on the computing side.

It's just phenomenal.

Chris Adams: Yes, I think that's a really, I would echo that very much. I'm a big fan of NREL and the output for them. It's a really like a national treasure Folks, I'm really, thank you so much for taking me through all of this work and diving in as deeply as we did and referring to things that soar as well, actually, Charles. I hope we could do this again sometime soon, but otherwise, have a lovely day, and thank you once again for joining us. Lovely seeing you two again.

Charles Tripp: Good seeing you.

Chris Adams: Okay, ciao!

Hey everyone, thanks for listening. Just a reminder to follow Environment Variables on Apple Podcasts, Spotify, or wherever you get your podcasts. And please do leave a rating and review if you like what we're doing. It helps other people discover the show, and of course, we'd love to have more listeners.

To find out more about the Green Software Foundation, please visit greensoftware.foundation. That's greensoftware.foundation in any browser. Thanks again, and see you in the next episode.