## TL;DR
- I reverse-engineered the encrypted protocol GoodWe smart meters and solar inverters use to send metrics to the cloud.
- I used this research to build a prometheus exporter.
## The Sun: so hot right now
I got a solar PV system installed in my house in mid 2023. I did the bare minimum of research beforehand - just talked to a couple of different installers about pricing, sizing and the economics of a battery.
One thing I certainly did not do is any research into brands and their relative hackability or security merits. I just specified that I wanted to monitor the devices and see some metrics. The installer told me that this required a smart meter and a mobile app. Honestly I assumed that all brands would be equally horrific IoT junk, so I just went with the recommendation of the installer. At least that way the electrical functionality had to be reasonable, right?
The result of my lucky dip was a GoodWe DNS G3 Inverter and a GoodWe HomeKit 1000 Smart Meter. These devices look quite slick, and so does the website. They are also popular here in Australia, so my hopes were high that it would be easy to set up local monitoring, because surely someone else had figured out how to do it.
## Post-install setup
### Metrics? You need to be online.
Right after physical installation the system is producing power, but the metrics aren’t visible anywhere. The documented way to see metrics is to connect the device to GoodWe’s cloud, and then use their web UI or mobile app.
The devices act in simultaneous wireless AP and STA modes, and setup works like so:
- Connect to the device’s WLAN, which will be named
Solar-WiFiXXXXXXXX
, where theX
s are the serial number of the device. The password is, naturally,admin
. - Visit the device’s web UI on
10.10.100.253
. - Log in (using credentials
admin
/admin
, of course!). - In the web UI, select the WLAN that you want the device to use to connect to the Internet.
Now the devices are connected to GoodWe’s cloud. But you still can’t see any metrics.
### SEMS Portal account required
The next step is to go to GoodWe’s SEMS Portal and create an account. Then let the installer know that the devices are connected, and the email you used to create an account on the SEMS Portal. Then the installer will email GoodWe (!?) to tell them to assocate your account with the serial number of the devices, and at some point GoodWe will action that request (I was assured they checked their inbox regularly).
Finally after a day or so the device’s metrics are visible in the SEMS Portal.
According to this flyer, it seems that the installer would have a portfolio of “power plants”, and they can use the SEMS Portal to perform “Fault self-analysis & troubleshooting”.
SEMS includes a range of functions and features to ensure reliable operation and to deliver precise information to operators at the press of a button. It is accessible by multiple accounts with different levels of access for owners, installers and EPC companies
## Post-setup state of play
So now these two devices were physically installed, and connected to GoodWe’s cloud over the internet via my isolated IoT VLAN. But I had questions:
- I wanted to scrape metrics locally, dammit! Why should I have to use the crappy cloud UI or equally bad mobile app?
- What else can GoodWe do with this connection? E.g. can they remotely administer the devices? If so, can I disable this “feature”?
It turns out that the inverter is powered by the solar panels, not by the grid. So it loses power and goes offline as soon as the sun goes down. And since I mostly have time to hack on this stuff after dark, I concentrated on the smart meter.
## Metrics extraction prior art
There is quite a cottage industry online documenting how to extract data from GoodWe inverters. They respond to Modbus queries, an Operational Technology standard. There are many Github repositories with useful information about the GoodWe Modbus protocol, such as:
- a python library for extracting metrics;
- a Home Assistant integration, built on that library; and
- some GoodWe-specific field documentation.
Unfortunately, my Homekit 1000 smart meter is not supported by any of these libraries.
## Hacking the Homekit 1000
I’m presenting the process I followed in chronological order. So if you want to find out what actually worked, skip to the end.
### nmap
The first thing I did was fire up nmap
, and point it at the HK1000.
It showed listening TCP port 23
- good old Telnet!
Connecting to this port and trying Username: admin
, Password: admin
gave me a command prompt!
$ nc 192.168.18.17 23
Login as:admin
Password:admin
CMD>?
cfg net os mft
CMD>
Poking around this prompt soon showed that it was pretty limited1, and it seemed to be a development interface that was left enabled. I couldn’t get any metrics out of it.
I also ran nmap
in aggressive mode and was rewarded with a hard crash in the web server, and the device resetting back to factory settings.
### Packet capture
Sniffing the traffic from the device showed that it was connecting out to tcp.goodwe-power.com:20001
, and sending packets at regular intervals.
However a quick look at the traffic revealed that while the serial number of my inverter was visible, the main body of the payload was a high-entropy blob.
So the metrics data I was after seemed to be encrypted.
I also found a Github comment which came to the same conclusion.
### Modbus
There is a GoodWe Modbus protocol spec sheet and register map floating around the internet which was invaluable in understanding how GoodWe encodes metrics from their inverters. From this documentation I built a Modbus scanner that simply queried every register. The address is only 2 bytes wide, so there are ~65k possible addresses.
Unfortunately the HK1000 only returns a value for a single register address. I forget which register it was, but it was something useless like Firmware Version.
### AA55 protocol
GoodWe devices also support another (older?) protocol known as the AA55 protocol. I couldn’t find much info about it except for another old spec sheet.
I built a scanner for this too, but the HK1000 didn’t respond to any queries.
### ZZ/5A5A protocol (mobile app)
The SEMS portal mobile app has an interesting function where you can connect to the SOLAR-Wifixxxx
network, and configure the device using the app but without any authentication.
Sniffing this traffic (thanks to airodump-ng and Wireshark’s WPA2 decrypt support) shows that the device can be configured without authentication by sending plaintext UDP packets to the right port. Of course, this port is listening on all interfaces so it also probably works via whichever local wifi network you connect the device to. Gross.
However, this protocol appeared to only be used for network configuration. I didn’t find any way of extracting data from the device using this protocol.
### Firmware Reverse Engineering
After no success with the query protocols, I decided that maybe the network was the wrong approach and I should try firmware instead. I managed to dump the firmware of the device using the command prompt and a command similar to this2:
echo -e 'admin\nadmin\nspi rd 0 2097152\n' | nc 192.168.18.17 23 | tee ~/download/hk1000.spi2.img
This hexdump is interspersed with log lines, and the bytes are transposed. So I dumped it twice, diffed the two dumps to eliminate the log lines, and fixed the transposition manually using vim.
Then I unhexlified the binary with xxd:
xxd -r -p hk1000.spi.img > hk1000.spi.bin.img
And ran binwalk over it:
binwalk -eM hk1000.spi.bin.img
This revealed that the OS was eCos RTOS on a MIPS architecture. I spent some time trying to reverse this binary using Ghidra, but honestly I just don’t know what I’m doing when it comes to binary reverse engineering.
Finally, while staring at the binwalk output, these lines caught my eye:
1976456 0x1E2888 AES Inverse S-Box
1977752 0x1E2D98 AES S-Box
### Packet Capture redux
Going back to the packet capture I finally noticed that the length of the encrypted blob section was always a multiple of 16, plus 2.
Wait a second… AES block size is 16 bytes!
## Analysis of the GoodWe metrics protocol
Since this is was a black-box analysis, I had to rely on probing via the I/O I controlled: network and power.
### Network “glitching”3
It was at this point that I found what would be the key to cracking the encryption scheme.
Back in October 2021, someone else did basically all the same work I did, and presented it at the Melbourne Linux User’s Group. Not only that, but they put their presentation online! Thank you Danny!
Anyway, Danny made a very interesting observation: if the internet connection went down, the device would buffer messages, and send them all at once when the connection came back up. Crucially, for buffered frames sent in the same second, the first few 16-byte blocks of ciphertext were identical!
I was able to replicate this locally!
### Empathy: a powerful reverse-engineering tool
When I’m looking at a problem like this, I like to put myself in the shoes of the developer. What kind of person are they? What are their motivations?
In this case, we can observe:
- Telnet left on in a production firmware image, with credentials
admin:admin
. nmap
can crash the device hard enough to factory reset.- Packets sent over TCP with identifying data (serial number) in the clear.
- The metrics seem to be poorly encrypted (identical section of ciphertext in consecutive frames).
- Unauthenticated configuration protocol.
- A web UI that looks like it was hacked together in an afternoon. Inspecting the source shows lots of commented out HTML blocks.
In Danny’s presentation, he used this slide after discovering the Telnet port password:
However I think this is more appropriate:
What these observations tell me is that GoodWe doesn’t put a great deal of effort into securing their devices, and therefore the developers working on this device didn’t have much incentive to create a secure protocol. So there’s a chance I can hack around their encryption.
Putting myself into the shoes of these developers, what would I need to implement a metric protocol?
- Framing: this is TCP; it’s a byte stream. So we need a header of some kind to know where frames start.
- Length: how many bytes after the header do we need to read to get the full frame?
- Detecting data corruption: not anything malicious, just bitflips.
Looking at the packet captures, it is easy to see POSTGW
is the frame header, and the very next field looks like a big-endian encoded int32 with a value consistently three bytes shy of the length of the data before the next POSTGW
.
That must be the length!
And finally: detecting data corruption. In the GoodWe Modbus document linked above, there is a description of the CRC used to detect data corruption. It is a standard Modbus CRC-16 (two bytes), designed to effectively detect bitflips. Again, assuming I am a software developer who is familiar with Modbus but who has been tasked with sending data over the internet (and didn’t really care much for security), why wouldn’t I use an algorithm or library I am already familiar with?
A quick check proves that running the data between the length field and the last two bytes through the Modbus CRC algorithm returns a value matching the last two bytes of the frame.
My best guess for the length field being three bytes shy of the length of data rather than two is that it is just a sloppy implementation with an off-by-one error, which matches my profile of the developers.
Another data point to paint a picture of the engineering quality: the CRC of frames from the client are encoded in big-endian byte order (same as all the other integers encoded in the protocol).
However the server sends the CRC in little-endian byte order.
Why?
Maybe the server is x86
and the developer forgot to call htons()
?
Now I just had the encrypted blob to decipher.
### Encryption scheme
I guessed that they must be using AES in CBC mode because:
- The identical section of ciphertext in consecutive frames is a classic CBC failure mode when reusing IVs.
- This is an old mode and widely supported in libraries, making it easy to use.
- Since they don’t care about security they are hardly likely to be using AEAD modes.
When implementing a scheme using CBC, it is critically important that initialization vectors are not reused. Otherwise identical plaintext will give you identical ciphertext. Metrics from a smart meter are highly likely to be the same minute-to-minute, which is probably why we see identical sections of ciphertext in successive frames with the same IV!
A common practice is to prefix the IV to the ciphertext. This is known as an explicit initialization vector, and it doesn’t need to be secret - just randomly generated in a cryptographically secure manner. However what if you are running on a microcontroller without a NRBG? Or maybe you just don’t know or care about CBC footguns? Then you have to use some other “unique-ish” value!
The device is designed to only send metrics every minute. Therefore the developers may have assumed that time based IVs will be unique enough, without taking into account buffering on network outage.
### Power “glitching”
The final and most difficult question: what is the encryption key?
The first thing I checked was what happened when the device rebooted: was there any key exchange or handshake? Fortunately the web UI has a reboot button, so it was easy to confirm that no, there is no key exchange on startup.
So because we are assuming AES (symmetric encryption), that probably means… fixed keys!
### Extracting the key
Since the keys are fixed, they are likely hard-coded.
AES can use 16, 24, or 32 byte keys, so I started by assuming a 16-byte key.
I suspected they’d use some string like GoodWeSolarPower
, and store it as a static string or byte array.
I poked around in the firmware a bit with Ghidra, but didn’t find any promising strings.
But in any case, there was another problem. One of the properties of AES-CBC is that you can plug any IV and secret key into it and it will “decrypt”. But unless the IV and key are correct, the output will be garbage. So how to know if I manage to correctly guess the IV and key?
At this point I made another educated guess. The frame header and length field use ASCII characters and leading null bytes respectively. Assuming the plaintext metric data is similarly structured, it will have relatively low Shannon entropy. Another property of AES is that it is a secure block cipher. That is, the ciphertext should be indistinguishable from random bytes. Therefore, using the incorrect key or IV should result in high entropy garbage.
Assuming the timestamp in the frame (which is null-padded to 16 bytes) is the IV, I wrote a really dumb tool to:
- step through the firmware one byte at a time, taking the next 16 bytes as a key.
- “decrypt” the encrypted blob using that key, and the timestamp prefix as the IV.
- calculate the entropy of the decrypted blob. If it is below a given threshold, print the plaintext and key.
Fortunately although this was a very naïve brute force algorithm, one great thing about 2024 is that computers are fast.
Running this tool over the firmware dump from my device only took a few seconds and yielded… nothing. Huh.
Fortunately my previous googling efforts had discovered a public Google drive with relatively recent updates (early 2023) containing firmware for (all?) GoodWe inverters4. Running the tool over a firmware image for another device yielded… nothing again!
Finally on the third attempt, I got a single hit:
Of course! The key was just all bits set. Why not!?
I doubt this was actually hard-coded as a key anywhere in the binary blob.
I think I just got lucky that this firmware had a run of 0xff
bytes.
## Extracting meaning from the plaintext
Finally I had a plaintext with obvious structure, but nothing mapping fields to metrics values. However I did have an oracle: the SEMS Portal API! I was able to dump metric values for my smart meter using curl on the SEMS Portal API, and observe the metrics changing every time a packet was sent from the smart meter.
Then by eyeballing the packets and the values (assuming standard two’s complement signed integer encoding) it was relatively straightforward, though a little time consuming, to map offsets to metrics values.
This wiring diagram was helpful to understand that there were really only two CT sensors and every other metric was calculated from those two numbers:
## Prometheus & Grafana
I like Prometheus for gathering metrics.
So I built an exporter based on the research described above.
It works by conducting a man-in-the-middle attack on the protocol.
Pointing the HK1000 at the IP address of the exporter when it requests tcp.goodwe-power.com
will cause the HK1000 to connect to the exporter instead of the GoodWe cloud.
Then the exporter will sniff the metrics out of the frames and forward them to the real tcp.goodwe-power.com
.
The nice thing about this design is that you still get metrics in SEMS Portal. These metrics are visible to your installer, so if you have problems it is easy for them to troubleshoot. I also added support for my inverter, which uses approximately the same protocol.
In addition, the Prometheus exporter will reject any packets from the server that it doesn’t understand. So hopefully unsolicited firmware updates will be blocked.
Finally, I created a dashboard in Grafana:
## Conclusions
- This exercise has reinforced my prejudice that IoT devices are horribly insecure. In the case of GoodWe, where they even have authentication, they use fixed default passwords such as
admin
, and leave Telnet debug interfaces listening on their production devices. - Although the metrics protocol and encryption scheme are insecure, I didn’t find anything that could really be described as a security vulnerability as opposed to a design decision.
- Only the metrics were encrypted in the data sent to SEMS Portal over the internet. Not the model or serial number. So even with (bad) encryption, they have left the most sensitive data unprotected. I guess they are just obfuscating the metrics? Or maybe the boss asked for encryption? “He said encryption! Give him encryption!”.
- Conversely the hardware seems pretty good, functions well, and looks great!
- I spent months tinkering on this on-and-off. I was motivated by equal parts indignant anger at not being able to scrape metrics locally from a device so intimately integrated into my house and running on my network, and morbid curiosity about what security flaw I was going to uncover next. Now I understand what jwz means when he talks about writing software in self-defence.
## How to secure GoodWe devices
Finally, here’s my advice if you have a GoodWe device:
- Whatever else you do, keep these things off the public internet! Preferably in your private, firewalled IoT VLAN.
- There doesn’t seem to be a simple way to disable the
Solar-Wifixxxx
WLAN after the devices are set up. So set a strong password, because the default isadmin
. You can do this via the web UI. - The web server is listening on all interfaces, so it is accessible from your VLAN.
Change the password for the web UI from
admin
to something a bit more secure. Note: not all devices have this option easily accessible. For example the HK1000 only allows changing this password via the Telnet interface.
For the paranoid:
- My prometheus exporter drops incoming packets it doesn’t recognize. Only metrics will flow, not e.g. firmware updates (I hope - I haven’t seen any come through yet). So in theory it will block remote administration of the devices.
## Miscellaneous notes
This section contains a few notes I made that didn’t fit into the narrative of the blog post, but are interesting nonetheless.
### GoodWe’s Cyber Security claims
GoodWe has a page on Cyber Security on their website with a nice infographic, basically confirming everything I have just discovered:
In order to prevent cyber-attacks on photovoltaic systems to the greatest extent, inverter manufacturers usually deploy various security policies on the equipment side and server side. Taking GoodWe as an example, to ensure the security of data transmission between the inverter and the server, we use the transmission protocols of CRC+AES and TLS respectively for communication with servers with different functions.
This is a great demonstration of how you can use secure cryptographic primitives such as AES-CBC, and still come up with an insecure encryption scheme.
### Hi-Flying and Xinwu
The GoodWe devices seem to use an IoT platform common to several Chinese manufacturers, for example Solarman. It has a unique discovery protocol where you broadcast a special packet to a given port, and the device replies with its IP, MAC, and SSID (which includes the device serial).
For example (in separate terminals):
nc -u -l -p 50123
192.168.18.17,907856FECDAB,Solar-WiFi12345678
echo -n WIFIKIT-214028-READ | nc -u -b -p 50123 192.168.18.255 48899
According to the config dumped from the Telnet command prompt, the chip in the HK1000 is the HF-A21, from a company called Hi-Flying, based in Shanghai. You can build your IoT device on top of this platform by loading your own application onto it, while the included OS takes care of the hardware, network etc.
An interesting part of the discovery protocol is the string 214028
.
Where does this come from?
Well approximately 150km from the Hi-Flying office is Xinwu district, Wuxi.
According to Wikipedia:
In 2013, the output value of Internet of Things (IoT) core industry in Wuxi New District exceeded 70 billion yuan, accounting for 38.4 percent of the output value of the whole high-tech industry in the district. Wuxi New District has formed a cloud computing industrial distribution, featuring hardware, platform and application.
Xinwu’s postcode is 214028.
### Remote administration
According to market researchers, GoodWe was the fifth largest supplier of solar inverters worldwide in 2022. GoodWe have full remote administration capability on the devices, including the capability to push firmware updates. This seems like a lot of power for any company, let alone a company headquartered in a totalitarian dictatorship, to have over national power grids.
### Batman mode
To validate the MITM functionality For fun, I implemented Batman mode in the prometheus exporter.
In this mode, rather than forwarding metrics to the SEMS Portal, the exporter replaces them with the batman equation.
### DNS updates
The GoodWe devices send their metrics to tcp.goodwe-power.com:20001
.
When I first started investigating the protocol in mid 2023, this resolved to an IP address in Alibaba Cloud.
However late last year this was updated to now resolve to a pair of ELBs in AWS.
In both Alibaba Cloud and in AWS they seem to be doing DNS load balancing, because while the SecurityTrails screenshots above show US IPs, from here in Australia both those domains resolved to IPs in Alibaba Cloud China (previously), and now to AWS Sydney.
Here are the commands I figured out:
?
display possible commands.? <command>
display command help.<command>
enter subcommand menu or execute command.up
go to parent command menu.
This line is from my bash history, but I advise to start the length at a low value and slowly increase it. From what I remember, at some point reading memory will cause the device to crash and reboot. ↩︎
Yes, I know this isn’t what is generally referred to as “glitching” in reverse engineering. But it is somewhat analogous. ↩︎
I’m linking the drive here, but of course it may be shut down at some point, or the firmware deleted. Hopefully someone takes a backup. ↩︎