Reverse-engineering an encrypted IoT protocol

## TL;DR

I reverse-engineered the encrypted protocol GoodWe smart meters and solar inverters use to send metrics to the cloud.
I used this research to build a prometheus exporter.

## The Sun: so hot right now

I got a solar PV system installed in my house in mid 2023. I did the bare minimum of research beforehand - just talked to a couple of different installers about pricing, sizing and the economics of a battery.

One thing I certainly did not do is any research into brands and their relative hackability or security merits. I just specified that I wanted to monitor the devices and see some metrics. The installer told me that this required a smart meter and a mobile app. Honestly I assumed that all brands would be equally horrific IoT junk, so I just went with the recommendation of the installer. At least that way the electrical functionality had to be reasonable, right?

The result of my lucky dip was a GoodWe DNS G3 Inverter and a GoodWe HomeKit 1000 Smart Meter. These devices look quite slick, and so does the website. They are also popular here in Australia, so my hopes were high that it would be easy to set up local monitoring, because surely someone else had figured out how to do it.

## Post-install setup

### Metrics? You need to be online.

Right after physical installation the system is producing power, but the metrics aren’t visible anywhere. The documented way to see metrics is to connect the device to GoodWe’s cloud, and then use their web UI or mobile app.

The devices act in simultaneous wireless AP and STA modes, and setup works like so:

Connect to the device’s WLAN, which will be named Solar-WiFiXXXXXXXX, where the Xs are the serial number of the device. The password is, naturally, admin.
Visit the device’s web UI on 10.10.100.253.
Log in (using credentials admin / admin, of course!).
In the web UI, select the WLAN that you want the device to use to connect to the Internet.

Now the devices are connected to GoodWe’s cloud. But you still can’t see any metrics.

### SEMS Portal account required

The next step is to go to GoodWe’s SEMS Portal and create an account. Then let the installer know that the devices are connected, and the email you used to create an account on the SEMS Portal. Then the installer will email GoodWe (!?) to tell them to assocate your account with the serial number of the devices, and at some point GoodWe will action that request (I was assured they checked their inbox regularly).

Finally after a day or so the device’s metrics are visible in the SEMS Portal.

Screenshot of SEMS portal showing power usage over a day

According to this flyer, it seems that the installer would have a portfolio of “power plants”, and they can use the SEMS Portal to perform “Fault self-analysis & troubleshooting”.

SEMS includes a range of functions and features to ensure reliable operation and to deliver precise information to operators at the press of a button. It is accessible by multiple accounts with different levels of access for owners, installers and EPC companies

## Post-setup state of play

So now these two devices were physically installed, and connected to GoodWe’s cloud over the internet via my isolated IoT VLAN. But I had questions:

I wanted to scrape metrics locally, dammit! Why should I have to use the crappy cloud UI or equally bad mobile app?
What else can GoodWe do with this connection? E.g. can they remotely administer the devices? If so, can I disable this “feature”?

It turns out that the inverter is powered by the solar panels, not by the grid. So it loses power and goes offline as soon as the sun goes down. And since I mostly have time to hack on this stuff after dark, I concentrated on the smart meter.

## Metrics extraction prior art

There is quite a cottage industry online documenting how to extract data from GoodWe inverters. They respond to Modbus queries, an Operational Technology standard. There are many Github repositories with useful information about the GoodWe Modbus protocol, such as:

a python library for extracting metrics;
a Home Assistant integration, built on that library; and
some GoodWe-specific field documentation.

Unfortunately, my Homekit 1000 smart meter is not supported by any of these libraries.

## Hacking the Homekit 1000

I’m presenting the process I followed in chronological order. So if you want to find out what actually worked, skip to the end.

### nmap

The first thing I did was fire up nmap, and point it at the HK1000. It showed listening TCP port 23 - good old Telnet! Connecting to this port and trying Username: admin, Password: admin gave me a command prompt!

$ nc 192.168.18.17 23
Login as:admin
Password:admin
CMD>?
cfg         net         os          mft         
CMD>

Poking around this prompt soon showed that it was pretty limited¹, and it seemed to be a development interface that was left enabled. I couldn’t get any metrics out of it.

I also ran nmap in aggressive mode and was rewarded with a hard crash in the web server, and the device resetting back to factory settings.

### Packet capture

Sniffing the traffic from the device showed that it was connecting out to tcp.goodwe-power.com:20001, and sending packets at regular intervals. However a quick look at the traffic revealed that while the serial number of my inverter was visible, the main body of the payload was a high-entropy blob. So the metrics data I was after seemed to be encrypted.

I also found a Github comment which came to the same conclusion.

### Modbus

There is a GoodWe Modbus protocol spec sheet and register map floating around the internet which was invaluable in understanding how GoodWe encodes metrics from their inverters. From this documentation I built a Modbus scanner that simply queried every register. The address is only 2 bytes wide, so there are ~65k possible addresses.

Unfortunately the HK1000 only returns a value for a single register address. I forget which register it was, but it was something useless like Firmware Version.

### AA55 protocol

GoodWe devices also support another (older?) protocol known as the AA55 protocol. I couldn’t find much info about it except for another old spec sheet.

I built a scanner for this too, but the HK1000 didn’t respond to any queries.

### ZZ/5A5A protocol (mobile app)

The SEMS portal mobile app has an interesting function where you can connect to the SOLAR-Wifixxxx network, and configure the device using the app but without any authentication.

Sniffing this traffic (thanks to airodump-ng and Wireshark’s WPA2 decrypt support) shows that the device can be configured without authentication by sending plaintext UDP packets to the right port. Of course, this port is listening on all interfaces so it also probably works via whichever local wifi network you connect the device to. Gross.

Screenshot of UDP dump showing the ZZ protocol

However, this protocol appeared to only be used for network configuration. I didn’t find any way of extracting data from the device using this protocol.

### Firmware Reverse Engineering

After no success with the query protocols, I decided that maybe the network was the wrong approach and I should try firmware instead. I managed to dump the firmware of the device using the command prompt and a command similar to this²:

echo -e 'admin\nadmin\nspi rd 0 2097152\n' | nc 192.168.18.17 23 | tee ~/download/hk1000.spi2.img

This hexdump is interspersed with log lines, and the bytes are transposed. So I dumped it twice, diffed the two dumps to eliminate the log lines, and fixed the transposition manually using vim.

Then I unhexlified the binary with xxd:

xxd -r -p hk1000.spi.img > hk1000.spi.bin.img

And ran binwalk over it:

binwalk -eM hk1000.spi.bin.img

This revealed that the OS was eCos RTOS on a MIPS architecture. I spent some time trying to reverse this binary using Ghidra, but honestly I just don’t know what I’m doing when it comes to binary reverse engineering.

Finally, while staring at the binwalk output, these lines caught my eye:

1976456       0x1E2888        AES Inverse S-Box
1977752       0x1E2D98        AES S-Box

### Packet Capture redux

Going back to the packet capture I finally noticed that the length of the encrypted blob section was always a multiple of 16, plus 2.

Wait a second… AES block size is 16 bytes!

## Analysis of the GoodWe metrics protocol

Since this is was a black-box analysis, I had to rely on probing via the I/O I controlled: network and power.

### Network “glitching”³

It was at this point that I found what would be the key to cracking the encryption scheme.

Back in October 2021, someone else did basically all the same work I did, and presented it at the Melbourne Linux User’s Group. Not only that, but they put their presentation online! Thank you Danny!

Anyway, Danny made a very interesting observation: if the internet connection went down, the device would buffer messages, and send them all at once when the connection came back up. Crucially, for buffered frames sent in the same second, the first few 16-byte blocks of ciphertext were identical!

I was able to replicate this locally!

### Empathy: a powerful reverse-engineering tool

When I’m looking at a problem like this, I like to put myself in the shoes of the developer. What kind of person are they? What are their motivations?

In this case, we can observe:

Telnet left on in a production firmware image, with credentials admin:admin.
nmap can crash the device hard enough to factory reset.
Packets sent over TCP with identifying data (serial number) in the clear.
The metrics seem to be poorly encrypted (identical section of ciphertext in consecutive frames).
Unauthenticated configuration protocol.
A web UI that looks like it was hacked together in an afternoon. Inspecting the source shows lots of commented out HTML blocks.

In Danny’s presentation, he used this slide after discovering the Telnet port password:

Picard facepalm meme

However I think this is more appropriate:

What these observations tell me is that GoodWe doesn’t put a great deal of effort into securing their devices, and therefore the developers working on this device didn’t have much incentive to create a secure protocol. So there’s a chance I can hack around their encryption.

Putting myself into the shoes of these developers, what would I need to implement a metric protocol?

Framing: this is TCP; it’s a byte stream. So we need a header of some kind to know where frames start.
Length: how many bytes after the header do we need to read to get the full frame?
Detecting data corruption: not anything malicious, just bitflips.

Looking at the packet captures, it is easy to see POSTGW is the frame header, and the very next field looks like a big-endian encoded int32 with a value consistently three bytes shy of the length of the data before the next POSTGW. That must be the length!

And finally: detecting data corruption. In the GoodWe Modbus document linked above, there is a description of the CRC used to detect data corruption. It is a standard Modbus CRC-16 (two bytes), designed to effectively detect bitflips. Again, assuming I am a software developer who is familiar with Modbus but who has been tasked with sending data over the internet (and didn’t really care much for security), why wouldn’t I use an algorithm or library I am already familiar with?

A quick check proves that running the data between the length field and the last two bytes through the Modbus CRC algorithm returns a value matching the last two bytes of the frame.

Annotated hexdump of protocol frame showing the components — An annotated frame, with length in red, device type and serial in green, timestamp in blue and purple, and encrypted blob in yellow.

My best guess for the length field being three bytes shy of the length of data rather than two is that it is just a sloppy implementation with an off-by-one error, which matches my profile of the developers.

Another data point to paint a picture of the engineering quality: the CRC of frames from the client are encoded in big-endian byte order (same as all the other integers encoded in the protocol). However the server sends the CRC in little-endian byte order. Why? Maybe the server is x86 and the developer forgot to call htons()?

Now I just had the encrypted blob to decipher.

### Encryption scheme

I guessed that they must be using AES in CBC mode because:

The identical section of ciphertext in consecutive frames is a classic CBC failure mode when reusing IVs.
This is an old mode and widely supported in libraries, making it easy to use.
Since they don’t care about security they are hardly likely to be using AEAD modes.

When implementing a scheme using CBC, it is critically important that initialization vectors are not reused. Otherwise identical plaintext will give you identical ciphertext. Metrics from a smart meter are highly likely to be the same minute-to-minute, which is probably why we see identical sections of ciphertext in successive frames with the same IV!

A common practice is to prefix the IV to the ciphertext. This is known as an explicit initialization vector, and it doesn’t need to be secret - just randomly generated in a cryptographically secure manner. However what if you are running on a microcontroller without a NRBG? Or maybe you just don’t know or care about CBC footguns? Then you have to use some other “unique-ish” value!

The device is designed to only send metrics every minute. Therefore the developers may have assumed that time based IVs will be unique enough, without taking into account buffering on network outage.

### Power “glitching”

The final and most difficult question: what is the encryption key?

The first thing I checked was what happened when the device rebooted: was there any key exchange or handshake? Fortunately the web UI has a reboot button, so it was easy to confirm that no, there is no key exchange on startup.

So because we are assuming AES (symmetric encryption), that probably means… fixed keys!

### Extracting the key

Since the keys are fixed, they are likely hard-coded. AES can use 16, 24, or 32 byte keys, so I started by assuming a 16-byte key. I suspected they’d use some string like GoodWeSolarPower, and store it as a static string or byte array. I poked around in the firmware a bit with Ghidra, but didn’t find any promising strings.

But in any case, there was another problem. One of the properties of AES-CBC is that you can plug any IV and secret key into it and it will “decrypt”. But unless the IV and key are correct, the output will be garbage. So how to know if I manage to correctly guess the IV and key?

At this point I made another educated guess. The frame header and length field use ASCII characters and leading null bytes respectively. Assuming the plaintext metric data is similarly structured, it will have relatively low Shannon entropy. Another property of AES is that it is a secure block cipher. That is, the ciphertext should be indistinguishable from random bytes. Therefore, using the incorrect key or IV should result in high entropy garbage.

Assuming the timestamp in the frame (which is null-padded to 16 bytes) is the IV, I wrote a really dumb tool to:

step through the firmware one byte at a time, taking the next 16 bytes as a key.
“decrypt” the encrypted blob using that key, and the timestamp prefix as the IV.
calculate the entropy of the decrypted blob. If it is below a given threshold, print the plaintext and key.

Fortunately although this was a very naïve brute force algorithm, one great thing about 2024 is that computers are fast.

Running this tool over the firmware dump from my device only took a few seconds and yielded… nothing. Huh.

Fortunately my previous googling efforts had discovered a public Google drive with relatively recent updates (early 2023) containing firmware for (all?) GoodWe inverters⁴. Running the tool over a firmware image for another device yielded… nothing again!

Finally on the third attempt, I got a single hit:

Screenshot of the firmware scan tool showing plaintext, the key used for decryption, and an entropy calculation of 2.862

Of course! The key was just all bits set. Why not!?

I doubt this was actually hard-coded as a key anywhere in the binary blob. I think I just got lucky that this firmware had a run of 0xff bytes.

## Extracting meaning from the plaintext

Finally I had a plaintext with obvious structure, but nothing mapping fields to metrics values. However I did have an oracle: the SEMS Portal API! I was able to dump metric values for my smart meter using curl on the SEMS Portal API, and observe the metrics changing every time a packet was sent from the smart meter.

Then by eyeballing the packets and the values (assuming standard two’s complement signed integer encoding) it was relatively straightforward, though a little time consuming, to map offsets to metrics values.

This wiring diagram was helpful to understand that there were really only two CT sensors and every other metric was calculated from those two numbers:

High-level Homekit 1000 wiring diagram

## Prometheus & Grafana

I like Prometheus for gathering metrics. So I built an exporter based on the research described above. It works by conducting a man-in-the-middle attack on the protocol. Pointing the HK1000 at the IP address of the exporter when it requests tcp.goodwe-power.com will cause the HK1000 to connect to the exporter instead of the GoodWe cloud. Then the exporter will sniff the metrics out of the frames and forward them to the real tcp.goodwe-power.com.

The nice thing about this design is that you still get metrics in SEMS Portal. These metrics are visible to your installer, so if you have problems it is easy for them to troubleshoot. I also added support for my inverter, which uses approximately the same protocol.

In addition, the Prometheus exporter will reject any packets from the server that it doesn’t understand. So hopefully unsolicited firmware updates will be blocked.

Finally, I created a dashboard in Grafana:

Household Power Grafana dashboard, showing a summary of power usage over a single day

## Conclusions

This exercise has reinforced my prejudice that IoT devices are horribly insecure. In the case of GoodWe, where they even have authentication, they use fixed default passwords such as admin, and leave Telnet debug interfaces listening on their production devices.
Although the metrics protocol and encryption scheme are insecure, I didn’t find anything that could really be described as a security vulnerability as opposed to a design decision.
Only the metrics were encrypted in the data sent to SEMS Portal over the internet. Not the model or serial number. So even with (bad) encryption, they have left the most sensitive data unprotected. I guess they are just obfuscating the metrics? Or maybe the boss asked for encryption? “He said encryption! Give him encryption!”.
Conversely the hardware seems pretty good, functions well, and looks great!
I spent months tinkering on this on-and-off. I was motivated by equal parts indignant anger at not being able to scrape metrics locally from a device so intimately integrated into my house and running on my network, and morbid curiosity about what security flaw I was going to uncover next. Now I understand what jwz means when he talks about writing software in self-defence.

## How to secure GoodWe devices

Finally, here’s my advice if you have a GoodWe device:

Whatever else you do, keep these things off the public internet! Preferably in your private, firewalled IoT VLAN.
There doesn’t seem to be a simple way to disable the Solar-Wifixxxx WLAN after the devices are set up. So set a strong password, because the default is admin. You can do this via the web UI.
The web server is listening on all interfaces, so it is accessible from your VLAN. Change the password for the web UI from admin to something a bit more secure. Note: not all devices have this option easily accessible. For example the HK1000 only allows changing this password via the Telnet interface.

For the paranoid:

My prometheus exporter drops incoming packets it doesn’t recognize. Only metrics will flow, not e.g. firmware updates (I hope - I haven’t seen any come through yet). So in theory it will block remote administration of the devices.

## Miscellaneous notes

This section contains a few notes I made that didn’t fit into the narrative of the blog post, but are interesting nonetheless.

### GoodWe’s Cyber Security claims

GoodWe has a page on Cyber Security on their website with a nice infographic, basically confirming everything I have just discovered:

In order to prevent cyber-attacks on photovoltaic systems to the greatest extent, inverter manufacturers usually deploy various security policies on the equipment side and server side. Taking GoodWe as an example, to ensure the security of data transmission between the inverter and the server, we use the transmission protocols of CRC+AES and TLS respectively for communication with servers with different functions.

Infographic diagram of an inverter and a laptop connecting to the cloud. The inverter connection has a lock icon and is labelled AES 128. The laptop connection also has a lock icon and is labelled HTTPS.

This is a great demonstration of how you can use secure cryptographic primitives such as AES-CBC, and still come up with an insecure encryption scheme.

### Hi-Flying and Xinwu

The GoodWe devices seem to use an IoT platform common to several Chinese manufacturers, for example Solarman. It has a unique discovery protocol where you broadcast a special packet to a given port, and the device replies with its IP, MAC, and SSID (which includes the device serial).

For example (in separate terminals):

nc -u -l -p 50123
192.168.18.17,907856FECDAB,Solar-WiFi12345678

echo -n WIFIKIT-214028-READ | nc -u -b -p 50123 192.168.18.255 48899

According to the config dumped from the Telnet command prompt, the chip in the HK1000 is the HF-A21, from a company called Hi-Flying, based in Shanghai. You can build your IoT device on top of this platform by loading your own application onto it, while the included OS takes care of the hardware, network etc.

An interesting part of the discovery protocol is the string 214028. Where does this come from? Well approximately 150km from the Hi-Flying office is Xinwu district, Wuxi. According to Wikipedia:

In 2013, the output value of Internet of Things (IoT) core industry in Wuxi New District exceeded 70 billion yuan, accounting for 38.4 percent of the output value of the whole high-tech industry in the district. Wuxi New District has formed a cloud computing industrial distribution, featuring hardware, platform and application.

Xinwu’s postcode is 214028.

### Remote administration

According to market researchers, GoodWe was the fifth largest supplier of solar inverters worldwide in 2022. GoodWe have full remote administration capability on the devices, including the capability to push firmware updates. This seems like a lot of power for any company, let alone a company headquartered in a totalitarian dictatorship, to have over national power grids.

### Batman mode

~~To validate the MITM functionality~~ For fun, I implemented Batman mode in the prometheus exporter. In this mode, rather than forwarding metrics to the SEMS Portal, the exporter replaces them with the batman equation.

Screenshot of SEMS Portal showing batman logo plotted on the power graph

### DNS updates

The GoodWe devices send their metrics to tcp.goodwe-power.com:20001. When I first started investigating the protocol in mid 2023, this resolved to an IP address in Alibaba Cloud. However late last year this was updated to now resolve to a pair of ELBs in AWS.

Screenshot of SecurityTrails showing the historical DNS records for tcp.goodwe-power.com

In both Alibaba Cloud and in AWS they seem to be doing DNS load balancing, because while the SecurityTrails screenshots above show US IPs, from here in Australia both those domains resolved to IPs in Alibaba Cloud China (previously), and now to AWS Sydney.

Here are the commands I figured out:
- ? display possible commands.
- ? <command> display command help.
- <command> enter subcommand menu or execute command.
- up go to parent command menu.
↩︎
This line is from my bash history, but I advise to start the length at a low value and slowly increase it. From what I remember, at some point reading memory will cause the device to crash and reboot. ↩︎
Yes, I know this isn’t what is generally referred to as “glitching” in reverse engineering. But it is somewhat analogous. ↩︎
I’m linking the drive here, but of course it may be shut down at some point, or the firmware deleted. Hopefully someone takes a backup. ↩︎

# Reverse-engineering an encrypted IoT protocol