https://voidcomputing.hu/blog/worse-better-prettier/ Services Demos About Blog More AR glasses USB protocols: the Worse, the Better and the Prettier Published on 2023.10.10 [thumbnail] We've found a drop-in replacement for the Nreal Light, called the Grawoow G530 (or Metavision M53, and who knows how many other names), so we finally have 3 more protocols to write about in this blog. Table of contents * Background * The Worse: Grawoow G530 + USB interfaces + MCU Protocol + Getting the calibration data + IMU protocol * The Better: Rokid Max + Protocol + Display modes * The Prettier: XREAL Air + MCU protocol + Display modes + The IMU protocol * Extra: Monado driver for the Rokid Max Background The previous blog post in the topic has become somewhat of a reference in the community, and has actually driven some sales for the company, so it seemed like a good idea to write about our more recent findings, and share it with anyone interested. The post itself will probably be a bit dry for the casual reader. Sorry about that. The Worse: Grawoow G530 G530 on a dog We started searching for a replacement for the XREAL Light from Day 1, because it is not supported or manufactured by XREAL anymore. We needed glasses with stereo cams and active support. Some months ago I got contacted on LinkedIn by a Chinese seller, and after a bit of talking, we bought a test piece. It wasn't as easy as just going on a webshop (and I had to do all kinds of import paperwork), but it was still smooth: send some mails, wire money, receive glasses. I call it worse, because it is a tiny bit worse in every regard: it looks cheaper, the plastic parts fit worse, and the protocol is missing some information that is crucial for good support. The main thing it has going for it is that it's still available from distributors. The architecture is extremely similar to the XREAL Light, so much so that I'm only going to link the draft architecture pic. Main components: * All the standard DP-over-USB-C driving two micro OLED LCD display stuff on two USB 3 lanes * An USB 3 RGB CAM on the remaining two lanes * Stereo grayscale cams and IMU driven by an OV-580 (exactly the same as the Light, even the protocol... mostly) * Distance sensor at the forehead * Four physical buttons: brightness up/down, and volume up/down * Judging by the USB device descriptors, the MCU and audio functionalities are done by the same chip. Gotta love Chinese copying culture. By the way, the glasses seem to be widely white-labeled; the Metavision M53 seem to be the same hardware. Even the firmware and SDK say G530 and not M53. USB interfaces The device comes up as two hubs (one USB 3 and one USB 2) and 5 devices: * Realtek Semiconductor Corp. RGB Camera * VID: 0bda, PID: 5880 * Bog-standard USB3 camera, capable of full HD at some pretty high frame rates * CVT Electronics.Co.,Ltd G530 * VID: 1ff7, PID: 0ff4 * Interfaces + 0: HID. This is the glasses control (MCU) endpoint. + 1,2,3: Audio * OmniVision Technologies, Inc. USB Camera-OV580 * VID: 05a9 PID: 0f87 * The OV580 with an UVC (stereo cam) and a HID (IMU) interface MCU Protocol The MCU control is predominantly through control packets, although there is an interrupt endpoint for the forehead detector event. The control protocol is always two control packets, one to send the command and one to receive the result. The magic libusb parameters are: Send: bmRequestType: 0x21 (CTRL_TYPE_CLASS|CTRL_RECIPIENT_INTERFACE|ENDPOINT_OUT) bRequest: 9 wValue: 0x201 wIndex: 0 Receive: bmRequestType: 0xa1 (CTRL_TYPE_CLASS|CTRL_RECIPIENT_INTERFACE|ENDPOINT_IN) bRequest: 1 wValue: 0x102 wIndex: 0 Note that these are the standard SetReport and GetReport HID requests (see Section 7.2 in the HID Device Class definition), so these might be available with some standard report-based HID APIs. The packet structure is as follows: * Header: 2 bytes, fixed 0xaa, 0xbb * Command: 2 bytes, big endian * Additional data size: 2 bytes, big endian * Additional data: variable, can be 0 bytes * Checksum: sum of all previous bytes, excluding the 0xaa, 0xbb part. An example packets: * Get serial number: [0xaa, 0xbb, 0x80, 5, 0, 0, 0x85] * Response: [0xaa, 0xbb, 0x80, 5, 0, 5, 0x33, 0x31, 0x33, 0x33, 0x37, 0x8b] + (I've accidentally overwritten the serial before I could read the original one.) Commands (data is empty for "Get" commands here): Command ID Data Get firmware 0xffe1 8 bytes, unknown format version Get serial 0x8005 The serial number as UTF-8 string number Set serial 0x8004 Same as above number Get display 0x8007 Display mode as a single byte: 0 is mirrored, mode any other nonzero is SBS 60Hz Set display 0x8008 Same as above mode Get display 0x801d Brightness as a single byte, 0-4 brightness Set display 0x801e Same as above brightness Some commands I haven't listed (or tested really), but can be easily obtained from the SDK libraries: * Sensor and camera enable/disable: All sensors and cameras are enabled and streaming by default, so no need to touch those * Display settings, like brightness contrast per channel, and all kinds of low-level DP stuff * Audio volume * Firmware update * Sketchy stuff like getting and setting a HDCP key You can also continuously read the Interrupt endpoint on endpoint number 0x85, where you should get key and distance sensor events (in the same 0xaa 0xbb format as the control packets), but I only ever got the "glasses taken off" event, and it was not worth implementing. Getting the calibration data As opposed to the XREAL protos, where you can get the calibration JSON from the OV580, you actually have to do it over the above MCU protocol, using command IDs 0x8009 (metadata) and 0x800a (actual calibration data). The metadata response looks something like this: [0, 0, 0, 241, 0, 0, 10, 210, 3, 142] * 2 bytes header, which should be 0 * 2 bytes is the "max packet size" (big endian). We'll be doing 256 byte control packets anyway, but good to know I guess? * 4 bytes data size (big endian) * 2 more unknown bytes. The "get calibration data" packet needs additional data: a 0 byte, and then 4 byte offset, in big endian. So it's [0, 0, 0, 0, 0] for the first packet, [0, 0, 0, 0, 241] for the next, and so on. Response is the same 5 bytes followed by a 0 byte (so 6 in total), and then the actual data. If you request more data than the calibration file size, the packet will be smaller, or even empty. So requesting the metadata is kind of useless, you can just request data until you get an empty response. IMU protocol Fortunately this is another glasses that gives you an IMU stream out of the box, and you don't need to fight for it. All you have to do is continuously read 0x80 chunks on the HID interrupt endpoint 0x89 of the OV580 device. It is a large packet, and the SDK only parses the raw accelerometer, gyro and temperature data. A lot of the packets seem to be fixed bytes, and the only thing that changes (other than what we already know) are two sequence numbers. Yeah, sequence numbers, not even proper timestamps. All data are transferred as little endian signed ints. The conversion factors are the same as in the Invensense MPU6050 docs. Data Offset Size Conversion Acceleration 0x58 3*4 Divide by 16384.0 and then convert gs to m/ s2 Gyroscope 0x3c 3*4 Divide by 16.4 and then convert deg/s to rad/s Temperature 0x2a 2 Divide by 326.8, then add 25.0 The Better: Rokid Max Rokid Max on a dog The Rokid Max is a logical evolution of the Rokid Air. Better design, better fit, better protocol, and the DisplayPort part is apparently 2ms quicker, reducing motion-to-photon latency. Everything else is pretty much the same, so much so that most of it can be handled by the same code. They even kept the gimmicky focal adjustment knobs (even though it's still unusable for people who have astigmatism) Protocol The main new protocol element is "sensor data marker = 17" in the IMU data packets, which combines all previous packets into one. Its structure looks like this: Index Bytes Description 0x00 1 Sensor data marker (17) 0x01 8 Timestamp (little endian) 0x09 3x4 Gyroscope x, y and z reading in f32 format 0x15 3x4 Accelerometer x, y and z reading in f32 format 0x21 3x4 Magnetometer x, y and z reading in f32 format 0x2d 1 Physical key statuses (bitfield) 0x2e 1 Proximity sensor status (near=0, far=1) 0x2f 1 ? 0x30 8 Timestamp of last VSYNC (little endian) 0x38 3 ??? 0x3b 1 Display brightness 0x3c 1 Volume 0x3d 3 ??? Display modes The Max added a bunch of new display modes: Mode SBS Resolution Refresh rate 0 1920x1080 60Hz 1 Yes 3840x1080 60Hz 2 Yes* 1920x1080 60Hz 3 1920x1080 120Hz 4 Yes 3840x1200 90Hz 5 Yes 3840x1200 60Hz *: This is a "half SBS" mode, meaning that it splits the regular HD image in half, and then stretches each half horizontally over each of the glasses. Modes above 6 are equivalent to mode 3. The Prettier: XREAL Air XREAL Air on a dog The XREAL Air is not an evolution of the Light, it is much more like the Rokid Max, but with a way better design. And I mean a lot better, the thing actually looks like regular (albeit a bit big) sunglasses. It is the first AR glasses that passes the "Tram #4 test": I could wear it on Tram #4 and people wouldn't really notice. Maybe the cable hanging down. Unfortunately it doesn't have a camera, so no inside-out 6DOF anymore. On the other hand, it has the absolute lowest display delay out of all 6 we described in these blog posts, so the image is rock stable even with dynamic head movements. The protocol is weird. They kept the separate USB interfaces for the MCU and IMU + DSP pair. Both are different from the Light's. Unfortunately this post was written way after I finished work on the Air, so I'm writing it based on the code of ar-drivers-rs. MCU protocol Packets are sent over regular HID read() and write() primitives, over interface 4 (endpoints 0x86 and 0x07). Packet size is 0x40 both ways. Index Bytes Description 0x00 1 Header (0xfd) 0x01 4 Checksum (see below) 0x05 2 Length of additional data 0x07 4 Request ID (not checked by the MCU, only used to identify answers. Can be anything) 0x0b 4 Timestamp (also not checked, can be 0) 0x0f 2 Command ID 0x11 5 Zeros (probably) 0x16 n Additional data Every int is Little Endian. The checksum is CRC32(Adler) like the Light's. The checksum data is from byte 5 to the end of the packet (i.e. the length field + 17). Again, there is no need to individually enable events or hardware, so we only need the bare minimum commands: Command ID Data Get MCU FW version 0x0026 Version as UTF-8 string Get serial number 0x0015 The serial number as UTF-8 string Get display mode 0x0007 Display mode as a single byte Set display mode 0x0008 Same as above There is a .js file in the official app that describes a lot more commands for both the Air and the Light. There aren't many interesting things, just a couple version strings, firmware update, reboot, and fiddling with the display. Some asynchronous events also arrive on the same channel (sometimes between command and its reply). They use the same packet format as the commands and replies. The only one worth looking for is ID 0x6c05, which is the key press (more precisely key release) event. Display modes They also added a lot more display modes: Mode SBS Resolution Refresh rate 1 1920x1080 60Hz 3 Yes 3840x1080 60Hz 4 Yes 3840x1080 72Hz 5 1920x1080 72Hz 8 Yes* 1920x1080 60Hz 9 Yes 3840x1080 90Hz 10 1920x1080 90Hz 11 1920x1080 120Hz *: This is a "half SBS" mode, meaning that it splits the regular HD image in half, and then stretches each half horizontally over each of the glasses. This is the replacement for Mode 1, which was vertically stretched half-SBS on the Light. Invalid display modes cause an error, and I checked all 256 values. The IMU protocol IMU packets are also sent/received with regular HID read() and write (), over interface 3 (endpoints 0x84 and 0x05), with 0x40-sized packets. Index Bytes Description 0x00 1 Header (0xaa) 0x01 4 Checksum (same as MCU checksum) 0x05 2 Length of additional data 0x07 1 Command ID 0x08 n Additional data Every int is Little Endian. Interestingly, while the packet format is very different, the commands are exactly the same as the Light's: Command Id Command data Get calibration 0x14 Calibration file id according to the SDK, file length doesn't seem to affect anything. Can be empty Get calibration 0x15 Should be block number. Doesn't do anything, can file part be empty. Enable IMU 0x19 0: disable, 1: Enable stream The calibration file format is similar, although this time they didn't stuff 3 different files in there, you only have the JSON. The IMU packet format is different, more compact, but the logic is the same: Index Bytes Description 0x00 2 Header (0x01, 0x02) 0x02 2 Temperature (raw data from the ICM-20602) 0x04 8 Timestamp (nanoseconds) 0x0C 2 Gyroscope multiplier 0x0e 4 Gyroscope divisor 0x12 3 Gyroscope X reading 0x15 3 Gyroscope Y reading 0x18 3 Gyroscope Z reading 0x1b 2 Accelerometer multiplier 0x1d 4 Accelerometer divisor 0x21 3 Accelerometer X reading 0x24 3 Accelerometer Y reading 0x27 3 Accelerometer Z reading 0x2a 2 Magnetometer offset 0x2c 4 Magnetometer divisor 0x30 2 Magnetometer X reading 0x32 2 Magnetometer Y reading 0x34 2 Magnetometer Z reading Yes, there are 3 byte signed integers there. They are encoded the same way as "regular" 4 byte integers (little endian, one's complement), but on 3 bytes. Thankfully the Rust parsing library I use has built-in support for these, because manually converting is a pain. One thing to note is that the coordinate system of the raw sensor readings is different from the calibration file's coordinate system. Extra: Monado driver for the Rokid Max I always wanted to support the Rokid Max, but I didn't really want to buy one just to do it. Thankfully, a kind soul from Canada actually got in contact with me on github paid for both the glasses and my time to do it. Thanks again Mauve. The only extra was that I had to also make a Monado driver. Monado is a nice piece of software that implements the OpenXR API, so any OpenXR-using apps (major 3D engines, some AR desktops for example) can use any Monado-supported hardware. They have a very friendly discord, and the code is very good quality, it was a joy to work with, and my code got reviewed basically instantly. Once the comments were fixed, it was in trunk the next day. Support for the Rokid Max has been merged to main. Some people are working on supporting the Nreal Air, and (as of writing) it works well, but there are some kinks to be ironed out. Maybe you can help :) --------------------------------------------------------------------- Previous article: New site design If you need Augmented Reality problem solving, or want help implementing an AR or VR idea, drop us a mail at info@voidcomputing.hu