Broken Connections (and USB gadgets)

So I’ve been messing with USB ACM gadgets recently…

Primarily as a nicer interface to debug the #postmarketOS initramfs (historically we just have had this whole thing with telnet and it’s not super fun).

Gadgets on my mind

A song to listen to while you read (if you're into that kind of thing)

Recently, I implemented a log dump feature that activates when the device can’t boot. Before we just had this ominous error message (like “ERROR: root partition not found!”) on the splash, and NO way to debug further. You had to enable a console and disable the splash, which on a phone requires modifiying the “Android boot image”, and really you had to build a modified ramdisk with the debug-shell hook installed so you could actually retrieve logs and poke about.

Well, what happens now is that we create a small disk image on the tmpfs, format it as fat32, fill it with a bunch of logs and info about the device and system, and then expose it as a mass storage device!

A white-on-black console log with timestamps on the left (all are 1808 seconds after boot), messages are prefixed “pmOS.rd”. It shows
32+0 records in
32+0 records out
33554432 bytes (32.0MB) copied, 0.033432 seconds, 957.2MB/s
loop0: detected capacity change from 0 to 65536
./_info
./blkid.txt
./cmdline.txt
./dmesg.txt
./fdt.dtb
./partitions.txt
./pmOS_init.txt
Making logs available via mass storage
Mass Storage Function, version: 2009/09/11
LUN: removable file: (no medium)
The ramdisk output while generating the log dump

The files are also compressed into a (gzip!!) archive so they can be easily drag/dropped into a gitlab issue, a README file is also created with instructions on how to do this.

There are a few situations where this feature doesn’t automatically activate, the initramfs can get into an infinite loop waiting for the root partition for example. But hey, I’m pretty happy with this overall. I haven’t heard much feedback from folks making use of it since it shipped… But I think I’m getting mentioned on Matrix/GitLab/Mastodon for tech support less…

If you’ve found this feature useful, I’d love to hear about it! My contact info is on the home page.

The topic at hand

This post is mostly about this merge request which may or may not be merged at the time you’re reading this.

Anyways, it’s actually really easy to create an ACM gadget via configfs on the phone and launch a getty on it, this then shows up as a serial device on your PC (which you can open with picocom or whatever), and now you have a nice way to introspect the ramdisk!

dmesg command output in color. timestamps all 204497 seconds after boot.
usb 7-2.2.3: new high-speed USB device number 89 using xhci_hcd
usb 7-2.2.3: New USB device found, idVendor=18d1, idProduct=d001, bcdDevice= 6.09
usb 7-2.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 7-2.2.3: Product: OnePlus 6
usb 7-2.2.3: Manufacturer: OnePlus
usb 7-2.2.3: SerialNumber: postmarketOS
cdc_ncm 7-2.2.3:1.0: MAC-Address: d2:60:09:a0:ee:c6
cdc_ncm 7-2.2.3:1.0 usb0: register ‘cdc_ncm’ at usb-0000:0f:00.4-2.2.3, CDC NCM (NO ZLP), d2:60:09:a0:ee:c6
cdc_acm 7-2.2.3:1.2: ttyACM2: USB ACM device
cdc_ncm 7-2.2.3:1.0 enp15s0f4u2u2u3: renamed from usb0
dmesg on my host PC when debug-shell is enabled

A bit about CDC-ACM

I used to be pretty terrified of USB gadget devices, it always felt somehow strange and cursed to me. Probably this is in large part because my experience with it is primarily on phones, where if you get it wrong you have to re-flash the thing to try again. The gadget is your only lifeline and the idea of losing it is pretty… scary? Or frustrating, however the mood takes you.

Nowadays, with real, physical serial ports at my disposal, I’m not scared to break a few milk bottles, uhh spill a few eggs,.. you know what I mean.

So really the only context that’s important for this post is the following:

Polishing the UX

So I wanted to have something like /etc/issue; a message that shows up when you open the terminal and tells you what’s up, points you to a URL to learn more about the initramfs, and gives you some useful commands (like pmos_continue_boot to quit debugging and finish booting). Making nice UX like this (in areas most people would just gloss over) is something I really enjoy about postmarketOS, both that I get to work on it, and that it’s something our community sees value in.

I went ahead and implemented this, writing the useful info to /README. The getty runs a wrapper, which spawns a login shell, that will source /etc/profile, this is configured to cat /README, aaand:

example debug shell welcome message:
postmarketOS debug shell
https://postmarketos.org/debug-shell
Kernel: 6.9.0-rc4-sdm845-00112-g7756e134631e-dirty
Device: oneplus-enchilada
OS ver: edge
initrd: 2.6.0-r0
Run ‘pmos_continue_boot’ to continue booting.
Run ‘pmos_logdump’ to generate a log dump and expose it over USB.
swanky debug-shell welcome message

Well… This works totally great on my real serial port, but for some reason when I go to open the USB gadget ACM device, there’s just some garbage and a bunch of newlines… Hmmm

picocom output:
Type [C-a] [C-h] to see available commands
Terminal ready
~ #
~ #
~ #
~ # postmarketOS d
/bin/sh: postmarketOS: not found
~ #
~ #
~ #
~ # https  OS
/bin/sh: https: not found
~ #
~ #
~ #
~ #
~ #
~ #
~ #
~ # R
/bin/sh: R: not found
~ #
~ # Run ‘pmos_logd^C
~ #
Attempting to run picocom on an ACM gadget serial port (with a getty on the other side).

What do?

A lucky google search later, I discovered this mini blog post describing EXACTLY the issue here! Wow that saved me a lot of time :D

It turns out, that whole time between the host CDC ACM driver loading and you opening the serial port, well all the bytes being sent from the device are happily thrown away by the TTY layer, and echo’d back! wtf!

There is a quirk one can set for their specific vendor/product ID which will make the CDC ACM driver clear the ECHO flag on the TTY port during init. For some applications it might be enough to just make your device pretend to be one of the devices with this quirk. In this case we’re also setting up a CDC NCM gadget, and I’d rather not run into weird udev quirks down the line (and in any case many pmOS devices override the vendor/product IDs).

Honestly I’m a little miffed on this one, and how nobody else has tried to fix it (this is ooooold code lol).

We could perhaps add a flag somewhere in the USB descriptors, an extension to the CDC ACM spec? But that risks breaking things…

There’s the f_serial (aka gser) gadget device, but it doesn’t have a host driver (it’s quite literally just a USB bulk endpoint where you spit data across)…

I’d love some suggestions! Bonus points if they don’t require making changes to the host ACM driver… I’m so not a fan of x86 kernel development ;P

Update: Workaround found

Since originally publishing this, I found a workaround which we’re now shipping in the postmarketOS initramfs.

Since the bug is due to the host PC echo’ing back all data you sent it at the moment someone opens the host serial port (e.g. by running picocom /dev/ttyACM0), we can detect the port being opened by sending a single character to the port on the device side and then waiting for any data to be sent back.

In most cases the character is buffered until the host side port is opened, it’s then usually echo’d back and forth a few times (this depends heavily based on some race conditions which is why the behaviour is different every time). The worst case then is that it doesn’t get echo’d back and the user hits the enter key to get the prompt to appear.

This should be largely host-agnostic, and your implementation may vary depending on your use case. But for my particular case, this let’s us avoid the user ending up with a bunch of weird garbage on their screen.