Cluster Computer Gotchas: U-Boot, Device Trees, iPXE
The Pine64 clusterboard with seven SOPINEs (and 28 ARM64 cores) is wonderful. It does, however, have hardware that is dissimilar to the baseboard holding a single SOPINE. There is no HDMI nor audio output for one, and a few unnecessary (for cluster computing) IO ports are omitted such as the display panel and camera (MIPI) headers. There is a handy LED on GPIO pin 2 (PL7) of the 10×2 respective headers. There is also a Realtek RTL8211E with reported timing issues we have to deal with.
U-Boot
- Allwinner EMAC TX Timing
- Ethernet Voltage Ramp-Up
- Realtek RTL8211E Timing Fix
- Ethernet GMAC TX Delay
- Spurious Timing Issue
- Disable USB in U-Boot for Fast Booting
- Disabling Distribution Defaults
- SOPINE-on-Clusterboard Device Tree
- Clusterboard LEDs as Boot Stage Indicators
- Custom U-Boot Board for SOPINEs
- U-Boot SPL and Full Builds
- Clusterboard LEDs and Boot Progress Indicators (code)
- Clusterboard LEDs Demonstration
- Patch the SUNXI Watchdog Timer
- Device Trees from Scratch
- Error “.rodata will not fit in .sram”
- SPI NOR-Flash Booting
- Summary of Changed U-Boot Files
iPXE
U-Boot on Pine64 Clusterboard Gotchas
U-Boot is a ubiquitous ARM64 bootloader, and this experience has let me dive deep into the C++ code and make helpful modifications. With U-Boot, it is possible to chain load iPXE to boot bare-metal hardware over HTTP(S). Very useful.
But first, here are some of the U-Boot gotchas the community solved but not yet in mainline U-Boot as of v2021.04. My hope is this information being in one place is helpful.
Allwinner EMAC TX Timing
A user (dippywood) in October 2020 in an Armbian forum came up with a script to patch a TX timing issue in the device tree for the SOPINE A64, the crux of which is increasing the timing delay from default 0ms to 500ms. As of 2021, this update is not in mainline U-Boot, so here is the procedure I followed.
The default TX timing of the EMAC is 0ms as seen below.
951 952 953 954 955 956 957 958 959 | // REF: https://github.com/u-boot/u-boot/blob/v2021.01/drivers/net/sun8i_emac.c if (!priv->use_internal_phy) parse_phy_pins(dev); sun8i_pdata->tx_delay_ps = fdtdec_get_int(gd->fdt_blob, node, "allwinner,tx-delay-ps", 0); // <-- Default timing of zero if (sun8i_pdata->tx_delay_ps < 0 || sun8i_pdata->tx_delay_ps > 700) printf("%s: Invalid TX delay value %d\n", __func__, sun8i_pdata->tx_delay_ps); |
In the baseboard DTS, which is a surrogate for the clusterboard, there is no TX timing information: the default is then zero.
78 79 80 81 82 83 84 85 86 | // REF: https://github.com/u-boot/u-boot/blob/v2021.01/arch/arm/dts/sun50i-a64-sopine-baseboard.dts &emac { pinctrl-names = "default"; pinctrl-0 = <&rgmii_pins>; phy-mode = "rgmii"; phy-handle = <&ext_rgmii_phy>; phy-supply = <®_dc1sw>; status = "okay"; }; |
Searching other board DTS files, and the patch in the forum, it suffices to add one (or two) lines to the sun50i-a64-sopine-baseboard.dts
and rename it to sun50i-a64-sopine-clusterboard.dts
.
77 78 79 80 81 82 83 84 85 86 87 88 | // REF: https://github.com/u-boot/u-boot/search?q=tx-delay-ps // REF: https://github.com/u-boot/u-boot/blob/v2021.01/arch/arm/dts/sun50i-a64-sopine-baseboard.dts &emac { pinctrl-names = "default"; pinctrl-0 = <&rgmii_pins>; phy-mode = "rgmii"; phy-handle = <&ext_rgmii_phy>; phy-supply = <®_dc1sw>; allwinner,rx-delay-ps = <0>; // Added allwinner,tx-delay-ps = <500>; // Added status = "okay"; }; |
phy-supply = <®_gmac_3v3>;
and phy-mode = "rgmii-id";
better, as used in the Rockchip DTS? DM me if you find out.Going forward, let’s use sun50i-a64-sopine-clusterboard.dts
as a new DTS file dedicated to the clusterboard.
Ethernet Voltage Ramp-Up
I’ll give the Ethernet PHY more time for the VCC to ramp up, say 300ms, from 100ms just in case that is an issue. Why not?
124 125 126 127 128 129 130 131 132 133 | // REF: https://github.com/u-boot/u-boot/blob/v2021.01/arch/arm/dts/sun50i-a64-sopine-baseboard.dts ®_dc1sw { /* * Ethernet PHY needs 30ms to properly power up and some more * to initialize. 100ms should be plenty of time to finish * whole process. */ regulator-enable-ramp-delay = <100000>; // <-- Increase to 300000 regulator-name = "vcc-phy"; }; |
Realtek RTL8211E Timing Fix
In 2021, in the Mainline U-Boot sopine_baseboard_defconfig
there is still no Realtek timing fix explicitly enabled. Actually, Realtek isn’t enabled at all, yet in March 2018 a commit added a RealTek RTL8211E timing fix. See below.
1 2 3 4 5 6 7 8 | // REF: https://github.com/u-boot/u-boot/blob/v2021.01/drivers/net/phy/realtek.c static int rtl8211e_probe(struct phy_device *phydev) { #ifdef CONFIG_RTL8211E_PINE64_GIGABIT_FIX phydev->flags |= PHY_RTL8211E_PINE64_GIGABIT_FIX; #endif return 0; } |
Let’s enable the RTL8211E in the clusterboard.
1 2 3 4 5 6 7 8 9 10 11 | # REF: https://github.com/u-boot/u-boot/blob/v2021.01/configs/sopine_baseboard_defconfig CONFIG_ARM=y CONFIG_ARCH_SUNXI=y CONFIG_SPL=y .. # Add these lines below # REF: https://forum.pine64.org/showthread.php?tid=5581&page=2 # REF: https://github.com/janwillies/u-boot/commit/73ce03a08b3989b874b88f4f17286308c6cd5eea CONFIG_PHY_REALTEK=y CONFIG_RTL8211E_PINE64_GIGABIT_FIX=y |
Now the clusterboard will be able to communicate with the onboard switch and packets will reach the network.
Ethernet GMAC TX Delay
From Linux Suni,
For reliable Gigabit networking (1000Mbit operation), several sunxi devices require an important tweak that adjusts the relative timing of the clock and data signals to the PHY, in order to compensate for differing trace lengths on the PCB … Recent mainline U-Boot uses
CONFIG_GMAC_TX_DELAY
to initialize these devices accordingly. If a necessary GMAC TX delay isn’t set, then GBit Ethernet operation might be unreliable or won’t work at all.
For completeness, let’s explore this more.
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | // REF: https://github.com/u-boot/u-boot/blob/v2021.01/board/sunxi/gmac.c void eth_init_board(void) { int pin; struct sunxi_ccm_reg *const ccm = (struct sunxi_ccm_reg *)SUNXI_CCM_BASE; /* Set MII clock */ #ifdef CONFIG_RGMII setbits_le32(&ccm->gmac_clk_cfg, CCM_GMAC_CTRL_TX_CLK_SRC_INT_RGMII | CCM_GMAC_CTRL_GPIT_RGMII); setbits_le32(&ccm->gmac_clk_cfg, CCM_GMAC_CTRL_TX_CLK_DELAY(CONFIG_GMAC_TX_DELAY)); // <-- GMAC TX delay used here #else ... |
After searching the U-Boot repo for CONFIG_GMAC_TX_DELAY
, let’s add CONFIG_GMAC_TX_DELAY=0
to the sopine_clusterboard_defconfig
because visually I can see there is an effort to make the trace lengths the same on the clusterboard. Nice.
Spurious Timing Issue
It has been found that the generic ARMv8 timer has spurious timeouts. This tweak purports to fix spurious timeouts which have been seen during testing the SPI driver. Timeouts disappear when the number of bits are reduced to 10.
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | // REF: https://github.com/u-boot/u-boot/blob/v2021.01/arch/arm/cpu/armv8/generic_timer.c // REF: https://github.com/u-boot/u-boot/commit/d503bd07db493eec31c94e91ffd98c27b982b464 #elif CONFIG_SUNXI_A64_TIMER_ERRATUM /* * This erratum sometimes flips the lower 11 bits of the counter value * to all 0's or all 1's, leading to jumps forwards or backwards. * Backwards jumps might be interpreted all roll-overs and be treated as * huge jumps forward. * The workaround is to check whether the lower 11 bits of the counter are * all 0 or all 1, then discard this value and read again. * This occasionally discards valid values, but will catch all erroneous * reads and fixes the problem reliably. Also this mostly requires only a * single read, so does not have any significant overhead. * The algorithm was conceived by Samuel Holland. */ unsigned long timer_read_counter(void) { unsigned long cntpct; isb(); do { asm volatile("mrs %0, cntpct_el0" : "=r" (cntpct)); } while (((cntpct + 1) & GENMASK(9, 0)) <= 1); // <-- Changed from 10 to 9 (10 bits) return cntpct; } ... |
Let’s not forget to enable the erratum in the defconfig:
1 2 | # Fix a spurious timer timout issue CONFIG_SUNXI_A64_TIMER_ERRATUM=y |
After flashing several SOPINES I’ve yet to notice any deleterious effects.
Disable USB in U-Boot for Fast Booting
DISTRO_DEFAULTS
in U-Boot may break everything.Disabling USB in U-Boot to shave off 10s of start-up time may result in BOOTP/DHCP packets not leaving the hardware. There is a MAC address, but no network egress of packets. For example,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ethernet@1c30000 Waiting for PHY auto negotiation to complete...... done BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 BOOTP broadcast 4 ... BOOTP broadcast 14 BOOTP broadcast 15 BOOTP broadcast 16 BOOTP broadcast 17 missing environment variable: pxeuuid missing environment variable: bootfile Retrieving file: pxelinux.cfg/01-02-ba-aa-cf-c7-a3 *** ERROR: `serverip' not set missing environment variable: bootfile Retrieving file: pxelinux.cfg/00000000 ... |
After the previous timing fixes, DHCP works consistently.
1 2 3 4 5 6 7 | ethernet@1c30000 Waiting for PHY auto negotiation to complete....... done BOOTP broadcast 1 BOOTP broadcast 2 DHCP client bound to address 192.168.1.164 (265 ms) Using ethernet@1c30000 device TFTP from server 192.168.1.118; our IP address is 192.168.1.164 ... |
Disabling Distribution Defaults
CONFIG_DISTRO_DEFAULTS
in your defconfig file to remove USB support, be sure to manually add back the various boot commands it formerly selected. Without this, our SPI flashing script will not run.Add the following to your sopine_clusterboard_defconfig
to “manually” add back distro defaults without USB support.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # Disable USB for faster booting # CONFIG_DISTRO_DEFAULTS is not set # CONFIG_USB_EHCI_HCD is not set # CONFIG_USB_OHCI_HCD is not set # CONFIG_SYS_USB_EVENT_POLL_VIA_INT_QUEUE is not set # CONFIG_USB is not set # CONFIG_CMD_USB is not set # CONFIG_CMD_USB_MASS_STORAGE is not set # CONFIG_DM_USB is not set # However, since we disabled distro defaults, # we need to manually add back everything # that would be selected except USB. CONFIG_AUTO_COMPLETE=y CONFIG_CMDLINE_EDITING=y CONFIG_CMD_BOOTI=y CONFIG_CMD_BOOTZ=y CONFIG_CMD_DHCP=y CONFIG_CMD_ENV_EXISTS=y CONFIG_CMD_EXT2=y CONFIG_CMD_EXT4=y CONFIG_CMD_FAT=y CONFIG_CMD_FS_GENERIC=y CONFIG_CMD_PART=y CONFIG_CMD_PIN=y CONFIG_CMD_MII=y CONFIG_USE_BOOTCOMMAND=y |
SOPINE-on-Clusterboard Device Tree
There is no DTB file for the clusterboard. Let’s replace the contents in U-Boot’s arch/arm/dts/Makefile
with just the essentials, noting to add the new clusterboard DTB (soon).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # SPDX-License-Identifier: GPL-2.0+ dtb-$(CONFIG_MACH_SUN50I) += \ sun50i-a64-pine64-lts.dtb \ sun50i-a64-pine64-plus.dtb \ sun50i-a64-pine64.dtb \ sun50i-a64-sopine-baseboard.dtb \ sun50i-a64-sopine-clusterboard.dtb # <-- Add the new DTB targets += $(dtb-y) # Add any required device tree compiler flags here DTC_FLAGS += PHONY += dtbs dtbs: $(addprefix $(obj)/, $(dtb-y)) @: clean-files := *.dtb *.dtbo *_HS |
Customizing a device tree with fragments and aliases is beyond this article, but we can start by copying the sopine baseboard DTS, renaming the model, removing the missing hardware, and including inherited boards like below.
1 2 3 4 5 | #include "sun50i-a64-sopine.dtsi" / { model = "SOPINE Clusterboard"; compatible = "pine64,sopine-clusterboard", "pine64,sopine-baseboard", "pine64,sopine", "allwinner,sun50i-a64-plus", "allwinner,sun50i-a64"; |
We should be able to use sun50i-a64-sopine-clusterboard.dtb
with the Linux distro as well.
Clusterboard LEDs as Boot Stage Indicators
As alluded to earlier, there are seven LEDs – one for each SOPINE – on GPIO pin 2 (PL7) of each of the 10×2 GPIO headers. Let’s make that work for us during boot.
First, the DTS file is missing LED info. Let’s borrow some code from Linus Torvalds that is not in mainline U-Boot:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #include <dt-bindings/leds/common.h> #include "sun50i-a64-sopine.dtsi" / { model = "SOPINE Clusterboard"; compatible = "pine64,sopine-clusterboard", "pine64,sopine-baseboard", "pine64,sopine", "allwinner,sun50i-a64-plus", "allwinner,sun50i-a64"; // Add the below LED description to access PL7 on the clusterboard (possibly optional) // REF: https://github.com/torvalds/linux/blob/master/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-lts.dts leds { compatible = "gpio-leds"; led { function = LED_FUNCTION_STATUS; color = <LED_COLOR_ID_GREEN>; gpios = <&r_pio 0 7 GPIO_ACTIVE_LOW>; /* PL7 */ }; }; |
An easy way to blink a given LED during boot is invoke the following commands in a boot script (boot.scr
):
1 2 3 4 | setenv led_off 'gpio set PL7' setenv led_on 'gpio clear PL7' setenv blink_led 'run led_on; sleep 0.1; run led_off' run blink_led |
This script can be expanded, but only runs during the execution of that boot script, not throughout the boot process.
Can we get more creative? Absolutely. Let’s take advantage of some built-in U-Boot facilities to make the LED blink at different boot stages. We could use the LED API, but if you are like me you will find it unsatisfying. Let’s be more sophisticated with the LED we have per SOPINE.
Where to begin? Here is a breadcrumb I found at the bottom of common/init/board_init.c
:
1 2 3 4 | /* * Board-specific Platform code can reimplement show_boot_progress () if needed */ __weak void show_boot_progress(int val) {} |
Searching through the U-Boot source will yield the Kconfig for CONFIG_SHOW_BOOT_PROGRESS
revealing a series of boot stage integers we can use to blink an LED or more.
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 | # REF: https://github.com/u-boot/u-boot/blob/v2021.01/common/Kconfig.boot#L492 config SHOW_BOOT_PROGRESS bool "Show boot progress in a board-specific manner" help Defining this option allows to add some board-specific code (calling a user-provided function show_boot_progress(int) that enables you to show the system's boot progress on some display (for example, some LEDs) on your board. At the moment, the following checkpoints are implemented: Legacy uImage format: Arg Where When 1 common/cmd_bootm.c before attempting to boot an image -1 common/cmd_bootm.c Image header has bad magic number 2 common/cmd_bootm.c Image header has correct magic number -2 common/cmd_bootm.c Image header has bad checksum 3 common/cmd_bootm.c Image header has correct checksum -3 common/cmd_bootm.c Image data has bad checksum 4 common/cmd_bootm.c Image data has correct checksum -4 common/cmd_bootm.c Image is for unsupported architecture 5 common/cmd_bootm.c Architecture check OK -5 common/cmd_bootm.c Wrong Image Type (not kernel, multi) 6 common/cmd_bootm.c Image Type check OK -6 common/cmd_bootm.c gunzip uncompression error -7 common/cmd_bootm.c Unimplemented compression type 7 common/cmd_bootm.c Uncompression OK 8 common/cmd_bootm.c No uncompress/copy overwrite error -9 common/cmd_bootm.c Unsupported OS (not Linux, BSD, VxWorks, QNX) 9 common/image.c Start initial ramdisk verification -10 common/image.c Ramdisk header has bad magic number -11 common/image.c Ramdisk header has bad checksum 10 common/image.c Ramdisk header is OK -12 common/image.c Ramdisk data has bad checksum 11 common/image.c Ramdisk data has correct checksum 12 common/image.c Ramdisk verification complete, start loading -13 common/image.c Wrong Image Type (not PPC Linux ramdisk) 13 common/image.c Start multifile image verification 14 common/image.c No initial ramdisk, no multifile, continue. 15 arch/<arch>/lib/bootm.c All preparation done, transferring control to OS -30 arch/powerpc/lib/board.c Fatal error, hang the system -31 post/post.c POST test failed, detected by post_output_backlog() -32 post/post.c POST test failed, detected by post_run_single() 34 common/cmd_doc.c before loading a Image from a DOC device -35 common/cmd_doc.c Bad usage of "doc" command 35 common/cmd_doc.c correct usage of "doc" command -36 common/cmd_doc.c No boot device 36 common/cmd_doc.c correct boot device -37 common/cmd_doc.c Unknown Chip ID on boot device 37 common/cmd_doc.c correct chip ID found, device available -38 common/cmd_doc.c Read Error on boot device 38 common/cmd_doc.c reading Image header from DOC device OK -39 common/cmd_doc.c Image header has bad magic number 39 common/cmd_doc.c Image header has correct magic number -40 common/cmd_doc.c Error reading Image from DOC device 40 common/cmd_doc.c Image header has correct magic number 41 common/cmd_ide.c before loading a Image from a IDE device -42 common/cmd_ide.c Bad usage of "ide" command 42 common/cmd_ide.c correct usage of "ide" command -43 common/cmd_ide.c No boot device 43 common/cmd_ide.c boot device found -44 common/cmd_ide.c Device not available 44 common/cmd_ide.c Device available -45 common/cmd_ide.c wrong partition selected 45 common/cmd_ide.c partition selected -46 common/cmd_ide.c Unknown partition table 46 common/cmd_ide.c valid partition table found -47 common/cmd_ide.c Invalid partition type 47 common/cmd_ide.c correct partition type -48 common/cmd_ide.c Error reading Image Header on boot device 48 common/cmd_ide.c reading Image Header from IDE device OK -49 common/cmd_ide.c Image header has bad magic number 49 common/cmd_ide.c Image header has correct magic number -50 common/cmd_ide.c Image header has bad checksum 50 common/cmd_ide.c Image header has correct checksum -51 common/cmd_ide.c Error reading Image from IDE device 51 common/cmd_ide.c reading Image from IDE device OK 52 common/cmd_nand.c before loading a Image from a NAND device -53 common/cmd_nand.c Bad usage of "nand" command 53 common/cmd_nand.c correct usage of "nand" command -54 common/cmd_nand.c No boot device 54 common/cmd_nand.c boot device found -55 common/cmd_nand.c Unknown Chip ID on boot device 55 common/cmd_nand.c correct chip ID found, device available -56 common/cmd_nand.c Error reading Image Header on boot device 56 common/cmd_nand.c reading Image Header from NAND device OK -57 common/cmd_nand.c Image header has bad magic number 57 common/cmd_nand.c Image header has correct magic number -58 common/cmd_nand.c Error reading Image from NAND device 58 common/cmd_nand.c reading Image from NAND device OK -60 common/env_common.c Environment has a bad CRC, using default 64 net/eth.c starting with Ethernet configuration. -64 net/eth.c no Ethernet found. 65 net/eth.c Ethernet found. -80 common/cmd_net.c usage wrong 80 common/cmd_net.c before calling net_loop() -81 common/cmd_net.c some error in net_loop() occurred 81 common/cmd_net.c net_loop() back without error -82 common/cmd_net.c size == 0 (File with size 0 loaded) 82 common/cmd_net.c trying automatic boot 83 common/cmd_net.c running "source" command -83 common/cmd_net.c some error in automatic boot or "source" command 84 common/cmd_net.c end without errors FIT uImage format: Arg Where When 100 common/cmd_bootm.c Kernel FIT Image has correct format -100 common/cmd_bootm.c Kernel FIT Image has incorrect format 101 common/cmd_bootm.c No Kernel subimage unit name, using configuration -101 common/cmd_bootm.c Can't get configuration for kernel subimage 102 common/cmd_bootm.c Kernel unit name specified -103 common/cmd_bootm.c Can't get kernel subimage node offset 103 common/cmd_bootm.c Found configuration node 104 common/cmd_bootm.c Got kernel subimage node offset -104 common/cmd_bootm.c Kernel subimage hash verification failed 105 common/cmd_bootm.c Kernel subimage hash verification OK -105 common/cmd_bootm.c Kernel subimage is for unsupported architecture 106 common/cmd_bootm.c Architecture check OK -106 common/cmd_bootm.c Kernel subimage has wrong type 107 common/cmd_bootm.c Kernel subimage type OK -107 common/cmd_bootm.c Can't get kernel subimage data/size 108 common/cmd_bootm.c Got kernel subimage data/size -108 common/cmd_bootm.c Wrong image type (not legacy, FIT) -109 common/cmd_bootm.c Can't get kernel subimage type -110 common/cmd_bootm.c Can't get kernel subimage comp -111 common/cmd_bootm.c Can't get kernel subimage os -112 common/cmd_bootm.c Can't get kernel subimage load address -113 common/cmd_bootm.c Image uncompress/copy overwrite error 120 common/image.c Start initial ramdisk verification -120 common/image.c Ramdisk FIT image has incorrect format 121 common/image.c Ramdisk FIT image has correct format 122 common/image.c No ramdisk subimage unit name, using configuration -122 common/image.c Can't get configuration for ramdisk subimage 123 common/image.c Ramdisk unit name specified -124 common/image.c Can't get ramdisk subimage node offset 125 common/image.c Got ramdisk subimage node offset -125 common/image.c Ramdisk subimage hash verification failed 126 common/image.c Ramdisk subimage hash verification OK -126 common/image.c Ramdisk subimage for unsupported architecture 127 common/image.c Architecture check OK -127 common/image.c Can't get ramdisk subimage data/size 128 common/image.c Got ramdisk subimage data/size 129 common/image.c Can't get ramdisk load address -129 common/image.c Got ramdisk load address -130 common/cmd_doc.c Incorrect FIT image format 131 common/cmd_doc.c FIT image format OK -140 common/cmd_ide.c Incorrect FIT image format 141 common/cmd_ide.c FIT image format OK -150 common/cmd_nand.c Incorrect FIT image format 151 common/cmd_nand.c FIT image format OK |
For the curious, sprinkled throughout U-Boot are bootstage markers like the following:
1 | bootstage_mark(BOOTSTAGE_ID_NET_ETH_START); |
show_boot_progress()
.Now, all we have to do (but don’t do this yet) is add a new method to board/sunxi/board.c like so:
1 2 3 4 5 | #ifdef CONFIG_SHOW_BOOT_PROGRESS void show_boot_progress(int progress) { printf("%i\n", progress); } #endif |
We can see the boot progress numbers appear. Very nice.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ethernet@1c30000 Waiting for PHY auto negotiation to complete....... done 170 BOOTP broadcast 1 170 BOOTP broadcast 2 DHCP client bound to address 192.168.1.164 (260 ms) 171 Using ethernet@1c30000 device TFTP from server 192.168.1.118; our IP address is 192.168.1.164 Filename 'ipxe.efi'. Load address: 0x42000000 Loading: ################ 2.4 MiB/s done Bytes transferred = 233984 (39200 hex) 81 82 84 |
What about making the clusterboard LEDs blink? We’ll come to that soon, but first, we need to talk more about hardware, next.
Custom U-Boot Board for SOPINEs
Do we really want to hack the common SUNXI board file to add LED support? Let’s complement it instead with a dedicated board for the clusterboard:
1 2 | # configs/sopine_clusterboard_defconfig: CONFIG_TARGET_SOPINE_CLUSTERBOARD=y |
1 2 3 4 5 6 7 | # board/sunxi/Makefile obj-y += board.o obj-y += sopine-clusterboard.o // <-- Add this line obj-$(CONFIG_SUN7I_GMAC) += gmac.o obj-$(CONFIG_MACH_SUN4I) += dram_sun4i_auto.o obj-$(CONFIG_MACH_SUN5I) += dram_sun5i_auto.o obj-$(CONFIG_MACH_SUN7I) += dram_sun5i_auto.o |
Because I like to write portable code, I’ve added TARGET_SOPINE_CLUSTERBOARD
to Kconfig.
506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 | # arch/arm/Kconfig config ARM64_SUPPORT_AARCH32 bool "ARM64 system support AArch32 execution state" depends on ARM64 default y if !TARGET_THUNDERX_88XX help This ARM64 system supports AArch32 execution state. ### Add these lines above 'choice' ############# config TARGET_SOPINE_CLUSTERBOARD bool "Support SOPINEs on a Clusterboard" default y if MACH_SUNXI select ARCH_SUNXI select SUNXI_A64_TIMER_ERRATUM select SHOW_BOOT_PROGRESS select SUNXI_GPIO select DM_GPIO ################################################ choice prompt "Target select" default TARGET_HIKEY ... |
Now we have a dedicated menu entry to select SOPINEs on the clusterboard.
And then, in our new sopine-clusterboard.c
file, we can write this stub:
1 2 3 4 5 6 7 8 | # board/sunxi/sopine-clusterboard.c #ifdef CONFIG_TARGET_SOPINE_CLUSTERBOARD #ifdef CONFIG_SHOW_BOOT_PROGRESS void show_boot_progress(int progress) { printf("%i\n", progress); } #endif #endif |
But what about the LEDs? I’ll get to that very shortly. Proper setup is crucial.
U-Boot SPL and Full Builds
If we wanted to get fancy when using SPL U-Boot, we can differentiate between the SPL U-Boot build and the full U-Boot with CONFIG_SPL_BUILD
. Why? U-Boot has flags to differentiate between regular builds and SPL builds, for example:
1 2 | CONFIG_DM_GPIO=y CONFIG_SPL_DM_GPIO=n |
Whoops. Not realizing this, a code block using DM (Driver Model) intended to run in both boot stages may fail in the initial SPL stage in strange ways.
1 2 3 4 5 6 7 8 9 10 11 | #ifdef CONFIG_SHOW_BOOT_PROGRESS #ifdef CONFIG_SPL_BUILD void show_boot_progress(int progress) { printf("SPL %i\n", progress); } #else void show_boot_progress(int progress) { printf("FULL %i\n", progress); } #endif #endif |
Clusterboard LEDs and Boot Progress Indicators
We have enough knowledge to blink the status LEDs at various bootstages from a very early stage, but not time zero. My sopine-clusterboard.c
file now looks like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | #if defined(CONFIG_TARGET_SOPINE_CLUSTERBOARD) &&\ defined(CONFIG_SHOW_BOOT_PROGRESS) &&\ defined(CONFIG_DM_GPIO) &&\ ! defined(CONFIG_SPL_BUILD) #include <linux/delay.h> #include <asm/gpio.h> #include <bootstage.h> #include <dm.h> // This is just to disable LEDs if desired. #define ENABLE_BOOT_PROGRESS_LED // GPIO 1.8V LED on the clusterboard #define PL7_LED_GPIO "PL7" #define LED_OFF 1 #define LED_ON 0 static volatile bool full_init_done = false; struct gpio_desc gpio_pin; int set_clusterboard_led(int value) { if (full_init_done) { dm_gpio_set_dir_flags(&gpio_pin, GPIOD_IS_OUT); dm_gpio_set_value(&gpio_pin, value); } return 0; } void blink_led(int count, int period_ms) { for (int i = 0; i < count; i++) { set_clusterboard_led(LED_ON); mdelay(period_ms); set_clusterboard_led(LED_OFF); mdelay(period_ms); } mdelay(100); // Pause between blinks } void show_boot_progress(int progress) { printf("\nProgress: %d\n", progress); #ifndef ENABLE_BOOT_PROGRESS_LED return 0; #endif switch(progress) { case -BOOTSTAGE_ID_NET_NETLOOP_OK: // -81 // Usually TFTP error guessing at files break; // Bring eth0 up successfully case BOOTSTAGE_ID_NET_ETH_INIT: // 65 printf("Ethernet init.\n"); blink_led(1, 100); break; // Make some network request case BOOTSTAGE_ID_NET_START: // 80 blink_led(1, 100); break; case BOOTSTAGE_ID_MAIN_LOOP: // 174 printf("Preparing to autoboot.\n"); blink_led(2, 100); break; case BOOTSTAGE_KERNELREAD_STOP: // 177 printf("U-Boot handoff.\n"); blink_led(6, 100); set_clusterboard_led(LED_ON); break; default: { if (progress < 0) { printf("Error code: %d.\n", progress); blink_led(100, 40); } } } } /** * This is critical for signalling when the GPIO * system is available to use the DM helper methods. */ int board_late_init(void) { // Prep the clusterboard LED once int ret; ret = dm_gpio_lookup_name(PL7_LED_GPIO, &gpio_pin); if (ret) { printf("GPIO: '%s' not found (%d)\n", PL7_LED_GPIO, ret); return ret; } ret = dm_gpio_request(&gpio_pin, PL7_LED_GPIO); if (ret) { // -16 means "busy" printf("GPIO: requesting '%s' failed (%d)\n", PL7_LED_GPIO, ret); return ret; } printf("Clusterboard LED now available\n"); full_init_done = true; return 0; } #endif |
From now on, a single LED blink means some DHCP network request. Two blinks mean autoboot is starting. Six blinks mean kernel handoff from U-Boot has started. This last stage means, for me, that the iPXE secondary bootloader has started (which needs its own blinking logic for more LED status indications). This is a big win.
Demo
Patch the SUNXI Watchdog Timer
In mainline U-Boot as of 2021.04 there is no SUNXI support for a watchdog timer. Try enabling one in menuconfig
and you may pull your hair out. Even patching the device tree with a watchdog section isn’t enough. We will need a community patch. To prove it to yourself, search in mainline for SUNXI_WDT_TIMEOUT
. If you cannot find that string in the repo, please continue reading.
A simple fix for this problem is to enable the internal hardware watchdog timer that Pine64-based SoCs have. We can enable it in the defconfig:
1 2 3 4 5 6 7 8 9 | # Enable watchdog functionality # REF: drivers/watchdog/wdt-uclass.c # REF: include/watchdog.h CONFIG_WATCHDOG=y CONFIG_CMD_WDT=y CONFIG_WDT=y CONFIG_SYSRESET=y CONFIG_WATCHDOG_TIMEOUT_MSECS=60000 # CONFIG_HW_WATCHDOG is not set |
Device Trees from Scratch
1 2 3 4 5 | WARNING: could not set serial-number FDT_ERR_BADSTRUCTURE. ERROR: root node setup failed - must RESET the board to recover. ERROR: failed to process device tree |
Device trees are a pretty neat concept. With them, we can describe our target hardware without recompiling the Linux kernel. We can also map GPIO pins, specify how to access LEDs, and state what network chip we are using. I borrowed a book from the library about device trees, but this device trees video was far more helpful.
Key device tree concepts:
- Device tree files (DTS) can be split into several files and are amalgamated by
#include
directives. - The final DTS files are prefixed with
.dts
. Included files are prefixed with.dtsi
. - Included files are underlaid, not just inserted at the
#include
directive point. - Including files are overlaid on top of DTSI files.
- Replace just some parts with labels, not with full re-definitions.
- We can use
#define
to replace hard-coded or magic numbers. - A “cell” just means a 32-bit integer.
- Strings like
#size-cells = <1>;
are not comments.
How DTSI underlays work:
We can then easily disable HDMI features of the SoC (not present on the clusterboard) with labels without having to copy the entire hardware defintion. For example,
1 2 3 4 | // Main .dts file &hdmi0 { status = "disabled"; } |
Error “.rodata will not fit in .sram”
Having spent hours trying to trim SPL code here and there only to shave off mere hundreds of bytes, the following is all too common:
1 2 3 4 | /usr/bin/aarch64-linux-gnu-ld.bfd: u-boot-spl section `.rodata' will not fit in region `.sram' /usr/bin/aarch64-linux-gnu-ld.bfd: region `.sram' overflowed by 4896 bytes make[2]: *** [../scripts/Makefile.spl:428: spl/u-boot-spl] Error 1 make[1]: *** [/u-boot/Makefile:1930: spl/u-boot-spl] Error 2 |
Taking a trip into the 700-page Allwinner A64 user manual we find the memory section.
SRAM A1 is 32 KiB (0x10000 - 0x17FFF
), but what’s this? SRAM C starts right after SRAM A1 (0x18000 - 0x3FFFF
). From the Allwinner block diagram, both memory blocks can be accessed by the same bus.
Let’s see if we can just overflow into SRAM C.
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | // include/config/sunxi-common.h /** Modify these lines ********************************/ #if ! defined(CONFIG_TARGET_SOPINE_CLUSTERBOARD) &&\ CONFIG_SUNXI_SRAM_ADDRESS == 0x100000 #define CONFIG_SPL_MAX_SIZE 0x7fa0 /* 32 KiB */ #ifdef CONFIG_ARM64 /* end of SRAM A2 for now, as SRAM A1 is pretty tight for an ARM64 build */ #define LOW_LEVEL_SRAM_STACK 0x00054000 #else #define LOW_LEVEL_SRAM_STACK 0x00018000 #endif /* !CONFIG_ARM64 */ #elif CONFIG_SUNXI_SRAM_ADDRESS == 0x20000 #define CONFIG_SPL_MAX_SIZE 0x7fa0 /* 32 KiB */ /* end of SRAM A2 on H6 for now */ #define LOW_LEVEL_SRAM_STACK 0x00118000 /* Use the contiguous memory of SRAM C at 0x18000 */ #elif CONFIG_SUNXI_SRAM_ADDRESS == 0x10000 #define CONFIG_SPL_MAX_SIZE 0xFFA0 /* 64 KiB, just to be safe */ #ifdef CONFIG_ARM64 #define LOW_LEVEL_SRAM_STACK 0x00054000 #define CONFIG_SPL_PAD_TO 65536 /* same as CONFIG_SPL_MAX_SIZE */ #else #error "Don't know what to do" #endif /******************************************************/ #else #define CONFIG_SPL_MAX_SIZE 0x5fa0 /* 24KB on sun4i/sun7i */ #define LOW_LEVEL_SRAM_STACK 0x00008000 /* End of sram */ #endif #define CONFIG_SPL_STACK LOW_LEVEL_SRAM_STACK /** Modify these lines ********************************/ #ifndef CONFIG_SPL_PAD_TO #define CONFIG_SPL_PAD_TO 32768 /* decimal for 'dd' */ #endif /******************************************************/ |
Then we must increase the SRAM size in a tool script. I’ll use 64 KiB (0x10000).
47 48 49 50 51 | // tools/mksunxiboot.c // Increase the SRAM to match SRAM A2 // #define SUNXI_SRAM_SIZE 0x8000 /* SoC with smaller size are limited before */ #define SUNXI_SRAM_SIZE 0x10000 #define SRAM_LOAD_MAX_SIZE (SUNXI_SRAM_SIZE - sizeof(struct boot_file_head)) |
Success. Unless I’ve gotten lucky, this works repeatedly.
SPI NOR-Flash Booting
SPI NOR-flash on each computer module holds U-Boot which can chainload iPXE to boot Linux over HTTP(S). In order to flash U-Boot to NOR flash, certain commands need to be enabled by default. SOPINE uses the Winbond NOR flash chip w25q128 with erase size 4 KiB (you will need padding!) and a total of 16 MiB. Here are the settings:
1 2 3 4 5 | # REF: sopine_clusterboard_defconfig # Needed to flash the NOR Flash CONFIG_CMD_SF=y CONFIG_CMD_SPI=y CONFIG_SPI_FLASH_WINBOND=y |
Then, we’ll need a boot.scr
script. Here is the one I use with comments and LED status indication:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ############################### # U-Boot SPI NOR-flash writer # # Copyright Eric Draken, 2021 # ############################### setenv led_off 'gpio set PL7' setenv led_on 'gpio clear PL7' setenv blink_power 'run led_on; sleep 0.1; run led_off; sleep 0.1' setenv memory_failure 'echo Memory failure; while true; do run blink_power blink_power; sleep 0.5; done' setenv flash_failure 'echo Flash failure; while true; do run blink_power blink_power blink_power blink_power; sleep 0.5; done' # DRAM: 128 Mbits setenv spi_flash_size 1000000 # 128 Mbits, 16MiB setenv uboot_filename armcube_loader.bin # Use your own u-boot-with-spl.bin # kernel_addr_r=0x40080000, so... setenv cmp_addr 0x40200000 # echo Displaying env vars (for the curious) # printenv # echo echo SD contents fatls ${devtype} ${devnum}:${distro_bootpart} echo # First read existing loader run blink_power sf probe 0 echo # e.g. SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB if size ${devtype} ${devnum}:${distro_bootpart} ${uboot_filename}; then echo "Loading ${uboot_filename}..." run blink_power # REF: https://github.com/u-boot/u-boot/blob/master/doc/usage/load.rst load ${devtype} ${devnum}:${distro_bootpart} ${kernel_addr_r} ${uboot_filename} # Gives us ${filesize} echo "Fize size: 0x${filesize}" echo echo Erasing entire flash... run blink_power blink_power sf erase 0 +${filesize} || run memory_failure # The + is important for automatic padding echo echo Writing flash... run blink_power blink_power blink_power sf write ${kernel_addr_r} 0 ${filesize} || run memory_failure echo echo Verifying flash... sf read ${cmp_addr} 0 ${filesize} || run memory_failure cmp.b ${cmp_addr} ${kernel_addr_r} ${filesize} || run flash_failure echo Success! echo run led_on else run flash_failure # Infinite blinking fi exit 0 |
Summary of Changed U-Boot Files
Here is a summary of the changed or added files to U-Boot as of v2021.04. I’ve created an overlay folder so that these files can be copied into U-Boot source instead of making yet another U-Boot fork. Why? Many hours were spent trying to decipher community memebers’ hack branches and PRs into U-Boot. Instead, I’ll have these all in one place for ease of reference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | overlay ├── arch │ └── arm │ ├── cpu │ │ └── armv8 │ │ └── generic_timer.c │ ├── dts │ │ ├── Makefile │ │ └── sun50i-a64-sopine-clusterboard.dts │ └── Kconfig ├── board │ └── sunxi │ ├── Makefile │ └── sopine-clusterboard.c ├── configs │ └── sopine_clusterboard_defconfig ├── include │ └── configs │ └── sunxi-common.h └── tools └── mksunxiboot.c |
iPXE Gotchas
To boot over HTTP(S), iPXE can complement U-Boot to achieve this.
Disable Interrupts in iPXE
iPXE has a smaller code base, so instead of using overlay files, I create them via a Docekrfile. Here is a fix for the above problem which essentially NOPs the cpu_nap()
method resulting in running furious while-no-input loops, which is fine because iPXE propceeds quickly to boot the kernel.
1 2 3 4 5 6 7 8 | # Fix the problem when U-Boot is single-threaded, but iPXE uses interrupts, # and prevent redefining cpu_nap() by avoiding efiarm_nap.h from loading. echo "\ \n#undef NAP_PCBIOS\ \n#undef NAP_EFIX86\ \n#undef NAP_EFIARM\ \n#define NAP_NULL\ " >> config/local/nap.h |
Add More Menu Items
iPXE can be souped up with more usefull shell commands. iPXE is not limited in size like U-Boot is, so let’s add more commands. Again, I use a Dockerfile to achieve this. Here is pretty much the kitchen sink of commands we can add in the ARM64 implementation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Add more menu items and features RUN echo "\ \n#define DOWNLOAD_PROTO_NFS\ \n#define DOWNLOAD_PROTO_HTTPS\ \n#define DOWNLOAD_PROTO_FTP\ \n\ \n#define PING_CMD\ \n#define IPSTAT_CMD\ \n#define NSLOOKUP_CMD\ \n#define TIME_CMD\ \n#define REBOOT_CMD\ \n#define POWEROFF_CMD\ \n#define CONSOLE_CMD\ \n#define VLAN_CMD\ \n#define LOTEST_CMD\ \n#define PCI_CMD\ \n#define PROFSTAT_CMD\ \n\ \n#define IMAGE_EFI\ \n#undef NET_PROTO_IPV6\ " >> config/local/general.h |
Disable iPXE USB
As with U-Boot, I’d like to disable USB functionality in iPXE to speed up loading with a Dockerfile.
1 2 3 4 5 6 7 8 9 | # Disable USB RUN echo "\ \n#undef USB_HCD_XHCI\ \n#undef USB_HCD_EHCI\ \n#undef USB_HCD_UHCI\ \n#undef USB_KEYBOARD\ \n#undef USB_BLOCK\ \n#undef USB_EFI\ " >> config/local/usb.h |
Speed up the iPXE Watchdog Reset
Let’s shorten the watchdog timeout from five minutes to just one minute. iPXE then has one minute to hand off execution control to the kernel. This is sufficient for the use case of treating each node like a Lambda. Again, using a Dockerfile, here is the procedure.
1 2 3 4 5 | # Shorten the watchdog timeout from 5 minutes to 1 minute # interface/efi/efi_watchdog.c # #define WATCHDOG_TIMEOUT_SECS ( 5 * 60 ) RUN sed -i -E 's|^#define\s+WATCHDOG_TIMEOUT_SECS.+$|#define WATCHDOG_TIMEOUT_SECS 60|' interface/efi/efi_watchdog.c &&\ grep -q -E '^#define WATCHDOG_TIMEOUT_SECS 60' interface/efi/efi_watchdog.c |