riscv: Memory Hot(Un)Plug support#93
Open
uestc-gr wants to merge 534 commits into
Open
Conversation
Based on the current openeuler_defconfig for riscv, use the following commands to generate the new openeuler_defconfig: cp arch/riscv/configs/openeuler_defconfig .config cat arch/riscv/configs/sg2042_defconfig >> .config make save_oedefconfig make update_oedefconfig Build and boot testing passed. Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
The SSWI device provides supervisor-level IPI functionality for a set of HARTs on a RISC-V platform. It provides a register to set an IPI (SETSSIP) for each HART connected to the SSWI device. The patch utilizes the feature to optimize IPI handling by avoiding Linux calls into firmware runtime, thus minimizing context switching expenses and removing the dependency on sbi-ipi. Co-developed-by: Xiaoguang Xing <xiaoguang.xing@sophgo.com> Signed-off-by: Xiaoguang Xing <xiaoguang.xing@sophgo.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
While working with the T-Head 1520 LicheePi4A SoC, certain conditions
arose that allowed me to reproduce a race issue in the sdhci code.
To reproduce the bug, you need to enable the sdio1 controller in the
device tree file
`arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows:
&sdio1 {
bus-width = <4>;
max-frequency = <100000000>;
no-sd;
no-mmc;
broken-cd;
cap-sd-highspeed;
post-power-on-delay-ms = <50>;
status = "okay";
wakeup-source;
keep-power-in-suspend;
};
When resetting the SoC using the reset button, the following messages
appear in the dmesg log:
[ 8.164898] mmc2: Got command interrupt 0x00000001 even though no
command operation was in progress.
[ 8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 8.180503] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000005
[ 8.186950] mmc2: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000
[ 8.193395] mmc2: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[ 8.199841] mmc2: sdhci: Present: 0x03da0000 | Host ctl: 0x00000000
[ 8.206287] mmc2: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
[ 8.212733] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x0000decf
[ 8.219178] mmc2: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
[ 8.225622] mmc2: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003
[ 8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 8.238513] mmc2: sdhci: Caps: 0x3f69c881 | Caps_1: 0x08008177
[ 8.244959] mmc2: sdhci: Cmd: 0x00000502 | Max curr: 0x00191919
[ 8.254115] mmc2: sdhci: Resp[0]: 0x00001009 | Resp[1]: 0x00000000
[ 8.260561] mmc2: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
[ 8.267005] mmc2: sdhci: Host ctl2: 0x00001000
[ 8.271453] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr:
0x0000000000000000
[ 8.278594] mmc2: sdhci: ============================================
I also enabled some traces to better understand the problem:
kworker/3:1-62 [003] ..... 8.163538: mmc_request_start:
mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0
stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0
sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0
can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1
retune_period=0
<idle>-0 [000] d.h2. 8.164816: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000
intmask_p=0x18000
irq/24-mmc2-96 [000] ..... 8.164840: sdhci_thread_irq:
msg=
irq/24-mmc2-96 [000] d.h2. 8.164896: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1
intmask_p=0x1
irq/24-mmc2-96 [000] ..... 8.285142: mmc_request_done:
mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0
stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0
sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0
data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0
hold_retune=1 retune_period=0
Here's what happens: the __mmc_start_request function is called with
opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO
bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is
triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE
is triggered. Depending on the exact timing, these conditions can
trigger the following race problem:
1) The sdhci_cmd_irq top half handles the command as an error. It sets
host->cmd to NULL and host->pending_reset to true.
2) The sdhci_thread_irq bottom half is scheduled next and executes faster
than the second interrupt handler for SDHCI_INT_RESPONSE. It clears
host->pending_reset before the SDHCI_INT_RESPONSE handler runs.
3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering
a code path that prints: "mmc2: Got command interrupt 0x00000001 even
though no command operation was in progress."
To solve this issue, we need to clear pending interrupts when resetting
host->pending_reset. This ensures that after sdhci_threaded_irq restores
interrupts, there are no pending stale interrupts.
The behavior observed here is non-compliant with the SDHCI standard.
Place the code in the sdhci-of-dwcmshc driver to account for a
hardware-specific quirk instead of the core SDHCI code.
Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 43658a542ebf ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241008100327.4108895-1-m.wilczynski@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Enable CONFIG_DRM_ETNAVIV=m Update CONFIG_AIC_FW_PATH="/lib/firmware/aic8800" And generate the new openeuler_defconfig. Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Due to a known bug found during the testing of video_memory, disable it until the issue is resolved. ``` [ +0.701834] BUG: Bad page state in process Media pfn:0c7d2 [ +0.005632] page:000000008ee78948 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc7d2 [ +0.009300] flags: 0xfffc00000004000(reserved|node=0|zone=0|lastcpupid=0x3fff) [ +0.007277] page_type: 0xffffffff() [ +0.003548] raw: 0fffc00000004000 0000000000000000 0000000000000122 0000000000000000 [ +0.007774] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ +0.007795] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set ... [ +0.008629] Hardware name: Sipeed Lichee Pi 4A 16G (DT) [ +0.005227] Call Trace: [ +0.002447] [<ffffffff800069ba>] dump_backtrace+0x28/0x30 [ +0.005409] [<ffffffff80a2f8aa>] show_stack+0x38/0x44 [ +0.005057] [<ffffffff80a3d652>] dump_stack_lvl+0x44/0x5c [ +0.005403] [<ffffffff80a3d682>] dump_stack+0x18/0x20 [ +0.005053] [<ffffffff802567aa>] bad_page+0x11a/0x162 [ +0.005056] [<ffffffff80256ede>] free_page_is_bad_report+0x42/0xa2 [ +0.006183] [<ffffffff802575b0>] free_unref_page_prepare+0x13e/0x24a [ +0.006363] [<ffffffff8025a11c>] free_unref_page+0x5a/0x1c0 [ +0.005574] [<ffffffff8025a48e>] __free_pages+0x106/0x10c [ +0.005402] [<ffffffff0263f502>] free_memblk_pages+0x76/0x26c [vidmem] [ +0.006627] [<ffffffff0263f9a6>] GFP_Free+0xae/0x150 [vidmem] [ +0.005832] [<ffffffff026400c2>] vidalloc_ioctl+0x4ba/0x870 [vidmem] [ +0.006437] [<ffffffff802dc506>] __riscv_sys_ioctl+0x96/0xc2 [ +0.005668] [<ffffffff80a3e18a>] do_trap_ecall_u+0x138/0x14a [ +0.005661] [<ffffffff80a49510>] ret_from_exception+0x0/0x64 [ +0.005730] Disabling lock debugging due to kernel taint ``` Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
fix: Error: unrecognized opcode cbo.clean (a0)
KUnit test error:
arch/riscv/mm/dma-noncoherent.c:54:
Error: unrecognized opcode cbo.clean (a0)', extension zicbom' required
Signed-off-by: Yafen Fang <yafen@iscas.ac.cn>
mainline inclusion commit de1ff306dcf4546d6a8863b1f956335e0d3fbb9b category: cleanup bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Now that the GIC-v3 callback can handle invocation with a fwspec parameter count of 0 lift the restriction in the core code and invoke select() unconditionally when the domain provides it. Preparatory change for per device MSI domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-3-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit ac81e94ab001c2882e89c9b61417caea64b800df category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Supporting per device MSI domains on ARM64, RISC-V and the zoo of interrupt mechanisms needs a bit more information than what the initial x86 implementation provides. Add the following fields: - required_flags: The flags which a parent domain requires to be set - bus_select_token: The bus token of the parent domain for select() - bus_select_mask: A bitmask of supported child domain bus types This allows to provide library functions which can be shared between various interrupt chip implementations and avoids replicating mostly similar code all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-4-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 6516d5a295356f8fd5827a1c0954d7ed5b2324dd category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Add a new domain bus token to prepare for device MSI which aims to replace the existing platform MSI maze. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-5-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit c88f9110bfbca5975a8dee4c9792ba12684c7bca category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Provide functions to create and remove per device MSI domains which replace the platform-MSI domains. The new model is that each of the devices which utilize platform-MSI gets now its private MSI domain which is "customized" in size and with a device specific function to write the MSI message into the device. This is the same functionality as platform-MSI but it avoids all the down sides of platform MSI, i.e. the extra ID book keeping, the special data structure in the msi descriptor. Further the domains are only created when the devices are really in use, so the burden is on the usage and not on the infrastructure. Fill in the domain template and provide two functions to init/allocate and remove a per device MSI domain. Until all users and parent domain providers are converted, the init/alloc function invokes the original platform-MSI code when the irqdomain which is associated to the device does not provide MSI parent functionality yet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-6-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 14fd06c776b5289a43c91cdc64bac3bdbc7b397e category: cleanup bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Switch all the users of the platform MSI domain over to invoke the new interfaces which branch to the original platform MSI functions when the irqdomain associated to the caller device does not yet provide MSI parent functionality. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-7-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 1a4671ff7a903e87e4e76213e200bb8bcfa942e4 category: cleanup bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 9c78c1a85c04bdfbccc5a50588e001087d942b08 category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- irq_create_fwspec_mapping() requires translation of the firmware spec to a hardware interrupt number and the trigger type information. Wired interrupts which are connected to a wire to MSI bridge, like MBIGEN are allocated that way. So far MBIGEN provides a regular irqdomain which then hooks backwards into the MSI infrastructure. That's an unholy mess and will be replaced with per device MSI domains which are regular MSI domains. Interrupts on MSI domains are not supported by irq_create_fwspec_mapping(), but for making the wire to MSI bridges sane it makes sense to provide a special allocation/free interface in the MSI infrastructure. That avoids the backdoors into the core MSI allocation code and just shares all the regular MSI infrastructure. Provide an optional translation callback in msi_domain_ops which can be utilized by these wire to MSI bridges. No other MSI domain should provide a translation callback. The default translation callback of the MSI irqdomains will warn when it is invoked on a non-prepared MSI domain. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-8-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 5aa3c0cf5bba6437c9e63a56f684f61de8b503d6 category: bugfix bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Users of the IRQCHIP_PLATFORM_DRIVER_{BEGIN,END} helpers rely on a fwspec containing only the fwnode (and crucially a number of parameters set to 0) together with a DOMAIN_BUS_ANY token to check whether a parent irqchip has probed and registered a domain. Since de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from select()"), ops->select() is called unconditionally, meaning that irqchips implementing select() now need to handle ANY as a match. Instead of adding more esoteric checks to the individual drivers, add that condition to irq_find_matching_fwspec(), and let it handle the corner case, as per the comment in the function. This restores the functionality of the above helpers. Fixes: de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from select()") Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reported-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Tested-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://lore.kernel.org/r/20240220114731.1898534-1-maz@kernel.org Link: https://lore.kernel.org/r/20240219-gic-fix-child-domain-v1-1-09f8fd2d9a8f@linaro.org Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 3095cc0d5b2c246ddfcb18f54ed5557640224b6a category: cleanup bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- In preparation for providing a special allocation function for wired interrupts which are connected to a wire to MSI bridge, split the inner workings of msi_domain_alloc_irq_at() out into a helper function so the code can be shared. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-9-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 2d566a498d6483ba986dadc496f64a20b032608f category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Provide a domain bus token for the upcoming support for wire to MSI device domains so the domain can be distinguished from regular device MSI domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-10-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 9d1c58c8004653b37721dd7b16f4360216778c94 category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- To support wire to MSI domains via the MSI infrastructure it is required to use the firmware node of the device which implements this for creating the MSI domain. Otherwise the existing firmware match mechanisms to find the correct irqdomain for a wired interrupt which is connected to a wire to MSI bridge would fail. This cannot be used for the general case because not all devices provide firmware nodes and all regular per device MSI domains are directly associated to the device and have not be searched for. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-11-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 0ee1578b00bcf5ef8e7955f0c6f02a624443eb29 category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- To support wire to MSI bridges proper in the MSI core infrastructure it is required to have separate allocation/free interfaces which can be invoked from the regular irqdomain allocaton/free functions. The mechanism for allocation is: - Allocate the next free MSI descriptor index in the domain - Store the hardware interrupt number and the trigger type which was extracted by the irqdomain core from the firmware spec in the MSI descriptor device cookie so it can be retrieved by the underlying interrupt domain and interrupt chip - Use the regular MSI allocation mechanism for the newly allocated index which returns a fully initialized Linux interrupt on succes This works because: - the domains have a fixed size - each hardware interrupt is only allocated once - the underlying domain does not care about the MSI index it only cares about the hardware interrupt number and the trigger type The free function looks up the MSI index in the MSI descriptor of the provided Linux interrupt number and uses the regular index based free functions of the MSI core. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-12-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit e49312fe09df36cc4eae0cd6e1b08b563a91e1bc category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Reroute interrupt allocation in irq_create_fwspec_mapping() if the domain is a MSI device domain. This is required to convert the support for wire to MSI bridges to per device MSI domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-13-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 9bbe13a5d414a7f8208dba64b54d2b6e4f7086bd category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Some platform-MSI implementations require that power management is redirected to the underlying interrupt chip device. To make this work with per device MSI domains provide a new feature flag and let the core code handle the setup of dev->pm_dev when set during device MSI domain creation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240127161753.114685-14-apatel@ventanamicro.com Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 9dbaf381008dfa2fad6225633004f7adb1bac252 category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Extend the ISA string parsing to detect the Smstateen extension. If the extension is enabled then access to certain 'state' such as AIA CSRs in VS mode is controlled by *stateen0 registers. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit a4f5f39849f39f62f5d4e88cbb600f95f927003d category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Add an entry for the Smstateen extension to the riscv,isa-extensions property. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 55d2a0bd5eadaade850efa9d3a7ffbb0aeb67198 category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Recently, we found that cross-die access to pagetable pages on ARM64 machines can cause performance fluctuations in our business. Currently, there are no PMU events available to track this situation on our ARM64 machines, so accurate pagetable accounting can help to analyze this issue, but now the PUD level pagetable accounting is missed. So introduce pagetable_pud_ctor/dtor() to help to get accurate PUD pagetable accounting, as well as converting the architectures which use generic PUD pagetable allocation to add corresponding PUD pagetable accounting. Moreover this patch will mark the PUD level pagetable with PG_table flag, which will help to do sanity validation in unpoison_memory(). On my testing machine, I can see more pagetables statistics after the patch with page-types tool: Before patch: flags page-count MB symbolic-flags long-symbolic-flags 0x0000000004000000 27326 106 __________________________g_________________ pgtable After patch: 0x0000000004000000 27541 107 __________________________g_________________ pgtable Link: https://lkml.kernel.org/r/876c71c03a7e69c17722a690e3225a4f7b172fb2.1695017383.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 8246601a7d391ce8207408149d65732f28af81a1 category: bugfix bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is a must for safe, imagine if an implementation caches the non-leaf translation in TLB, although I didn't meet this HW so far, but it's possible in theory. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Fixes: c5e9b2c2ae82 ("riscv: Improve tlb_flush()") Link: https://lore.kernel.org/r/20231219175046.2496-2-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 40d1bb92a49313b3e0dc5513fdd2578362c40312 category: cleanup bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- This is to prepare for enabling MMU_GATHER_RCU_TABLE_FREE. No functionality changes. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231219175046.2496-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 69be3fb111e73bd025ce6d2322371da5aa497c70 category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- In order to implement fast gup we need to ensure that the page table walker is protected from page table pages being freed from under it. riscv situation is more complicated than other architectures: some riscv platforms may use IPI to perform TLB shootdown, for example, those platforms which support AIA, usually the riscv_ipi_for_rfence is true on these platforms; some riscv platforms may rely on the SBI to perform TLB shootdown, usually the riscv_ipi_for_rfence is false on these platforms. To keep software pagetable walkers safe in this case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h for more details. This patch enables MMU_GATHER_RCU_TABLE_FREE, then use *tlb_remove_page_ptdesc() for those platforms which use IPI to perform TLB shootdown; *tlb_remove_ptdesc() for those platforms which use SBI to perform TLB shootdown; Both case mean that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231219175046.2496-4-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion commit 3f910b7a522e064d7261f31a00d9c9dca31d902a category: feature bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1 -------------------------------- Activate the fast gup for riscv mmu platforms. Here are some GUP_FAST_BENCHMARK performance numbers: Before the patch: GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us After the patch: GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us The get time is reduced by 66.7%! IOW, 3x get speed! Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231219175046.2496-5-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
|
check patch done. log: https://jenkins.oerv.ac.cn/job/rvck-pipeline/job/check-patch/76/consoleFull |
|
Kernel build success! |
|
Lava check done! result url: https://lava.oerv.ac.cn/scheduler/job/342 |
|
kunit test done. log:https://jenkins.oerv.ac.cn/job/rvck-pipeline/job/kunit-test/84/consoleFull |
xmzzz
pushed a commit
that referenced
this pull request
May 11, 2026
stable inclusion from stable-v6.6.120 commit 857e7a2d5a94c9d97da52137a069a62eae42d4a5 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8839 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=857e7a2d5a94c9d97da52137a069a62eae42d4a5 -------------------------------- [ Upstream commit 24e17a29cf7537f0947f26a50f85319abd723c6c ] The xfstests' test-case generic/073 leaves HFS+ volume in corrupted state: sudo ./check generic/073 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/073 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent (see XFSTESTS-2/xfstests-dev/results//generic/073.full for details) Ran: generic/073 Failures: generic/073 Failed 1 of 1 tests sudo fsck.hfsplus -d /dev/loop51 ** /dev/loop51 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. Invalid directory item count (It should be 1 instead of 0) ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000 CBTStat = 0x0000 CatStat = 0x00004000 ** Repairing volume. ** Rechecking volume. ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. ** The volume untitled was repaired successfully. The test is doing these steps on final phase: mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo So, we move file bar from testdir_1 into testdir_2 folder. It means that HFS+ logic decrements the number of entries in testdir_1 and increments number of entries in testdir_2. Finally, we do fsync only for testdir_1 and foo but not for testdir_2. As a result, this is the reason why fsck.hfsplus detects the volume corruption afterwards. This patch fixes the issue by means of adding the hfsplus_cat_write_inode() call for old_dir and new_dir in hfsplus_rename() after the successful ending of hfsplus_rename_cat(). This method makes modification of in-core inode objects for old_dir and new_dir but it doesn't save these modifications in Catalog File's entries. It was expected that hfsplus_write_inode() will save these modifications afterwards. However, because generic/073 does fsync only for testdir_1 and foo then testdir_2 modification hasn't beed saved into Catalog File's entry and it was flushed without this modification. And it was detected by fsck.hfsplus. Now, hfsplus_rename() stores in Catalog File all modified entries and correct state of Catalog File will be flushed during hfsplus_file_fsync() call. Finally, it makes fsck.hfsplus happy. sudo ./check generic/073 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc3+ #93 SMP PREEMPT_DYNAMIC Wed Nov 12 14:37:49 PST 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/073 32s ... 32s Ran: generic/073 Passed all 1 tests Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20251112232522.814038-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 857e7a2d5a94c9d97da52137a069a62eae42d4a5) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
按照如下方法验证
1、qemu合入以下补丁
https://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/
2、内核打开CONFIG_MEMORY_HOTPLUG和CONFIG_MEMORY_HOTREMOVE和CONFIG_VIRTIO_MEM配置
3、启动虚拟机,配置qemu monitor和virtio-mem设备
qemu-system-riscv64
-nographic -machine virt
-smp 8
-M virt -cpu rv64
-m 16G,slots=3,maxmem=32G
-object memory-backend-ram,id=mem0,size=16G
-blockdev node-name=pflash0,driver=file,read-only=on,filename=RISCV_VIRT_CODE.fd
-blockdev node-name=pflash1,driver=file,filename=RISCV_VIRT_VARS.fd
-kernel ./rvck-olk/cloud-kernel/arch/riscv/boot/Image
-monitor unix:/tmp/qemu-monitor.sock,server,nowait
-object memory-backend-ram,id=vmem0,size=2G
-device virtio-mem-pci,id=vm0,memdev=vmem0,node=0
.......
4、在qemu monitor 中执行下述命令,完成内存的热插入和热拔出
qom-set vm0 requested-size XX
5、在riscv虚拟机中执行下述命令,将新增内存online
echo 1 > /sys/devices/system/memory/memoryX/online