Skip to content

riscv: Memory Hot(Un)Plug support#93

Open
uestc-gr wants to merge 534 commits into
RVCK-Project:OLK-6.6from
uestc-gr:memhotplug
Open

riscv: Memory Hot(Un)Plug support#93
uestc-gr wants to merge 534 commits into
RVCK-Project:OLK-6.6from
uestc-gr:memhotplug

Conversation

@uestc-gr
Copy link
Copy Markdown
Contributor

@uestc-gr uestc-gr commented Jul 8, 2025

按照如下方法验证
1、qemu合入以下补丁
https://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/

2、内核打开CONFIG_MEMORY_HOTPLUG和CONFIG_MEMORY_HOTREMOVE和CONFIG_VIRTIO_MEM配置

3、启动虚拟机,配置qemu monitor和virtio-mem设备
qemu-system-riscv64
-nographic -machine virt
-smp 8
-M virt -cpu rv64
-m 16G,slots=3,maxmem=32G
-object memory-backend-ram,id=mem0,size=16G
-blockdev node-name=pflash0,driver=file,read-only=on,filename=RISCV_VIRT_CODE.fd
-blockdev node-name=pflash1,driver=file,filename=RISCV_VIRT_VARS.fd
-kernel ./rvck-olk/cloud-kernel/arch/riscv/boot/Image
-monitor unix:/tmp/qemu-monitor.sock,server,nowait
-object memory-backend-ram,id=vmem0,size=2G
-device virtio-mem-pci,id=vm0,memdev=vmem0,node=0
.......

4、在qemu monitor 中执行下述命令,完成内存的热插入和热拔出
qom-set vm0 requested-size XX

5、在riscv虚拟机中执行下述命令,将新增内存online
echo 1 > /sys/devices/system/memory/memoryX/online

xmzzz and others added 30 commits June 5, 2025 17:51
Based on the current openeuler_defconfig for riscv, use the following
commands to generate the new openeuler_defconfig:

cp arch/riscv/configs/openeuler_defconfig .config
cat arch/riscv/configs/sg2042_defconfig >> .config
make save_oedefconfig
make update_oedefconfig

Build and boot testing passed.

Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
The SSWI device provides supervisor-level IPI functionality for a
set of HARTs on a RISC-V platform. It provides a register to set
an IPI (SETSSIP) for each HART connected to the SSWI device.

The patch utilizes the feature to optimize IPI handling by avoiding
Linux calls into firmware runtime, thus minimizing context switching
expenses and removing the dependency on sbi-ipi.

Co-developed-by: Xiaoguang Xing <xiaoguang.xing@sophgo.com>
Signed-off-by: Xiaoguang Xing <xiaoguang.xing@sophgo.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
While working with the T-Head 1520 LicheePi4A SoC, certain conditions
arose that allowed me to reproduce a race issue in the sdhci code.

To reproduce the bug, you need to enable the sdio1 controller in the
device tree file
`arch/riscv/boot/dts/thead/th1520-lichee-module-4a.dtsi` as follows:

&sdio1 {
	bus-width = <4>;
	max-frequency = <100000000>;
	no-sd;
	no-mmc;
	broken-cd;
	cap-sd-highspeed;
	post-power-on-delay-ms = <50>;
	status = "okay";
	wakeup-source;
	keep-power-in-suspend;
};

When resetting the SoC using the reset button, the following messages
appear in the dmesg log:

[    8.164898] mmc2: Got command interrupt 0x00000001 even though no
command operation was in progress.
[    8.174054] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[    8.180503] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00000005
[    8.186950] mmc2: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[    8.193395] mmc2: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
[    8.199841] mmc2: sdhci: Present:   0x03da0000 | Host ctl: 0x00000000
[    8.206287] mmc2: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
[    8.212733] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x0000decf
[    8.219178] mmc2: sdhci: Timeout:   0x00000000 | Int stat: 0x00000000
[    8.225622] mmc2: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
[    8.232068] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[    8.238513] mmc2: sdhci: Caps:      0x3f69c881 | Caps_1:   0x08008177
[    8.244959] mmc2: sdhci: Cmd:       0x00000502 | Max curr: 0x00191919
[    8.254115] mmc2: sdhci: Resp[0]:   0x00001009 | Resp[1]:  0x00000000
[    8.260561] mmc2: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
[    8.267005] mmc2: sdhci: Host ctl2: 0x00001000
[    8.271453] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr:
0x0000000000000000
[    8.278594] mmc2: sdhci: ============================================

I also enabled some traces to better understand the problem:

     kworker/3:1-62      [003] .....     8.163538: mmc_request_start:
mmc2: start struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_arg=0x0 cmd_flags=0x2e1 cmd_retries=0 stop_opcode=0 stop_arg=0x0
stop_flags=0x0 stop_retries=0 sbc_opcode=0 sbc_arg=0x0 sbc_flags=0x0
sbc_retires=0 blocks=0 block_size=0 blk_addr=0 data_flags=0x0 tag=0
can_retune=0 doing_retune=0 retune_now=0 need_retune=0 hold_retune=1
retune_period=0
          <idle>-0       [000] d.h2.     8.164816: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x10000
intmask_p=0x18000
     irq/24-mmc2-96      [000] .....     8.164840: sdhci_thread_irq:
msg=
     irq/24-mmc2-96      [000] d.h2.     8.164896: sdhci_cmd_irq:
hw_name=ffe70a0000.mmc quirks=0x2008008 quirks2=0x8 intmask=0x1
intmask_p=0x1
     irq/24-mmc2-96      [000] .....     8.285142: mmc_request_done:
mmc2: end struct mmc_request[000000000d30cc0c]: cmd_opcode=5
cmd_err=-110 cmd_resp=0x0 0x0 0x0 0x0 cmd_retries=0 stop_opcode=0
stop_err=0 stop_resp=0x0 0x0 0x0 0x0 stop_retries=0 sbc_opcode=0
sbc_err=0 sbc_resp=0x0 0x0 0x0 0x0 sbc_retries=0 bytes_xfered=0
data_err=0 tag=0 can_retune=0 doing_retune=0 retune_now=0 need_retune=0
hold_retune=1 retune_period=0

Here's what happens: the __mmc_start_request function is called with
opcode 5. Since the power to the Wi-Fi card, which resides on this SDIO
bus, is initially off after the reset, an interrupt SDHCI_INT_TIMEOUT is
triggered. Immediately after that, a second interrupt SDHCI_INT_RESPONSE
is triggered. Depending on the exact timing, these conditions can
trigger the following race problem:

1) The sdhci_cmd_irq top half handles the command as an error. It sets
   host->cmd to NULL and host->pending_reset to true.
2) The sdhci_thread_irq bottom half is scheduled next and executes faster
   than the second interrupt handler for SDHCI_INT_RESPONSE. It clears
   host->pending_reset before the SDHCI_INT_RESPONSE handler runs.
3) The pending interrupt SDHCI_INT_RESPONSE handler gets called, triggering
   a code path that prints: "mmc2: Got command interrupt 0x00000001 even
   though no command operation was in progress."

To solve this issue, we need to clear pending interrupts when resetting
host->pending_reset. This ensures that after sdhci_threaded_irq restores
interrupts, there are no pending stale interrupts.

The behavior observed here is non-compliant with the SDHCI standard.
Place the code in the sdhci-of-dwcmshc driver to account for a
hardware-specific quirk instead of the core SDHCI code.

Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 43658a542ebf ("mmc: sdhci-of-dwcmshc: Add support for T-Head TH1520")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241008100327.4108895-1-m.wilczynski@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Enable CONFIG_DRM_ETNAVIV=m
Update CONFIG_AIC_FW_PATH="/lib/firmware/aic8800"
And generate the new openeuler_defconfig.

Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Due to a known bug found during the testing of video_memory, disable it
until the issue is resolved.

```
[  +0.701834] BUG: Bad page state in process Media  pfn:0c7d2
[  +0.005632] page:000000008ee78948 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xc7d2
[  +0.009300] flags: 0xfffc00000004000(reserved|node=0|zone=0|lastcpupid=0x3fff)
[  +0.007277] page_type: 0xffffffff()
[  +0.003548] raw: 0fffc00000004000 0000000000000000 0000000000000122 0000000000000000
[  +0.007774] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[  +0.007795] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
...
[  +0.008629] Hardware name: Sipeed Lichee Pi 4A 16G (DT)
[  +0.005227] Call Trace:
[  +0.002447] [<ffffffff800069ba>] dump_backtrace+0x28/0x30
[  +0.005409] [<ffffffff80a2f8aa>] show_stack+0x38/0x44
[  +0.005057] [<ffffffff80a3d652>] dump_stack_lvl+0x44/0x5c
[  +0.005403] [<ffffffff80a3d682>] dump_stack+0x18/0x20
[  +0.005053] [<ffffffff802567aa>] bad_page+0x11a/0x162
[  +0.005056] [<ffffffff80256ede>] free_page_is_bad_report+0x42/0xa2
[  +0.006183] [<ffffffff802575b0>] free_unref_page_prepare+0x13e/0x24a
[  +0.006363] [<ffffffff8025a11c>] free_unref_page+0x5a/0x1c0
[  +0.005574] [<ffffffff8025a48e>] __free_pages+0x106/0x10c
[  +0.005402] [<ffffffff0263f502>] free_memblk_pages+0x76/0x26c [vidmem]
[  +0.006627] [<ffffffff0263f9a6>] GFP_Free+0xae/0x150 [vidmem]
[  +0.005832] [<ffffffff026400c2>] vidalloc_ioctl+0x4ba/0x870 [vidmem]
[  +0.006437] [<ffffffff802dc506>] __riscv_sys_ioctl+0x96/0xc2
[  +0.005668] [<ffffffff80a3e18a>] do_trap_ecall_u+0x138/0x14a
[  +0.005661] [<ffffffff80a49510>] ret_from_exception+0x0/0x64
[  +0.005730] Disabling lock debugging due to kernel taint
```

Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
fix: Error: unrecognized opcode cbo.clean (a0)
KUnit test error:
  arch/riscv/mm/dma-noncoherent.c:54:
    Error: unrecognized opcode cbo.clean (a0)', extension zicbom' required

Signed-off-by: Yafen Fang <yafen@iscas.ac.cn>
mainline inclusion
commit de1ff306dcf4546d6a8863b1f956335e0d3fbb9b
category: cleanup
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Now that the GIC-v3 callback can handle invocation with a fwspec parameter
count of 0 lift the restriction in the core code and invoke select()
unconditionally when the domain provides it.

Preparatory change for per device MSI domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-3-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit ac81e94ab001c2882e89c9b61417caea64b800df
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Supporting per device MSI domains on ARM64, RISC-V and the zoo of
interrupt mechanisms needs a bit more information than what the
initial x86 implementation provides.

Add the following fields:

  - required_flags: 	The flags which a parent domain requires to be set
  - bus_select_token:	The bus token of the parent domain for select()
  - bus_select_mask:	A bitmask of supported child domain bus types

This allows to provide library functions which can be shared between
various interrupt chip implementations and avoids replicating mostly
similar code all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-4-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 6516d5a295356f8fd5827a1c0954d7ed5b2324dd
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Add a new domain bus token to prepare for device MSI which aims to replace
the existing platform MSI maze.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-5-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit c88f9110bfbca5975a8dee4c9792ba12684c7bca
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Provide functions to create and remove per device MSI domains which replace
the platform-MSI domains. The new model is that each of the devices which
utilize platform-MSI gets now its private MSI domain which is "customized"
in size and with a device specific function to write the MSI message into
the device.

This is the same functionality as platform-MSI but it avoids all the down
sides of platform MSI, i.e. the extra ID book keeping, the special data
structure in the msi descriptor. Further the domains are only created when
the devices are really in use, so the burden is on the usage and not on the
infrastructure.

Fill in the domain template and provide two functions to init/allocate and
remove a per device MSI domain.

Until all users and parent domain providers are converted, the init/alloc
function invokes the original platform-MSI code when the irqdomain which is
associated to the device does not provide MSI parent functionality yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-6-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 14fd06c776b5289a43c91cdc64bac3bdbc7b397e
category: cleanup
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Switch all the users of the platform MSI domain over to invoke the new
interfaces which branch to the original platform MSI functions when the
irqdomain associated to the caller device does not yet provide MSI parent
functionality.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-7-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 1a4671ff7a903e87e4e76213e200bb8bcfa942e4
category: cleanup
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 9c78c1a85c04bdfbccc5a50588e001087d942b08
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

irq_create_fwspec_mapping() requires translation of the firmware spec to a
hardware interrupt number and the trigger type information.

Wired interrupts which are connected to a wire to MSI bridge, like MBIGEN
are allocated that way. So far MBIGEN provides a regular irqdomain which
then hooks backwards into the MSI infrastructure. That's an unholy mess and
will be replaced with per device MSI domains which are regular MSI domains.

Interrupts on MSI domains are not supported by irq_create_fwspec_mapping(),
but for making the wire to MSI bridges sane it makes sense to provide a
special allocation/free interface in the MSI infrastructure. That avoids
the backdoors into the core MSI allocation code and just shares all the
regular MSI infrastructure.

Provide an optional translation callback in msi_domain_ops which can be
utilized by these wire to MSI bridges. No other MSI domain should provide a
translation callback. The default translation callback of the MSI
irqdomains will warn when it is invoked on a non-prepared MSI domain.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-8-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 5aa3c0cf5bba6437c9e63a56f684f61de8b503d6
category: bugfix
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Users of the IRQCHIP_PLATFORM_DRIVER_{BEGIN,END} helpers rely on a fwspec
containing only the fwnode (and crucially a number of parameters set to 0)
together with a DOMAIN_BUS_ANY token to check whether a parent irqchip has
probed and registered a domain.

Since de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction
from select()"), ops->select() is called unconditionally, meaning that
irqchips implementing select() now need to handle ANY as a match.

Instead of adding more esoteric checks to the individual drivers, add that
condition to irq_find_matching_fwspec(), and let it handle the corner case,
as per the comment in the function.

This restores the functionality of the above helpers.

Fixes: de1ff306dcf4 ("genirq/irqdomain: Remove the param count restriction from select()")
Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reported-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Link: https://lore.kernel.org/r/20240220114731.1898534-1-maz@kernel.org
Link: https://lore.kernel.org/r/20240219-gic-fix-child-domain-v1-1-09f8fd2d9a8f@linaro.org
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 3095cc0d5b2c246ddfcb18f54ed5557640224b6a
category: cleanup
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

In preparation for providing a special allocation function for wired
interrupts which are connected to a wire to MSI bridge, split the inner
workings of msi_domain_alloc_irq_at() out into a helper function so the
code can be shared.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-9-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 2d566a498d6483ba986dadc496f64a20b032608f
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Provide a domain bus token for the upcoming support for wire to MSI device
domains so the domain can be distinguished from regular device MSI domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-10-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 9d1c58c8004653b37721dd7b16f4360216778c94
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

To support wire to MSI domains via the MSI infrastructure it is required to
use the firmware node of the device which implements this for creating the
MSI domain. Otherwise the existing firmware match mechanisms to find the
correct irqdomain for a wired interrupt which is connected to a wire to MSI
bridge would fail.

This cannot be used for the general case because not all devices provide
firmware nodes and all regular per device MSI domains are directly
associated to the device and have not be searched for.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-11-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 0ee1578b00bcf5ef8e7955f0c6f02a624443eb29
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

To support wire to MSI bridges proper in the MSI core infrastructure it is
required to have separate allocation/free interfaces which can be invoked
from the regular irqdomain allocaton/free functions.

The mechanism for allocation is:
  - Allocate the next free MSI descriptor index in the domain
  - Store the hardware interrupt number and the trigger type
    which was extracted by the irqdomain core from the firmware spec
    in the MSI descriptor device cookie so it can be retrieved by
    the underlying interrupt domain and interrupt chip
  - Use the regular MSI allocation mechanism for the newly allocated
    index which returns a fully initialized Linux interrupt on succes

This works because:
  - the domains have a fixed size
  - each hardware interrupt is only allocated once
  - the underlying domain does not care about the MSI index it only cares
    about the hardware interrupt number and the trigger type

The free function looks up the MSI index in the MSI descriptor of the
provided Linux interrupt number and uses the regular index based free
functions of the MSI core.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-12-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit e49312fe09df36cc4eae0cd6e1b08b563a91e1bc
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Reroute interrupt allocation in irq_create_fwspec_mapping() if the domain
is a MSI device domain. This is required to convert the support for wire
to MSI bridges to per device MSI domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-13-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 9bbe13a5d414a7f8208dba64b54d2b6e4f7086bd
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Some platform-MSI implementations require that power management is
redirected to the underlying interrupt chip device. To make this work
with per device MSI domains provide a new feature flag and let the
core code handle the setup of dev->pm_dev when set during device MSI
domain creation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240127161753.114685-14-apatel@ventanamicro.com
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 9dbaf381008dfa2fad6225633004f7adb1bac252
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Extend the ISA string parsing to detect the Smstateen extension. If the
extension is enabled then access to certain 'state' such as AIA CSRs in
VS mode is controlled by *stateen0 registers.

Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit a4f5f39849f39f62f5d4e88cbb600f95f927003d
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Add an entry for the Smstateen extension to the riscv,isa-extensions
property.

Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 55d2a0bd5eadaade850efa9d3a7ffbb0aeb67198
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Recently, we found that cross-die access to pagetable pages on ARM64
machines can cause performance fluctuations in our business.  Currently,
there are no PMU events available to track this situation on our ARM64
machines, so accurate pagetable accounting can help to analyze this issue,
but now the PUD level pagetable accounting is missed.

So introduce pagetable_pud_ctor/dtor() to help to get accurate PUD
pagetable accounting, as well as converting the architectures which use
generic PUD pagetable allocation to add corresponding PUD pagetable
accounting.  Moreover this patch will mark the PUD level pagetable with
PG_table flag, which will help to do sanity validation in
unpoison_memory().

On my testing machine, I can see more pagetables statistics after the patch
with page-types tool:

Before patch:
        flags           page-count      MB  symbolic-flags                     long-symbolic-flags
0x0000000004000000           27326      106  __________________________g_________________       pgtable
After patch:
0x0000000004000000           27541      107  __________________________g_________________       pgtable

Link: https://lkml.kernel.org/r/876c71c03a7e69c17722a690e3225a4f7b172fb2.1695017383.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 8246601a7d391ce8207408149d65732f28af81a1
category: bugfix
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

If non-leaf PTEs I.E pmd, pud or p4d is modified, a sfence.vma is
a must for safe, imagine if an implementation caches the non-leaf
translation in TLB, although I didn't meet this HW so far, but it's
possible in theory.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: c5e9b2c2ae82 ("riscv: Improve tlb_flush()")
Link: https://lore.kernel.org/r/20231219175046.2496-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 40d1bb92a49313b3e0dc5513fdd2578362c40312
category: cleanup
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

This is to prepare for enabling MMU_GATHER_RCU_TABLE_FREE.
No functionality changes.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231219175046.2496-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 69be3fb111e73bd025ce6d2322371da5aa497c70
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

In order to implement fast gup we need to ensure that the page
table walker is protected from page table pages being freed from
under it.

riscv situation is more complicated than other architectures: some
riscv platforms may use IPI to perform TLB shootdown, for example,
those platforms which support AIA, usually the riscv_ipi_for_rfence is
true on these platforms; some riscv platforms may rely on the SBI to
perform TLB shootdown, usually the riscv_ipi_for_rfence is false on
these platforms. To keep software pagetable walkers safe in this case
we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in
include/asm-generic/tlb.h for more details.

This patch enables MMU_GATHER_RCU_TABLE_FREE, then use

*tlb_remove_page_ptdesc() for those platforms which use IPI to perform
TLB shootdown;

*tlb_remove_ptdesc() for those platforms which use SBI to perform TLB
shootdown;

Both case mean that disabling interrupts will block the free and
protect the fast gup page walker.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231219175046.2496-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
mainline inclusion
commit 3f910b7a522e064d7261f31a00d9c9dca31d902a
category: feature
bugzilla: https://github.com/RVCK-Project/rvck-olk/issues/1

--------------------------------

Activate the fast gup for riscv mmu platforms. Here are some
GUP_FAST_BENCHMARK performance numbers:

Before the patch:
GUP_FAST_BENCHMARK: Time: get:53203 put:5085 us

After the patch:
GUP_FAST_BENCHMARK: Time: get:17711 put:5060 us

The get time is reduced by 66.7%! IOW, 3x get speed!

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231219175046.2496-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Hangfan Li <lihangfan@iscas.ac.cn>
@oervci
Copy link
Copy Markdown

oervci commented Jul 8, 2025

@oervci
Copy link
Copy Markdown

oervci commented Jul 8, 2025

Kernel build success!

@oervci
Copy link
Copy Markdown

oervci commented Jul 8, 2025

Lava check done! result url: https://lava.oerv.ac.cn/scheduler/job/342

@oervci
Copy link
Copy Markdown

oervci commented Jul 11, 2025

xmzzz pushed a commit that referenced this pull request May 11, 2026
stable inclusion
from stable-v6.6.120
commit 857e7a2d5a94c9d97da52137a069a62eae42d4a5
category: bugfix
bugzilla: https://atomgit.com/openeuler/kernel/issues/8839

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=857e7a2d5a94c9d97da52137a069a62eae42d4a5

--------------------------------

[ Upstream commit 24e17a29cf7537f0947f26a50f85319abd723c6c ]

The xfstests' test-case generic/073 leaves HFS+ volume
in corrupted state:

sudo ./check generic/073
FSTYP -- hfsplus
PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.17.0-rc1+ #4 SMP PREEMPT_DYNAMIC Wed Oct 1 15:02:44 PDT 2025
MKFS_OPTIONS -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/073 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent
(see XFSTESTS-2/xfstests-dev/results//generic/073.full for details)

Ran: generic/073
Failures: generic/073
Failed 1 of 1 tests

sudo fsck.hfsplus -d /dev/loop51
** /dev/loop51
Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
Executing fsck_hfs (version 540.1-Linux).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
Invalid directory item count
(It should be 1 instead of 0)
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00004000
** Repairing volume.
** Rechecking volume.
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled was repaired successfully.

The test is doing these steps on final phase:

mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

So, we move file bar from testdir_1 into testdir_2 folder. It means that HFS+
logic decrements the number of entries in testdir_1 and increments number of
entries in testdir_2. Finally, we do fsync only for testdir_1 and foo but not
for testdir_2. As a result, this is the reason why fsck.hfsplus detects the
volume corruption afterwards.

This patch fixes the issue by means of adding the
hfsplus_cat_write_inode() call for old_dir and new_dir in
hfsplus_rename() after the successful ending of
hfsplus_rename_cat(). This method makes modification of in-core
inode objects for old_dir and new_dir but it doesn't save these
modifications in Catalog File's entries. It was expected that
hfsplus_write_inode() will save these modifications afterwards.
However, because generic/073 does fsync only for testdir_1 and foo
then testdir_2 modification hasn't beed saved into Catalog File's
entry and it was flushed without this modification. And it was
detected by fsck.hfsplus. Now, hfsplus_rename() stores in Catalog
File all modified entries and correct state of Catalog File will
be flushed during hfsplus_file_fsync() call. Finally, it makes
fsck.hfsplus happy.

sudo ./check generic/073
FSTYP         -- hfsplus
PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc3+ #93 SMP PREEMPT_DYNAMIC Wed Nov 12 14:37:49 PST 2025
MKFS_OPTIONS  -- /dev/loop51
MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch

generic/073 32s ...  32s
Ran: generic/073
Passed all 1 tests

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20251112232522.814038-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 857e7a2d5a94c9d97da52137a069a62eae42d4a5)
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.