Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Workstation 17.6.0 #281

Open
wants to merge 48 commits into
base: master
Choose a base branch
from

Conversation

philipl
Copy link

@philipl philipl commented Sep 7, 2024

This isn't really mergeable given how this repo is constructed but it's a working 17.6.0 source tree with the compatibility patches we need for 6.9.0 and 6.10.0. (and 6.11 doesn't seem to require anything additional)

philipl and others added 18 commits September 7, 2024 14:31
The definition of COMPAT_LINUX_VERSION_CHECK_LT() macro lacks surrounding
paretheses so that negated tests like !COMPAT_LINUX_VERSION_CHECK_LT(...)
expand to something completely different. This could be worked around
easily by adding parentheses to each place the macro is used in an
expression but it makes much more sense to fix the macro definition so that
the macro does not serve as a trap.
When building against kernel 4.12 and newer, macro name GDT_SIZE used in
(vmmon) include/segs.h collides with macro defined in (kernel source)
arch/x86/include/asm/segment.h, resulting in its redefinition.

To prevent potential problems, rename vmmon's GDT_SIZE to VMMON_GDT_SIZE
and GDT_LIMIT to VMMON_GDT_LIMIT. (There is no GDT_LIMIT in mainline kernel
source but let's be consistent.)
The PCI_VENDOR_ID_VMWARE macro is defined in mainline pci_ids.h since
commit 94e57fea6202 ("PCI: Move PCI_VENDOR_ID_VMWARE to pci_ids.h") in
v3.18-rc1.

The PCI_DEVICE_ID_VMWARE_VMXNET3 macro is defined in mainline pci_ids.h
since commit b1226c7db1d9 ("vmxnet3: Move PCI Id to pci_ids.h") in
v4.10-rc1.

The MSR_MISC_FEATURES_ENABLES_CPUID_FAULT macro is defined in mainline
since commit e9ea1e7f53b8 ("x86/arch_prctl: Add ARCH_[GET|SET]_CPUID") in
v4.12-rc1.

The CR3_PCID_MASK is defined in mainline asm/processor-flags.h since commit
6c690ee1039b ("x86/mm: Split read_cr3() into read_cr3_pa() and
__read_cr3()") in v4.13-rc1.
As discussed in

  https://bugzilla.suse.com/show_bug.cgi?id=1059674

the reason for multiple objtool warnings is the fact that vmmon module
defines its own Panic() function which never returns. While it is marked as
such which is used by the compiler for optimization, there is no way to
find this from object file.

While this seems innocuous, it might result in problems with unwinder
later. The quickest way around is to replace vmmon's own Panic() with
standard kernel panic() until a cleaner solution is found.
Remove .cache.mk inside vmmon-only and vmnet-only when executing
"make clean". This file can cause issues when upgrading gcc as make will
still look for includes inside older gcc includes directory.

File .cache.mk with cache of generated variables was created by build since
kernel v4.15-rc1, commit 3298b690b21c ("kbuild: Add a cache for generated
variables") until the feature was removed in v4.18-rc1, commit e08d6de4e532
("kbuild: remove kbuild cache").
After mainline commit 13c01139b171 ("x86/headers: Remove APIC headers from
<asm/smp.h>") in 5.9-rc1, APIC headers are no longer included via
<asm/smp.h> so that linux/hostif.c will use incorrect fallback definitions
of SPURIOUS_APIC_VECTOR, POSTED_INTR_VECTOR and ERROR_APIC_VECTOR even if
built against kernel where these are defined.

Include <asm/irq_vectors.h> in linux/hostif.c explicitly to avoid that.
The do_gettimeofday() helper was removed by commit e4b92b108c6c
("timekeeping: remove obsolete time accessors") in v5.0-rc1 and since
commit c766d1472c70 ("y2038: hide timeval/timespec/itimerval/itimerspec
types") in v5.6-rc3, struct timeval should no longer be used in kernel
code.

Convert the do_gettimeofday() relics in VNetBridge (which are only compiled
with LOGLEVEL >= 4) completely to ktime based interface.
SLE15-SP5 backported mainline commit adeef3e32146 ("net: constify
netdev->dev_addr") from 5.17-rc1 into their "5.14" kernel. Add an extra
hack to the version check to fix SLE15-SP5 build.
While vmmon module already uses explicit module_init() and module_exit()
for its init and cleanup function, vmnet relies on traditional magic names
init_module() and cleanup_module(). Apparently this has an unfortunate side
effect that the two functions are not identified as indirect call targets
by objdump and they get "sealed" when the module is built against and
loaded into an IBT enabled kernel.

Starting with 6.3-rc1, objtool is going to warn about this issue,
indicating that the legacy module initialization is deprecated and
module_init() and module_exit() macros should be used instead so do that
for vmnet as well.
Some cross page functions need an explicit endbr64 instruction as they are
indirect branch targets but are not recognized as such. VMware 17 uses home
cooked ENDBR macro rather than standard ASM_ENDBR defined in kernel.

Use ASM_ENDBR instead and define it as empty if not available (kernel
before 5.18-rc1) so that we do not generate useless endbr64 instructions
when building against kernel which does not support IBT or has it disabled.
Mainline commit 6308499b5e99 ("net: unexport csum_and_copy_{from,to}_user")
in 5.19-rc1 unexports csum_and_copy_to_user as no in-tree module is using
it. A clean solution would probably be rewriting the code to use iovec
iterator as csum_and_copy_to_iter() is still exported (or perhaps
skb_copy_and_csum_datagram() might be used instead). Anything like this
would be way too intrusive so it would have to wait for VMware developers.

Workstation 17.0.0 handles this with a call to csum_partial_copy_nocheck()
inside a user_access_begin()/user_access_end() block which lets the build
succeed but objtool still warns about a call to csum_partial_copy_nocheck()
with UACCESS enabled. Based on the reasoning in commit message of mainline
commit ea24213d8088 ("objtool: Add UACCESS validation"), this workaround
does indeed seem questionable.

Use the older workaround combining csum_partial() with copy_to_user() like
in workstation-16.2.4 branch instead. This will be less efficient but
hopefully the performace hit will not be noticeable.
Mainline commit c304eddcecfe ("net: wrap the wireless pointers in struct
net_device in an ifdef") in 5.19-rc1 makess ieee80211_ptr member present in
struct net_device only if CONFIG_CFG80211 is enabled. Workstation 17.0.0
checks if CONFIG_CFG80211 is enabled but only when CONFIG_WIRELESS_EXT is
not. Thus a build against a kernel with CONFIG_WIRELESS_EXT enabled and
CONFIG_CFG80211 disabled still fails. Also, the newly introduced version
check is pointless, there is no point checking dev->ieee80211_ptr with
CONFIG_CFG80211 disabled (fortunately it will be null anyway).

Rewrite the check in VNetBridgeIsDeviceWireless() to check each of the two
pointers only if it actually exists.
Two functions use "foo()" rather than proper "foo(void)" in their
definitions and as reported, clang compiler treats it as an error.

While at it, also mark VNetFreeInterfaceList() static to make its
definition match the declaration.
As a side effect of mainline commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail") in 6.5-rc1, __pte_offset_map(), called by
pte_offset_map(), is no longer exported. WMware developers decided to hack
around this by replacing pte_offset_map() by pte_offset_kernel() which does
not seem to be a good idea and apparently may trigger warn checks in RCU
code on some systems as mentioned in the discussion on issue mkubecek#223.
Therefore let's use the same solution as we had for 17.0.2 and older
versions as it does not show these problems.

Based on an upstream IRC discussion and the hva_to_pfn_*() family of
functions in KVM code, what PgtblVa2MPNLocked() does seems to be an
incomplete and partial open coded logic of get_user_pages() and as it is
only used to get PFN from a virtual address, it can be easily implemented
using get_user_pages() family.

Without knowledge what exactly are the PFNs used for in VMware, it is hard
to guess the right flags, these seem to work and have been tested by
multiple users over last few weeks.

We could likely use get_user_pages() also on older kernels and it might be
actually cleaner and more reliable as existing open coded implementation
does not seem to handle some corner cases but without knowledge of VMware
internals, it will be safer to stick to existing code where possible.
This also requires tweaking the Makefiles to allow for building for non-active
kernels
@lizi-garden
Copy link

It works fine on openSUSE Tumbleweed with kernel 6.10.7-1-default

Remember to turn off Secure Boot

@the-black-wolf
Copy link

You are a life saver. This should somehow be merged back into new workstation-17.6.0 branch

@nobody5050
Copy link

Should this be compatible with 6.11?

@philipl
Copy link
Author

philipl commented Sep 16, 2024

Yes, it seems to work fine without additional patches.

@ndobbs
Copy link

ndobbs commented Sep 29, 2024

This got me out of a real bind - thank you!

Works great on Fedora 40 with kernel 6.10.11-200.fc40.x86_64

@breno-ca
Copy link

Works like a charm on Pop!_OS 22.04 with kernel 6.9.3-76060903-generic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants