Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL no internet connection / DNS issues #11693

Open
1 of 2 tasks
cyberjj999 opened this issue Jun 14, 2024 · 73 comments
Open
1 of 2 tasks

WSL no internet connection / DNS issues #11693

cyberjj999 opened this issue Jun 14, 2024 · 73 comments
Labels

Comments

@cyberjj999
Copy link

cyberjj999 commented Jun 14, 2024

Windows Version

Microsoft Windows [Version 10.0.22621.3737]

WSL Version

2.2.4.0

Are you using WSL 1 or WSL 2?

  • WSL 2
  • WSL 1

Kernel Version

5.15.153.1-microsoft-standard-WSL2

Distro Version

No response

Other Software

No response

Repro Steps

  • ping google.com returns ping: google.com: Temporary failure in name resolution
  • pip install streamlit or other packages return network error message; same for sudo-apt install

This indicate a clear network problem.

Expected Behavior

No issues with network problem - ping works and pip install <package> should work.

Actual Behavior

Clear Network/Internet Connection Problem:

  • ping google.com returns ping: google.com: Temporary failure in name resolution
  • pip install streamlit or other packages return network error message; same for sudo-apt install

My host machine (Windows 11) has no internet issues at all.

What I've Tried

  1. Disable windows firewall entirely shutdown WSL and ping google.com again: doesn't work

  2. Ran the following command to flush my dns on windows:

netsh winsock reset 
netsh int ip reset all
netsh winhttp reset proxy
ipconfig /flushdns

then restarted my computer: doesn't work

  1. Updated my /etc/resolv.conf and /etc/wsl.conf to put nameserver of 8.8.8.8 and 8.8.8.4 and even make the /etc/resolve.conf immutable... and it doesn't work.
sudo rm /etc/resolv.conf
sudo bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf'
sudo bash -c 'echo "[network]" > /etc/wsl.conf'
sudo bash -c 'echo "generateResolvConf = false" >> /etc/wsl.conf'
sudo chattr +i /etc/resolv.conf
  1. Disabled "fast start-up" option in Power Options then restarted my comp... still doesn't work

  2. Chnaged from my company WiFi to my personal mobile hotspot - doesn't work

Suspected Reasons

  1. Change of network (VPN?) but disabling VPN doesn't yield a meaningful difference

  2. Windows Update (including Quality Updates)

Somehow I have WSL update automatically with an old kernel version though my WSL Ubuntu is installed from Microsoft store?
enter image description here

But my WSL version seems to be okay

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2

Appreciate Any Help

  • I'm totally at a loss and would really appreciate any help!! My last resort is to uninstall WSL and reinstall :(

Diagnostic Logs

Added WSL Logs

WslLogs-2024-06-16_22-35-16.zip

Copy link

Logs are required for review from WSL team

If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'.
Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.

How to collect WSL logs

Download and execute collect-wsl-logs.ps1 in an administrative powershell prompt:

Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/WSL/master/diagnostics/collect-wsl-logs.ps1" -OutFile collect-wsl-logs.ps1
Set-ExecutionPolicy Bypass -Scope Process -Force
.\collect-wsl-logs.ps1

The scipt will output the path of the log file once done.

Once completed please upload the output files to this Github issue.

Click here for more info on logging
If you choose to email these logs instead of attaching to the bug, please send them to [email protected] with the number of the github issue in the subject, and in the message a link to your comment in the github issue and reply with '/emailed-logs'.

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

@kohlerdominik
Copy link

kohlerdominik commented Jun 14, 2024

I have the exact same issue, started today as well.

WslLogs-2024-06-14_17-43-54.zip

@cyberjj999
Copy link
Author

@kohlerdominik It's a really annoying problem.

I found a quick fix incase it really bothers you - doesn't seem to affect my project code but things feel slower in general.

I just downgrade to WSL1 and it works

# Open Powershell
C:\WINDOWS\system32>wsl --list

Windows Subsystem for Linux Distributions:
Ubuntu (Default)

# Set WSL to Version 1
C:\Users\priv>wsl --set-version Ubuntu 1

It says it takes a few minutes but in reality it took almost an hour for me.

I'm hoping a proper fix will be here so I can switch back to WSL2.

Cheers and hope it helps!

P.S. I tried to switch back to WSL2 afterwards and the network issue still persisted, so I guess switching back and forth doesnt' work 👎

@kohlerdominik
Copy link

@cyberjj999 after investigating I found, that etc/resolv.conf is a symbolic link to a non-existing file. After creating that file with the following content, the name resolution works again

nameserver [our company dns server ip - don't add this if you only require internet dns]
nameserver 1.1.1.1

This is quite an ugly workaround, though, because it works staticly only for our work environment and the internet.

I guess the resolv.conf should be taken from the windows network adapter usually (so DHCP-conf is properly forwared), but this did break.

@dcasota
Copy link

dcasota commented Jun 15, 2024

Hi,
There are quite a few differences between WSL1 and WSL2.
See #4150 (comment). The thread is mentioned in the weblink above.

I'm using NAT mode, and this works perfectly for Photon OS on WSL2.

@Wrong-Code
Copy link

Wrong-Code commented Jun 15, 2024

Yes, there is a difference between version 1 and 2, but currently version 2 doesn't work anymore, and version 1 still performs as expected. Thanks to @cyberjj999 for hinting on temporarily switch to version 1.

Running 10.0.22631.3737 FWIW

@dcasota
Copy link

dcasota commented Jun 15, 2024

On x86_64,
WSL 2.1.5 works flawlessly with version 2. Regular installations are not affected, because wsl —update does not update to 2.2.4.
WSL with version 1 is not an option for distributions with systemd requirements.

@cyberjj999
Copy link
Author

@dcasota

On x86_64, WSL 2.1.5 works flawlessly with version 2. Regular installations are not affected, because wsl —update does not update to 2.2.4. WSL with version 1 is not an option for distributions with systemd requirements.

I'm still deeply lost with what's happening. I happen to need to use ollama which requires wsl2 so my current approach of reverting to wsl1 will not work.

I have uploaded the logs above.

Incidentally, I saw in another issue (#11675) mentioning similar things, and your suggestion for WSL reinstallation.

# see https://learn.microsoft.com/en-us/windows/wsl/install-manual#step-1---enable-the-windows-subsystem-for-linux

# Open a Powershell Terminal (Administrator) 
dism /online /disable-feature /FeatureName:Microsoft-Windows-Subsystem-Linux
dism /online /disable-feature /featurename:VirtualMachinePlatform
dism /online /disable-feature /FeatureName:Microsoft-Hyper-V
rm "$env:userprofile\.wslconfig"

# reboot

# Open a Powershell Terminal (Administrator) 
dism /online /enable-Feature /All /FeatureName:Microsoft-Windows-Subsystem-Linux /norestart
dism /online /enable-feature /All /Featurename:VirtualMachinePlatform /norestart
dism /online /enable-Feature /All /FeatureName:Microsoft-Hyper-V /norestart
bcdedit /set hypervisorlaunchtype auto

# reboot

# Open a Powershell Terminal (Administrator) 
wsl --install

I'll give it a shot tomorrow if there's no alternative solution.

Can I ask regarding the backup for WSL which you suggested: wsl --export <distribution-name> <path\filename.tar>, if WSL2 still doesn't have connection after I do your WSL reinstallation step, how can I re-import these backup contents into my new WSL2?

(I have custom bash scripts/commands + other installations like ollama and it'd be nice if i can easily re-import them to my new WSL2...)

Copy link

Diagnostic information
Issue was edited and new log file was found: https://github.com/user-attachments/files/15858698/WslLogs-2024-06-16_22-35-16.zip
Detected appx version: 2.2.4.0

@dcasota
Copy link

dcasota commented Jun 16, 2024

@cyberjj999

  • fyi Ollama does not require WSL, it's an option. But yes, I'm using this option, too.
  • Before considering a reinstallation of WSL, backup your work locally e.g. to the hard disk, usb drive, etc. Shutdown the vm(s) with vm -d <distribution-name> --shutdown and use the export functionality.
  • Run wsl --uninstall. Run the recipe mentioned above.
  • Reinstalling WSL x86_64 actually is a reset to version 2.1.5. Hence there shouldn't be a networkMode related issue as in 2.2.4.
  • After the reinstallation, use the import function. Configure .wslconfig as needed.

From my perspective, Ollama, security-hardened by VMware Photon OS (one of the origins of Cbl mariner), runs best in VMware Workstation. But built-in WSL in Windows 11 as homelab use-case has become good enough. Not all functions are fail-safe yet.

Clarification:
wsl --unregister -d <distribution-name> deletes a custom distribution, but
wsl --uninstall+WSL reinstallation not.

@cyberjj999
Copy link
Author

@craigloewen-msft @dcasota

Thanks for contributing. I hope to update that I've done even more steps

  1. registering another Ubuntu instance (didn't work)
  2. follow the whole process of deregistering WSL, uninstalling it + uninstalling Ubuntu completely + reimporting the profile back (didn't work)
  3. instead of reimporting, I created an entire new profile (still didn't work)

The network issue still persists despite all the attempts.

I sincerely seek your support., especially since I'd need WSL2 to run certain programs.... thanks

@dcasota
Copy link

dcasota commented Jun 20, 2024

Which constellation does not work?

  • wsl 2.1.5 + networkingMode=mirror
  • wsl 2.2.4 + networkingMode=mirror
  • ...

In my homelab,

  • wsl 2.1.5 +networkingMode=nat (ipv4 only) works.
  • wsl 2.2.4 + networkingMode=nat (ipv4 only) does not work
  • wsl 2.1.5 +networkingMode=mirror (ipv4 only) stops after half a minute, hence does not work.
  • removing kb5039212 helped for a successful wsl 2.1.5 re-installation

All tests were on VMware By Broadcom Photon OS 5.0 guest with findings from March to June 2024. I didn't start testing yet the new possibilities as described in https://devblogs.microsoft.com/commandline/whats-new-in-the-windows-subsystem-for-linux-in-may-2024/.

@cyberjj999
Copy link
Author

@dcasota

I'm using the default WSL configuration which is utilizing NAT by default, I presume.

I tried uninstalling the kb5039212 windows update as you mentioned: doesn't work with WSL v2.2.4

Downgrading to WSL v2.1.5 using the .msi installer you shared and confirmed downgrade using wsl --version: but internet connection still didn't work.

I'm starting to think it could be some network configuration issues but I really have no clues of what happened 1-2 weeks ago that randomly caused this to happen.

^ If the issue is VPN, wouldn't it work once I switch network? Because that didn't work... I'm rather clueless now...

@dcasota
Copy link

dcasota commented Jun 21, 2024

Troubleshooting WSL lists quite a few known issues e.g. constellations with Cisco anyconnect vpn, antiviruses which prevent wsl internet access, etc.
Logfiles are needed for further investigations. Simply go step by step through the collecting wsl logs recipe.

Edited: From your logs provided and using Windows Performance Analyzer for the first time, I'm afraid, in System activity > 'regions of interest' >table details view mostly is empty and in WPP Trace >Process(Name) are displayed as unknown.
A Microsoft wsl support enginer may help by providing a curated methodology to analyze logfiles.

Otherwise I would collect the information the classic way: 1) Windows eventlogs, 2) in .wslconfig set debugConsole=true, 3) in the Linux distro collect journalctl content e.g. show errors since last reboot with journalctl -p 3 -xb.

@cyberjj999
Copy link
Author

@craigloewen-msft are there any updates? Thank you @dcasota I already included my log as you've mentioned.

Currently working with raw windows and there's quite a number of packages that I can't even install to get my dev work going.

@dcasota
Copy link

dcasota commented Jun 27, 2024

@cyberjj999 From what I've seen, it matches to your observations. The service goal to safely deliver package updates at granular level from Windows to distros seems to work, but hiding the complexity wasn't possible in that time.

In registry, there are a bunch of entries with the naming schema <package>~<guid>~<architecture>~~<fileversionraw>, e.g. in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Component Based Servicing\Packages\ . To get an idea, the following lists all .mum packages in correlation with KB5039212:

Get-Childitem -Path Registry::HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\"Component Based Servicing"\Packages -recurse -ErrorAction SilentlyContinue | get-itemproperty | foreach-object{ if ($_.InstallLocation -ilike "*KB5039212*") {$_.InstallName}}

I finished the delay of KB5039212. It seems that the kb has been updated at granular level. Now I get

get-childitem c:\windows\system32\wsl* -include *.dll,*.exe | foreach-object { "{0}`t{1}" -f $_.Name, [System.Diagnostics.FileVersionInfo]::GetVersionInfo($_).FileVersionRaw }

wsl.exe 10.0.22621.3672
wslapi.dll      10.0.22621.3672
wslconfig.exe   10.0.22621.3672
wslg.exe        10.0.22621.3672

and not the .3737-files anymore. Wsl still is on version 2.1.5. I didn't update for the moment as it works flawlessly.

@AnonymousWP
Copy link

AnonymousWP commented Jun 27, 2024

I also figured out internet isn't working for me anymore at all (Ubuntu 22.04, WSL 2.2.4). I first thought it was Docker or a temporary bug that could be fixed by a wsl --shutdown, but it isn't. Pinging IP addresses don't work, changing the DNS-server in resolv.conf doesn't work either. Seems like a new Windows update messed this up.

@AnonymousWP
Copy link

I fixed the issue by setting networkingMode to mirrored, so: networkingMode=mirrored in the wslconfig.conf file. Make sure you restart WSL after that: wsl --shutdown. The default for networkingMode is nat, which seems to be broken now.

@dcasota
Copy link

dcasota commented Jun 27, 2024

@AnonymousWP From your description

  • wsl 2.2.4 with networkingMode=mirrored --> works
  • wsl 2.2.4 with networkingMode=nat --> fails

Wrong or Right?

Yes, resolv.conf, docker and distro-specific issues can be excluded.
Actually wsl 2.1.5 with networkingMode=nat(default) works flawlessly, too.

@AnonymousWP
Copy link

Yes, correct. Now the question is: what's causing this? I got an update for WSL from the Microsoft Store some days (almost a week) ago, or is it the Windows update?

@dcasota
Copy link

dcasota commented Jun 27, 2024

I don't know the root cause.

The history was

  • wsl 2.2.4 from Github repo is out for a while. It didn't worked in my environment with networkingMode=nat, so I downgraded to 2.1.5. networkingMode=mirrored in wsl 2.1.5 caused networking issues after half a minute. I didn't invest time for research, and simply used nat.
  • With KB5039212 from June 11th, there was implicitly an update to 2.2.4. Hence, the already known issue was back. I uninstalled the kb, but that didn't work completely. In %windir%/system32 there were wsl* bits with fileversionraw-identification '3737'. A complete wsl uninstallation and reinstallation was the trick to recover to a working setup.
  • I delayed updates for a week, so e.g. KB5039212 could not reinstall automatically. Later, it has been reinstalled and the wsl bits in %windir%\system32 got the wsl2.1.5-matching fileversionraw-identification '3672'.

Where is wslconfig.conf placed? In my environment, the configuration is in %userprofile%\.wslconfig and inside the distro the configuration file is in /etc/wsl.conf, but there is no wslconfig.conf (?)

@cyberjj999
Copy link
Author

cyberjj999 commented Jun 28, 2024

@AnonymousWP thanks for your feedback: i added a .wslconfig file in my %userprofile% / C:/Users/MyUser/.wslconfig with the following content

[wsl2]
networkingMode=mirrored

and I still suffer from a lack of internet connection.

@dcasota I think the logs above might have been collected when my WSL was 2.2.4, but regardless, my current version is 2.1.5 (after I ran the .msi file you sent to downgrade my WSL. Afterwards, I also uninstalled the security updates you mentioned and... it still doesn't work.

Not quite sure what's left to try.

I'm considering switching to an entire dual-boot set up at the moment, but admittedly it'd be quite inconvenient...

@dcasota
Copy link

dcasota commented Jun 28, 2024

@cyberjj999 wsl 2.1.5 with networkingMode=nat and windows setttings ipv4 only should work.
wsl 2.1.5 with networkingMode=mirrored didn't work in my home lab, too.

@Wrong-Code
Copy link

Wrong-Code commented Jun 29, 2024

I think I have found what's going on with this issue, at least on Windows 11. With version 22H2, Microsoft has introduced the Hyper-V firewall, and depending on your Windows Defender Firewall configuration, the default settings for the Hyper-V firewall may impact negatively the WSL2 connections.

TL;DR solution if you are stuck with this issue (at least it works for me)

Note: The Hyper-V firewall can only be configured with PowerShell. Currently, there is no specific support for configuring it via GPOs, nor there is a GUI.

From an elevated PowerShell prompt, run this cmdlet:

Get-NetFirewallHyperVVMCreator

You should get the following:

VMCreatorId  : {40E0AC32-46A5-438A-A0B2-2B479E8F2E90}
FriendlyName : WSL

Now get the default configuration for WSL. Run:

Get-NetFirewallHyperVVMSetting -PolicyStore ActiveStore -Name '{40E0AC32-46A5-438A-A0B2-2B479E8F2E90}'

You should get the following:

Name                  : {40E0AC32-46A5-438A-A0B2-2B479E8F2E90}
Enabled               : True
DefaultInboundAction  : Block
DefaultOutboundAction : Block
LoopbackEnabled       : True
AllowHostPolicyMerge  : True

Allow the outbound traffic for WSL2:

Set-NetFirewallHyperVVMSetting -Name '{40E0AC32-46A5-438A-A0B2-2B479E8F2E90}' -DefaultOutboundAction Allow

Your Linux distros should now be able to connect to the LAN/Internet.

Longer version

If Windows firewall is configured to block all the outbound (or inbound) connections for which there is no rule, your WSL2 distros outbound/inbound connections will be blocked as well. The reason is that, by default, both inbound and outbound traffic to/from WSL2 distros is blocked. Configuring WSL2 to work in mirrored networking mode (which is the one I need) will not change this.

After having tried all the possible solutions, downgrades, upgrades, ... suggested on this bug issue, including the removal and reinstallation of the entire WSL (and depending) subsystems, I was always back to square one: none of my WSL2 distros would connect to the LAN or the Internet. I noticed however that ICMP traffic was permitted everywhere (or everywhere your core/border firewall, if you happen to have one, permits).

I realized I had to investigate the Windows firewall. Only, I could not find any trace of blocked connections in the Windows firewall log. Moreover, even using the nifty Windows Firewall Control (WFC) utility by Binisoft (now Malwarebytes) WSL2 distros attempts to communicate did not raise any notification. With WFC, I normally use the Medium Filtering, which corresponds in Windows Defender Firewall parlance to block inbound or outbound connections for which there is no rule. I changed temporarily WFC configuration to Low Filter, which enables all the outbound connections even if there is no specific rule for them, and immediately my Linux distros started to communicate.

A quick Internet search with keywords WSL2 and firewall pointed me to this Microsoft page, Configure Hyper-V firewall, a feature I was not aware of. I quickly realized that with my Windows firewall configuration, which is not the default, the default settings of the new Hyper-V firewall cause the issue. The TL;DR steps I've shown above is what has fixed my WSL2 connections problems.

A couple of related notes:

  • Even with the default Hyper-V firewall settings, ICMP traffic is strangely permitted even all the in/out communications are blocked.

  • Another funky thing is that I couldn't find a way to have Hyper-V firewall logs its actions. Indeed, I discovered it was causing the issue by accident, not because I had log entries that stated the blockings.

  • The latest Windows 11 cumulative patch (June 2024, KB5039212) can be installed once the above fix is in place. My current WSL config is:

WSL --version
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3737

I downgraded WSL to 2.1.5.0 following this long thread, but I have the feeling it should work even with the latest WSL version.

EDIT

Confirmed: upgrading to the latest WSL 2.2.4.0 the fix still works. There should be no need to remove the latest Windows patches, downgrade or reinstall WSL whatsoever. All considered, I also don't think KB5039212 has anything to do with this issue.

EDIT 2

ICMP traffic is permitted because there are specific rules for that. See the output of

Get-NetFirewallHyperVRule -VMCreatorId '{40E0AC32-46A5-438A-A0B2-2B479E8F2E90}'

@dcasota
Copy link

dcasota commented Jun 30, 2024

@Wrong-Code The regression complexity is the main culprit for the many wsl issues. Obviously this was and is not the case yet.

Windows 11 Pro with KB5039212, wsl 2.1.5, hyper-V firewall settings { DefaultInboundAction=Allow / DefaultOutboundAction=Allow / LoopbackEnabled=true / AllowHostPolicyMerge=true } does not work with networkingMode=mirrored in my homelab with Photon OS 5 and ipv4only with no vlan/vpn/bond.

In March, I started with some similar hyper-V firewall research findings of hardening posssibilities in https://github.com/dcasota/photonos-scripts/wiki/Photon-OS-on-WSL2#hardening.

A good solution pattern should include

  1. antivirus interoperability
  2. windows 11 pro/.. : security updates, feature updates such as before 23H2 / with 23H2 and without KB5039212 / with KB5039212
  3. hyper-V release version capabilities and firewall settings
  4. ipv4 only / ipv6only / both
  5. wsl release e.g. 2.1.5 / 2.2.4
  6. networkingMode=nat / networkingMode=mirrored
  7. various adapters wired / wlan / usb-c ethernet adapters
  8. extended ethernet functions: vlan, vpn, bonds
  9. distro release version capabilities: systemd, nvidia

Hence, I would say there is a solution regression gap somewhere between 4-8.

It's a pity that there is no open-source joint-venture with VMware By Broadcom. Planning together existing and future virtual hardware capabilities (device firmware, kernel, virtual devices, drivers, default settings) for x86_64 and arm64 would be helpful for terrestric edge and datacenter solutions. Network configuration management, network setup, networking event brokering - this is NOT solved in cbl-mariner and therefore not in wsl.
Unfortunately, without more features inside the network stack, answers in situations "WSL no internet connection / DNS issues" seems to end with a learning curve. Users hate this. It does not help.

The software delivery method in Windows through .mum packages seems to work. Facing the 50++ .mum package entries with the naming schema ~~, in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Component Based Servicing\Packages is quite challenging to maintain in business as usual bug situations, probably for Windows developers, but in any case for customers.

@1MLightyears
Copy link

1MLightyears commented Aug 27, 2024

@CatalinFetoiu Sorry for late response, been busy with my work :(


I have changed my /etc/wsl.conf and /mnt/wsl/resolv.conf, now the content in /etc/wsl.conf is now automatically filled and is as follows:

# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 10.255.255.254
search modem

Though, the result remains the same(ping: www.google.com: Temporary failure in name resolution). What else can I do?


PS: my .wslconfig had been modified to:

[wsl2]
firewall=false
debugConsole=false

PPS: I'm not sure that if my systemd-resolved service is the key to the problem: it is not active(output from journalctl -xeu systemd-resolved.service):

: systemd-resolved.service: Failed to execute /lib/systemd/systemd-resolved: Permission denied
: systemd-resolved.service: Failed at step EXEC spawning /lib/systemd/systemd-resolved: Permission denied

Though this file is actually 755.

@CatalinFetoiu
Copy link
Collaborator

@1MLightyears thanks.
can you please collect a trace using the following commands?
(If you no longer encounter the issue with the script exiting early you might use the script instead)

download https://github.com/microsoft/WSL/blob/master/diagnostics/wsl_networking.wprp
wpr.exe -start .\wsl_networking.wprp -filemode
in WSL, run ping google.com
wpr.exe -stop .\logs.etl

collect and share logs.etl

@1MLightyears
Copy link

@CatalinFetoiu Hi! I'd like to provide some good news.
I noticed your update commit on diagnostics/collect-networking-logs.ps1 yesterday and tried it again. It worked! Below is the log captured:
WslLogs-2024-08-31_14-12-40.zip
and I think you might prefer this.

@CatalinFetoiu
Copy link
Collaborator

@1MLightyears thanks. it looks like you attached a WslLogs zip, do you have a WslNetworkingLogs zip generated by the collect-networking-logs.ps1?

If you still encounter issues with collect-networking-logs.ps1, please try the wpr commands I shared

@1MLightyears
Copy link

1MLightyears commented Sep 6, 2024

@1MLightyears thanks. it looks like you attached a WslLogs zip, do you have a WslNetworkingLogs zip generated by the collect-networking-logs.ps1?

If you still encounter issues with collect-networking-logs.ps1, please try the wpr commands I shared

Hi @CatalinFetoiu , the problem of collect-networking-logs.ps1 remains, but the good news is that I troubleshooted it myself and now it works well. I'm thinking of making a PR to the WSL repo fixing this.


Here is the log of a successful running the fixed collect-networking-logs.ps1:
WslNetworkingLogs-2024-09-06_14-18-10.zip

@CatalinFetoiu
Copy link
Collaborator

@1MLightyears great to hear the script worked, please feel free to open a PR with the change you made. thanks!

@1MLightyears
Copy link

@CatalinFetoiu Sorry to disturb you but is there any update on this issue?
The wsl networking logs has been attached.
Thank you!

@torgeros
Copy link

torgeros commented Sep 30, 2024

I was able to solve mine by setting

[wsl2]
dnsTunneling=false

in %UserProfile%\.wslconfig.

@CatalinFetoiu
Copy link
Collaborator

@1MLightyears thanks for your patience. I looked at WslNetworkingLogs-2024-09-06_14-18-10.zip

DNS requests for google.com are sent to 127.0.0.1, port 53
this is unexpected, because 10.255.255.254 is configured as DNS server in /etc/resolv.conf (this is the DNS proxy that is used as part of DNS tunneling), so we expect DNS queries to be sent to 10.255.255.254

how are you reproducing the issue? (e.g. are you running ping google.com?)
do you have additional DNS configurations done in Linux that are setting up 127.0.0.1 as DNS server?

thanks

@1MLightyears
Copy link

@CatalinFetoiu Yes, I reproduced the issue with a ping www.google.com. Also, the nslookup can resolve the domain name correctly:

# nslookup www.google.com
Server:         10.255.255.254
Address:        10.255.255.254#53

Non-authoritative answer:
Name:   www.google.com
Address: 142.250.70.196
Name:   www.google.com
Address: 2404:6800:4015:800::2004

But ping just doesn't resolve it. This issue trouble me a lot as I need to run an apt update and it also raises Temporary failure resolving 'us.archive.ubuntu.com'.

@1MLightyears
Copy link

1MLightyears commented Oct 2, 2024

I was able to solve mine by setting

[wsl2]
dnsTunneling=false

in %UserProfile%\.wslconfig.

@torgeros Sadly I've tried this before and it doesn't work. The problem is that I don't know whether it's a problem of wsl or a problem about Linux configuration...

@CatalinFetoiu
Copy link
Collaborator

@1MLightyears thanks for following up. Could you please collect and share the following strace outputs? those should give us a hint on why ping uses the wrong DNS server (127.0.0.1 instead of 10.255.255.254)

strace -f ping google.com
strace -f nslookup google.com

@shigenobuokamoto
Copy link

$ sudo systemctl --now disable systemd-resolved

is not this it?

@1MLightyears
Copy link

@1MLightyears thanks for following up. Could you please collect and share the following strace outputs? those should give us a hint on why ping uses the wrong DNS server (127.0.0.1 instead of 10.255.255.254)

strace -f ping google.com strace -f nslookup google.com

The stderr output of these two commands are as follows:
strace_ping.txt
strace_nslookup.txt

@shigenobuokamoto Thank you for your reply! Sadly after it's disabled, the ping still gives Temporary failure in name resolution. I've double checked that systemd-resolved is inactive.

@CatalinFetoiu
Copy link
Collaborator

CatalinFetoiu commented Oct 8, 2024

@1MLightyears thanks for sending the strace logs

from strace_ping, there are failures to open /etc/resolv.conf (and other DNS related files) with error permission denied, so ping seems to fall back to using 127.0.0.1 as DNS server, which will not work

it's not immediately clear why this happens

to narrow down the problem, can you please share the following?

  1. does dig @10.255.255.254 google.com work (does it return an IP address for google.com) ?
  2. is there a difference when you run "sudo ping google.com" vs "ping google.com" ?
  3. what is the output of ls -l /etc/resolv.conf?

newfstatat(AT_FDCWD, "/etc/nsswitch.conf", 0x7ffe385ff100, 0) = -1 EACCES (Permission denied)
newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
newfstatat(AT_FDCWD, "/etc/resolv.conf", 0x7ffe385ff220, 0) = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/etc/host.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
futex(0x7f6aaad1132c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

@1MLightyears
Copy link

@CatalinFetoiu Updates:

does dig @10.255.255.254 google.com work (does it return an IP address for google.com) ?
Here is the output:

# dig @10.255.255.254 www.google.com
;; communications error to 10.255.255.254#53: timed out
;; communications error to 10.255.255.254#53: timed out
;; communications error to 10.255.255.254#53: timed out

; <<>> DiG 9.18.12-0ubuntu0.22.04.2-Ubuntu <<>> @10.255.255.254 www.google.com
; (1 server found)
;; global options: +cmd
;; no servers could be reached

I also tried a plain dig www.google.com and it gave me:

# dig  www.google.com

; <<>> DiG 9.18.12-0ubuntu0.22.04.2-Ubuntu <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62020
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         0       IN      A       142.250.70.132

;; Query time: 9 msec
;; SERVER: 172.17.64.1#53(172.17.64.1) (UDP)
;; WHEN: Wed Oct 09 20:25:46 AEDT 2024
;; MSG SIZE  rcvd: 62

is there a difference when you run "sudo ping google.com" vs "ping google.com" ?

There is no difference as I'm using the root user running all the commands(and all the commands in wsl above)

what is the output of ls -l /etc/resolv.conf?
It's 777:

# ls -l /etc/resolv.conf
lrwxrwxrwx 1 lightyears root 20 Oct  9 20:24 /etc/resolv.conf -> /mnt/wsl/resolv.conf

I double checked it and ensured that the content is:

nameserver 172.17.64.1

But when I check the mode of /mnt/wsl/resolv.conf, and the other Permission denied file /etc/host.conf, it shows that:

# ls -l /mnt/wsl/resolv.conf
-rw-r--r-- 1 lightyears root 197 Oct  9 20:24 /mnt/wsl/resolv.conf
# ls -l /etc/host.conf
-rw-r--r-- 1 lightyears root 92 Oct 15  2021 /etc/host.conf

Nevertheless, 644 should be enough to read it...I have no idea why it's denied

@shigenobuokamoto
Copy link

@1MLightyears

[FAILED] Failed to start Dispatcher daemon for systemd-networkd.
See 'systemctl status networkd-dispatcher.service' for details.

it looks like the error is occurring when trying to run systemd-networkd.
WSL will set up the network-related stuff, so stop it.

systemctl --now disable systemd-networkd networkd-dispatcher

@CatalinFetoiu
Copy link
Collaborator

CatalinFetoiu commented Oct 12, 2024

@1MLightyears could you please collect the following additional diagnostics, to help understanding what's causing the permission denied issue? Please run ping google one more time before collecting the below

dmesg &> dmesg.log and share this file
journalctl &> journalctl.log and share this file
output of ls -lah /etc
output of ls -lah /mnt
output of ls -lah /mnt/wsl
mount &> mount.log and share this file
content of /etc/passwd

thanks

@1MLightyears
Copy link

@1MLightyears

[FAILED] Failed to start Dispatcher daemon for systemd-networkd.
See 'systemctl status networkd-dispatcher.service' for details.

it looks like the error is occurring when trying to run systemd-networkd. WSL will set up the network-related stuff, so stop it.

systemctl --now disable systemd-networkd networkd-dispatcher

@shigenobuokamoto There is still a failure when pinging...it seems that WSL do will set up the retwork-related stuff, but the native Linux part doesn't work well...
I double checked the journal and noticed that networkd-dispatcher is also raising an error:
pam_systemd(login:session): Failed to create session: The name org.freedesktop.login1 was not provided by any .service files
And I noticed that when I enabled these services, there was a Created symlink /etc/systemd/system/dbus-org.freedesktop.network1.service → /lib/systemd/system/systemd-networkd.service.
It looks like that a thirdparty dbus is redirecting systemd-networkd to itself, though I have never done it, and it failed to start. Does it look normal?

@1MLightyears
Copy link

@1MLightyears could you please collect the following additional diagnostics, to help understanding what's causing the permission denied issue? Please run ping google one more time before collecting the below

dmesg &> dmesg.log and share this file journalctl &> journalctl.log and share this file output of ls -lah /etc output of ls -lah /mnt output of ls -lah /mnt/wsl mount &> mount.log and share this file content of /etc/passwd

thanks

Thank you @CatalinFetoiu , here is the logs.zip

@CatalinFetoiu
Copy link
Collaborator

@1MLightyears thanks
the /etc/passwd output shows the uid of root is 999. this needs to be 0 instead
root:x:999:0::/root:/bin/bash

if you replace this line in /etc/passwd with the line below, then run "wsl --shutdown" and restart WSL, the permission issue should be fixed
root:x:0:0:root:/root:/bin/bash

cc @OneBlue

@shigenobuokamoto
Copy link

@1MLightyears

And I noticed that when I enabled these services, there was a Created symlink /etc/systemd/system/dbus-org.freedesktop.network1.service → /lib/systemd/system/systemd-networkd.service.

this symlink is created when enable systemd-networkd.

to verify, i installed WSL + Ubuntu.
systemd-networkd is disabled.

$ systemctl list-unit-files | grep systemd-network
systemd-network-generator.service            disabled        enabled
systemd-networkd-wait-online.service         disabled        enabled
[email protected]        disabled        enabled
systemd-networkd.service                     disabled        enabled
systemd-networkd.socket                      disabled        enabled

i tried enabling systemd-networkd and it did not break anything.
there may be no problem if enable it.

the issue seems to be that for some reason systemd-resolved is unable to recognize /etc/resolv.conf.

would you mind trying some?

  1. restart systemd-resolved
$ systemctl restart systemd-resolved
  1. add a DNS server to systemd-resolved and restart

/etc/systemd/resolved.conf.d/dnsserver.conf

[Resolve]
DNS=1.1.1.1
$ systemctl restart systemd-resolved
  1. stopping systemd-resolved
$ systemctl stop systemd-resolved

@yogch
Copy link

yogch commented Dec 5, 2024

For me nothing works. I've tried everything.
It is not related specifically to DNS.

For example, I try to ping 8.8.8.8 - this does not related to DNS.
I see the packets go out and return on the Windows 11 host. but not redirected back to the WSL container.

Is there any workaround?
I cannot use it like that.

@CatalinFetoiu
Copy link
Collaborator

@yogch thanks for reaching out. since you mentioned your problem is not related to DNS, could you please open a separate issue?. I also recommend trying networkingMode=mirrored to see if it resolves your problem

@yogch
Copy link

yogch commented Dec 8, 2024

@CatalinFetoiu, thanks.
I did try that, it doesn't help.
The only thing that helps is https://github.com/sakai135/wsl-vpnkit

@CatalinFetoiu
Copy link
Collaborator

@yogch thanks for following up, sorry to hear it did not fix your issue. are you using any VPN when you are encountering the issue (if so, can you please share the name of the VPN product?)

@yogch
Copy link

yogch commented Dec 15, 2024

@CatalinFetoiu, I'm using Check Point Endpoint Security as a VPN client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests