Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No raw posts (article elements) were found in this page #66

Closed
Kaosam opened this issue Sep 26, 2024 · 5 comments
Closed

No raw posts (article elements) were found in this page #66

Kaosam opened this issue Sep 26, 2024 · 5 comments

Comments

@Kaosam
Copy link

Kaosam commented Sep 26, 2024

For the get_posts function PageParser can't find any url

No raw posts (article elements) were found in this page.

image

Same as -> kevinzg#1113

@Kaosam Kaosam changed the title No raw posts (elements) were found in this page No raw posts (article elements) were found in this page Sep 26, 2024
@moda20
Copy link
Owner

moda20 commented Oct 4, 2024

@Kaosam can you share your starting script ?
and it seems it thinks your browser is not supported, maybe you are using a non-standard userAgent

@Kaosam
Copy link
Author

Kaosam commented Oct 5, 2024

@Kaosam can you share your starting script ? and it seems it thinks your browser is not supported, maybe you are using a non-standard userAgent

This is the piece of the script to scrape groups with cookies file @moda20

    posts = []
    try:
        posts_count = 0
        # get the iterator of all posts
        posts_iter = get_posts(post_urls=iter(post_urls) if post_urls else None, group=group, cookies=cookie)
        # select the first n_posts posts
        for post in posts_iter:
            if posts_count >= n_posts:
                break
            sleep(sleep_seconds)
            posts.append(post)
            posts_count += 1
        # successfully got posts with the current cookie
        return posts
    except Exception as e:
        # there was an error with the current cookie: go to next one
        logger.debug(f"Could not scrape with cookie {cookie}: {str(e)}")
        last_exception = e # save last exception for logging purposes

@pitzmoni
Copy link

pitzmoni commented Oct 7, 2024

Which user-agent is working currently?

@kbalicki
Copy link

kbalicki commented Oct 8, 2024

@Kaosam can you share your starting script ? and it seems it thinks your browser is not supported, maybe you are using a non-standard userAgent

Is there any user agent that still works for you?

@Kaosam
Copy link
Author

Kaosam commented Oct 9, 2024

It seems that adding headers

with open('./mbasicHeaders.json', 'r') as file:
    _scraper.mbasic_headers = json.load(file)

following #22

in combination with basic url:
for post in get_posts('NintendoAmerica', base_url="https://mbasic.facebook.com", start_url="https://mbasic.facebook.com/NintendoAmerica?v=timeline", pages=1):

solved the issue
Thank you!

@Kaosam Kaosam closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants