Comments extraction issue #198

gqwang16 · 2021-04-03T23:20:36Z

I use get_posts("the page I scrape", pages=3,options={'comments':True}) to extract the comments, however, I got nonzero comments number but nothing in the "comments_full". Does anyone know the reason or how to extract comments?

lgjluis · 2021-04-04T01:00:00Z

Hi @gqwang16,

Facebook gives you a limit of searches without being logged in. I use a proxy to get information.

lgjluis · 2021-04-04T01:00:36Z

Hi @kevinzg,

I have a problem with the comments. They are bringing information from other posts. Do you know how I can fix it?

gqwang16 · 2021-04-04T01:35:15Z

Hi, could you please share how to use a proxy to get information? Actually, I am new to python and I would be very appreciative if you could send me some resources or even python examples on extracting facebook comments? Thanks a lot for the help! My email is ***@***.***

…

On Sat, Apr 3, 2021 at 9:00 PM lgjluis ***@***.***> wrote: Hi @gqwang16 <https://github.com/gqwang16>, Facebook gives you a limit of searches without being logged in. I use a proxy to get information. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#198 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASH54LEV6JQCHA7SW5CYYEDTG62ZXANCNFSM42KXULJQ> .

lgjluis · 2021-04-04T01:50:35Z

I use a local service as a rotating proxy. In facebook_scraper.py change:

`def init(self, session=None, requests_kwargs=None):
if session is None:
session = HTMLSession()
session.headers.update(self.default_headers)

    if requests_kwargs is None:
        requests_kwargs = {'proxies':{'http': 'http://{ip}:{port}','https': 'http://{ip}:{port}'}}

    self.session = session
    self.requests_kwargs = requests_kwargs`

You must change the {ip} and {port}.

neon-ninja · 2021-04-07T01:20:09Z

Hi @gqwang16 - can you tell us which page or post is causing the problem? I tested with the Nintendo page (#188) and that works fine. @lgjluis same for you

lgjluis · 2021-04-07T04:40:40Z

Hi @neon-ninja,

If I activate the comments, after a while Facebook closes the connection and stops extracting information. For this reason I use a proxy.

neon-ninja · 2021-04-07T04:42:58Z

Gotcha. Yes, extracting comments results in more requests to Facebook servers, which results in triggering a temporary IP ban faster

ccolonna · 2021-04-19T07:44:03Z

Please @lgjluis can you share a working script or simple example on how to use a rotating proxy?

neon-ninja · 2021-04-19T08:48:23Z

Twint has support for tor, and reloading tor if IP banned - might be worth porting here

lgjluis · 2021-04-23T16:22:23Z

Hi @Christian-Nja, I use a Docker with a rotating-proxy.

ccolonna · 2021-04-27T07:22:40Z

Ok thank you. So, just to be sure of what technique to follow.

User credentials: wide access to information, but a single user login so facebook can temporary ban the user for massive scraping
No user credentials: limited access to information, possibility to IP banning, but with rotating proxy you can do all the massive scraping you want

Is this correct? You can't get the benefit of being logged in and IP rotation at once.

neon-ninja · 2021-04-27T07:29:12Z

Ok thank you. So, just to be sure of what technique to follow.

User credentials: wide access to information, but a single user login so facebook can temporary ban the user for massive scraping

No user credentials: limited access to information, possibility to IP banning, but with rotating proxy you can do all the massive scraping you want

Is this correct? You can't get the benefit of being logged in and IP rotation at once.

Sounds about right. Depending on what you're trying to do, #212 might also be useful. Additionally, if you had multiple accounts, cookie rotation might work

abubelinha · 2021-05-09T23:35:46Z

@neon-ninja @lgjluis
In addition to using proxies, do you know if there is any parameter to slow-down the frequency of facebook-scraper requests?
I prefer it to wait a bit between requests (i.e. one second), if doing it I avoid my IP being banned (I am not scraping a lot ... I just want a cron job which makes a database backup of a given facebook page comments).

When I use get_posts() just once, does this imply a single request, or internally this function will launch a loop of requests which I can't slow-down?

neon-ninja · 2021-05-10T00:37:35Z

@abubelinha actually, get_posts returns a generator - and requests are only made when you iterate through it. Note that each page contains 4 posts. So,

import time
from facebook_scraper import get_posts
for post in get_posts("Nintendo"):
    print(post.get("post_id"))
    time.sleep(.25)

Should add a one second delay in between each request

webcoderz · 2021-05-14T12:44:14Z

Twint has support for tor, and reloading tor if IP banned - might be worth porting here

this is what brought me here today was looking to see if you guys were looking on implementing this feature, it would be a tremendous help

webcoderz · 2021-05-15T13:29:30Z

PS: https://github.com/twintproject/twint/blob/e7c8a0c764f6879188e5c21e25fb6f1f856a7221/twint/get.py#L73

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments extraction issue #198

Comments extraction issue #198

gqwang16 commented Apr 3, 2021

lgjluis commented Apr 4, 2021

lgjluis commented Apr 4, 2021

gqwang16 commented Apr 4, 2021 via email

lgjluis commented Apr 4, 2021 •

edited

Loading

neon-ninja commented Apr 7, 2021 •

edited

Loading

lgjluis commented Apr 7, 2021

neon-ninja commented Apr 7, 2021

ccolonna commented Apr 19, 2021

neon-ninja commented Apr 19, 2021

lgjluis commented Apr 23, 2021

ccolonna commented Apr 27, 2021

neon-ninja commented Apr 27, 2021 •

edited

Loading

abubelinha commented May 9, 2021 •

edited

Loading

neon-ninja commented May 10, 2021

webcoderz commented May 14, 2021 •

edited

Loading

webcoderz commented May 15, 2021

Comments extraction issue #198

Comments extraction issue #198

Comments

gqwang16 commented Apr 3, 2021

lgjluis commented Apr 4, 2021

lgjluis commented Apr 4, 2021

gqwang16 commented Apr 4, 2021 via email

lgjluis commented Apr 4, 2021 • edited Loading

neon-ninja commented Apr 7, 2021 • edited Loading

lgjluis commented Apr 7, 2021

neon-ninja commented Apr 7, 2021

ccolonna commented Apr 19, 2021

neon-ninja commented Apr 19, 2021

lgjluis commented Apr 23, 2021

ccolonna commented Apr 27, 2021

neon-ninja commented Apr 27, 2021 • edited Loading

abubelinha commented May 9, 2021 • edited Loading

neon-ninja commented May 10, 2021

webcoderz commented May 14, 2021 • edited Loading

webcoderz commented May 15, 2021

lgjluis commented Apr 4, 2021 •

edited

Loading

neon-ninja commented Apr 7, 2021 •

edited

Loading

neon-ninja commented Apr 27, 2021 •

edited

Loading

abubelinha commented May 9, 2021 •

edited

Loading

webcoderz commented May 14, 2021 •

edited

Loading