-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract direct link for image and posts #213
Comments
This issue seems specific to groups, not pages. The photos issue is tricky, as if you need an account to resolve the url to the full quality image, it's impossible for the scraper to resolve that unless you feed it cookies. It would still be possible to extract the low quality image. Perhaps we should always extract the low quality image, and also try to extract the full quality image if possible. With the URLs problem, the regex needs to be updated for group posts. This problem was also reported in #165. |
In my case I would need the images because they can contain information. (I'm creating a facebook page to rss feed converter, so the goal is not to have an account) In general, I think it could be good to get a functional link depending on what is available, even if the quality is not good in the end. But I understand the reason to propose the best quality link. If not, maybe there is a way to get the html code of the posts? I did not find if it was possible. Since from there I could extract the link myself. That could be enough for me. As for the direct links of the groups, they just seem to be all built in the same way: |
@Breizhux I've raised a pull request to always return the low quality image (possibly in addition to the high quality one), see #217 It is possible to get the HTML, the parameter is I've also raised a separate pull request to fix the regexes for group posts - #216 |
Thanks for the change @neon-ninja . Note that the merge now empties the |
I'm currently facing the same problem not on groups but on a shared post on a page
|
In the time since you posted that comment, that post is no longer the first on the page. This code works fine though: posts = list(get_posts(
post_urls=["https://m.facebook.com/story.php?story_fbid=4059210580793355&id=2261226117258486"],
cookies="cookies.txt"
))
print(posts[0]["image"]) outputs |
Unfortunately this doesn't do the trick for me:
|
You might need to recreate |
I thought I already did that, it worked now. |
I pushed a commit to warn about non en_US locales present in result HTML, should help with this kind of problem 21ac8c4 |
Hello,
I noticed that the url of the recovered images are not necessarily usable. Sometimes they are direct links with the domain "scontent-cdt1-1.xx.fbcdn.net". But sometimes the url is not direct and requires authentication to recover it, the domain in this case is m.facebook.com.
Public page : https://fr-fr.facebook.com/groups/saintyves.rennes/
Post concerned: https://www.facebook.com/groups/saintyves.rennes/permalink/1360623547663812/
Url that is retrieved for the image : https://m.facebook.com/photo/view_full_size/?fbid=3861145620587869&ref_component=mbasic_photo_permalink&ref_page=%2Fwap%2Fphoto.php&refid=13&__tn__=%2Cg
While the following url would be much more relevant : https://scontent-cdt1-1.xx.fbcdn.net/v/t1.6435-9/s960x960/176314059_3861145627254535_6708760356773320290_n.jpg?_nc_cat=110&ccb=1-3&_nc_sid=825194&_nc_ohc=nTqGUQ-o0h0AX_bnWV-&_nc_ht=scontent-cdt1-1.xx&tp=7&oh=c7ab3b2c862064ae5c12503f6707f434&oe=60A68072
By creating this ticket, I notice that the url of the post is not relevant either.
The url retrieved for the post is : https://facebook.com/439909623068547/posts/1360623547663812
While this url is usable without an account : https://www.facebook.com/groups/saintyves.rennes/permalink/1360623547663812/
I understand the idea of retrieving urls even if you need an account to access them. But I think it would be very convenient to put also the direct url, usable without an account.
The text was updated successfully, but these errors were encountered: