modified: twitterscraper/tweet.py #100

hengruo · 2018-03-04T06:33:33Z

I added a new field in Tweet: reply_to_id.
If tweet A is a reply to tweet B, then reply_to_id = B.id;
If tweet A doesn't reply to any tweet, then reply_to_id = A.id.

This field can let us construct the reply tree of tweets.

taspinar · 2018-03-06T14:43:18Z

twitterscraper/tweet.py

 class Tweet:
-    def __init__(self, user, fullname, id, url, timestamp, text, replies, retweets, likes, html):
+    def __init__(self, user, fullname, id, url, timestamp, text, reply_to_id, replies, retweets, likes, html):


placing a new argument at this location breaks backward compatibility. I suggest you move it to the end of the list of arguments.

The newly implemented 'reply_to_user' is not passed to the Tweet class and hence will not appear in the output.

taspinar · 2018-03-06T15:42:36Z

twitterscraper/tweet.py

@@ -38,7 +39,8 @@ def from_soup(cls, tweet):
                'span', 'ProfileTweet-action--favorite u-hiddenVisually').find(
                    'span', 'ProfileTweet-actionCount')['data-tweet-stat-count'] or '0',
            html=str(tweet.find('p', 'tweet-text')) or "",
-        )
+            reply_to_id = tweet.findChildren()[0]['data-conversation-id'] or '0',


This can also be achieved with
reply_to_id = tweet.find('div', 'tweet')['data-conversation-id'] or '0'

taspinar · 2018-03-06T15:49:22Z

twitterscraper/tweet.py

@@ -17,6 +17,7 @@ def __init__(self, user, fullname, id, url, timestamp, text, replies, retweets,
        self.retweets = retweets
        self.likes = likes
        self.html = html
+        self.reply_to_id = reply_to_id


self.reply_to_id = 0 if id == reply_to_id else reply_to_id
sets it to zero if it is equal to the tweet-id, i.e. if it is not a reply to anyone. Giving the reply_to_id a value even when it is not a reply is misleading and people would have to check the equivalence of id and reply_to_id before they can be sure it is an reply.

taspinar · 2018-03-06T16:05:42Z

I know it is possible to retrieve the contents of a tweet if you know the username and id with "https://twitter.com//status/".
And I was wondering if it is possible to retrieve the contents of a tweet by id only. I have looked on the internet and I could not find a good answer.
If this is not possible, and the original tweet is not in the list of scraped tweets, the reply_to_id is not very useful.

One way in which you can find out the username belonging to the original tweet is with the following command:
reply_to_users = json.loads(tweet.find('div', 'tweet')['data-reply-to-users-json']) or []

This retrieves a JSON-list containing among other things the username, screen_name and id_str of everyone which has participated in the conversation.
If the list has a length of 1, the tweet was not a reply to anyone and the list only contains information about the current tweet.
If the list contains more than 1 element, the tweet was a reply, and the last element in the list contains information about the user of the original tweet.

If it is not possible to retrieve a tweet by id only, I suggest you also include the username of the original tweet.

hengruo · 2018-03-07T19:00:16Z

Your suggestions are very useful! I stored tweets in my database so I don't consider the condition where we need to get tweets online just by id. I'll fix it.

taspinar · 2018-03-14T17:52:44Z

twitterscraper/query.py

        if html_response:
            html = response.text
        else:
-            json_resp = response.json()
+            json_resp = ujson.loads(response.text)


What is the difference between json.loads() and ujson.loads() ? If there is no clear reason for using ujson instead of json, I prefer the usage of json.

taspinar · 2018-03-14T17:56:46Z

twitterscraper/query.py

-        limit_per_pool = roundup(limit, poolsize)
-    else:
-        limit_per_pool = None
+    limit_per_pool = limit


This change will result in twitterscraper scraping approximately for P*limit number of tweets (where P is the poolsize) instead of the given limit. Please remove this change.

taspinar · 2018-03-14T18:29:15Z

twitterscraper/tweet.py

@@ -38,7 +39,9 @@ def from_soup(cls, tweet):
                'span', 'ProfileTweet-action--favorite u-hiddenVisually').find(
                    'span', 'ProfileTweet-actionCount')['data-tweet-stat-count'] or '0',
            html=str(tweet.find('p', 'tweet-text')) or "",
-        )
+            reply_to_id = tweet.find('div', 'tweet')['data-conversation-id'] or '0',
+            reply_to_user = tweet.find('div', 'tweet')['data-mentions'] or "",


This is already implemented in PR #98 . Maybe it is best to remove it here.

taspinar

See added comments.

hengruo added 3 commits March 4, 2018 01:28

modified: twitterscraper/tweet.py

ec4f17c

modified: twitterscraper/query.py

c90b031

modified: twitterscraper/query.py

47390ec

taspinar reviewed Mar 6, 2018

View reviewed changes

taspinar mentioned this pull request Mar 6, 2018

Add tracking of mentions to tweets #98

Open

modified: twitterscraper/tweet.py

9d41825

taspinar reviewed Mar 14, 2018

View reviewed changes

taspinar requested changes Mar 14, 2018

View reviewed changes

taspinar mentioned this pull request May 30, 2018

How to retrieve a users replies? #116

Open

taspinar mentioned this pull request Jun 15, 2019

Can I obtain the replies of a tweet? #193

Closed

lapp0 added the stale label Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modified: twitterscraper/tweet.py #100

modified: twitterscraper/tweet.py #100

hengruo commented Mar 4, 2018

taspinar Mar 6, 2018

taspinar Mar 14, 2018

taspinar Mar 6, 2018

taspinar Mar 6, 2018

taspinar commented Mar 6, 2018

hengruo commented Mar 7, 2018

taspinar Mar 14, 2018

taspinar Mar 14, 2018

taspinar Mar 14, 2018

taspinar left a comment

modified: twitterscraper/tweet.py #100

Are you sure you want to change the base?

modified: twitterscraper/tweet.py #100

Conversation

hengruo commented Mar 4, 2018

taspinar Mar 6, 2018

Choose a reason for hiding this comment

taspinar Mar 14, 2018

Choose a reason for hiding this comment

taspinar Mar 6, 2018

Choose a reason for hiding this comment

taspinar Mar 6, 2018

Choose a reason for hiding this comment

taspinar commented Mar 6, 2018

hengruo commented Mar 7, 2018

taspinar Mar 14, 2018

Choose a reason for hiding this comment

taspinar Mar 14, 2018

Choose a reason for hiding this comment

taspinar Mar 14, 2018

Choose a reason for hiding this comment

taspinar left a comment

Choose a reason for hiding this comment