I had the same problem.
]]>@Naveen
Were you able to resolve the error?
Regarding the script to output geoJSON, it works but reports some values for geo_tweets that are greater than total_tweets, both with line 51 commented and not.
My in-file is one line per tweet so that checks out:
prompt:~/Brunila/process$ wc -l self-driving_2018-11-26.json
468 self-driving_2018-11-26.json
The file included 446 unique users who tweeted with or without geo data
The file included 353 unique users who tweeted with geo data, including ‘location’
The users with geo data tweeted 720 out of the total 468 of tweets.
My output count reconciles:
prompt:~/Brunila/process$ grep ‘”user_id”:’ geo_data.self-driving_2018-11-26.json | wc -l
353
comment the conditional line 51:
The file included 446 unique users who tweeted with or without geo data
The file included 438 unique users who tweeted with geo data, including ‘location’
The users with geo data tweeted 898 out of the total 468 of tweets.
prompt:~/Brunila/process$ grep ‘”user_id”:’ geo_data.self-driving_2018-11-26.json | wc -l
438
Big thanks for sharing this script. If I figure out what seems to be counted twice I’ll be sure to share.
]]>Hope this message finds you well. Thank you for posting this, it’s fantastic.
I’m a researcher working on scraping geo-tagged Tweets. I’m only interested in the first category of geocoding that you suggest (i.e. precise coordinates). My question is, why didn’t you just do a bounding box within the stream.filter command? (See here: https://github.com/Ccantey/GeoSearch-Tweepy/blob/master/GeoTweepy.py). I’m asking because I wonder whether the bounding box only provides Tweets of type 1 (exact coordinates), or whether it also includes type 2 and 3 Tweets.
If the latter is true, I was thinking of running a script similar to the one you describe above, except using the bounding box as an initial search filter (instead of the hashtag you used), and then excluding your second and third iteration steps.
Thanks in advance!
]]>