Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Lots of missing main keywords from Search Console
Hello Moz Community, A while back I noticed in Google Search Console that the volume of impressions and clicks dropped off a cliff. I also noticed that this was for primary head keywords that generally had a decent volume of impression share. The natural initial reaction was 'oh, I must have lost those rankings' but upon checking I realized these rankings are still in existence. Admittedly most are page 2 or 3, but still within the confines of being captured in GSC. Is there a logical reason why these keywords have just gone from search console? An example keyword would be something like 'online football management game' and the website is https://www.worldelitesoccer.com There is 0 queries in my search console data that includes the word 'football'. Thanks, Ben
Reporting & Analytics | | melaniedsg0 -
Google Analytics Question - Impressions & Queries Up, Sessions Down
I'm working with a client who, according to the Google Query report, impressions and sessions are up since we've started work with them about 6 months ago, but Google sessions are down. In moz, we're seeing a gradual, but steady increase in search visibility specifically with Google. Note: this is all organic. From when we started tracking queries, the first month we were tracking there were 43,581 impressions and 690 click throughs for the month. This past month there were 98,293 queries and 1015 clicks throughs for the month (granted not year over year data) - of these 1,015 clicks, 995 of them were from web. However, for those same time periods, sessions from Google are down over 30% - 1,750 vs. 1,189. I'm not sure how to interpret this. I realize that clicks and sessions are not a straightforward comparison, but I would think that if clicks were up according to the query report that sessions would also be up. Is it that some of these clicks are bouncing and therefore not being tracked as a session? Is there a potential issue with how data is being tracked?
Reporting & Analytics | | Corporate_Communications0 -
Can not divide in different properties a domain in Search Console (Webmaster Tools)
Dear Moz Community, I hope you can give me a hand with the following questions. Im in charge of SEO of an ecommerce site in LATAM. It´s service is available in several countries, therefore each country has it subdirectory Eg. /ar /pe /co /bo /cl /br,etc... (in the future we will move to differente ccTLDs). I have been recomended to split or create different Search Console or Webmaster Tools properties (one for each subdirectory) but when Im creating a new property with a subdirectory, lets say www.domain.com/ar, Webmaster tools starts creating a property for www.domain.com/ar/ (NOTICE THE LAST SLASH) and it returns since that page doesn´t exist, what do you recomend me to do? Best wishes, Pablo Lòpez C
Reporting & Analytics | | pablo_carrara0 -
How Do Queries And Impressions Relate?
For one of our keywords, i have 2,500 impressions this past month, but there were only 1,300 queries according to Google's keyword planner. How can I have more impressions than queries? If anything, I thought it would be the other way around. If someone could flush this out for me, I'd be incredibly grateful. Thanks, Ruben
Reporting & Analytics | | KempRugeLawGroup0 -
Google Making all searches secure - "Not provided" data to increase in Analytics
A lot of you might already be aware of the recent Google change at encrypting all search activity except for clicks on ads. Rand did a whiteboard session on this recently. How is everyone planning to adjust their research data to accommodate for this change?
Reporting & Analytics | | SEO5Team0 -
(Not provided) organic search results in Analytics
Hi, So from what I've read, (not provided) organic search results in Analytics is a result of the user being logged into Google. The problem I have is about 20% of the organic search results are (Not provided), so what is mainly left are branded search terms. I think I'm clutching at straws, but is there anything I can do to see these organic traffic sources?
Reporting & Analytics | | JuiceBoxOM0 -
How can I track search engine optimization data in Google analytics?
My website is linked to a Google Analytics web property. But, I am not able to track search engine optimization data in Google Analytics. So, How can I get it done?
Reporting & Analytics | | CommercePundit0 -
How to measure number of visits from Google News coming from Google Universal Search (NOT referral coming directly coming from news.google.com) with google analyitcs
I'm running a news site, and I have a problem of accuratly measuring which traffic is REALLY coming from google news. I analyzed a lot of individual articles and I come to the conclusion, that the visits, that come from the google news section in the universal search results are counted as "normal" search engine traffic in google analytics. So if you do a Google search for a topic that includes links from Google news, you don't get an accurate referral count. As an example, if you do a search for "eBay", incorporated into the page 1 search results you may also see Google news results as well.
Reporting & Analytics | | Mulle
If someone clicks on that Google news link that appears in Google search, it shows up in Google analytics as a referral from Google search, when it was actually from a Google news referral. I was already checking google analytics and google news help forums and searched SEO blogs for this. But I wasn't able to find a working solution. Can anybody help me out with this problem? Thanks so much, Matthias0