How do the Quoras of this world index their content?
-
I am helping a client index lots and lots of pages, more than one million pages. They can be seen as questions on Quora. In the Quora case, users are often looking for the answer on a specific question, nothing else.
On Quora there is a structure setup on the homepage to let the spiders in. But I think mostly it is done with a lot of sitemaps and internal linking in relevancy terms and nothing else... Correct? Or am I missing something?
I am going to index about a million question and answers, just like Quora. Now I have a hard time dealing with structuring these questions without just doing it for the search engines. Because nobody cares about structuring these questions. The user is interested in related questions and/or popular questions, so I want to structure them in that way too.
This way every question page will be in the sitemap, but not all questions will have links from other question pages linking to them. These questions are super longtail and the idea is that when somebody searches this exact question we can supply them with the answer (onpage will be perfectly optimised for people searching this question). Competition is super low because it is all unique user generated content.
I think best is just to put them in sitemaps and use an internal linking algorithm to make the popular and related questions rank better. I could even make sure every question has at least one other page linking to it, thoughts?
Moz, do you think when publishing one million pages with quality Q/A pages, this strategy is enough to index them and to rank for the question searches? Or do I need to design a structure around it so it will all be crawled and each question will also receive at least one link from a "category" page.
-
Wow, that is insane right?
https://www.quora.com/sitemap/questions?page_id=50
I wonder how long this carries on.
-
Quora don't seem to have a XML sitemap but a HTML one :
[https://www.quora.com/robots.txt](https://www.quora.com/robots.txt) refers to [https://www.quora.com/sitemap](https://www.quora.com/sitemap)
-
Yes there are many challenges and external linking is definitely one of them.
What do you think about sitemaps to get this longtail indexed? I think that a lot can be indexed by submitting the sitemaps.
-
There are many challenges to building a really large site. Most of them are related to building the site, but one that often kills the success of the site is the ability to get the pages into the index and keep them there. This requires a steady flow of spiders into the deepest pages of the site. If you don't have continuous and repetitive spider flow the pages will be indexed, but then forgotten, before the spiders return.
An effective way to get deep spidering is have powerful links permanently connected to many deep hub pages throughout the site. This produces a flow of spiders into the site and forces them to chew their way out, indexing pages as they go. These links must be powerful or the spiders will index a couple of pages and die. These links must be permanent because if they are removed the flow of spiders will stop and pages in the index will be forgotten.
The goal of the hub pages is to create spider webs through the site that allow spiders to index all of the pages on short link paths, rather than requiring the spiders to crawl through long tunnels of many consecutive links to get everything indexed.
Lots of people can build a big site, but only some of those people have the resources to get the powerful, permanent links that are required to get the pages indexed and keep them in the index. You can't rely on internal links alone for the powerful, permanent links because most spiders that enter any site come from external sources rather than spontaneously springing up deep in the bowels of your website.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should m-dot sites be indexed at all
I have a client with a site with a m-dot mobile version. They will move it to a responsive site sometime next year but in meanwhile I have a massive doubt. This m-dot site has some 30k indexed pages in Google. Each of this page is bidirectionally linked to the www. version (rel="alternate on the www, rel canonical on the m-dot) There is no noindex on the m-dot site, so I understand that Google might decide to index the m-dot pages regardless of the canonical to the www site. But my doubts stays: is it a bad thing that both the version are indexed? Is this having a negative impact on the crawling budget? Or risking some other bad consequence? and how is the mobile-first going to impact on this? Thanks
Intermediate & Advanced SEO | | newbiebird0 -
Duplicated Content with Index.php
Good Afternoon, My website uses Joomla CMS and has the htaccess rewrite code enabled to ensure the use of search engine friendly URLs (SEF's). While browsing the crawl diagnostics I have found that Moz considers the /index.php URL a duplicate to our root. I will always under the impression that the htaccess rewrite took care of that issue and obviously I would like to address it. I attempted to create a 301 redirect from the index.php URL to the root but ran into an issue when attempting to login to the admin portion of the website as the redirect sent me back to the homepage. I was curious if anyone had advice for handling the index.php duplication issue, specifically with Joomla. Additionally, I have confirmed that in Google Webmasters, under URL parameters, the index.php parameter is set as 'Representative URL'.
Intermediate & Advanced SEO | | BrandonEML0 -
Opinions on Boilerplate Content
Howdy, Ideally, uniqueness for every page's title, description, and content is desired. But when a site is very, very large, it becomes impossible. I don't believe our site can avoid boilerplate content for title tags or meta-descriptions. We will, however, markup the pages with proper microdata so Google can use this information as they please. What I am curious about is boilerplate content repeated throughout the site for the purpose of helping the user, as well as to tell Google what the page is about (rankings). For instance, this page and this page offer the same type of services, but in different areas. Both pages (and millions of others) offer the exact same paragraph on each page. The information is helpful to the user, but it's definitely duplicate content. All they've changed is the city name. I'm curious, what's making this obvious duplicate content issue okay? The additional unique content throughout (in the form of different businesses), the small, yet obvious differences in on-site content (title tags clearly represent different locations), or just the fact that the site is HUGELY authorative and gets away with it? I'm very curious to hear your opinions on this practice, potential ways to avoid it, and whether or not it's a passable practice for large, but new sites. Thanks!
Intermediate & Advanced SEO | | kirmeliux0 -
All Thin Content removed and duplicate content replaced. But still no success?
Good morning, Over the last three months i have gone about replacing and removing all the duplicate content (1000+ page) from our site top4office.co.uk. Now it been just under 2 months since we made all the changes and we still are not showing any improvements in the SERPS. Can anyone tell me why we aren't making any progress or spot something we are not doing correctly? Another problem is that although we have removed 3000+ pages using the removal tool searching site:top4office.co.uk still shows 2800 pages indexed (before there was 3500). Look forward to your responses!
Intermediate & Advanced SEO | | apogeecorp0 -
Technical Automated Content - Indexing & Value
One of my clients provides some Financial Analysis tools, which generate automated content on a daily basis for a set of financial derivatives. Basically they try to estimate through technical means weather a particular share price is going up or down, during the day as well as their support and resistance levels. These tools are fairly popular with the visitors, however I'm not sure on the 'quality' of the content from a Google Perspective. They keep an archive of these tools which tally up to nearly a 100 thousand pages, what bothers me particularly is that the content in between each of these varies only slightly. Textually there are maybe up to 10-20 different phrases which describe the move for the day, however the page structure is otherwise similar, except for the Values which are thought to be reached on a daily basis. They believe that it could be useful for users to be able to access back-dated information to be able to see what happened in the past. The main issue is however that there is currently no back-links at all to any of these pages and I assume Google could deem these to be 'shallow' provide little content which as time passes become irrelevant. And I'm not sure if this could cause a duplicate content issue; however they already add a Date in the Title Tags, and in the content to differentiate. I am not sure how I should handle these pages; is it possible to have Google prioritize the 'daily' published one. Say If I published one today; if I had to search "Derivative Analysis" I would see the one which is dated today rather then the 'list-view' or any other older analysis.
Intermediate & Advanced SEO | | jonmifsud0 -
Wordpress Duplicate Content
We have recently moved our company's blog to Wordpress on a subdomain (we utilize the Yoast SEO plugin). We are now experiencing an ever-growing volume of crawl errors (nearly 300 4xx now) for pages that do not exist to begin with. I believe it may have something to do with having the blog on a subdomain and/or our yoast seo plugin's indexation archives (author, category, etc) --- we currently have Subpages of archives and taxonomies, and category archives in use. I'm not as familiar with Wordpress and the Yoast SEO plugin as I am with other CMS' so any help in this matter would be greatly appreciated. I can PM further info if necessary. Thank you for the help in advance.
Intermediate & Advanced SEO | | BethA0 -
Duplicate Content Help
seomoz tool gives me back duplicate content on both these URL's http://www.mydomain.com/football-teams/ http://www.mydomain.com/football-teams/index.php I want to use http://www.mydomain.com/football-teams/ as this just look nice & clean. What would be best practice to fix this issue? Kind Regards Eddie
Intermediate & Advanced SEO | | Paul780 -
How to resolve Duplicate Page Content issue for root domain & index.html?
SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?
Intermediate & Advanced SEO | | ContentWriterMicky0