Are the CSV downloads malformatted, when a comma appears in a URL?
-
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.
The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.
Is there something simple I'm missing or is it something that can be fixed?
Looking forward to learn more about the various tools. Thanks for the help.
-
I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.
-
The first rule in this business is "You can't trust programmers"
I should know, I am a programmer and I used to manage teams of them.
You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.
They should know that URLs can contain commas, and they should quote them.
If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.
What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.
Unfortunately, even that is a problem.
The problem is there are other fields that may not be quoted, some of which can start with http://
There can also be line breaks in the title field, and possibly even in the link text field.
Quotes and other characters are escaped with double quotes.
Titles and link text can also contain commas, so it is very complex.
Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.
They did actually quote the first URL, if it contains commas.
They really should have quoted every field
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
/essions/essions keeps appending to 1 url on our website
Moz keeps giving us an error showing URL too long, when I investigate the offending url, I get this in the crawl. We can't work out what /essions is or why it's appending to the end of the url. Is this a Moz or website issue? <colgroup><col width="841"></colgroup>
Moz Pro | | NickWillWright
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |0 -
Youtube traffic page url referral
Hello, How can I see which videos from Youtube that has my domain inserted in their description url drive traffic to my domain? I can see in GA how many visitors are coming from Youtube to my domain, but I can't see what Youtube video pages has driven traffic. Any help?
Moz Pro | | xeonet320 -
Duplicate Page Content on pages that appear to be different?
Hi Everyone! My name's Ross, and I work at CHARGED.fm. I worked with Luke, who has asked quite a few questions here, but he has since moved on to a new adventure. So I am trying to step into his role. I am very much a beginner in SEO, so I'm trying to learn a lot of this on the fly, and bear with me if this is something simple. In our latest MOZ Crawl, over 28K high priority issues were detected, and they are all Duplicate Page Content issues. However, when looking at the issues laid out, the examples that it gives for "Duplicate URLs" under each individual issue appear to be completely different pages. They have different page titles, different descriptions, etc. Here's an example. For "LPGA Tickets", it is giving 19 Duplicate URLs. Here are a couple it lists when you expand those:
Moz Pro | | keL.A.xT.o
http://www.charged.fm/one-thousand-one-nights-tickets
http://www.charged.fm/trash-inferno-tickets
http://www.charged.fm/mylan-wtt-smash-hits-tickets
http://www.charged.fm/mickey-thomas-tickets Internally, one reason we thought this might be happening is that even though the pages themselves are different, the structure is completely similar, especially if there are no events listed or if there isn't any content in the News/About sections. We are going to try and noindex pages that don't have events/new content on them as a temporary fix, but is there possibly a different underlying issue somewhere that would cause all of these duplicate page content issues to begin appearing? Any help would be greatly appreciated!0 -
Which URL to use for my campaign?
My outdated CMS redirects from a clean url root domain to an ugly url. Should I use the ugly one to start a my campaign? So when you type in www.site.com it redirects to www.site.com/site/c.leJRIROrEpH/b.5699537/k.BEF4/Home.htm Should I use the url it redirects to when starting a campaign?
Moz Pro | | mstanwyck0 -
4XX (Client Error) Report - Referrer URL Feature Request
Why not allow me to click on the individual 404 errors in the 4XX (Client Error) report and allow me to actually see the where the broken links are coming from (as a hyperlinked url) so I can locate/click and fix them? Providing the referring url w/hyperlink will stop me from having to jump between the report and searching in site explorer, or downloading the complete error report and sorting.
Moz Pro | | WaterGuy0 -
Tool for tracking actions taken on problem urls
I am looking for tool suggestions that assist in keeping track of problem urls, the actions taken on urls, and help deal with tracking and testing a large number of errors gathered from many sources. So, what I want is to be able to export lists of url's and their problems from my current sets of tools (SEOmoz campaigns, Google WM, Bing WM,.Screaming Frog) and input them into a type of centralized DB that will allow me to see all of the actions that need to be taken on each url while at the same time removing duplicates as each tool finds a significant amount of the same issues. Example Case: SEOmoz and Google identify urls with duplicate title tags (example.com/url1 & example.com/url2) , while Screaming frog sees that example.com/url1 contains a link that is no longer valid (so terminates in a 404). When I import the three reports into the tool I would like to see that example.com/url1 has two issues pending, a duplicated title and a broken link, without duplicating the entry that both SEOmoz and Google found. I would also like to see historical information on the url, so if I have written redirects to it (to fix a previous problem), or if it used to be a broken page (i.e. 4XX or 5XX error) and is now fixed. Finally, I would like to not be bothered with the same issue twice. As Google is incredibly slow with updating their issues summary, I would like to not important duplicate issues (so the tool should recognize that the url is already in the DB and that it has been resolved). Bonus for any tool that uses Google and SEOmoz API to gather this info for me Bonus Bonus for any tool that is smart enough to check and mark as resolved issues as they come in (for instance, if a url has a 403 error it would check on import if it still resolved as a 403. If it did it would add it to the issue queue, if not it would be marked as fixed). Does anything like this exist? how do you deal with tracking and fixing thousands of urls and their problems and the duplicates created from using multiple tools. Thanks!
Moz Pro | | prima-2535090 -
Canonical link on canonical url
This might seem a bit of an odd one, but we seem to be going around in circles on this when using the on page optimizer tool. We have an ecommerce site (magento) which by default is putting a canonical link in the header on every product page. For example; www.example.com/product1.html has the But when we run the on page optimiser tool, we're losing points on the critical section for not having canonical set correctly. If we remove the tag, we get the tick and the a grade, but then further down the report we lose a tick for not using canonical links. What are we missing here?
Moz Pro | | andyjsi0 -
I'm not getting the csv emailed in OSE? Anyone else?
The new version of the tool does not allow me to download the data via csv immediately. It says it will email in 5-10 minutes and never comes? Anyone else having this problem? Cheers
Moz Pro | | josey0