Log in

No account? Create an account
Vexen Crabtree 2015


Vexen Crabtree's Live Journal

Sociology, Theology, Anti-Religion and Exploration: Forcing Humanity Forwards

Previous Entry Share Flag Next Entry
Vexen Crabtree 2015

Counting URLs (SQL, in Access environment)

I'm trying to do something quite simple in SQL but can't get it right. I suddenly found myself creating 3 queries just to do this one thing, and I'm sure I should be able to do it in one query. I have a list of URLs in a table (T_URLs including the field URL_URL), and I simply want to return a list of URLs, ordered by domain, with a count of how many URLs are in each domain.


1. Extract the Domain part of the URL. For example the domain of http://www.vexen.co.uk/countries/best.html is http://www.vexen.co.uk/:
2. I need to Group By and Order By this URL_Domain field, which is where my problems start. So I need extra SQL like:
3. I then want to count how many URLs are in that domain. The result should be a list of URLs, with an extra "Domain_Q" field that counts how many other URLs are also in that domain:

In total:
SELECT Mid(URL_URL,1,InStr(8,URL_URL,"/")) AS Domain_URL, Count(URL_URL) AS Domain_Q



This will later let me iterate through every URL that exists in a domain which has more than 10 URLs in it.

But the SQL so far returns a (correct) list of URL_URLs and Domain_URLs, but, the Domain_Q says "1" for every line in the results.

It works if I put the Count_Q in a different SQL query, but then I end up creating a circular reference when I try to bring the Domain_Q back into the URL list. Can anyone work out (a) what I'm talking about, and (b) what nifty little trick I'm missing to get this done sensibly?

  • 1
At a quick glance, have you tried removing URL_URL from the GROUP BY?

That would be the sensible approach, I agree, but, it gives an error if I do that:

"You tried to execute a query that does not include the specificed expression 'URL_URL' as part of an aggregate function."

Which is an age-old error message that has demonized me for years, and I've never understood, but, the way to get rid of it is to include the offending field in the Group By list.

Shouldn't you be counting Domain_URL 's , and not URL_URL 's? That line creates a URL entry for a domain, and I'm guessing if the same old domain doesn't come up there'll be no new entry in the database table?

Also a sensible question; the problem is that there is no field called Domain_URLs so it causes an error, it is worked-out from URL_URL. This is why I've had to put that part in a seperate query - although it still seems like there should be a way round that, I can't find it!

Hmm, and if you ORDER BY URL_Domain. Or cheat and add max(URL_URL) or something. I really should sit down and think about this.

Well don't worry, I've used a 3-query solution so I'm just going to forget about it. One query does a SELECT URL_URL, URL_Domain, another does "COUNT(URL_Domain) GROUP BY URL_Domain) and then a third brings them together in a LEFT JOIN.

Max(URL_URL) would (if it works for non-numeric) give the last URL_URL from an alphabetical list (i.e., MAX('A','B','C','D','E','F') would be 'F' so it wouldn't work as a count.

I prefer to have things in as few quesries as possible (my db, Ziggy, now has 50), but it is always a trade-off between time-taken (too much, on this issue!) and efficiency.

Thanks for the help though!

  • 1