Elvish Wanderer (aelana) wrote in lj_nifty,
Elvish Wanderer
aelana
lj_nifty

Search your LJ without killing the server! (and for more than 50 articles)

Hopefully this is ok, given that it is a standard search engine spider that runs once a week, I don't see why it would be a problem, but since I know the great LJ folks are out there reading LJ_NIFTY we will know if it is a problem soon enough ;)

so onward
What it is?
There is a free service out there called freefind that allows you to create search pages for your sites for free. It adds adds to the results page but you can completely template the results page so it looks like the rest of your journal and it works well and (hopefully) without much load on the system.

Where is it?
http://www.freefind.com/

OK... how do I set it up so that it works and doesn't index all of my pages five hundred times?

How I set it up (which works nicely since you can even search on most of the comments as well) is as follows:

Create a calendar LJ style in raw mode and include only the following:

CALENDAR_YEAR_DISPLAYED=>

CALENDAR_YEAR_LINK=><A HREF="http://www.livejournal.com/customview.cgi?username=username&styleid=styleid&year=%%yyyy%%">%%yyyy%%</A><BR>

CALENDAR_DAY=>

CALENDAR_DAY_NOEVENT=>

CALENDAR_YEAR_LINKS=>%%years%%

CALENDAR_EMPTY_DAYS=>

CALENDAR_PAGE<=
<HTML>
<HEAD>
<TITLE>%%name%%%%name-'s%% Posts</TITLE>
</HEAD>
<body>
%%yearlinks%%
%%months%%
</body>
</html>
<=CALENDAR_PAGE

CALENDAR_MONTH=><A HREF="%%urlmonthview%%">%%monlong%%, %%yyyy%%</A>

CALENDAR_WEBSITE=>

CALENDAR_NEW_YEAR=>

CALENDAR_SORT_MODE=>forward

CALENDAR_DAY_EVENT=>

CALENDAR_WEEK=>


Of course replace username with your username and styleid with the styleid of the new style.. Jot down the style id.
Set the website as: http://www.livejournal.com/customview.cgi?user=username&styleid=styleid&year=2001
(doesn't matter what year... and of course replace username with your username and styleid with the styleid - requires a paid account to work)

Then go into the build index tab on free find and then into exclude pages.... and specify the following:
http://www.livejournal.com/*
http://www.livejournal.com/talkread.bml?journal=username* index=yes follow=yes
http://www.livejournal.com/view/?type=month&user=username* index=no follow=yes
http://www.livejournal.com/customview.cgi?user=username* index=no follow=yes
http://www.livejournal.com/customview.cgi?username=username* index=no follow=yes
http://www.livejournal.com/talkread.bml?journal=username*thread=* index=no follow=no

(note: make sure to keep all the astericks in place.... they are neccessary)
Then go into the build index tab on free find and then into set starting points and add
http://www.livejournal.com/

(no this is not telling it to index all of lj... see above exclude pages)

Then go into the build index tab on free find and select schedule-reindexing and choose
one of the "Every weekday" options... please don't index it more than once a week .... my guess is a lot of people are going to want to do it and if we all index once a day, even though it is a "nice" spider, brad will want it squashed :)

THen go into the build index tab on free find and select set indexing speed and choose
Slow (see note above on re-indexing)

Then you can go into customization and create a template for the results page and go into HTML and get the form you will want to add into your style/bio/website whatever.


Ok... what is it doing, and what is that style anyway?
(technical details behind a lj-cut)


Style 182619 is a calendar style that only shows the month link (to the detailed month view where all the posts in a month are linked to) and the year links.... The exclude list then tells freefind to follow all of these links and the links on those pages that match the follow=yes properties above (most noteably the talkread.bml links) and only index the talkread.bml links. The last exclude where it excludes the thread ones is to prevent it from indexing the same comments more than once for each post - it gets v. messy if you don't put them in.


What problems are there with it?
It only indexes public posts.
Some comments won't be indexed because of the collapsing comments feature in LJ... I don't know how to get around this w/o allowing freefind to index threads which would index comments many many times eating up your free-find limit... eg if a comment thread was as such
Comment
- Response
-- Response to response
The main comment would be indexed once, the response twice, and the response to the response three times, etc.
There are limits to the size of the website that freefind will index for free, look at the site for details.
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 4 comments