#duraspace IRC Log

IRC Log for 2010-02-03

Timestamps are in GMT/BST.

[4:05] -hubbard.freenode.net- *** Looking up your hostname...
[4:05] -hubbard.freenode.net- *** Checking Ident
[4:05] -hubbard.freenode.net- *** Found your hostname
[4:05] -hubbard.freenode.net- *** No Ident response
[4:05] [frigg VERSION]
[4:05] * DuraLogBot (~PircBot@fedcommsrv1.nsdlib.org) has joined #duraspace
[4:05] * Topic is 'Welcome to DuraSpace - This channel is logged - http://duraspace.org/irclogs/'
[4:05] * Set by cwilper on Tue Jun 30 16:32:05 EDT 2009
[8:06] * mhwood (~mhwood@2001:18e8:3:171:218:8bff:fe2a:56a4) has joined #duraspace
[9:32] * tdonohue (~tdonohue@c-98-228-50-55.hsd1.il.comcast.net) has joined #duraspace
[11:59] * vhollister (~440e1e7e@gateway/web/freenode/x-jjgjdgbwdlwlglyr) has joined #duraspace
[14:08] * grahamtriggs (~grahamtri@cpc1-stev6-2-0-cust340.9-2.cable.virginmedia.com) has joined #duraspace
[14:35] * mdiggory (~mdiggory@64.50.88.162.ptr.us.xo.net) has joined #duraspace
[14:38] * stuartlewis (~stuartlew@gendiglt02.lbr.auckland.ac.nz) has joined #duraspace
[14:51] * keithg (~keith-noa@lib-kgilbertson.library.gatech.edu) has joined #duraspace
[14:56] * jtrimble (~jtrimble@maag127.maag.ysu.edu) has joined #duraspace
[14:56] * carynn (~80c8e115@gateway/web/freenode/x-ismqbhcjuupjsbfs) has joined #duraspace
[14:58] <tdonohue> hi all, we'll be starting our DSpace Dev Meeting here in a few minutes. I sent a general agenda out to dspace-devel listserv: https://sourceforge.net/mailarchive/forum.php?thread_name=4B69AE6B.9060407%40duraspace.org&forum_name=dspace-devel
[15:00] <tdonohue> Ok, looks like we're at top of the hour. stuartlewis: would you like to start us off with an update on the 1.6 release status?
[15:00] <stuartlewis> OK
[15:01] <stuartlewis> Remaining issues are here:
[15:01] <stuartlewis> http://jira.dspace.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=10042
[15:01] * richardrodgers (~richardro@pool-173-76-16-252.bstnma.fios.verizon.net) has joined #duraspace
[15:01] <stuartlewis> 5 of these 9 are documentation issues and are in the process of being taken care of.
[15:01] <mdiggory> link doesn't work
[15:01] <tdonohue> http://jira.dspace.org/jira/secure/IssueNavigator.jspa?reset=true&pid=10020&status=1&fixfor=10020
[15:01] <stuartlewis> (Ah - probably because it is a saved search I have in JIRA)
[15:02] <tdonohue> (I think that's the link Stuart meant)
[15:02] <stuartlewis> Can we have a quick update on the other 4 issues?
[15:02] <jtrimble> Yes, and after Tim and Stuart quit beating me with the rod, I will finished today.
[15:02] <jtrimble> JUST KIDDING
[15:02] <stuartlewis> :)
[15:02] <stuartlewis> First one: http://jira.dspace.org/jira/browse/DS-336
[15:02] <tdonohue> :) no worries jtrimble
[15:02] <stuartlewis> mdiggory: Is that just docs, so can be ignored for RC2?
[15:03] <mdiggory> yes, I think so
[15:04] <stuartlewis> OK - thanks
[15:04] <stuartlewis> What about http://jira.dspace.org/jira/browse/DS-247 Same for that?
[15:04] <jtrimble> Avada Kedavra to the jira docs.
[15:04] <mdiggory> Yes.
[15:05] <kshepherd> re DS-418: Went from being a big jspui i18n fixup to a tiny tiny patch to ItemTag. I want to commit it (in addition to Stuart's patch), but so far only tested by me and one other user. Claudia's going to test when she has time, but the sooner some folks can test and vote on that patch the better I guess.
[15:05] <mdiggory> I think maybe we should put somthing in place to review the release versions for services prior to a final release.
[15:06] <stuartlewis> OK - DS-418: next
[15:06] <stuartlewis> http://jira.dspace.org/jira/browse/DS-418
[15:06] <stuartlewis> Is everyone happy if Kim commits this patch?
[15:06] <kshepherd> only two remaining minor issues: i think session locale should revert to browser locale upon logout (rather than persisting eperson/UI locale), and browser caching is still a bit annoying when testing i18n changes :P
[15:06] <richardrodgers> +1 commit patch
[15:06] <tdonohue> +1 I'd say commit it, and we can have some concentrated testing after 1.6RC2
[15:06] <mhwood> +1
[15:07] <stuartlewis> 1) What is required to fix that? Invalidate the session again (so ignore my patch?) and 2) Anything easy that can be done?
[15:07] <stuartlewis> +1
[15:07] <kshepherd> stuartlewis: it's an issue that won't impact people 99% of the time, so i think leave for 1.7 and we'll probably be thinking hard about wider session invalidation issues before that
[15:08] <stuartlewis> Then finally we have http://jira.dspace.org/jira/browse/DS-440
[15:08] <stuartlewis> mdiggory: Can you give us an update about that one?
[15:10] <mdiggory> I think that we can take one of two directions with this
[15:10] * cccharles (~user@131.104.62.47) has joined #duraspace
[15:10] <mdiggory> 1.) improve the spider detection on a number of fronts before release.
[15:11] <mdiggory> 2.) accept the population tool that stuart wrote and push those improvements to happen after release
[15:11] <grahamtriggs> imho, reverting to browser locale on log out is irrelevant - the same person could well continue using the repository, and expect it to remain in the same language.
[15:12] <kshepherd> grahamtriggs: ok. that makes that easy then.
[15:12] <tdonohue> mdiggory: it's sounding like your #1 is a nice to have, but I worry how much it'd delay things?
[15:12] <kshepherd> i'm used to thinking about people in labs or on other public computers
[15:12] <stuartlewis> mdiggory: Are there any performance issues using the script I wrote with its 1/4 million or so IP addresses?
[15:12] <grahamtriggs> what would be more pressing is extending support to auto-login capabilities... ie. someone remaining 'logged in' beyond the lifetime of a single server session... where they would expect to keep the same locale
[15:14] * stuartlewis must confess to not testing the spiders.txt file. I test its population, but didn't see how it affected the solr stats.
[15:14] * tdonohue confesses to the same...I don't have enough data in my Solr to see if a 1.6MB spiders.txt file causes a big slowdown
[15:15] <stuartlewis> mdiggory: How is the spiders.txt file used?
[15:15] <stuartlewis> Is it used to stop spider entries being put INTO the solr indexes, or is it used to filter them when getting results OUT of the solr indexes?
[15:16] <mdiggory> sorry... need a minute...
[15:17] <kshepherd> yeh.. it's an interesting point -- my own stats system keeps all the crawls and just filters them out in the queries. initially i thought i might want some crawling stats, you see, but.. they've not been useful and just caused lots of table bloat basically
[15:18] <kshepherd> on the other hand, only stopping the entries being indexed makes it harder to retroactively remove a spider you've just discovered
[15:18] <mhwood> One obvious way to condense the file and probably speed matching would be to extend the format to take masked addresses: a.b.c.d/n or a.b.c.d/e.f.g.h
[15:18] <stuartlewis> I'm not too concerned which method it uses, just want to make sure there is no performance impact.
[15:18] <kshepherd> mhwood: yep
[15:19] <stuartlewis> mhwood: Doesn't need to be that complicated - I think just network classes would be enough (x.y / x.y.z) as that is how the iplists.com do it.
[15:19] <tdonohue> looking at code, it looks like SolrLogger.getIgnoreSpiders() creates a query using each line of spiders.txt....that's going to be a monster query if there's 129,000 IPs listed
[15:19] <stuartlewis> tdonohue: That was my worry.
[15:19] <tdonohue> http://scm.dspace.org/svn/repo/dspace/trunk/dspace-stats/src/main/java/org/dspace/statistics/SolrLogger.java
[15:20] <tdonohue> (see bottom of file, the last method: getIgnoreSpiders())
[15:20] <stuartlewis> Could a cron job be created to nightly remove spider entries from the index?
[15:20] <mdiggory> ok, sorry, I'm back
[15:20] <kshepherd> stuartlewis: there's a problem with the 'partial' addresses used in iplists
[15:20] <mhwood> The best-known spiders also advertise themselves in the agent string, and exclusion code often looks for those strings as one of several filters.
[15:20] <kshepherd> proper netblocks would be much better
[15:21] <kshepherd> otherwise a line like "64.68.91" will match IPs like "164.68.91.2"
[15:21] <kshepherd> well, that's easily fixed.. not a good example
[15:21] <mdiggory> there would be a twofold approach to managing this
[15:21] <kshepherd> but.. etc.
[15:21] <mdiggory> 1.) newly found bots that retrieved the robots.txt would be authomatically excluded
[15:22] <mdiggory> 2.) user added bots would be added to the exclusions list.
[15:22] <mdiggory> 3.) A prune script would be authord that removed bots from the Solr repo
[15:22] <tdonohue> ok, so based on that, the proper usage of spiders.txt is just for manual exclusions list?
[15:23] <stuartlewis> OK - so spiders.txt is for manually added IPs.
[15:23] <mdiggory> we would want multiple spider lists
[15:23] <stuartlewis> How easy is it to get 1) and 3) going in the next 48 hours?
[15:23] <mdiggory> one for manual additions, others for the download script
[15:24] <stuartlewis> I'm not convinced (1) will work. It may be that only 1 of googles thousands of crawlers will download robots.txt, and the rest will then obey it. That method would only stop the bot that requested robots.txt, and not all the others.
[15:24] <stuartlewis> Can't we use user-agent as a quick and dirty fix?
[15:24] <mdiggory> We beed to add Agent Header filtering
[15:25] <mdiggory> beed = need
[15:25] <stuartlewis> (the dspace.log -> solr convertor uses this, and instantly wipes out 70-80% of all item views)
[15:25] <mdiggory> not quicka nd dirty / we need multiple approaches
[15:25] <mdiggory> we need that code to be centralized and reusable in teh UI as well
[15:25] <mhwood> Heh, we need RFC3514 extended to also define the Bad Bot Bit.
[15:25] <stuartlewis> But I think due to time restraints we need to choose 1 that will give us 80% success, and then work on the others later?
[15:26] <tdonohue> could we get away with a "good enough" version....(1) user agent filtering, (2) manual spider exclusion list
[15:26] <mdiggory> I tend to agree
[15:26] <mhwood> The original bug was just to put *something* in the file when it ships.
[15:26] <mdiggory> mhwood: the problem is this
[15:27] <mdiggory> the size of the list is not scalable
[15:27] <mdiggory> it cannot be placed into the solr QueryFilter
[15:27] <mdiggory> as the QueryFilter has limitations on the length of the query
[15:27] <stuartlewis> So what can usefully be put in the file for RC2?
[15:28] <mdiggory> The real underlying change is that bot exclusion need to be proactive, not after the fact
[15:28] <richardrodgers> or do we just doc how to add to it maunally?
[15:29] <tdonohue> mdiggory: I'm still confused on what the code currenlty does. Does it ONLY exclude based on what is listed in spiders.txt, or does it do anything else?
[15:29] <mdiggory> I need to allocate some time here to solve it.
[15:29] <stuartlewis> mdiggory: Any chance of that happening before the end of the week, as we'd planned to release RC2 by Friday.
[15:29] <mdiggory> The code currently appends the spider bot list to the statistics graph generation query
[15:29] <mhwood> By "proactive", you mean keeping bot accesses out of the data, not straining them out of search results later?
[15:30] <mdiggory> the Solr repo holds all the spider bot reads as well as real user reads
[15:30] <mdiggory> mhwood: correct
[15:30] <tdonohue> mdiggory: thanks...i understand a bit better now how it's just using spiders.txt
[15:31] <mdiggory> We can break this up into smaller parts and put some in rc2 and the other in next iteration
[15:31] <stuartlewis> mdiggory: Hopefully there will be no more iterations before 1.6 final is released, so this is out last chance to try and do the best we can for 1.6.
[15:32] <mdiggory> understandable
[15:32] <tdonohue> mdiggory: any time to devote this week on your end?
[15:32] <tdonohue> I could help...I know Solr, but I need to dig into this specific code
[15:32] <stuartlewis> We're aiming for a mid-to-late Feb release, pending no nasties jumping out of RC2.
[15:33] <stuartlewis> Can we agree on an initial tactic (e.g. user agent filtering from a config file) and look at how we get that implemented quickly?
[15:33] <tdonohue> +1 I think user agent filtering would be nice to try to implement quickly...and leave anything else for after 1.6
[15:34] <stuartlewis> +1 user agent filtering, ditto rest for post 1.6
[15:34] <mdiggory> I think we can approach these projects individually as long as we agree on a couple important convergence points
[15:34] <mdiggory> 1.) User Agent Filtering that Stuart created in his log processor need to be callable from UI code
[15:35] <mdiggory> 2.) We need to rewrite the SpiderDectector to use multiple files (configure which is "manual")
[15:35] <mdiggory> 3.) Rewite the Spider list downloader to put each list ina separate file
[15:36] <mhwood> The downloader is the only piece which has to know which file(s) are "manual".
[15:36] <mdiggory> 4.) Adjust SolrLogger to not log events that come from bots to Solr
[15:36] <mdiggory> mhwood: true enough
[15:36] <stuartlewis> 1.) My code actually used reverse lookups on the domain names as dspace.log doesn't hold user-agent. This tactic would be too slow anyway for solr, so stick with user-agent.
[15:36] <mdiggory> 5.) The CLI app to prune the Solr repo
[15:37] <mdiggory> But we want to have a list of User Agents, same as spider IP's
[15:37] <stuartlewis> I'd be happy to release 1.6 with just (1), so I think that needs to be priority. Any volunteers to tackle it?
[15:37] <mdiggory> so we have something to compare against that can be updated manually of via download when neccessary
[15:38] <tdonohue> mdiggory: correct, we need a configurable list of User Agents...but that's not what stuart's code does
[15:38] <mdiggory> I'm willing to invest some time in the others prior to the rc
[15:38] <mdiggory> you want to cut the RC on friday?
[15:39] <mdiggory> stuartlewis: 1.) My code actually used reverse lookups on the domain names as dspace.log doesn't hold user-agent. This tactic would be too slow anyway for solr, so stick with user-agent.
[15:39] <stuartlewis> If possible, Friday would be good.
[15:39] <mdiggory> how do you get a User Agent header from a RDNS?
[15:39] <stuartlewis> You don't, but the hostname of the machine gives it away.
[15:40] <stuartlewis> if ((dns.endsWith(".googlebot.com.")) ||
[15:40] <stuartlewis> (dns.endsWith(".crawl.yahoo.net.")) ||
[15:40] <stuartlewis> (dns.endsWith(".search.msn.com.")))
[15:41] <mdiggory> I think that if you configure it tomcat is does this RDNS lookup
[15:41] <stuartlewis> So, any volunteers to look at excluding hits from entering solr based on a user-agent exclusion list?
[15:41] <stuartlewis> mdiggory: But since user-agent is passed anyway, why bother?
[15:42] <mdiggory> I said I would invest some time in it
[15:42] <mdiggory> stuartlewis: quite true ;-)
[15:42] <tdonohue> I can help look into it, if Mark needs help.
[15:42] <stuartlewis> Ok - thanks Mark & Tim.
[15:42] <tdonohue> Are we wanting something by Friday on the User Agent stuff?
[15:42] <stuartlewis> If possible?
[15:43] <mdiggory> I'll probably only have this evening / tomorrow evening to look at it.
[15:43] <richardrodgers> do we want to advertise RC2 for testing purposes?
[15:43] <stuartlewis> Ok - we'll re-evalaute the situation on Friday then.
[15:43] <mdiggory> Theres already UA detection code in both the JSPUI and XMLUI that Larry aded
[15:43] <stuartlewis> richardrodgers: Yes - we'll be doing that.
[15:43] <tdonohue> mdiggory: if you can give me some "hints" based on your analysis this evening/tomorrow, I might be able to help along quickly
[15:44] <mdiggory> I'll be online, lets bounce it around in the ticket.
[15:44] <tdonohue> richardrodgers: yes, we'll need to advertise RC2. But, I think as we still don't have a "set in stone" date, we're forced to wait to send out a message
[15:44] <tdonohue> mdiggory: sounds good...i'll be online as well
[15:45] <stuartlewis> One final patch I'd like to put in 1.6: I'm worried that the session invalidation code we put in to tighten up security is possibly going to break some authN methods such as shibboleth. Would anyone have a problem if I therefore made it dspace.cfg option, defaulting to invalidate, but with the option to turn that off?
[15:46] <stuartlewis> Anyone unhappy with me doing that?
[15:46] <tdonohue> stuartlewis: not quite sure...don't understand full context -- but seems like reasonable logic
[15:46] <richardrodgers> no - could always be removed if unneeded
[15:46] <stuartlewis> Ok - thanks.
[15:46] <mhwood> Sounds okay. We can take it out later if not needed.
[15:46] <stuartlewis> I think that is the 1.6 update completed. Thanks.
[15:47] <mdiggory> stuartlewis: I agree
[15:47] <tdonohue> ok. sounds good :) Next agenda item was Google Summer of Code.
[15:47] <tdonohue> I'm assuming we all want to do this again this year?
[15:47] <mdiggory> I'm worried about the session recreation stuff.
[15:47] <tdonohue> http://groups.google.com/group/google-summer-of-code-discuss
[15:48] <mdiggory> I was trying to find time to start planning, I'm excited to see you beating me to it
[15:48] <tdonohue> we still have a while to decide...but we have March 8-12th is application procedure for organizations
[15:48] <stuartlewis> Yes - we should do GSoC again - but make sure we run it tied in to a 1.7 (or other) release so that we actually see the benefit of some of the work in a release. I think I'm right in saying that to date, over the past 4 years, no GSoC code has ever made it into a release.
[15:48] <mdiggory> my recommendation start working on this now. Last year we were behind on everything
[15:48] <kshepherd> i don't mean to sound cynical, but how much GSoC work actually gets used by the dspace community?
[15:49] <tdonohue> kshepherd: I think stuartlewis is right...none of it has been used...but I think the results have sometimes been informative
[15:49] <mdiggory> I use it and if we can get tot he point where we are properly modularizing, they would be better taken advantage of
[15:49] <tdonohue> (and it's unfortunate that none has ever been used)
[15:49] <kshepherd> ok
[15:49] <stuartlewis> I think we need to define our scope for 1.7, and assign GSoC projects base don that. Not *all* need to do that - some can be speculative work, but it would be nice to see some payback from GSoC into DSpace for the effort the mentors put in.
[15:50] <mdiggory> its not that none has ever been used, its that the expectaion in the "maintence" group here is that everything that is contributed to dspace should be shoe horned intot he codebase under trunk or it has no value
[15:50] <mdiggory> that is a wrong perspective
[15:50] <mhwood> Agree, but we haven't had anywhere else to put it.
[15:50] <kshepherd> hm
[15:50] <tdonohue> mdiggory: i agree...that's why I said the work has been "informative"...it's not worthless by any means
[15:51] <mdiggory> we have a perfectly good "modules" and "sandbox" for the projects
[15:51] <mdiggory> The problem with whats happened last year and in the past is one of "follow through"
[15:52] <mhwood> Good for developing, but where is the "1.6 tested add-ons" ?
[15:52] <mdiggory> its actually the mentors faults that the code did not continue on and get into the trunk or modules where appropriate (myself included)
[15:52] <mdiggory> thats why its critical that the code go into the SCM
[15:53] <tdonohue> ok. well it sounds like we all agree that GSOC would be good. we just need to organize/scope better this year?
[15:53] <stuartlewis> tdonohue: Yes - agreed.
[15:53] <mdiggory> yes, and I'd like to suggest that with now being Duraspace, we might consider inviting the Fedora folks to participate
[15:54] <mdiggory> and open up the scope a bit
[15:54] <stuartlewis> +1 if they want to join the party.
[15:54] <richardrodgers> sure, its really a matter of interested mentors from their side
[15:55] <stuartlewis> tdonohue: Are you the best person to be org admin this year as you are in the middle of it all, or would prefer to find a volunteer?
[15:55] <tdonohue> mdiggory: good thought...I can ask that of the fedora folks...but we do need their involvement
[15:55] <mdiggory> we couldn't include them with out it ;-)
[15:56] <tdonohue> I'd prefer a volunteer on GSOC (if anyone is willing), though I'm here for support in helping branch with Fedora (if that works out)
[15:56] <tdonohue> As I'll admit I've *never* been a mentor, it may not be best for me to lead it
[15:56] <mdiggory> What we need is an Admin
[15:57] <stuartlewis> vhollister: Any ideas of a suitable candidate?
[15:57] <tdonohue> she's offline I believe...she had to head out
[15:57] <tdonohue> what do you need out of an Admin, maybe I don't quite understand?
[15:57] <mdiggory> Admin is responsible for organizing the schedule and meeting activities, also with mediation and getting the invoices/travel recipts for GSoC Mentors summit to Michele etc
[15:58] <tdonohue> Ok. makes sense then (this is all new to me) :)
[15:58] <mdiggory> Mentors last year we not proactively involved with the GSoC Mentors community, unless we can aleviate that, then the work falls ont eh Admins shoulder
[15:59] <mdiggory> we could make more "roles" and distribute that workload
[15:59] <tdonohue> I'd be willing to share in those Admin functions. But, as I don't have the background as some of you, you may have better ways to keep GSOC moving (and not stagnating)
[16:00] <tdonohue> But, I'm willing to try and pester and keep things moving along :)
[16:00] <tdonohue> So. If anyone else is interested in helping with some of the organizational aspects, I'd appreciate it. We'll leave it at that for now. We still have some time before we even apply for GSOC
[16:01] <tdonohue> Final topic: if you read my email , you saw my note about a "call for proposals" around improving our release process/timeline, etc.
[16:01] <mdiggory> TBH, I've been doing most of this work since Rob left, and based on what I've seen at the Mentors summit, I think I need a break, someone with time to invest into talking a lot with students and mentors would be appropriate. They need to be keeping the gears properly greased"
[16:02] <tdonohue> I'm not wanting to "open up a can of worms" right now, but I just wanted to get everyone thinking about it
[16:02] <mdiggory> I didn't see it yet
[16:02] * jtrimble (~jtrimble@maag127.maag.ysu.edu) Quit (Quit: Leaving)
[16:02] <tdonohue> mdiggory: thanks for letting me know
[16:02] <tdonohue> https://sourceforge.net/mailarchive/forum.php?thread_name=4B69AE6B.9060407%40duraspace.org&forum_name=dspace-devel
[16:02] <tdonohue> see #4 in this list
[16:03] <tdonohue> essentially, I've created a new Wiki page for any proposals anyone may want to bring up (so we can begin bringing ideas more out in the open, etc): http://wiki.dspace.org/index.php/Proposals
[16:03] <mdiggory> shorter cycles, more releases. +1
[16:04] <tdonohue> The first type of thing I'd really love feedback on is our Release Process, the general release procedures, cycle length, etc. So, if anyone has any proposals, I'd love it if you wrote something up or just send me an email
[16:04] <mdiggory> asynchronous releases, more modules with separate release cycles
[16:04] <mhwood> The way to include input from repo managers, I think, is to sit in on their discussions and have them sit in on ours.
[16:04] * stuartlewis isn't sure the community is ready for asynchronous releases.
[16:04] <mdiggory> its unclear to me how to sit in on "their" discussions
[16:04] <tdonohue> You'll also notice that Val & I wrote up an initial proposal on including repo managers: http://wiki.dspace.org/index.php/ReleaseAdvisoryTeamProposal
[16:05] <tdonohue> This is just a brainstorm by Val & I, and we'd love feedback, response, likes/dislikes, etc.
[16:05] <mhwood> Yeah, as yet there's no "their" there. When there's a forum, we need to follow it.
[16:05] <mdiggory> my concern is that too much structure in the process will reduce forward inovation
[16:06] <mhwood> Too much structure and we bog down. Not enough and we quickly build things no one uses.
[16:06] <tdonohue> So, in general, I guess, if there are any ideas that have been in the back of your mind that you'd like to voice or suggestion -- now is a great time to write up a small bulleted list, or paragraph, etc.
[16:06] <grahamtriggs> stuartlewis: I tend to agree - it's a nice bit of idealism in gaining development momentum / involving more developers, but big question mark over if people maintaining an installation wants lots of little things going on, or synchronized releases
[16:07] <mdiggory> That is why modularizing development is also important. If something is built that no one uses, then it just doesn't get used.
[16:08] <grahamtriggs> I'll also note that quite a few modular environments only really got their act together through synchronizing release schedules - most notably Eclipse
[16:08] <mhwood> Better to listen to what people say they would like to use, and not build stuff no one wants.
[16:08] <mdiggory> being modular, if it doesn't get used, it isn't in the way of those modules that are used
[16:08] <tdonohue> Ok...I really did mean to leave it at that. I didn't want to start up too much discussion/argument right now (though if you have time, feel free). I just wanted to get everyone thinking about things :)
[16:08] <mdiggory> mhwood: people will build what they want
[16:08] <mdiggory> we do, our clients do, you do
[16:10] <tdonohue> So, with that...I'd like to suggest we close the meeting. But again, I encourage you all to add proposals, comment on the ones that are there, etc.
[16:11] <mhwood> OK, seconded.
[16:11] <tdonohue> (and let me know if anyone has questions)
[16:11] <tdonohue> ok...meeting closed :)
[16:11] <richardrodgers> thanks all, bye
[16:11] * richardrodgers (~richardro@pool-173-76-16-252.bstnma.fios.verizon.net) Quit (Quit: richardrodgers)
[16:11] <kshepherd> cheers folks
[16:13] <mdiggory> grahamtriggs: I'm not sure I would say Eclipse "got its act together" ;-)
[16:14] <grahamtriggs> mdiggory: neither would I really, but it's definitely a lot closer to staging a performance with it's annual releases
[16:15] <mhwood> There is, anyway, something to be said for gathering up all the modules and putting them through a release cycle together.
[16:20] * carynn (~80c8e115@gateway/web/freenode/x-ismqbhcjuupjsbfs) Quit (Quit: Page closed)
[16:27] <mdiggory> I never argued against that.
[16:28] * PeterDietz (~PeterDiet@ACK5859s3.lib.ohio-state.edu) has joined #duraspace
[16:28] <mhwood> True. There's also something to be said for releasing the modules as they become ready, for those who want to use them before they've gone through system test.
[16:33] <kshepherd> :q
[16:33] <kshepherd> err
[16:33] <grahamtriggs> There are lots of things that could be said to be useful for different reasons. But if you asked me to guess the community's view, I would put them in the "so what modules / versions work together then?" camp
[16:35] * keithg (~keith-noa@lib-kgilbertson.library.gatech.edu) has left #duraspace
[16:36] <mhwood> Some will want a nice virtually-boxed version that is all tested together. Some will want a particular module as quickly as it can be stabilized and be willing to work with the maintainers on any rough edges.
[16:38] <mdiggory> Yes, we provide both types of support to our clients.
[16:39] <mhwood> So I think this argues for a hybrid strategy: modules release when they are believed ready, but from time to time we put the whole package through a release process as a unit.
[16:51] <tdonohue> a slightly different twist is the Ubuntu release schedule...every few releases is "LTS" Long Term Support, highly tested/stable release....the others still work, but are more for people who want the latest features before they make it into the next LTS release. (not the same as completely modular releases, but a similar concept which allows "early adopters" to get to features before they may be considered fully tested)
[16:53] <mhwood> The thing is: a Linux distro is 99% selected releases of components from elsewhere. That may not be the best model for DSpace, which is mostly itself.
[16:56] <mdiggory> All the work we currently try to contribute back to DSpace is assessed for modularity and placed into the codebase accordingly
[16:57] <mdiggory> dspace-services is a module, work on improving it should be done under the release cycle for it. Which may or may not align with dspace trunk
[16:58] <mdiggory> That is why I brought it up in the meeting. However, I answered my own question
[16:58] <tdonohue> mhwood: not saying it is...just pointing out a different way of "packaging up" specific versions of modules and saying: "these seem semi-stable together -- so early adopters may be interested in these latest versions", and then later "repackaging" and going through the full release process
[16:58] <mdiggory> becuase I release dspace-services 2.0.0 prior to our cutting the first release candidate for 1.6.0
[16:59] <mdiggory> since there are no "bug fixes" for it, we are still targenting that version for the 1.6.0 release
[17:00] <mdiggory> if there were work done in the next year to address bugs or new features in dspace-services, and we release those in interim or official release builds, your repository is more than welcome to experiment with utilizing them by bumping up the version number
[17:01] <mdiggory> without having to fester over merging source code and recompiling something you may not even comprehend
[17:01] <tdonohue> mdiggory: but the key to that is 'bumping up the version number'. 99% won't want to do that cause they may not be as "supported" as out-of-the-box
[17:02] <mdiggory> sorry, my experience is that clients want things "fixed, working and supported" by us, not necessarily the community at large... thus the stability issue is a localized phenomena.
[17:03] <mdiggory> extrapolating that back to my experiences at MIT, and it still holds true.
[17:03] <tdonohue> yea...but even there you were at one of the rare institutions that had a committer on staff :)
[17:04] <tdonohue> of the 700 institutions using DSpace, only a handful have committers :)
[17:04] <tdonohue> (and you'd probably be surprised that not much more seem to have even dedicated programmers for their IR -- or at least, not that I've heard...that's more hersay)
[17:05] <kshepherd> tdonohue: that's certainly true for .nz at least
[17:05] <mdiggory> If you need to customize DSpace, you have three choices.
[17:05] <mdiggory> 1.) hire a company, 2.) hire a person, 3.) do it yourself
[17:07] <mdiggory> All I am saying is that the local work always comes first, unless there is some mandate to have it returned to the community
[17:08] <mdiggory> this happens because of funding.
[17:08] <mdiggory> for development
[17:09] <tdonohue> yea..I understand your points. My counterpoint is that, although modular releases are nice for us (us being the committers), the majority of the DSpace institutions probably won't care and won't really use them. This isn't to say I'd be against module releases...I just don't think that's a big selling point for most of our DSpace community
[17:09] <PeterDietz> I wonder if the crowd sourcing translations idea might be suitable for a GSoC project
[17:10] <kshepherd> PeterDietz: nice idea
[17:10] <mdiggory> It will be when the modules provide features not seen in the trunk
[17:10] <mhwood> It goes beyond "won't care" and all the way to "will it break if I drop out-of-sync bits into it?"
[17:10] <mdiggory> such as facetted search and discovery.
[17:10] <mdiggory> or usage statistics
[17:10] <mdiggory> or CAS Authentication,
[17:10] <mdiggory> or Authority Control Plugins
[17:11] <mdiggory> or SRW?U
[17:11] <tdonohue> mdiggory: even with nice cool features like that, most the community won't have the staff or funding to be an "early adopter" -- they'll likely wait till it's fully supported in the next release
[17:11] <tdonohue> I think mhwood is also right that people fear "falling out of sync"
[17:11] <mhwood> tdonohue, some features may *never* be bundled in the next release.
[17:12] <PeterDietz> I'm also wondering about using some tool like Google Moderator to capture IR manager's / developer's / user's ideas and then to have them vote them up and down
[17:12] <tdonohue> PeterDietz: I actually had never seen that (google moderator)...interesting
[17:12] <kshepherd> PeterDietz: mm.. the lack of 'vote against' is the only thing stopping JIRA from serving that purpose, imho
[17:13] <mhwood> Some modules will be core DSpace; some will be add-ons forever. Some may be adopted into the core eventually.
[17:13] <mdiggory> I think we have clearly shown that if you release a module for DSpace and support it properly, there will be a demand for it.
[17:14] <mdiggory> Actually, my objective is always that the core be "reduced" mhwood, that there is difference between "supported" and "packaged" into the release
[17:16] <grahamtriggs> mhwood: Linux distro isn't the greatest example of modular releases. Anything that needs to be loaded by the kernel results in lots of tight coupling and upgrade issues on minor revisions (ie. VM software)
[17:19] <mhwood> I think we need a little more discussion of how we think modularity will be used. But for now I'm expected at home. Sorry to always be rushing off....
[17:21] * mhwood (~mhwood@2001:18e8:3:171:218:8bff:fe2a:56a4) has left #duraspace
[17:23] <tdonohue> i'll admit, I'm being a bit of the "devil's advocate" here. I don't disagree about modularity and addons. I'm just trying to make sure we are keeping in mind that we (the committers/developers) may want ansynchronous module releases more than others in the community (at least until it's very easy to add/swap modules in an Admin UI, similar to WordPress, etc.)
[17:27] <grahamtriggs> mdiggory: I see that as limited perspective though. Yes, people will desire interesting new functionality. And so you start with one, and can say - well, it works with these versions of DSpace.
[17:28] <grahamtriggs> But then you have more and more modules, and then you aren't saying I want module X... you are saying I want module A,C,E,F,G... and they all have to work together in the same system
[17:29] <grahamtriggs> if the releases are falling out of sync with each other, the likelihood of you building a stable system that contains all the modules that you want decreases
[17:30] <mdiggory> I don't disagree. But that is not what we have at this time, if A,B are hardcoded in the core and C is incompatible B then you cannot have A,C,E,F,G
[17:31] <grahamtriggs> that's what happened with Eclipse... at one point, it was virtually impossible to build even a barely working IDE with the features you wanted, due to version incompatibilities across the modules
[17:31] <grahamtriggs> I'm still not a fan of Eclipse, but at least you stand half a chance of having an IDE that will actually start with the annual releases
[17:32] * tdonohue thinks that's exactly why he gave up on Eclipse and switched to NetBeans
[17:34] <tdonohue> I agree with modularizing, and not hardcoding as much into the "core". But, I think we still need to be able to provide a stable, "out-of-the-box" application (which could just be a well-tested set of modules) that we can concentrate support on
[17:45] * vhollister (~440e1e7e@gateway/web/freenode/x-jjgjdgbwdlwlglyr) Quit (Quit: Page closed)
[18:08] * tdonohue (~tdonohue@c-98-228-50-55.hsd1.il.comcast.net) Quit (Quit: Leaving.)
[19:16] * stuartlewis (~stuartlew@gendiglt02.lbr.auckland.ac.nz) Quit (Quit: stuartlewis)
[20:05] * mdiggory (~mdiggory@64.50.88.162.ptr.us.xo.net) has left #duraspace
[21:49] * mdiggory (~mdiggory@cpe-66-74-212-9.san.res.rr.com) has joined #duraspace
[22:13] * mdiggory (~mdiggory@cpe-66-74-212-9.san.res.rr.com) Quit (Quit: mdiggory)
[22:25] * mdiggory (~mdiggory@cpe-66-74-212-9.san.res.rr.com) has joined #duraspace
[23:00] * mdiggory (~mdiggory@cpe-66-74-212-9.san.res.rr.com) Quit (Quit: mdiggory)
[23:18] * mdiggory (~mdiggory@cpe-66-74-212-9.san.res.rr.com) has joined #duraspace

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.