#duraspace IRC Log

Index

IRC Log for 2010-11-17

Timestamps are in GMT/BST.

[1:24] * stuartlewis (~stuartlew@s-lewis.itss.auckland.ac.nz) Quit (Quit: stuartlewis)
[4:31] * ksclarke (~kevin@adsl-39-74-150.clt.bellsouth.net) Quit (Ping timeout: 264 seconds)
[4:49] * ksclarke (~kevin@adsl-39-74-150.clt.bellsouth.net) has joined #duraspace
[6:29] -card.freenode.net- *** Looking up your hostname...
[6:29] -card.freenode.net- *** Checking Ident
[6:29] -card.freenode.net- *** Your forward and reverse DNS do not match, ignoring hostname
[6:29] -card.freenode.net- *** No Ident response
[6:29] [frigg VERSION]
[6:29] * DuraLogBot (~PircBot@184.73.169.236) has joined #duraspace
[6:29] * Topic is '[Welcome to DuraSpace - This channel is logged - http://irclogs.duraspace.org/]'
[6:29] * Set by cwilper!ad579d86@gateway/web/freenode/ip.173.87.157.134 on Fri Oct 22 01:19:41 UTC 2010
[6:57] * ksclarke (~kevin@adsl-39-74-150.clt.bellsouth.net) Quit (Quit: Leaving.)
[7:03] * sbayliss (~IceChat7@188-222-88-173.zone13.bethere.co.uk) Quit (Quit: bye!)
[7:06] * Tonny_DK (~thl@130.226.36.117) has joined #duraspace
[8:38] * mdiggory (~mdiggory@ip72-199-217-116.sd.sd.cox.net) Quit (Quit: mdiggory)
[13:14] * grahamtriggs (~grahamtri@62.189.56.2) has joined #duraspace
[13:43] * bradmc (~bradmc@dhcp-18-111-21-94.dyn.mit.edu) has joined #duraspace
[14:30] * tdonohue (~tdonohue@c-98-228-50-55.hsd1.il.comcast.net) has joined #duraspace
[14:41] * ksclarke (~kevin@adsl-39-74-150.clt.bellsouth.net) has joined #duraspace
[15:11] * alxp (~alxp@PC044.ROBLIB.UPEI.CA) has joined #duraspace
[15:45] * mdiggory (~mdiggory@ip72-199-217-116.sd.sd.cox.net) has joined #duraspace
[15:47] * Tonny_DK (~thl@130.226.36.117) Quit (Quit: Leaving.)
[17:22] * bradmc_ (~bradmc@dhcp-18-111-21-94.dyn.mit.edu) has joined #duraspace
[17:22] * bradmc (~bradmc@dhcp-18-111-21-94.dyn.mit.edu) Quit (Read error: Connection reset by peer)
[17:22] * bradmc_ is now known as bradmc
[17:43] * grahamtriggs (~grahamtri@62.189.56.2) has left #duraspace
[19:14] * sandsfish (~sandsfish@dhcp-18-111-15-13.dyn.mit.edu) has joined #duraspace
[19:36] * grahamtriggs (~grahamtri@cpc2-stev6-2-0-cust333.9-2.cable.virginmedia.com) has joined #duraspace
[19:55] * cccharles (~chris@131.104.62.55) has joined #duraspace
[19:56] <tdonohue> DSpace Developer Meeting will be starting in 5 minutes. Today's agenda: https://wiki.duraspace.org/display/DSPACE/DevMtg+2010-11-17
[19:58] * mhwood (~mhwood@2001:18e8:3:171:218:8bff:fe2a:56a4) has joined #duraspace
[19:59] * keithg (~keith-noa@lib-kgilbertson.library.gatech.edu) has joined #duraspace
[20:01] * richardrodgers (~richardro@pool-96-237-109-32.bstnma.fios.verizon.net) has joined #duraspace
[20:01] <tdonohue> Hi all -- Time for the DSpace Devel meeting. Here's the agenda: https://wiki.duraspace.org/display/DSPACE/DevMtg+2010-11-17
[20:01] <tdonohue> Pretty open ended again today -- mostly I wanted to give us time to discuss 1.7, Testathon and any JIRA issues requiring immediate attention for 1.7.0
[20:02] <tdonohue> I listed 2 JIRA issues (in agenda) which had some recent back & forth, that didn't seem to come to a conclusion. Shall we start there? Or is there anything more pressing, PeterDietz?
[20:04] <tdonohue> ok -- we'll just get started with DS-640 then: https://jira.duraspace.org/browse/DS-640 It sounds like there was some disagreement around whether the /browse path should throw a 400, or just go to browse-by-title?
[20:04] <kshepherd> i think 640 just needs a vote, probably...
[20:04] <kshepherd> yeah, i think that's pretty much all that's left to decide
[20:05] <tdonohue> anyone feel strongly about 640, one way or the other?
[20:05] <richardrodgers> I agree we need a vote, but my concern is that we might not be fully informed about consequences.
[20:05] <grahamtriggs> add a method that returns the default index - returns the index configured in a 'browse.default.index' config value, or 'title' if none exists, or throws an exception if it can't even do that
[20:06] <tdonohue> richardrodgers: how so?
[20:06] * stuartlewis (~stuartlew@121.98.157.79) has joined #duraspace
[20:06] <richardrodgers> For instance, there is the question of crawlers - and their heuristic url behavior
[20:07] <mhwood> Both humans and robots should get the information they need in this case, to the extent that we can work out what they need. It could return 400 AND the browse-by-title content, if that makes sense.
[20:07] <sandsfish> hm, that's an interesting alternative.
[20:07] <mhwood> We also need to avoid being too clever in trying to guess what crawlers will do.
[20:08] <tdonohue> richardrodgers: I think we already did some basic analysis of that for Google in the last detailed discussion (see DS-640 comments). If you look at these google results, Google at least is treating URLs with different querystrings as *different* URLs: http://www.google.com/search?q=browse++Nuclear+site%3Ahttp%3A%2F%2Fdspace.mit.edu
[20:08] <sandsfish> generally, i imagine if a crawler is looking for something at a URL and it gets valid content back, it will think that is a valid URL. that's just my take.
[20:08] <grahamtriggs> and if a crawler or proxy takes that 400 to mean that nothing under /browse exists / works?
[20:08] <mhwood> I suppose there are no standards to point the way....
[20:09] <richardrodgers> yes, and that's the problem, because there is now an infinite URL space
[20:09] <tdonohue> grahamtriggs -- that is a good point, I guess. Don't know what the standards are for crawlers
[20:09] <richardrodgers> consisting of invalid browse parameters
[20:10] <kshepherd> it doesn't seem right to me that google would do that
[20:10] <mhwood> Can you point to some recommendation which says that Google would be wrong to do that?
[20:10] <kshepherd> that means nobody can run pages which require certain params
[20:11] <richardrodgers> kshepherd: perhaps, but right or not, we don't know
[20:11] <grahamtriggs> why are we only concerning ourselves with what Google does? there are other crawlers, and possibly more importantly, web proxies out there
[20:12] <mhwood> define google {Google and the other 10% of our crawler traffic}
[20:12] <richardrodgers> mhwood: my understanding was that Google Scholar stopped crawling a lot of DSpace sites because of that Cocoon bug that allowed non-404 responses for i=invalid URLS
[20:13] <PeterDietz> hi all. I didn't realize the time.. my first meeting since time change
[20:13] <grahamtriggs> what constitutes an invalid url? passing invalid parameters to a url doesn't [necessarily] make the url invalid
[20:13] <tdonohue> well, sounds like there's a lot of uncertainty. :)
[20:13] * alxp (~alxp@PC044.ROBLIB.UPEI.CA) Quit (Quit: alxp)
[20:14] <richardrodgers> grahamtriggs: again, I'm not taking a stand, just saying we need better intelligence to decide...
[20:15] <tdonohue> shall we go with the option of potential "least concern"? : (1) for invalid browse types, throw 404, (2) for missing browse type, just default to "browse by title"
[20:15] <grahamtriggs> richardrodgers: agreed, but it's still a valid question to have in mind if we are thinking about that
[20:16] * alxp (~alxp@PC044.ROBLIB.UPEI.CA) has joined #duraspace
[20:16] <richardrodgers> I can reach out to Anurag for at least Google Scholar opinion....
[20:17] <tdonohue> yea -- but, Google Scholar acts differently from regular google crawlers (from my understanding). So, even then, I'm not sure if it will make things clearer?
[20:17] <mhwood> tdonohue's suggestion does seem to sharply limit the potential for damage. When we genuinely don't know what to do, 404; when there is a sensible guess, carry it out.
[20:18] <sandsfish> i'd agree with that.
[20:18] <mhwood> Too bad there's no status code for "we had to guess what you wanted, and this may not be it."
[20:19] <tdonohue> my suggestion actually probably should be changed to "(2) for missing browse type, just default to *first* browse index -- so, by default this would be browse by *date*, actually"
[20:19] <richardrodgers> tdonohue: you may be right :(
[20:19] <stuartlewis> +1 for Tim's idea
[20:19] <sandsfish> +1
[20:20] <mhwood> Maybe do this now, and revisit later, attempting to generate some interest among crawler operators in a common recommendation?
[20:20] <keithg> +1, but curious about Google opinion anyway
[20:20] * alxp (~alxp@PC044.ROBLIB.UPEI.CA) Quit (Client Quit)
[20:20] <tdonohue> mhwood & keithg -- yea, I think it's still good to ask the crawlers what they expect. We could change this later as we learn more
[20:21] <richardrodgers> I certainly agree that the internal system error should go...
[20:21] <grahamtriggs> why does Scholar even deal with 'browse interfaces' anyway, and not sitemaps? I know it's nice for them to work with possibly a wider range of sources, but that shouldn't mean ignoring better options
[20:21] <mhwood> Indeed, if there is a declared sitemap, they should just use it and not crawl. "should"
[20:21] <tdonohue> Scholar doesn't use sitemaps (from what I've heard). They do use the Browse by Title, however. Again, this is just hearsay
[20:22] <tdonohue> whereas, normal google spiders *do* use the sitemap
[20:22] <grahamtriggs> actually, they say they want browse by date: http://scholar.google.com/intl/en/scholar/inclusion.html
[20:23] <tdonohue> (we can surely verify this with Anurag though -- he'd be able to tell us the differences between Google Scholar and normal google spiders)
[20:23] <stuartlewis> Jim Rutherford and I met with the scholar people a few years back - they didn't really want to tell us anything. So secretive.
[20:23] <mhwood> Time for Google corporate to ask Google Scholar leadership for the very good reason that they don't use the sitemap listing that corporate spent so much to set up.
[20:23] <tdonohue> grahamtriggs -- my mistake. I knew there was some browse by they used -- must be date :)
[20:24] <stuartlewis> Oh well - are we all happy with Tim's suggestion as where to go? I know both Pere and Kim have looked at this, and both had to stop as we kept changing our collective minds about what we wanted.
[20:25] * kshepherd wonders if repositories turn up in http://academic.research.microsoft.com, or if it's mostly journals
[20:25] <tdonohue> It sounds like we are +3 the idea to just do (1) invalid browse type = 404, and (2) missing browse type = first browse index (default=date)
[20:26] <tdonohue> anyone else have a strong opinion, or does this just pass as +3 for now? (and we revisit later after 1.7.0 after we learn more)
[20:26] <richardrodgers> I'd +1 also, we can revisit as we learn more.
[20:27] <mhwood> JIRA should be marked that we are not finished thinking even though there is a fix in place.
[20:28] <tdonohue> Ok -- sounds like the idea passes. Any volunteers to complete the fixes for DS-640?
[20:29] <tdonohue> mhwood -- I'd agree -- either that, or open a second issue and schedule immediately for 1.8.0 (which now exists in JIRA) so that we remember to revisit.
[20:30] <tdonohue> kshepherd? interested in finishing up DS-640 now that we finally know what we want? Or, stuartlewis, would Pere be interested you think?
[20:31] <stuartlewis> I think Pere is busy with end of term assignments now.
[20:32] <kshepherd> i can finish the patch as long as we're truly decided ;)
[20:32] <tdonohue> yes -- we are :) I'll open up a new issue to force us to revisit this idea with 1.8.0, and link it to DS-640
[20:33] <richardrodgers> kshepherd: you will be excused if we go after it again
[20:33] <tdonohue> Ok -- another issue requiring a decision. Should we enable the new Google Scholar <meta> tags by default in 1.7.0? https://jira.duraspace.org/browse/DS-396
[20:34] <stuartlewis> Is there any good reason not to?
[20:34] <tdonohue> I feel rather strongly that we probably *should* enable these by default, just like the DC <meta> tags are enabled by default, and have been since 1.5.x. But, I want to make sure others agree with me
[20:35] <PeterDietz> I think we were just being cautious with introducing an optional feature and not wanting to break anything. But it serves a valuable purpose to have this on by default for all instances
[20:35] <tdonohue> stuartlewis -- not to my knowledge. I just noticed that currently they are not enabled by default in 1.7.0 RC1
[20:35] <mhwood> Yes, default on.
[20:35] <richardrodgers> There is really only one consumer (Scholar), and they seem to want it, so it's their risk
[20:36] <stuartlewis> If no good reason not to, +1 for turning on by default
[20:36] <sandsfish> I think we should enable by default. We also have Scholar looking at what it's outputting, and other than a few default configuration tweaks, they seem to understand what they'll be getting.
[20:36] <tdonohue> sounds like we are in agreement then. sandsfish, do you want to do the honors?
[20:36] <sandsfish> Sure thing.
[20:37] <PeterDietz> I think the only problem with it is that scholar wants to do alot with the tags, but the dummy data doesn't have anything particularly interesting about it. They were interested in having publisher, patent, journal info, etc. come in through the citation tags
[20:37] <sandsfish> I'll flip the switch in dspace.cfg. tdonohue, also I'm fine with moving the mapping config file to crosswalks if there's general agreement on that. It's neither here nor there to me.
[20:38] <PeterDietz> So I would say its a useful crosswalk, and we should consider using the crosswalk interface to clean up the implementation, but its working
[20:38] <tdonohue> ok. time to open up the floor. PeterDietz or others -- are there any other JIRA issues (or testathon suggestions) that need immediate attention from us?
[20:38] <tdonohue> sandsfish -- yea, I think that config should be moved to /config/crosswalks , just to be consistent with where the DC <meta> config file is currently stored.
[20:39] <PeterDietz> I brought up one csv import/export getting broken from the major refactoring that I passed to stuart, but he's since fixed that
[20:40] <PeterDietz> a few issues raised have been from us having inconsistent settings on the testathon site. reinstall cleared out the embargo / discovery settings, so a few testers noticed that.
[20:40] <tdonohue> (whoops -- sorry, PeterDietz. probably was during my update of the server on Friday. my bad!)
[20:41] <PeterDietz> I'm not sure that the testing process has really garnered a thorough investigation of each feature, such that we're certain things aren't broken. The unit testing framework is a big help, so we have more confidence than just having our fingers crossed
[20:42] <tdonohue> PeterDietz & others: How do we all feel about this release so far? Do we want to do an RC2? Do we want to release early? We have some flexibility in the remainder of our schedule: https://wiki.duraspace.org/display/DSPACE/DSpace+Release+1.7.0+Notes
[20:42] <PeterDietz> ...but for the things that have been reported they seem to producing useful data that we can follow up on. but no blockers have come up... that haven't been resolved
[20:43] <PeterDietz> I'm all for refreshing the demo site after each batch of bugs gets fixed, so over the weekend I would be fine cutting rc2, so that people aren't hindered by known bugs that have already been resolved in trunk
[20:44] <mhwood> It also shows that testing gets results. That might encourage more testers.
[20:44] <sandsfish> I'm for an RC2
[20:45] <tdonohue> so, we don't have to follow this schedule exactly -- but, currently we had: Dec 3 = release RC2 , Dec 6-15 = More testing/bug fixes, Dec 17 = Final Release
[20:46] <tdonohue> but, PeterDietz -- you are welcome to schedule an RC2 early if you'd rather.
[20:47] <tdonohue> (just a reminder -- next week is Thanksgiving in the USA -- so, activity will likely be lower next week, at least from the USA)
[20:47] <richardrodgers> PeterDietz: how labor-intensive for you to cut an RC?
[20:47] <PeterDietz> I'm thinking: RC1 == two weeks ago, RC2 == this weekend, RC3 == three weeks (dec 3), 1.7.0 == Dec 17
[20:48] <tdonohue> sounds good to me. I'm perfectly OK with doing 3 release candidates
[20:48] <grahamtriggs> maybe we should separate RC2 from demo... we can easily update demo to the latest code without releasing RC2. Is it worth us actually pushing out another RC as a release as well?
[20:48] <PeterDietz> I haven't done a sonatype release yet, but the first time through was a learning, and broken process. But its just following a guide, and takes about an hour, then a lot of waiting.. Sonatype could mean less waiting.
[20:49] <mhwood> A bunch of little stuff probably doesn't need an RC, whereas just 2-3 substantial fixes or one showstopper should be enough.
[20:49] <PeterDietz> I'm not sure of the value of the RC, as I don't think people would be downloading a .zip of the RC2 code. I would almost like to do another RC cut, just so that the final 1.7.0 cut goes smoothly
[20:50] <PeterDietz> I would be fine running demo off of trunk updated weekly
[20:50] <tdonohue> PeterDietz -- yea, it'd probably be good for you to do at least one more RC release, just to get the Sonatype release process down (and well documented)
[20:51] <tdonohue> Also, FYI -- we actually have had some downloads of RC1 from sourceforge. Not a ton, but ~50 total: http://sourceforge.net/projects/dspace/files/
[20:52] <richardrodgers> Still, if even a fraction did a test install, that's good
[20:52] * HardyPottinger (80ce8627@gateway/web/freenode/ip.128.206.134.39) has joined #duraspace
[20:53] <tdonohue> So, PeterDietz -- do you want to update the release schedule on the 1.7.0 Release Notes? That way we all know the next steps
[20:54] <PeterDietz> In that case, I would like to make an effort to improve the getting started guides. DSpace certainly has a rough learning curve, especially if your expecting to open it in IDE, and press the green play button. So things like installing should be simple and follow standard conventions, or make sure we link to alternates, such as alt tomcat configs. Or how to move your code into its own source-control-repo
[20:56] <tdonohue> ok. I'm all for that, PeterDietz. We just need a volunteer or two to take a lead on the updates :) I know it's fallen by the wayside recently and hasn't been kept as well up-to-date.
[20:56] * cccharles (~chris@131.104.62.55) Quit (Remote host closed the connection)
[20:57] <PeterDietz> A developer for an institution at DSUG'10 last week was umm, how do I do things to dspace. An honest question, and then I talked about how we added the scholar code, and after I explained a much simpler task, how to display the community-list with a SQL upper(community_name) so that things are sorted case-insensitive, then it all began to make sense for him
[20:58] <PeterDietz> ok, I'll update the 1.7 release notes to have an updated timeline that follows what our next steps are.
[20:58] <kshepherd> i would call this "getting started with development" rather than just "getting started" ;) i'm pretty sure most installations have no customisations outside of CSS/XSL or some minor JSP edits
[20:58] <tdonohue> I think the idea is a great one. Are you looking for volunteers to help out? Is it just a matter of updating this page with improved/updated content: https://wiki.duraspace.org/display/DSPACE/Guide+to+Developing+with+DSpace
[20:59] <kshepherd> developer guides are awesome, but it would pay to make sure we don't scare people off by talking about IDEs and SQL if they just want to install dspace ;)
[20:59] <tdonohue> definitely a good point. There's (at least) two levels of "getting started" :)
[21:00] <mhwood> Yes. Developers shouldn't be scared so easily, and others would probably be helped more by making installation simpler.
[21:00] <kshepherd> the folks at sun.ac.za do quite a lot of wiki-writing, and i notice more and more people following their installer guides
[21:00] <kshepherd> i wonder if they'd be keen to join in a more centralised effort on the dspace wiki?
[21:01] <tdonohue> kshepherd -- I've actually asked them in the past, never really got a response. But, feel free to ask again :)
[21:02] <tdonohue> Ok -- we're running over time here. Anything else you want to make sure to cover today, PeterDietz?
[21:02] <PeterDietz> P.S. What is the git repository tab in Jira? Is that the fedora folks messing around?
[21:02] <tdonohue> PeterDietz -- yes. Fedora folks are playing with Git. They may start using it more heavily
[21:03] <tdonohue> PeterDietz -- if you have a DSpace Git repo, we can also hook that up to JIRA. I was thinking of playing around more with Git + DSpace myself, but likely won't have a chance till after 1.7.0
[21:03] <PeterDietz> I've switched my whole local setup to git. I'm not going back.
[21:04] <PeterDietz> I don't have any other things to add I guess, so thanks all for being able to ramp up your efforts so that we've been able to do 1.7, not "done", but almost
[21:05] <mhwood> Looks like most Eclipse users choolse EGit?
[21:06] <PeterDietz> Netbeans has something nbgit, which is only good to mark up your code as being green, red, and blue. But I think its still safest to do things from the command line. I don't trust gui's for this yet
[21:06] <tdonohue> I think most the Fedora team (at least those in DuraSpace) is using IDEA + Git. Though, I've never tried that
[21:07] <tdonohue> Ok -- sounds like we are wrapping things up here. Feel free to stick around to continue any discussions. But, the meeting itself is officially closed.
[21:08] <richardrodgers> Thanks everyone - bye
[21:08] <kshepherd> cheers all
[21:08] * richardrodgers (~richardro@pool-96-237-109-32.bstnma.fios.verizon.net) Quit (Quit: richardrodgers)
[21:08] <sandsfish> Bye everyone.
[21:08] * sandsfish (~sandsfish@dhcp-18-111-15-13.dyn.mit.edu) Quit (Quit: sandsfish)
[21:09] <mhwood> Thanks all.
[21:12] * HardyPottinger (80ce8627@gateway/web/freenode/ip.128.206.134.39) Quit (Quit: Page closed)
[21:12] <kshepherd> mhwood: were you wanting to do something with ds-720?
[21:12] <kshepherd> i should get onto looking at statistics.items.*
[21:13] <mhwood> No, I just noticed it was still "received" and opened it.
[21:14] <kshepherd> ah yes, i keep forgetting that part ;)
[21:27] * bradmc (~bradmc@dhcp-18-111-21-94.dyn.mit.edu) Quit (Quit: bradmc)
[21:29] * ttt (5229fd08@gateway/web/freenode/ip.82.41.253.8) has joined #duraspace
[21:30] * tdonohue (~tdonohue@c-98-228-50-55.hsd1.il.comcast.net) Quit (Read error: Connection reset by peer)
[21:48] * ttt (5229fd08@gateway/web/freenode/ip.82.41.253.8) Quit (Quit: Page closed)
[21:48] * keithg (~keith-noa@lib-kgilbertson.library.gatech.edu) has left #duraspace
[21:52] * stuartlewis (~stuartlew@121.98.157.79) Quit (Quit: stuartlewis)
[22:02] * ksclarke (~kevin@adsl-39-74-150.clt.bellsouth.net) Quit (Ping timeout: 265 seconds)
[22:05] * mhwood (~mhwood@2001:18e8:3:171:218:8bff:fe2a:56a4) has left #duraspace
[22:21] * ksclarke (~kevin@adsl-39-74-150.clt.bellsouth.net) has joined #duraspace
[22:54] * snail (~stuart@130.195.179.88) Quit (Read error: Connection reset by peer)
[23:42] * grahamtriggs (~grahamtri@cpc2-stev6-2-0-cust333.9-2.cable.virginmedia.com) Quit (Quit: grahamtriggs)
[23:43] * bradmc (~bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) has joined #duraspace

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.