#duraspace IRC Log

Index

IRC Log for 2009-07-15

Timestamps are in GMT/BST.

[0:21] * grahamtriggs (n=grahamtr@cpc3-stev1-0-0-cust857.lutn.cable.ntl.com) has joined #duraspace
[0:22] * grahamtriggs (n=grahamtr@cpc3-stev1-0-0-cust857.lutn.cable.ntl.com) Quit (Client Quit)
[0:40] * mdiggory (n=mdiggory@cpe-76-176-188-83.san.res.rr.com) Quit ()
[2:40] * mdiggory (n=mdiggory@cpe-76-176-188-83.san.res.rr.com) has joined #duraspace
[4:58] * grahamtriggs (n=trig01@195.128.10.96) has joined #duraspace
[5:23] * mdiggory (n=mdiggory@cpe-76-176-188-83.san.res.rr.com) Quit ()
[6:38] * grahamtriggs (n=trig01@195.128.10.96) has left #duraspace
[7:05] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) has joined #duraspace
[7:10] * grahamtriggs (n=trig01@195.128.10.96) has joined #duraspace
[8:06] * mhwood (i=mwood@mhw.ulib.iupui.edu) has joined #duraspace
[8:10] -christel- [Global Notice] Hi all, I need to remove a server from production, it will be somewhat noisy and approximately 5,000 users will be affected. Apologies for the inconvenience and have a good day.
[9:10] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-f7600575c9208bd0) has joined #duraspace
[9:10] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-f7600575c9208bd0) Quit (Client Quit)
[10:19] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) Quit (Read error: 104 (Connection reset by peer))
[10:20] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) has joined #duraspace
[11:51] * tdonohue (i=80ae241d@gateway/web/freenode/x-c90f92d969527dbd) has joined #duraspace
[11:54] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-e19415a0ea5452a8) has joined #duraspace
[12:00] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-e19415a0ea5452a8) Quit ("Page closed")
[12:02] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-9a6227d7b0fdae68) has joined #duraspace
[12:07] * mdiggory (n=mdiggory@64.50.88.162.ptr.us.xo.net) has joined #duraspace
[12:07] <bradmc> Anyone around for dspace committers?
[12:08] <mhwood> Aye.
[12:08] <bradmc> Sorry for the slight delay. Topics?
[12:08] <bradmc> Update on the orphan jira items idea: I'm still working on it, as soon as I have our intern lined up, we'll schedule the first one (it's summertime, harder to get people).
[12:10] <mdiggory> Hi, likewise, just getting settled in
[12:12] <mdiggory> There the Experimental Confluence
[12:12] <mdiggory> Space
[12:12] <mdiggory> 2.) DSpace 2.0
[12:13] <mdiggory> 3.) DSpace Statistics / Event Services
[12:13] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-9a6227d7b0fdae68) has left #duraspace
[12:13] * cod3r (i=844850ae@gateway/web/freenode/x-559a7ea6583f8fd2) has joined #duraspace
[12:13] * cod3r (i=844850ae@gateway/web/freenode/x-559a7ea6583f8fd2) has left #duraspace
[12:13] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-67b23a0570ad2fb5) has joined #duraspace
[12:14] <mdiggory> 4.) Possible Cocoon 2.2 Block issues with 1.5.2 / 1.6
[12:15] * gaurav_hiiii (i=844850ae@gateway/web/freenode/x-67b23a0570ad2fb5) has left #duraspace
[12:16] <bradmc> Not quite sure we have a quorum, but it's logged, so we'll take a shot. Mark, let's go ahead in your order.
[12:16] <mdiggory> You take what you can get
[12:17] <mdiggory> So, Brad and I discussed setting up some space in the Fedora confluence instance rather than setting up an entirely separate instance.
[12:17] <mdiggory> Chris has setup a space for us just a short while ago here:
[12:18] <mhwood> One organization with two Confluences does sound strange. :-)
[12:18] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) Quit (Read error: 104 (Connection reset by peer))
[12:18] <mdiggory> http://fedora-commons.org/confluence/display/DSTEST
[12:18] <grahamtriggs> You are not permitted to perform this operation.
[12:18] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) has joined #duraspace
[12:19] <mdiggory> you personally?
[12:19] <mhwood> And neither am I.
[12:19] <mdiggory> correct, we are excluding Graham from being able to edit the wiki from this point forward
[12:19] <mdiggory> ;-)
[12:19] <grahamtriggs> Yeah, it could be taking exception to me. I am logged in ;)
[12:20] <mdiggory> So our first issue, permission ;-)
[12:20] <grahamtriggs> Fix the permissions, point a 'wiki.dspace.org' like URL at it: +1. Ok next topic ;)
[12:20] <mdiggory> I have a export/conversion of the dspace wiki to experiment with placing here
[12:21] <bradmc> Let's be clear about the status of that wiki space. I assume we're going to operate in an investigation mode until some date. Graham says today, I'm thinking something more like August <something>
[12:21] <mdiggory> bradmc: correct, that was going to be my next comment
[12:22] <mdiggory> We have some work to do getting it properly ported.
[12:22] <bradmc> Shall we go week to week and see when we want to propose flipping a switch?
[12:22] <mdiggory> We do not want a repeat of last time
[12:22] <grahamtriggs> I was joking about going on the initial import. But yes, a move asap would be beneficial. Particularly if we get the Scroll plugin going and get focus documentation efforts on the new Wiki
[12:22] <mdiggory> we can have the documentation be a project separate from the wiki porting
[12:22] * grahamtriggs thinking that we need to document 1.6 sooner rather than later
[12:23] <mdiggory> I think we need to start identify in documentation concerns as JIRA issues?
[12:23] <bradmc> Note that on documentation, Jeffrey Trimble is working hard and ready for input. We should figure out how that interrelates with the wiki. I still need to add that Jira field for doc status, I'll get that done.
[12:24] <mdiggory> It was recently pointed out to me that someone was delinquent in updating the build, install and webapp customization instructions in 1.5.1
[12:24] <grahamtriggs> http://www.k15t.com/confluence/display/TEST/Scroll+Test+Space
[12:25] <bradmc> Yes, that issue is on Jeff's list, and when he's ready with a doc update, we'll post it on ds.org, independent of any other release schedules.
[12:25] <bradmc> The question is getting him any text any of you want to write.
[12:26] <mhwood> Need a doc Component in JIRA?
[12:27] <mdiggory> This is why I think we want JIRA issues for it, to track the activity, right now its happening in private emails and while I am going to contribute, it would be easier to manage in JIRA
[12:28] <mhwood> +1 capture doc. bugs, improvements in JIRA just like other parts of the product.
[12:29] <bradmc> I'm fine with that; how does it compare / relate to issues in other components that have documentation associated with them?
[12:29] <mhwood> That also makes it easier to not have to have Jeff do it all and remember it all.
[12:30] <mdiggory> I think having a component call doc makes sense
[12:31] <mhwood> If a doc. issue is part of a code issue then it probably belongs with the code component. But many doc. issues are primarily about the documentation itself.
[12:31] <mdiggory> I don't think its a problem, you can seleect multiple components right?
[12:32] <mdiggory> seems "components" are more like "tags"
[12:32] <grahamtriggs> you could also link a doc issue with relevant code issues
[12:33] <mdiggory> and I wonder if its more important to have them represent salient focus areas in the project rather than concrete subprojects
[12:34] <bradmc> Okay, so we'll add the component for Docs, and also add the Doc status field to all issues, with Graham's proposed states: needed, not required, in attachment(s),
[12:34] <bradmc> in description, in comments
[12:34] <mhwood> Try a "documentation" component now and reorganize later if needed?
[12:35] <mdiggory> if its easy for Brad to setup +1
[12:36] <bradmc> Done.
[12:36] * bradmc wipes sweat from brow.
[12:38] <mhwood> Best if the documentation gardener announces this?
[12:38] <bradmc> I'll follow up with Jeffrey. Shall we move to next issue?
[12:38] <mdiggory> best if the documentation gardner first knows about it
[12:40] <mdiggory> quick 2.0 update is more related to the two GSoC projects now related to it
[12:40] <mdiggory> https://scm.dspace.org/svn/repo/modules/storage-fedora/trunk/
[12:40] <mdiggory> and
[12:40] <mdiggory> https://scm.dspace.org/svn/repo/modules/rest/trunk/
[12:41] <mdiggory> will hold those GSoC project work activities
[12:41] <mdiggory> the project reporganization that we done to date in the DSpace 2.0 has settled down and I am deeming it concluded
[12:42] <mdiggory> the ultimate outcome is that modules from dspace 1.x and DSpace 2.x can be managed/maintained within the same repository space
[12:42] <mdiggory> https://scm.dspace.org/svn/repo/modules/
[12:43] <mdiggory> which currently contains a mix of i18n, alpha and beta grade addons for dspace
[12:44] <mdiggory> In DSpace 2.0 these will all have their own release schedules
[12:44] <mhwood> Wow, quite a list.
[12:44] <mdiggory> and versioning
[12:45] * bradmc completes adding "Documentation Status" to Jira's field list.
[12:45] <mdiggory> Yes, there quite a mix of stuff here now
[12:45] <mdiggory> and the status of all of it needs to be addressed somehow
[12:46] <mdiggory> some are projects I dragged along from MIT's activities like dspace-history, policy,sesame
[12:47] <mdiggory> dspace-srw is the OCLC SRW webapp with just the DSpace driver
[12:47] <mdiggory> SIP is a contribution from Larry Stone for constructing DSpace METS packages
[12:48] <mdiggory> radius ia a contributes radius authenication servlet for jspui
[12:48] * mdiggory thinks he needs a new keyboard
[12:49] <mhwood> A single page gathering all modules together, with what-it-does, current version, status, link to more info., link to SCM would probably be well received.
[12:49] <bradmc> Okay, stats and events?
[12:50] <mdiggory> Geoip, Services, Solr and Solr-stats are are @mires current activity around statistics
[12:50] <bradmc> mhwood: +1
[12:50] <mdiggory> I've passed some work around concerning the UsageEvent changes proposed for 1.6, but am very open to propositions
[12:52] <mdiggory> More specifically, I may augment the proposal to instead be replacing the calls to UsageEvent in JSPUI and XMLUI with
[12:52] <mdiggory> https://scm.dspace.org/svn/repo/modules/dspace-services/trunk/src/main/java/org/dspace/services/LoggingService.java
[12:52] <mdiggory> and having a default UsageEvent based version of this
[12:52] <mdiggory> in dspace-api
[12:53] <mdiggory> The large question is "Is ther anyone using UsageEvent in the community?"
[12:53] * grahamtriggs (n=trig01@195.128.10.96) has left #duraspace
[12:53] <mhwood> Besides me? :-)
[12:53] <mdiggory> MInho did use it in their port to 1.5
[12:54] <mdiggory> The major improvement I have been looking to get in place for UsageEvent is multicasting (or stacking of UsageEvent listeners
[12:55] <mdiggory> And the easiest way I've discovered to do this is to adopt usage of the DSpace 2.0 EventService implementation
[12:55] <mdiggory> rather than implementing one from scratch
[12:56] <mdiggory> This would replace the LoggingService api posted above with the DSpace 2.0 Event Service API
[12:57] <mdiggory> https://scm.dspace.org/svn/repo/dspace2/core/trunk/api/src/main/java/org/dspace/services/EventService.java
[12:58] <mdiggory> AbstractUsageEvent would then be a EventListener rather than a Service
[12:58] <mdiggory> https://scm.dspace.org/svn/repo/dspace2/core/trunk/api/src/main/java/org/dspace/services/model/EventListener.java
[12:58] <mhwood> UsageEvent only exists because folks didn't think the other event service should be extended to non-content-model events. If there's a better way to do it all with one mechanism now, let it be so.
[12:59] <mhwood> +1 making usage event listeners stackable. I should have done that to begin with.
[13:00] <mdiggory> Minho changes UsageEvent purpose to include content model changes
[13:02] * bradmc starts lurking while attending another meeting.
[13:02] <mdiggory> There is one dilemma I currently face, the UsageEvent API and the 2.0 Event service do not pass actual DSpaceObjects through to their listeners/providers
[13:02] <mdiggory> just the identifiers for the objects (type, id)
[13:03] <mhwood> So we lose information across the adapter?
[13:03] <mdiggory> This is a controversial question: is it fair to pass the object itself or is that really bad form?
[13:03] <mdiggory> mhwood: we loose the ability to efficiently see the state of the source objecct
[13:05] <mdiggory> The original EventManager designers took a very conservative stance that passing the object through opens the system up to possible inefficiencies
[13:05] <mdiggory> and state problems
[13:06] <mdiggory> that they percieved a community of Event Consumer developers that would hold onto the objects and try to do things to them
[13:07] <mdiggory> like change their state and call update etc
[13:07] <mdiggory> or place them on a queue for processing later asyncronously
[13:08] <mdiggory> so the decision was made to for the EventManager/Event to be just a message, not a object container
[13:08] <mdiggory> I tend to agree and think we have the appropriate capability in 2.0... I don't think I want to port the 2.0 caching tier to DSpace 1 however
[13:09] <mhwood> And then the design becomes dependent on the designer's understanding what event consumers will need to know about those objects.
[13:10] <mdiggory> CachingService is the "savior" here in 2.0 You can always et back to the original object through it in your provideres etc
[13:10] <mhwood> I think I agree, though, that one should have to do a bit of work to get references to the actual objects, to make one think about problematic uses.
[13:12] <mdiggory> But, without caching, looking those object up again in the EventListeners will be costly and slow the system down unnecessarily when the object is already instantiated somewhere else in the request cycle
[13:12] <mdiggory> In 1.x that current location is the Context object
[13:14] <mhwood> How much re-lookup is likely? What additional information would be necessary in the message to obviate the majority of re-lookups?
[13:15] <mdiggory> The problem is that makes the Service have to be much more complex about guessing what information is relevant
[13:15] <mhwood> Or: what are event consumers wanting to do for themselves that could be done for them at low cost?
[13:16] <mdiggory> for instance, in the statistics case, we want to take the DSpaceObjects metadata, attribute and parentage to construct a contextual logging event
[13:16] <mdiggory> this means we need a DSpaceObject to retrieve those from.
[13:16] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) Quit (Read error: 54 (Connection reset by peer))
[13:17] <mdiggory> Likewise, we want certain other metadata attributes like contributors , titles, dates, etc
[13:17] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) has joined #duraspace
[13:18] <mdiggory> my point is that it would be best if the Event / EventService did just identify the object and force the providers/consumers to get a reference to it on their own, but its would also be very important to assure that is efficient by passing through a cached version of the object in something like the COntext
[13:18] <mhwood> And the event sink needs to know all of that? I kind of expected that enough identifying information would be passed through to allow a whole 'nother process to look up what it wants later.
[13:19] <mdiggory> mhwood: I think we are in agreement here.
[13:21] <mhwood> The question is not just what do we need to know, but when do we need to know it? We only need to know the title of an item, for instance, when we want to do something with it (like display it).
[13:22] <mhwood> At least, I have difficulty imagining many folk wanting to do real-time statistics on the *titles* of things.
[13:24] <mdiggory> but authors of things and subjects of things, these are more relevant
[13:26] <mdiggory> I think we will be safe with passing through the Context for the moment somehow so that the provider can complete its processing using its cache if available.
[13:27] <mdiggory> In 2.0 Caching is hidden from the application inside the service
[13:27] <mhwood> Should we settle for BIG WARNINGS on the event interfaces, that you are in an area that is supposed to be fast and not hang onto references?
[13:28] <mdiggory> So doing something like new DSpace().getStorageService().getEntity(String id) is how everything would access the object
[13:28] <mdiggory> and the service may or may not implement caching for efficiency
[13:30] <mhwood> Meanwhile, though, most of us aren't there yet, and we may have to just live with the idea that this is one place where Things Will Be Different In 2.0.
[13:30] <mdiggory> Yes, but it also means that the EventService implementation needs to assure this as well it can't be anything but a synchronous in process service
[13:31] <mdiggory> I'm trying to bring the Mountain to Mohammed
[13:32] <mdiggory> by providing a bare bones ServiceManager for 1.x
[13:33] <mhwood> My thinking is that this is a case of "things will be easier and simpler in the next release, but for now please be careful in these ways."
[13:35] <mdiggory> mhwood: I'm trying to prove that we can be both careful and innovative at the same time
[13:38] <mhwood> Well, you know better than I at this point how much time and effort are involved in bringing these properties back to 1.x vs. just waiting until one is on 2.x and they're there already.
[13:38] <mdiggory> I'm being careful by bringing this stuff to the community for discussion prior to committing anything directly on 1.6, likewise, by working on these projects independently in the modules space, we allow them to evolve publicly to a state of maturity rather than dumping them into the dspace-api/etc
[13:42] <mdiggory> mhwood: the problem is that 2.x needs more developers who understand the development model we are designing here. There will need to be stepping stones between 1.x and 2.x, those are both important allow 2.x more time to mature and for 1.x to benefit from all the learning going on there
[13:43] <mdiggory> maybe time to switch to that last uncovered topic
[13:43] <mdiggory> since you are here
[13:43] <mhwood> And backporting some of that model also helps to grow such developers. Got it.
[13:43] <mdiggory> Cocoon 2.2 blocks in 1.5.2
[13:45] <mhwood> Cocoon blocks: I am in a twisty maze of passages, all different and most undocumented.
[13:46] <mdiggory> this is a dilemma with cocoon, the nice thing about Cocoon 3.0 is that you will probably no longer look in the Cocoon community for this detail and instead look in the more properly documented Spring community instead.
[13:47] <mhwood> Figures, I was just about ready to see if I could do something about the appalling Cocoon documentation.
[13:47] <mdiggory> so where I get confused concerning the issue thats arising is that it doesn't happen for everyone
[13:48] <mhwood> That is weird. There's speculation that spiders are involved -- that some of them are trying to follow bogus paths through the namespace and triggering the bug.
[13:49] <mhwood> A colleague here thinks our troubles became noticeable just after he registered our site with a specialized indexer. I don't know which but I can find out.
[13:50] <mhwood> My recollection is that we've had this problem at a low level for a few weeks now (since we went to 1.5.2) but it came to a head just last week. That would jibe with his observations.
[13:51] <mhwood> I have a stupid little script that just asks for "HEAD http://hostname/" periodically, and I saw this site disappear and reappear at intervals but could never catch it (until last week).
[13:52] <mdiggory> when it comes to spiders... you need a good initial tier of defense
[13:52] <mdiggory> For my previous work at MIT it was mod_cbandmod_cband
[13:53] <mdiggory> using it we could both throttle good crawlers and bad agents
[13:53] <mhwood> I'll check into that one.
[13:53] <mdiggory> and deliver more bandwidth for true users
[13:54] <mhwood> One reason I've been squawking about getting the status codes right is that I want to provide sitemaps to help the search engines use less of our cycles.
[13:58] <mdiggory> I agree that that is important
[13:58] <mhwood> Hmmm, independent of the current problem, we probably need to figure out how to reliably take up new releases of our dependencies in a modular way. Ugh, the Cocoon documentation issue again.
[13:59] <mdiggory> is the current sitemaps implementation inadequate?
[13:59] <mhwood> I can't tell -- I can't inform Google of our sitemaps until I can verify the site, and I can't verify the site until they get a 404 for a nonexistent page.
[14:00] <mdiggory> in the Cocoon case, we need to consider not using the "parent" module containers and sepect the specific artifacts we only require
[14:00] <mdiggory> sepect?
[14:01] <mhwood> I don't know what it means either, but I was going to let it go -- the overall meaning is clear.
[14:03] <mhwood> If my diagnosis is correct, asking for "/no-such-page.html" returns a 500 because that case is shipped off to a file reader which returns FileNotFoundError, which the <handle-errors> setup is not prepared for.
[14:03] <mdiggory> we should also consider utilizing "excludes" more in our dependencies to secure against getting unwanted dependencies that may be added by the third party
[14:03] <mhwood> "excludes" sounds good.
[14:04] <mdiggory> So it looks for the file in the webapp directory?
[14:04] <mdiggory> or is it looking elsewhere
[14:05] <mdiggory> like static/...
[14:05] <mhwood> I think it's looking for the file in the right place. It's just that, if there *is* no such file in the right place, we should return 404 but this goes to the <otherwise> clause and returns 500.
[14:06] <mhwood> Yes, I think static/ is the place. I'm drowning in sitemaps right now....
[14:07] <mhwood> Unwinding a bit, Google is deliberately looking for a file that they expect not to find, to check that we are returning the proper status -- I suppose, so that they can rely on our status codes in other cases.
[14:09] <mhwood> I actually fixed this issue once, inbetween 1.5.1 and 1.5.2, but someone introduced the static/ stuff and now these requests go down a very different path. (And I got bogged down trying to provide an integration test so it doesn't get "unfixed" again....)
[14:14] <mhwood> *sigh*, I see I've done it again. I should've posted an issue in JIRA, but I thought, "I'll just fix it and post the fix with the problem", it took way longer than I thought, and now nobody knows about it and I'm still tinkering.
[14:14] <mdiggory> Looking as well
[14:15] <mdiggory> the thing is that the themes were supposed to be where static content originates from
[14:15] <mdiggory> not some separate directory
[14:16] <mdiggory> the way themes is mounted above and below the static stuff seems odd
[14:16] <mhwood> The static-content fun begins around line 365 of the top sitemap.
[14:17] <mdiggory> I do not like this...
[14:17] <mdiggory> <map:match pattern="*.txt">
[14:17] <mdiggory> <map:read src="static/{1}.txt" />
[14:17] <mdiggory> </map:match>
[14:17] <mdiggory> <map:match pattern="*.html">
[14:17] <mdiggory> <map:read src="static/{1}.html" />
[14:17] <mdiggory> </map:match>
[14:18] <mhwood> Yup, that's where I think the 500 problem came in.
[14:18] <mdiggory> something I've learned in creating themes is that you want static content to be mapped explicitly where it is located, I.E. the way themes is setup to resolve static content
[14:19] <mdiggory> I vote this goes away...
[14:20] <mhwood> That should solve my status code problem. IIRC the first fix was down in the theme stuff, and was simply to specify 404 in such cases instead of letting it default to 200.
[14:21] <mdiggory> I'm still trying to grok why the last themes/themes.xmap mount is required
[14:21] <mdiggory> I need to work on this all being delivered as blocks in 2.0
[14:22] <mhwood> That is the only reference to themes.xmap that I can find. We do need one.
[14:23] <mdiggory> Why this is hurting my head is that the "themes/*/**" gets the theme sitemap and I think is where the theme is executed
[14:24] <mdiggory> so its unclear to me the purpose of the later mount
[14:25] <mhwood> themes/themes.xmap does the theme selection and sets the default theme.
[14:25] <mdiggory> I also suspect that
[14:25] <mdiggory> <map:match pattern="static/*/**">
[14:25] <mdiggory> <map:read src="static/{1}/{2}" />
[14:25] <mdiggory> </map:match>
[14:25] <mdiggory> should be
[14:26] <mdiggory> <map:match pattern="static/**">
[14:26] <mdiggory> <map:read src="static/{1}" />
[14:26] <mdiggory> </map:match>
[14:28] <mdiggory> I think I see, they are trying to allow things like "/my-file.html" to have global level static html, but do not want to store it at the global level...
[14:28] <tdonohue> mdiggory and mhwood....just catching up on your chat....I had added that "static/" stuff in based on a feature request for globally static content (across themes)...e.g. robots.txt, flat HTML "help" pages, google webmaster tools html, etc.
[14:29] <tdonohue> i had thought that i had tested it to ensure it throws 404 when not found, but it's possible that a bug got in there somewheres
[14:29] <mdiggory> tdonohue: I think we should be careful about how we allow static content into xmlui
[14:30] <tdonohue> understood...but, we needed a way to allow for 'robots.txt' and static help pages (Similar to jspui), and I had presented this as a solution...if there's a better way, feel free to rework it though
[14:30] <mdiggory> we do want a good robots.txt in place regardless
[14:30] <tdonohue> i was just explaining the *need* that it was meant to fulfill
[14:31] <mdiggory> yes, certainly, we have a need, just thinking about best practice
[14:31] <mdiggory> sorry didn't mean to sound disapproving if I was
[14:32] <mhwood> Well, robots.txt is a special case: we need one in every DSpace and it has a specific name and path. So maybe special-case that file in the sitemap, and then think about other cases?
[14:32] <tdonohue> also that pattern matching against "*.txt" and "*.html" should only be matching against "http://mydspace.edu/*.txt" and "http://mydspace.edu/*.html"...so, I'm not entirely sure that this is where the 500 problem is...
[14:33] <mdiggory> I need to setup a test instance
[14:33] <mdiggory> doh, wiat, I already do...
[14:34] <mhwood> The 500 problem, if I understand it, is that if you ask for /nosuchfile.txt or /nosuchfile.html, and the file doesn't exist, the reader throws an exception that <handle-errors> isn't configured to understand, so it goes through <otherwise> and comes out as 500.
[14:35] <tdonohue> mhwood: right, i think that's 100% correct...so, in my mind, why not change the <handle-errors> to catch that exception? I also wonder if the <otherwise> should just return 404 by default, to be honest
[14:35] <mdiggory> http://pastebin.com/m74bf005d
[14:37] <tdonohue> hmm...so, why isn't that error being caught by the "not-found" case? If it's a FileNotFoundException, shouldn't it be caught by that?
[14:37] <mhwood> Well, the 500 catch-all seems appropriate to me. I tried defining a FileNotFoundException case, and it does pick up the error, but for some reason now I get 200 for that case no matter that I specified "status-code='404'". I got pulled away before I could figure out why.
[14:37] <tdonohue> odd
[14:38] <mdiggory> it is happening at
[14:38] <mdiggory> <map:match pattern="*.html">
[14:38] <mdiggory> <map:read src="static/{1}.html"/>
[14:38] <mdiggory> </map:match>
[14:38] <mhwood> the not-found exception matches org.apache.cocoon.ResourceNotFoundException.
[14:38] <mdiggory> so /foo.html doesn't work while /static/foo.html does work
[14:39] <mdiggory> and returns the appropriate code
[14:39] <mhwood> The reader throws FileNotFoundException.
[14:41] <tdonohue> mdiggory: so, you only get that exception if you go to http://my.dspace.edu/foo.html, but not if you go to http://my.dspace.edu/static/foo.html ? am I understanding that correctly?
[14:41] <mdiggory> correct, likewise /xmlui/bar/foo.txt returns the correct 404 error page as well
[14:43] <mdiggory> likewise http://localhost:8082/xmlui/foo.bar returns approriate 404 error page
[14:44] <tdonohue> something is definitely broken then...cause this works fine with our IDEALS XMLUI (on Dspace 1.5.2): http://www.ideals.uiuc.edu/robots.txt versus http://www.ideals.uiuc.edu/static/robots.txt (the latter throws a 404 as its not
[14:45] <tdonohue> (latter throws a 404 as the path isn't valid)
[14:47] <mdiggory> it seems that the FileNotFound exception is wrapped in a org.apache.cocoon.ProcessingException: Failed to process reader
[14:49] <mdiggory> we may try adding the exception and seeing if we can unroll it in the configuration
[14:52] <mdiggory> http://cocoon.apache.org/2.2/core-modules/core/2.2/1379_1_1.html
[14:58] <tdonohue> that looks like the right way to go in general...we should be unrolling more of these exceptions, likely
[14:59] <mdiggory> I did this bt like mhwood it catches the exception and is handled, but the status code is 200 instead of 404
[15:00] <mhwood> Um, I just tried http://www.ideals.uiuc.edu/static/robots.txt and I got a 200 with the "we can't find the page you asked for" page.
[15:05] <tdonohue> mhwood...yep, you are right...that is broken...and i notice accessing a non-existing file throws the same error: http://www.ideals.uiuc.edu/foo.html
[15:05] <tdonohue> it is broke
[15:14] <mdiggory> digging... have errors being handled, but status code is still 200
[15:14] <mdiggory> I added an exception case for java.io.FileNotFoundException
[15:15] <mhwood> Yes, that's what I tried. Got 200 instead of the configured 404, haven't yet worked out why.
[15:15] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) Quit (Read error: 104 (Connection reset by peer))
[15:23] <mdiggory> apparently we can set the status to anything but 404
[15:23] <mdiggory> 404 always returns 200
[15:23] <mhwood> <marvin>Oh, good.</marvin>
[15:24] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) has joined #duraspace
[15:24] <tdonohue> crap..how the hell did that happen?
[15:24] <mdiggory> http://www.nabble.com/Cocoon-2.2-error-handlers-tt23272049.html#a23272049
[15:26] <tdonohue> so, that's just how cocoon 2.2 works? that's crappy
[15:26] <mdiggory> http://wiki.apache.org/cocoon/ErrorHandling
[15:26] <mdiggory> thres something more going on here
[15:26] <mdiggory> Error type and !ResourceNotFound exception
[15:29] <mhwood> That's the old type= stuff. Deprecated in 2.2, use exception selector instead.
[15:37] <mdiggory> yes, not trying to use it, just seeing legacy cruft around 404
[15:38] * mdiggory feels like he's in the "bing" commercial http://www.google.com/search?q=cocoon+2.2+404+status+header&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
[15:57] <mdiggory> ok, it seems to be an issue in the serialize component and occurs throughout the application, even when manakin theme rendering occurs
[15:59] <mdiggory> lookin at the Reference sitemap I see theres not real error handling there
[16:01] <tdonohue> fun fun...well, i gotta head to a few meetings, but i'll try and get back on #duraspace tomorrow morning to help out as necessary
[16:01] * tdonohue (i=80ae241d@gateway/web/freenode/x-c90f92d969527dbd) Quit ("Page closed")
[16:12] <mdiggory> https://issues.apache.org/jira/browse/COCOON-2218
[16:12] * mdiggory feels he has come full circle
[16:17] <mhwood> So it just spontaneously un-broke at some point? Humph.
[16:19] <mhwood> So, can you tell whether 2.2-dev is later than released 2.2? Presumably yes, otherwise we supposedly should not see this problem.
[16:20] <mdiggory> testing
[16:22] <mdiggory> mhwood: did you try cocoon-servlet-service-impl-1.2.0.jar
[16:23] <mhwood> Not with this one. I did with the problem I just filed in JIRA today, but haven't been able to make the app. start.
[16:29] <mhwood> I *am* in a twisty little maze of passages.
[17:01] * bradmc (n=bradmc@207-172-69-79.c3-0.smr-ubr3.sbo-smr.ma.static.cable.rcn.com) Quit ()
[17:02] <mhwood> Gotta go, thanks everybody, back tomorrow.
[17:09] * mhwood (i=mwood@mhw.ulib.iupui.edu) has left #duraspace
[17:13] <mdiggory> mdiggory is stuck between a rock and a hard place
[17:35] * bradmc (n=bradmc@c-71-233-44-204.hsd1.ma.comcast.net) has joined #duraspace
[17:39] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) Quit (Read error: 104 (Connection reset by peer))
[17:39] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) has joined #duraspace
[17:50] * bradmc_ (n=bradmc@c-71-233-44-204.hsd1.ma.comcast.net) has joined #duraspace
[17:50] * bradmc (n=bradmc@c-71-233-44-204.hsd1.ma.comcast.net) Quit (Read error: 104 (Connection reset by peer))
[18:09] * bradmc_ (n=bradmc@c-71-233-44-204.hsd1.ma.comcast.net) Quit ()
[19:21] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) Quit (Read error: 104 (Connection reset by peer))
[19:33] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) has joined #duraspace
[20:21] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) Quit (Read error: 104 (Connection reset by peer))
[20:49] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) has joined #duraspace
[20:52] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) Quit (Read error: 104 (Connection reset by peer))
[20:53] * mdiggory (n=mdiggory@64.50.88.162.ptr.us.xo.net) Quit (Read error: 110 (Connection timed out))
[20:55] * stuartlewis (n=stuartle@gendig21.lbr.auckland.ac.nz) has joined #duraspace
[22:10] * mdiggory (n=mdiggory@cpe-76-176-188-83.san.res.rr.com) has joined #duraspace

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.