#duraspace IRC Log

Index

IRC Log for 2013-06-12

Timestamps are in GMT/BST.

[6:49] -morgan.freenode.net- *** Looking up your hostname...
[6:49] -morgan.freenode.net- *** Checking Ident
[6:49] -morgan.freenode.net- *** Found your hostname
[6:49] -morgan.freenode.net- *** No Ident response
[6:49] * DuraLogBot (~PircBot@atlas.duraspace.org) has joined #duraspace
[6:49] * Topic is '[Welcome to DuraSpace - This channel is logged - http://irclogs.duraspace.org/]'
[6:49] * Set by cwilper!ad579d86@gateway/web/freenode/ip.173.87.157.134 on Fri Oct 22 01:19:41 UTC 2010
[12:38] * mhwood (mwood@mhw.ulib.iupui.edu) has joined #duraspace
[13:12] * tdonohue (~tdonohue@c-67-177-111-99.hsd1.il.comcast.net) has joined #duraspace
[13:14] * misilot (~misilot@p-body.lib.fit.edu) has left #duraspace
[19:54] * hpottinger (~hpottinge@mu-162198.dhcp.missouri.edu) has joined #duraspace
[19:55] * helix84 (~a@ip4-95-82-147-170.cust.nbox.cz) has joined #duraspace
[19:58] * l_a_p (~chatzilla@31.188.17.226) has joined #duraspace
[19:59] * kstamatis__ (4f6ba400@gateway/web/freenode/ip.79.107.164.0) has joined #duraspace
[20:00] <tdonohue> Hi all, welcome. It's now time for our weekly DSpace Developers Meeting. Today's general agenda: https://wiki.duraspace.org/display/DSPACE/DevMtg+2013-06-12
[20:00] <kompewter> [ DevMtg 2013-06-12 - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DevMtg+2013-06-12
[20:01] * robint (522a6b02@gateway/web/freenode/ip.82.42.107.2) has joined #duraspace
[20:01] <robint> hi all
[20:01] <tdonohue> First and foremost though... I wanted to welcome kstamatis_ (Kostas Stamatis), who is our newest DSpace Committer! Welcome & Congrats, kstamatis_
[20:01] <mhwood> Welcome!
[20:02] <kstamatis__> Many thanks to all
[20:02] <tdonohue> hi as well, robint :)
[20:02] <helix84> welcome, kostas!
[20:02] <kstamatis__> @helix84 Thanks
[20:03] <hpottinger> Welcome, kstamatis__ good to see you here!
[20:03] <robint> Just tried looking up 'welcome' in greek, but I don't think I should risk it :)
[20:03] * bollini (~chatzilla@adsl-ull-225-49.47-151.net24.it) has joined #duraspace
[20:04] <tdonohue> As normal, we're going to start this meeting with a quick review of a few GitHub Pull Requests (just to keep on top of them and make sure we are giving feedback on them)
[20:04] <kstamatis__> As I said in a previous email in the list, I feel a little bit akward , like a fish out of the water.
[20:04] <kstamatis__> I will get used to it soon, however!
[20:04] <mhwood> I remember that feeling.
[20:04] <tdonohue> kstamatis__ : no worries. I honestly think we all start out that way. You are more than welcome to just listen in & also ask plenty of questions if something is confusing or unclear
[20:05] * hpottinger still feels that way, but we're a friendly bunch, you'll see.
[20:05] <tdonohue> So, in terms of Pull Requests, we're starting today at #189: https://github.com/DSpace/DSpace/pull/189
[20:05] <kompewter> [ SSO authentication module - RemoteUser by iwellaway · Pull Request #189 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/189
[20:06] <kstamatis__> @hpottinger I have noticed that (friendly bunch) from the very first moments as a committer!
[20:06] <helix84> what kind of authentication is this?
[20:06] <tdonohue> SSO = "Single Sign-On"
[20:06] <helix84> like when Squid inserts a RemoteUser HTTP header?
[20:06] <tdonohue> but, SSO is vague, I agree
[20:06] <helix84> SSO is a generic term, is it a specific one, too?
[20:07] <helix84> this should have an associated jira issue
[20:07] * aschweer (~schweer@schweer.its.waikato.ac.nz) has joined #duraspace
[20:07] <mhwood> I think it's used generically here. I don't know of a specific meaning.
[20:07] <tdonohue> yea, I agree. I think SSO is meant generically here. I also agree we need to ask for a corresponding JIRA ticket
[20:08] <mhwood> This may refer to some kind of container-managed authn.
[20:08] <helix84> also, what about JSPUI? all auth methods so far have been common, is this one implemented only for XMLUI?
[20:09] <hpottinger> mhwood, I infer the same, from the description.
[20:10] <helix84> this may be useful, I just think it needs a ticket and a bit more background. E.g. what SSO systems work this way and how we can test this.
[20:10] <robint> If we get a Jira ticket we could add a request to look for someone to add the jspui code
[20:10] <aschweer> also, from the comments it looks like the configuration is in the ldap config file -- probably not a good spot
[20:10] <hpottinger> it looks like the code just assumes that if RemoteUser returns a value, all is well, feels like Apache Basic Auth to me
[20:11] <tdonohue> Hmmm... #189 seems to be a "messy" pull request. It's accidentally removing some seemingly unrelated (Stats / Open Search) settings from this sitemap.xmap: https://github.com/DSpace/DSpace/pull/189/files#L4L21
[20:11] <kompewter> [ SSO authentication module - RemoteUser by iwellaway · Pull Request #189 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/189/files#L4L21
[20:11] <mhwood> LDAP? Where?
[20:11] <helix84> another question i have is - could this be potentially used for SSO between our XMLUI and JSPUI (without third-party software)?
[20:11] <aschweer> https://github.com/DSpace/DSpace/pull/189/files#L0R35 but might just be lack of coffee at my end
[20:11] <kompewter> [ SSO authentication module - RemoteUser by iwellaway · Pull Request #189 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/189/files#L0R35
[20:12] <mhwood> That may be just copy/paste problems.
[20:13] <tdonohue> aschweer: yea, I don't think that comment is accurate. There's no such changes in the PR itself
[20:13] <aschweer> well I suppose then at least the comment needs to be fixed ;) but I'll be quiet now and get a coffee
[20:13] <mhwood> It wouldn't be the only class having comments belonging to another class.
[20:13] <tdonohue> In any case, it sounds like we need a JIRA ticket here. We may need some PR cleanup (likely just ensuring he's working against latest "master" and cleanup comments)
[20:13] <mhwood> Yes to ticket and to cleanup.
[20:14] <tdonohue> beyond that though, this all sounds reasonable. We also need JSPUI too.
[20:14] <mhwood> Yes again.
[20:14] <helix84> the ldap comment makes sense
[20:14] <helix84> it's something i used before
[20:14] <helix84> not with dspace
[20:16] <mhwood> Hmmm, it is using LDAP configuration data to make decisions. That may need more cleanup.
[20:17] <helix84> you see, with a RemoteUser header, you know who's logged in, but you still need to fetch user details from somewhere
[20:17] <tdonohue> I added a quick comment to #189, and linked to this discussion log
[20:18] <tdonohue> Feel free to add more suggestions to #189 though, if I didn't properly capture everything.
[20:18] <mhwood> authentication-ldap.login.specialgroups probably shouldn't be used, though. Needs authentication-remoteuser.login.specialgroups.
[20:19] <tdonohue> We'll stop the PR reviews there for today, as we're already nearly 20mins in.
[20:20] <helix84> mhwood: not necessarily. with hierarchical ldap, you get the full DN based on username
[20:20] <helix84> mhwood: and specialgroups are based on full DN
[20:20] <tdonohue> The only other agenda item I had for today was to keep us thinking about DSpace 4.0 release (for Nov/Dec 2013).
[20:21] <tdonohue> We still could use a few more volunteers for the Release Team (to work with mhwood & hpottinger who already volunteered). If anyone else is interested (or even just wanting to learn more), feel free to get in touch.
[20:21] <mhwood> Yes please.
[20:21] <hpottinger> come on, it's fun! :-)
[20:22] <tdonohue> We also probably want to think more about 4.0 Release Schedule. We haven't set any deadlines/timelines yet. But, looking back at last year, for 3.0 our first "deadline" (Code Submission Deadline) was late August...which is rapidly approaching :)
[20:23] <tdonohue> We are more than welcome to change how we want to approach this 4.0 Release though. There's no need to set the same sort of deadlines as 3.0. We can also try something new if we wish. I'm just reporting how things began last year.
[20:24] <helix84> just noting that we were very lenient last year and allowed things to come in late, which resulted in several pushbacks of the date. this year's RT may want to re-think that policy. or not.
[20:24] <mhwood> So, what features do people want to get into 4.0?
[20:25] <mhwood> That is: do you have features that you want to contribute to 4.0? what are they?
[20:25] <helix84> I think DCAT expects the metadata improvements they designed, but I haven't noticed any volunteer for that yet
[20:25] <hpottinger> I think it's worth it to try to get DAO finished up and tested
[20:26] <tdonohue> helix84++ Yes, there were some date pushbacks on 3.0 cause of lenient dates. It seems like we always encounter minor pushbacks in some form, but it would be good to try to minimize them if we can :)
[20:26] <helix84> I don't mean to spek for Joao, but he said he will have time to resume DAO & SpringUI work in August, so he might miss the deadline.
[20:26] <tdonohue> Some other 4.0 early ideas are on our 4.0 Release Page: https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.0+Notes
[20:26] <kompewter> [ DSpace Release 4.0 Notes - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/DSpace+Release+4.0+Notes
[20:27] <hpottinger> Hmmm.... DAO is a huge piece of work, we'd need it early, not late, if at all, I think.
[20:27] <helix84> but he also asked for help with DAO, so volunteers are surely welcome
[20:27] <helix84> it has been marked as 50% complete for months now, noone else chipped in
[20:28] <helix84> is there no time, no interest or is it unclear what he needs help with?
[20:28] <mhwood> It's never been clear to me what DAO does for us.
[20:28] <tdonohue> I recall from past DAO discussions that there have been several folks interested, but a lot of confusion around how best to chip in, as it's not clear how to chip in (and what's there is not well documented yet).
[20:29] <helix84> mhwood: as I understand it, broadly speaking - storage abstraction
[20:29] <mhwood> I don't deprecate DAO; I don't understand it well enough to deprecate or encourage.
[20:30] <helix84> mhwood: you have surely noticed a lot of "if (oracle)" sprinkled around
[20:31] <mhwood> I wonder if a few stored procedures would fix those.
[20:31] <tdonohue> regarding the DAO work. In case anyone else is searching for it, here's the JIRA ticket (DS-1438) and it's also PR#161 https://github.com/DSpace/DSpace/pull/161
[20:31] <kompewter> [ DAO Implementation (Status: 50%) - Community help requested by lyncodev · Pull Request #161 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/161
[20:31] <kompewter> [ https://jira.duraspace.org/browse/DS-1438 ] - [#DS-1438] DAO implementation - DuraSpace JIRA
[20:31] <hpottinger> in order for us to help, we need more of a rallying cry, I think
[20:31] <helix84> then I also heard some keywords like Spring and Hibernate, but you Java guys have to pardon my superficial knowledge
[20:32] <mhwood> Hibernate is an Object/Relational Mapping thingy. I haven't worked with it (or any others) yet.
[20:33] <helix84> so, is there anyone else interested or at least curious about DAO (in addition to richard)?
[20:33] <tdonohue> Here's Joao's notes on his work so far. In it he says that Hibernate replaces our DatabaseManager. https://wiki.duraspace.org/display/~joaomelo/DAO+Implementation Context object is also changed to use a Hibernate Session.
[20:33] <kompewter> [ Log In - DuraSpace Wiki ] - https://wiki.duraspace.org/display/~joaomelo/DAO+Implementation
[20:34] <tdonohue> (hmm..looks like you may have to login to the wiki to get to that page.)
[20:35] <tdonohue> I'm interested in seeing DAO happen. But, I will admit I'm not likely to have any time in the near future to help. Plus this is more of a "backend" thing -- not super flashy to most folks, but still may be worthwhile cleanup in our architecture.
[20:35] <mhwood> There are some smaller pieces that could be split off and just done, as they seem minor. Thinking mainly of the "deprecated tables" stuff.
[20:36] <hpottinger> count me as interested/curious, I'd like to be able to organize our database access/storage around a "building block"
[20:36] <helix84> if anyone is curious about the Context change, that has a separate ticket: DS-1205
[20:36] <kompewter> [ https://jira.duraspace.org/browse/DS-1205 ] - [#DS-1205] DSpace org.dspace.core.Context caching problem - DuraSpace JIRA
[20:38] <mhwood> My inexperience is speaking here, but I worry that we wind up replacing a well-characterized language (SQL) with 69,000 methods that implement a subset of it.
[20:38] <tdonohue> helix84: I think that's a different Context change? Joao mentions a new "ContextV2" that he created in the DAO work that now uses Hibernate session underneath (instead of DatabaseManager class)
[20:38] <mhwood> We have some rather complex queries.
[20:38] <hpottinger> currently running 1205 in production, and the patch for 1205 is already in Master, I believe
[20:39] <helix84> tdonohue: yes, a less intrusive fix was merged in 3.x and the larger changes are part of the DAO work
[20:39] <helix84> tdonohue: I mentioned it because the ticket explain the motivation
[20:39] <bollini> mhwood: you should really don't work too much. Hibernate produced queries can be tuned as needed (you can also write raw sql query if you really need them)
[20:41] <tdonohue> I will mention: since DAO touches nearly everything (via Context & Hibernate)... and since we have no one seemingly able to drive this quickly forward, this DAO work seems less likely for 4.0. But, it'd be nice to keep pushing it forward little by little so that it could possibly be ready for 5.0 (in 2014).
[20:42] <tdonohue> In other words, even though I'd like DAO.... I would want to make sure it was in *early* so that it gets well tested/hardened. It seems unlikely (from this discussion) that this work will get in early for 4.0
[20:42] <hpottinger> tdonohue++
[20:42] <bollini> tdonohue: +1 we should decide early than dao/hibernate is the main change for the 5.0
[20:43] <mhwood> Time does seem short.
[20:43] <tdonohue> bollini : yes, I agree. We'd want to decide early on
[20:43] <hpottinger> maybe we can find something "flashy" that has DAO as a dependency
[20:44] <tdonohue> hpottinger: one possible thing would be MySQL support. That pops up every now and then. But, it never happens cause it's too hard to do right now
[20:44] <mhwood> Replacing the RDBMS with piles of RDF, for example? It's been mentioned before....
[20:45] <helix84> hpottinger: SpringUI
[20:45] <mhwood> What *is* the connection with Spring here?
[20:45] <helix84> mhwood: no idea, but the code is out there to look at
[20:46] <tdonohue> oh, yea, and the SpringUI that Joao/Lyncode is also working on may be the "flashy" thing that interests people (I don't know much about the SpringUI work yet...though)
[20:46] <bollini> just to put on the table another big question. I'm thinking about a way to solve the need of a "plugin market" like wordpress, ojs, etc. I'm collecting some requirements as: ability to manage plugin data and configuration, hot installation, etc. Someone else is looking to this? what do you think about the OSGI technology?
[20:46] <bollini> mhwood: SpringUI uses Spring MVC library a subproject of the springframework
[20:47] <mhwood> OSGI doesn't sound doable by 4.0 unless someone has been doing a lot of work on it quietly. Maybe by 6.0.
[20:47] <tdonohue> bollini: I think having a "plugin" solution is of high interest to many people. But, I don't know of anyone looking at it in great detail yet.
[20:47] <mhwood> Yes, Spring MVC I get. But what does it have to do with DAO?
[20:48] <tdonohue> I've heard OSGI can be rather difficult in some ways (but very flexible).. Beyond that, I haven't much experience with it myself yet.
[20:48] <helix84> mhwood: I'm out of my comfort zone here, but you may want to look search for spring here, there are many occurences there: https://github.com/DSpace/DSpace/pull/161/files
[20:48] <hpottinger> Richard Rodgers' MDS has a plugin structure he's tinkering with, I believe
[20:48] <kompewter> [ DAO Implementation (Status: 50%) - Community help requested by lyncodev · Pull Request #161 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/161/files
[20:48] <bollini> mhwood: spring mvc work better with standard javabean.... and this is a big connection point with hibernate
[20:50] <robint> Got to duck out early, cheers all
[20:50] * robint (522a6b02@gateway/web/freenode/ip.82.42.107.2) Quit (Quit: Page closed)
[20:51] <tdonohue> Any other 4.0 topics / ideas we'd like to touch on here?
[20:51] <helix84> we might want to skip REST today since Peter is not here
[20:51] <bollini> upgrade to solr 4? someone is working on it?
[20:52] <mhwood> DOI: I was in a meeting this week with librarians who want to move forward on this. Will work with Pascal-Nicolas to see if we can get this ready in time.
[20:52] <tdonohue> haven't heard of anyone working on Solr 4. Might be good to ask @mire (as they do a lot of Solr stuff), but I haven't heard of anything yet.
[20:52] <helix84> kevin did the Solr 3.3.0 -> 3.5.0 upgrade in DSpace 3
[20:52] <tdonohue> mhwood: good to know on DOI
[20:53] * PeterDietz (~peterdiet@128.146.173.70) has joined #duraspace
[20:53] <helix84> has anyone heard about the ORCID work done for Dryad? they had a speaker at ELAG and (planned) at OAI8
[20:53] <helix84> here he comes. hi Peter
[20:53] <PeterDietz> hi all. Had a meeting
[20:53] <bollini> as part of the dspace-cris work we are already moved to solr 4 so we can easly upgrade all the dspace "client" code (discovery and statistics). Instead we are not comfortable with dspace-solr upgrade
[20:53] <aschweer> helix84: I was just checking whether there's a timeline on the ORCID/Dryad wiki page. Doesn't look like it though.
[20:54] <tdonohue> https://wiki.duraspace.org/display/~ryscher/ORCID+Integration
[20:54] <kompewter> [ Log In - DuraSpace Wiki ] - https://wiki.duraspace.org/display/~ryscher/ORCID+Integration
[20:55] <helix84> oh, and we have updates to SWORD ready, thanks to Richard Jones, that needs a review
[20:55] <hpottinger> I'm planning on using that new SWORD stuff soonish
[20:56] <tdonohue> PeterDietz: any updates to share on REST API? We were touching base on stuff that may or may not be ready for 4.0.
[20:56] <helix84> https://github.com/DSpace/DSpace/pull/229
[20:56] <kompewter> [ Updated SWORDv2 module by richard-jones · Pull Request #229 · DSpace/DSpace · GitHub ] - https://github.com/DSpace/DSpace/pull/229
[20:56] <tdonohue> hpottinger: cool...let us know how it all goes. Sounds like some nice SWORD updates at a glance
[20:57] <PeterDietz> REST updates, Harvard has been approaching the design through "intent", I've been approaching it by code-jamming, and seeing what you get by implementing technology
[20:57] <PeterDietz> intent, meaning.. What do you intend to use the API for, and how might you want to be able to fetch information from it..
[20:58] <PeterDietz> one example: You start by abstracting away some internally named variables, and aliasing them to other things.
[20:58] <helix84> PeterDietz: I'm still bothered by the fact that we have 2 production-grade implementations and you're working on a new experimental one. I think when the time comes, that one will be rushed into 4.0.
[20:58] <tdonohue> ok. So, is there anything needing broader feedback (even from developers/committers) yet? Just curious if we should make time to discuss anything in one of our upcoming meetings
[21:00] <PeterDietz> I do need to mention the works-in-progress on the developer list
[21:00] * fasseg (~fas@HSI-KBW-078-043-007-220.hsi4.kabel-badenwuerttemberg.de) has left #duraspace
[21:01] <bollini> I need to leave. bye
[21:01] <PeterDietz> I might be fine with an approach, of "blessing" an existing api (hedtek|wijiti), and promoting that. And then saying that from a technology standpoint, we'd prefer to mature it to another framework
[21:01] <tdonohue> PeterDietz: please do, when you get the chance. If it makes sense, we definitely can make time in one of these meetings to discuss REST "works-in-progress" if you all could use the feedback (and Harvard folks are more than welcome to join us)
[21:02] * bollini (~chatzilla@adsl-ull-225-49.47-151.net24.it) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[21:02] <PeterDietz> Since its nobodies full-time job to develop an API, its hard to promise that anything can be completed in-time, without being able to dedicate effort to it
[21:02] <mhwood> If we can get 80% of what people want and grow it later without too much pain, I think that would be fine.
[21:03] <tdonohue> PeterDietz: That is a definite possibility. Technically, as long as we are OK we keep the same general REST 'syntax', we can swap out whatever is "underneath" without affecting things built on the API. It's just frustrating to others if we change the entire syntax of the API.
[21:03] <PeterDietz> ...but then does api v2 (think JERSEY or something), need to implement compatibility for v1 (hedtek or wijiti)
[21:03] <tdonohue> s/we are OK we/we/
[21:03] <kompewter> tdonohue meant to say: PeterDietz: That is a definite possibility. Technically, as long as we keep the same general REST 'syntax', we can swap out whatever is "underneath" without affecting things built on the API. It's just frustrating to others if we change the entire syntax of the API.
[21:04] <PeterDietz> meaning, will someone build a Drupal/CMS/Wordpress/Omeka plugin against our get-it-out-the-door API, and then when we want to upgrade the implementation, its gotta support old-syntax
[21:04] <PeterDietz> I don't need these questions to stop progress, but if you can start from a clean slate (since I'm unaware of production client implementations of API), then try to get to where you want to get
[21:05] <helix84> PeterDietz: do you have any use for the test cases developed for Hedtek in your implementation? (we should give it a name. Jersey API?)
[21:05] <tdonohue> PeterDietz: Yes, ideally we either keep support for "old-syntax" in that scenario (v1 to v2), or we provide a clear "upgrade" path (e.g. "here's how you translate 'old-syntax' to 'new-syntax'")
[21:05] <mhwood> Good point. Would have to e.g. release new API in 5.0, deprecate old; move old out of the main code to "contributed" or "archived" add-on status in 6.0.
[21:06] <hpottinger> Maybe what we can commit to for 4.0 is an API spec?
[21:06] <mhwood> Meanwhile I have to go. Thanks all!
[21:07] * mhwood (mwood@mhw.ulib.iupui.edu) has left #duraspace
[21:07] <tdonohue> "Nice to have for 4.0" : API Spec + Beta implementation. (That "Beta implementation" could be hedtek, wijiti or something else)
[21:07] <helix84> hpottinger: I don't think that's a good idea - unless we provide a complete implementation with it
[21:08] <helix84> hpottinger: what I'm trying to say is that you may not want spec to get ahead of implementation, because the spec will then limit you in what you may want to do, as you learn by implementing it
[21:08] <PeterDietz> Hopefully we still have "plenty of time" to be able to get something quality built. I'd hate to paint myself as the person behind the lack of a DSpace REST API. I'd just like to use this moment to aspire to the API we want
[21:10] <tdonohue> regarding a DSpace API spec: This is also something we could look at other REST APIs for pointers/ideas.... e.g. Fedora
[21:10] <helix84> PeterDietz: sorry if I sound critical about this. I appreciate your work, I really do. I just see there has already been a lot of wheel reinventing in this area, which frustrates the other teams.
[21:11] <hpottinger> I like that this API design process is coming about from a discussion between consumers and producers of the API, I may be naive, but that sounds to me like a process that will definitely end up with a great API (in other words, I agree with helix84)
[21:12] <PeterDietz> wheel reinventor.. I like that. Its probably easier to get a patent mention.
[21:12] <hpottinger> I also had the same initial reaction to the idea of making another API, but it does seem like that's where the work needed to be
[21:14] <helix84> also (I was assured about this in a recent discussion about an unrelated product), the eat your own dogfood approach works best for developing a great API. I.e. develop a fully functional interface on top of it. Yeah, we don't really need another interface in DSpace, but many people prefer to work in other languages, so this may be actually a good idea.
[21:15] <PeterDietz> ...which is what Harvard is working on. They've built some type of stub api intent-erface.. Which might not be connected to dspace-api (core), but it spits out JSON for variable components. And they've been making some JS widgets against that.
[21:16] <PeterDietz> completely un-related, but I'm hoping that the DAO work continues, because I'm intrigued by this neo4j graph database thing.
[21:16] <tdonohue> My point of view here is: If there are limitations we've already realized in HedTek or Wijiti, I'm perfectly OK with working towards a new API. But, no matter what we start with in a "v1", chances are we're gonna find problems that need to be fixed in a "v2" (that's just the nature of things). So, the real question becomes...are HedTek or Wijiti "good enough" for a "v1" API, or is there something else we can build quickly that
[21:16] <helix84> neo4j++
[21:17] <helix84> although I don't think DAO work will help us do anything with it, it's just something we may use "in addition", not "instead"
[21:17] <tdonohue> I think the only thing we want to avoid is the scenario of building a new API that also isn't "good enough" and then we have hedtek, wijiti, and "new API" all sitting around gathering dust. :)
[21:17] <PeterDietz> I've kicked the hedtek version a bit, and it works (completely?) for my initial needs
[21:18] <helix84> PeterDietz: can you reuse the test cased from it?
[21:18] <PeterDietz> I haven't played with wijiti version, but not for any good reason
[21:18] <helix84> cases
[21:18] <PeterDietz> Their test cases look like they make a dependency between JORUM's DB content, and some json files.. And they look for paatern matching
[21:20] <helix84> so, can they be actually ran by us for the hedtek API? Do we have the mock data? (I haven't tried)
[21:20] * tdonohue notes that obviously our meeting has gone "over" time. Feel free to head out (as some already have) if you need to. I'm not going to call any more agenda items...but folks are welcome to continue discussions here (I'll stick around for a bit as well)
[21:21] <PeterDietz> ok, it looks like they have a bunch of DB fixtures, to help populate the sample DB
[21:23] <helix84> PeterDietz: I also wanted to ask you if you did some more work on ES statistics. It feels like 95% complete and I'd really like to use it. I do run it in production, it just has some rough edges (which I commented on).
[21:24] <PeterDietz> oh right. Yes, I've been building more things into it
[21:24] <helix84> any teasers?
[21:24] <PeterDietz> DSpace 3 uses ES 0.18.6, I've upgraded to 0.19.11, which helps to facilitate multi-ES server replication
[21:25] <helix84> one thing that is really necessary is localhost restriction, like Solr has
[21:25] <PeterDietz> basically, it helped me to migrate our ES data from our DSpace server's ES server node, to a dedicated standalone ES server
[21:25] <helix84> you mean online migration?
[21:26] <PeterDietz> i'm currently working on kicking out robots based on useragent
[21:27] <helix84> that's one of the things that could potentially be shared with Solr
[21:27] <PeterDietz> i've also built entire-site level stats
[21:27] <PeterDietz> you can see your entire repository usage
[21:27] <helix84> yay!
[21:28] <helix84> I hope we can go over my notes on ES together at OR.
[21:28] <tdonohue> One thing I've wondered about ES Statistics. Is there any need to ever "optimize" or cleanup an ES index? Just been curious, cause I know we have all those optimization options for Solr via "dspace stats-utils". Wasn't sure if we needed anything similar for ES at some point or if ES does this all differently.
[21:30] <tdonohue> RE: the useragent filtering stuff & site wide stats - Cool! I'd love to see both, and also like to see if we can find someone else to perhaps "port" them to Solr Stats as well.
[21:30] <PeterDietz> another cool feature, if you request a date-range in the report, for less than a month (i.e. 29 days or less). then it shows you daily statistics
[21:30] <aschweer> I have site-wide stats for solr -- I might ask for time to contribute back. http://otago.ourarchive.ac.nz/stats
[21:30] <hpottinger> +1 for localhost restriction for ES stats, I had to defer any playing with deploying ES for our production upgrade to 3.1, but I really want to get back into that soon
[21:30] <PeterDietz> as opposed to being grouped by the month
[21:31] <kompewter> [ OUR Archive - Usage Statistics ] - http://otago.ourarchive.ac.nz/stats
[21:31] <kstamatis__> Goodbye everyone. I need to wake up early in the morning!
[21:31] <PeterDietz> by kstamatis__, congratulations!
[21:31] <helix84> hpottinger: why not just use iptables?
[21:31] <tdonohue> bye kstamatis__! Thanks for joining us!
[21:31] <PeterDietz> we have this blocked by firewall, nobody can touch :9200
[21:32] <PeterDietz> ..so maybe, I can think about it.
[21:32] <tdonohue> aschweer: that'd be an excellent contribution for 4.0, if you can find the time (I know a ton of folks who would love that!)
[21:32] <PeterDietz> Other ES work I was doing was to help make it so that you can move your ES OFF of your DSpace server
[21:32] <hpottinger> firewall is fine, I'm talking about provisioning so ES binds to localhost instead of the first available interface
[21:32] <kstamatis__> Really was very useful. I was always searching for what you are mentioning in here. Things start getting in a straight direction for me.
[21:33] <PeterDietz> we have tons of VM's on campus.. we just bought tons and tons of server space / memory / cpus. So instead of jam-packed VM's, we're distributed everything..
[21:33] <kstamatis__> PeterDietz: Thanks!
[21:33] <helix84> OUR - that's a nice acronym. We have a similar one, which translates loosely to something like YOUR :)
[21:33] <tdonohue> kstamatis__ : great. Definitely let us know if you have any questions :)
[21:33] * kstamatis__ (4f6ba400@gateway/web/freenode/ip.79.107.164.0) Quit (Quit: Page closed)
[21:34] <aschweer> helix84: I can't take the credit on that one, but I do like it :)
[21:35] <PeterDietz> aschweer: I like your statistics portal front page. Its lots of information that helps to understand whats (being used) in your repository.
[21:38] <helix84> btw this is another area where we have too many implementations - log files, solr-based, ES-based, Minho (which I'm fairly sure Joao will want to use as a model for his own implementation), Analytics (a lot of interest, just one actual implementation) - we might want to think of converging a bit
[21:39] <aschweer> PeterDietz: thanks :) we got there after a few iterations. I think originally they used a custom module that kshepherd based off someone else's stats
[21:42] <tdonohue> helix84: Yes, I agree on converging -- at least on what we support centrally. Ideally, we'd have one or two "out-of-the-box" stats engines...and others which are "third party plugins" (which may be supported by committers and/or other developer teams). The problem remains though that we don't have a good plugin model here.
[21:43] <tdonohue> Because we've lacked a good plugin model, we tend to encounter these scenarios where we have several implementations of the same thing (UIs, Stats engines, Browse engines, etc.) in the primary codebase. Ideally, out-of-the-box, we'd try to limit to one or two implementations -- and let everything else be plugins.
[21:43] <tdonohue> (that's just my personal opinion here)
[21:43] <helix84> basically things like extracting legacy stats from log files, bot filters could be shared to a great extent
[21:44] * l_a_p (~chatzilla@31.188.17.226) Quit (Quit: ChatZilla 0.9.90 [Firefox 16.0.2/20121024073032])
[21:44] <tdonohue> yes, at the very least, the configurations for which bots to filter should be shared
[21:46] <tdonohue> (Sidenote: Personally, I suspect it may be about time to think about deprecating the old "log-based" statistics. I think both Solr & ES are doing a better job and are better supported at this point...as is the Minho add-on)
[21:47] <aschweer> tdonohue: that depends on which aspect of the log-based stats you're talking about. # items added per month and the like are used by most/all of my institutions and you can't get that from Solr/ES
[21:47] <aschweer> (at least afaik)
[21:49] <helix84> apropos, workflow stats seem broken in DSpace 3: https://jira.duraspace.org/browse/DS-1573
[21:49] <kompewter> [ [#DS-1573] DSpace statistics: incorrectly parsing event names from log file entries - DuraSpace JIRA ] - https://jira.duraspace.org/browse/DS-1573
[21:49] <kompewter> [ https://jira.duraspace.org/browse/DS-1573 ] - [#DS-1573] DSpace statistics: incorrectly parsing event names from log file entries - DuraSpace JIRA
[21:49] <tdonohue> aschweer: very true. If there are parts that the log-based stats do better...then we should either keep those parts (or migrate them to Solr & ES, if it'd make sense to). I just know that there is some duplication in "log-based" and Solr/ES, and Solr/ES is more accurate/descriptive in those cases.
[21:51] <aschweer> tdonohue: yes for sure. the view / download / search stats can probably go away from the log-based stats
[21:52] <tdonohue> I think it's just confusing for new DSpace users to see multiple (often duplicative) statistics pages in "Log-based" and either Solr or ES. That's the part I'd like to avoid, if we can. :)
[21:53] <helix84> tdonohue++ just see yesterday's/today's thread by Terry Brady
[21:53] <tdonohue> So, it's a matter of either consolidating...or making it clear which statistics pages measure which "type of system statistics"
[21:53] <tdonohue> helix84: yep, I saw that thread :)
[21:53] <aschweer> yes, pulling the content stats away from the usage stats makes a lot of sense to me. separating by the nature rather than by source of data
[21:53] <aschweer> anyway, gotta go. bye everyone!
[21:54] * aschweer (~schweer@schweer.its.waikato.ac.nz) Quit (Quit: leaving)
[21:54] <helix84> honestly, I was confused myself in some aspects of the different statistics "engines" (for a lack of a better term)
[21:57] <helix84> PeterDietz: did you also add page views in addition to downloads to ES?
[22:02] * tdonohue (~tdonohue@c-67-177-111-99.hsd1.il.comcast.net) has left #duraspace
[22:02] * hpottinger (~hpottinge@mu-162198.dhcp.missouri.edu) has left #duraspace
[22:25] * helix84 (~a@ip4-95-82-147-170.cust.nbox.cz) has left #duraspace

These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.